[PATCH v5 0/19] Support early break/return auto-vectorization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH v5 0/19] Support early break/return auto-vectorization
@ 2023-06-28 13:40 Tamar Christina
  2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
                   ` (42 more replies)
  0 siblings, 43 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 10897 bytes --]

Hi All,

This patch adds initial support for early break vectorization in GCC.
The support is added for any target that implements a vector cbranch optab,
this includes both fully masked and non-masked targets.

Depending on the operation, the vectorizer may also require support for boolean
mask reductions using Inclusive OR.  This is however only checked then the
comparison would produce multiple statements.

Concretely the kind of loops supported are of the forms:

 for (int i = 0; i < N; i++)
 {
   <statements1>
   if (<condition>)
     {
       ...
       <action>;
     }
   <statements2>
 }

where <action> can be:
 - break
 - return
 - goto

Any number of statements can be used before the <action> occurs.

Since this is an initial version for GCC 14 it has the following limitations and
features:

- Only fixed sized iterations and buffers are supported.  That is to say any
  vectors loaded or stored must be to statically allocated arrays with known
  sizes. N must also be known.  This limitation is because our primary target
  for this optimization is SVE.  For VLA SVE we can't easily do cross page
  iteraion checks. The result is likely to also not be beneficial. For that
  reason we punt support for variable buffers till we have First-Faulting
  support in GCC.
- any stores in <statements1> should not be to the same objects as in
  <condition>.  Loads are fine as long as they don't have the possibility to
  alias.  More concretely, we block RAW dependencies when the intermediate value
  can't be separated fromt the store, or the store itself can't be moved.
- The number of loop iterations must be known,  this is just a temporarily
  limitation that I intend to address in GCC 14 itself as follow on patches.
- Prologue peeling, alignment peelinig and loop versioning are supported.
- Fully masked loops, unmasked loops and partially masked loops are supported
- Any number of loop early exits are supported.
- The early exit must be before the natural loop exit/latch.  The vectorizer is
  designed in way to propage phi-nodes downwards.  As such supporting this
  inverted control flow is hard.
- No support for epilogue vectorization.  The only epilogue supported is the
  scalar final one.  Epilogue vectorization would also not be profitable.
- Early breaks are only supported for inner loop vectorization.

I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break

With the help of IPA and LTO this still gets hit quite often.  During bootstrap
it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
since these are tests for support for early exit vectorization.

This implementation does not support completely handling the early break inside
the vector loop itself but instead supports adding checks such that if we know
that we have to exit in the current iteration then we branch to scalar code to
actually do the final VF iterations which handles all the code in <action>.

niters analysis and the majority of the vectorizer with hardcoded single_exit
have been updated with the use of a new function vec_loop_iv value which returns
the exit the vectorizer wants to use as the main IV exit.

for niters the this exit is what determines the overall iterations as
that is the O(iters) for the loop.

For the scalar loop we know that whatever exit you take you have to perform at
most VF iterations.  For vector code we only case about the state of fully
performed iteration and reset the scalar code to the (partially) remaining loop.

This new version of the patch does the majority of the work in a new rewritten
loop peeling.  This new function maintains LCSSA all the way through and no
longer requires the touch up functions the vectorized used to incrementally
adjust them later on.  This means that aside from IV updates and guard edge
updates the early exit code is identical to the single exit cases.

When the loop is peeled during the copying I have to go through great lengths to
keep the dominators up to date.  All exits from the first loop are rewired to the
loop header of the second loop.  But this can change the immediate dominator.

The dominators can change again when we wire in the loop guard, as such peeling
now returns a list of dominators that need to be updated if a new guard edge is
added.

For the loop peeling we rewrite the loop form:

                     Header
                      ---
                      |x|
                       2
                       |
                       v
                -------3<------
     early exit |      |      |
                v      v      | latch
                7      4----->6
                |      |
                |      v
                |      8
                |      |
                |      v
                ------>5

into

                     Header
                      ---
                      |x|
                       2
                       |
                       v
                -------3<------
     early exit |      |      |
                v      v      | latch
                7      4----->6
                |      |
                |      v
                |      8
                |      |
                |      v
                |  New Header
                |     ---
                ----->|x|
                       9
                       |
                       v
                ------10<-----
     early exit |      |      |
                v      v      | latch
                14     11---->13
                |      |
                |      v
                |      12
                |      |
                |      v
                ------> 5

That is to say, the first vector loop executes so long as the early exit isn't
needed.  Once the exit is taken, the scalar code will perform at most VF extra
iterations.  The exact number depending on peeling and iteration start and which
exit was taken (natural or early).   For this scalar loop, all early exits are
treated the same.

When we vectorize we move any statement not related to the early break itself
and that would be incorrect to execute before the break (i.e. has side effects)
to after the break.  If this is not possible we decline to vectorize.

This means that we check at the start of iterations whether we are going to exit
or not.  During the analyis phase we check whether we are allowed to do this
moving of statements.  Also note that we only move the scalar statements, but
only do so after peeling but just before we start transforming statements.

Codegen:

for e.g.

#define N 803
unsigned vect_a[N];
unsigned vect_b[N];

unsigned test4(unsigned x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i] > x)
     break;
   vect_a[i] = x;

 }
 return ret;
}

We generate for Adv. SIMD:

test4:
        adrp    x2, .LC0
        adrp    x3, .LANCHOR0
        dup     v2.4s, w0
        add     x3, x3, :lo12:.LANCHOR0
        movi    v4.4s, 0x4
        add     x4, x3, 3216
        ldr     q1, [x2, #:lo12:.LC0]
        mov     x1, 0
        mov     w2, 0
        .p2align 3,,7
.L3:
        ldr     q0, [x3, x1]
        add     v3.4s, v1.4s, v2.4s
        add     v1.4s, v1.4s, v4.4s
        cmhi    v0.4s, v0.4s, v2.4s
        umaxp   v0.4s, v0.4s, v0.4s
        fmov    x5, d0
        cbnz    x5, .L6
        add     w2, w2, 1
        str     q3, [x1, x4]
        str     q2, [x3, x1]
        add     x1, x1, 16
        cmp     w2, 200
        bne     .L3
        mov     w7, 3
.L2:
        lsl     w2, w2, 2
        add     x5, x3, 3216
        add     w6, w2, w0
        sxtw    x4, w2
        ldr     w1, [x3, x4, lsl 2]
        str     w6, [x5, x4, lsl 2]
        cmp     w0, w1
        bcc     .L4
        add     w1, w2, 1
        str     w0, [x3, x4, lsl 2]
        add     w6, w1, w0
        sxtw    x1, w1
        ldr     w4, [x3, x1, lsl 2]
        str     w6, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        add     w4, w2, 2
        str     w0, [x3, x1, lsl 2]
        sxtw    x1, w4
        add     w6, w1, w0
        ldr     w4, [x3, x1, lsl 2]
        str     w6, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        str     w0, [x3, x1, lsl 2]
        add     w2, w2, 3
        cmp     w7, 3
        beq     .L4
        sxtw    x1, w2
        add     w2, w2, w0
        ldr     w4, [x3, x1, lsl 2]
        str     w2, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        str     w0, [x3, x1, lsl 2]
.L4:
        mov     w0, 0
        ret
        .p2align 2,,3
.L6:
        mov     w7, 4
        b       .L2

and for SVE:

test4:
        adrp    x2, .LANCHOR0
        add     x2, x2, :lo12:.LANCHOR0
        add     x5, x2, 3216
        mov     x3, 0
        mov     w1, 0
        cntw    x4
        mov     z1.s, w0
        index   z0.s, #0, #1
        ptrue   p1.b, all
        ptrue   p0.s, all
        .p2align 3,,7
.L3:
        ld1w    z2.s, p1/z, [x2, x3, lsl 2]
        add     z3.s, z0.s, z1.s
        cmplo   p2.s, p0/z, z1.s, z2.s
        b.any   .L2
        st1w    z3.s, p1, [x5, x3, lsl 2]
        add     w1, w1, 1
        st1w    z1.s, p1, [x2, x3, lsl 2]
        add     x3, x3, x4
        incw    z0.s
        cmp     w3, 803
        bls     .L3
.L5:
        mov     w0, 0
        ret
        .p2align 2,,3
.L2:
        cntw    x5
        mul     w1, w1, w5
        cbz     w5, .L5
        sxtw    x1, w1
        sub     w5, w5, #1
        add     x5, x5, x1
        add     x6, x2, 3216
        b       .L6
        .p2align 2,,3
.L14:
        str     w0, [x2, x1, lsl 2]
        cmp     x1, x5
        beq     .L5
        mov     x1, x4
.L6:
        ldr     w3, [x2, x1, lsl 2]
        add     w4, w0, w1
        str     w4, [x6, x1, lsl 2]
        add     x4, x1, 1
        cmp     w0, w3
        bcs     .L14
        mov     w0, 0
        ret

On the workloads this work is based on we see between 2-3x performance uplift
using this patch.

Follow up plan:
 - Boolean vectorization has several shortcomings.  I've filed PR110223 with the
   bigger ones that cause vectorization to fail with this patch.
 - SLP support.  This is planned for GCC 15 as for majority of the cases build
   SLP itself fails.  This means I'll need to spend time in making this more
   robust first.  Additionally it requires:
     * Adding support for vectorizing CFG (gconds)
     * Support for CFG to differ between vector and scalar loops.
   Both of which would be disruptive to the tree and I suspect I'll be handling
   fallouts from this patch for a while.  So I plan to work on the surrounding
   building blocks first for the remainder of the year.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Also ran across various workloads and no issues.

When closer to acceptance I will run on other targets as well and clean up
related testsuite fallouts there.

--- inline copy of patch -- 

-- 

[-- Attachment #2: rb17494.patch --]
[-- Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
@ 2023-06-28 13:41 ` Tamar Christina
  2023-07-04 11:29   ` Richard Biener
  2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
                   ` (41 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 12284 bytes --]

Hi,

With the patch enabling the vectorization of early-breaks, we'd like to allow
bitfield lowering in such loops, which requires the relaxation of allowing
multiple exits when doing so.  In order to avoid a similar issue to PR107275,
the code that rejects loops with certain types of gimple_stmts was hoisted from
'if_convertible_loop_p_1' to 'get_loop_body_in_if_conv_order', to avoid trying
to lower bitfields in loops we are not going to vectorize anyway.

This also ensures 'ifcvt_local_dec' doesn't accidentally remove statements it
shouldn't as it will never come across them.  I made sure to add a comment to
make clear that there is a direct connection between the two and if we were to
enable vectorization of any other gimple statement we should make sure both
handle it.

NOTE: This patch accepted before but never committed because it is a no-op
without the early break patch.   This is a respun version of Andre's patch and
rebased to changes in ifcvt and updated to handle multiple exits.

Bootstrappend and regression tested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu and no issues.

gcc/ChangeLog:

	* tree-if-conv.cc (if_convertible_loop_p_1): Move check from here ...
	(get_loop_body_if_conv_order): ... to here.
	(if_convertible_loop_p): Remove single_exit check.
	(tree_if_conversion): Move single_exit check to if-conversion part and
	support multiple exits.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-bitfield-read-1-not.c: New test.
	* gcc.dg/vect/vect-bitfield-read-2-not.c: New test.
	* gcc.dg/vect/vect-bitfield-read-8.c: New test.
	* gcc.dg/vect/vect-bitfield-read-9.c: New test.

Co-Authored-By:  Andre Vieira <andre.simoesdiasvieira@arm.com>

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
new file mode 100644
index 0000000000000000000000000000000000000000..0d91067ebb27b1db2b2352975c43bce8b4171e3f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
@@ -0,0 +1,60 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define RES 56
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      {
+	switch (ptr[i].a)
+	  {
+	  case 0:
+	    res += ptr[i].a + 1;
+	    break;
+	  case 1:
+	  case 2:
+	  case 3:
+	    res += ptr[i].a;
+	    break;
+	  default:
+	    return 0;
+	  }
+      }
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c
new file mode 100644
index 0000000000000000000000000000000000000000..4ac7b3fc0dfd1c9d0b5e94a2ba6a745545577ec1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      {
+	asm volatile ("" ::: "memory");
+	res += ptr[i].a;
+      }
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c
new file mode 100644
index 0000000000000000000000000000000000000000..52cfd33d937ae90f3fe9556716c90e098b768ac8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define ELT4 {4}
+#define N 32
+#define RES 25
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT4, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      {
+	if (ptr[i].i == 4)
+	  return res;
+	res += ptr[i].i;
+      }
+
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "Bitfield OK to lower." "ifcvt" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..ab814698131a5905def181eeed85d8a3c62b924b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define ELT4 {0x7FFFFFFFUL, 4}
+#define RES 9
+struct s A[N]
+  = { ELT0, ELT4, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      {
+	if (ptr[i].a)
+	  return 9;
+	res += ptr[i].a;
+      }
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "Bitfield OK to lower." "ifcvt" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index e342532a343a3c066142adeec5fdfaf736a653e5..cdb0fe4c29dfa531e3277925022d127b13ffcc16 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -586,7 +586,7 @@ add_to_dst_predicate_list (class loop *loop, edge e,
 /* Return true if one of the successor edges of BB exits LOOP.  */
 
 static bool
-bb_with_exit_edge_p (class loop *loop, basic_block bb)
+bb_with_exit_edge_p (const class loop *loop, basic_block bb)
 {
   edge e;
   edge_iterator ei;
@@ -1268,6 +1268,44 @@ get_loop_body_in_if_conv_order (const class loop *loop)
     }
   free (blocks_in_bfs_order);
   BITMAP_FREE (visited);
+
+  /* Go through loop and reject if-conversion or lowering of bitfields if we
+     encounter statements we do not believe the vectorizer will be able to
+     handle.  If adding a new type of statement here, make sure
+     'ifcvt_local_dce' is also able to handle it propertly.  */
+  for (index = 0; index < loop->num_nodes; index++)
+    {
+      basic_block bb = blocks[index];
+      gimple_stmt_iterator gsi;
+
+      bool may_have_nonlocal_labels
+	= bb_with_exit_edge_p (loop, bb) || bb == loop->latch;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	switch (gimple_code (gsi_stmt (gsi)))
+	  {
+	  case GIMPLE_LABEL:
+	    if (!may_have_nonlocal_labels)
+	      {
+		tree label
+		  = gimple_label_label (as_a <glabel *> (gsi_stmt (gsi)));
+		if (DECL_NONLOCAL (label) || FORCED_LABEL (label))
+		  {
+		    free (blocks);
+		    return NULL;
+		  }
+	      }
+	    /* Fallthru.  */
+	  case GIMPLE_ASSIGN:
+	  case GIMPLE_CALL:
+	  case GIMPLE_DEBUG:
+	  case GIMPLE_COND:
+	    gimple_set_uid (gsi_stmt (gsi), 0);
+	    break;
+	  default:
+	    free (blocks);
+	    return NULL;
+	  }
+    }
   return blocks;
 }
 
@@ -1438,36 +1476,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs)
 	exit_bb = bb;
     }
 
-  for (i = 0; i < loop->num_nodes; i++)
-    {
-      basic_block bb = ifc_bbs[i];
-      gimple_stmt_iterator gsi;
-
-      bool may_have_nonlocal_labels
-	= bb_with_exit_edge_p (loop, bb) || bb == loop->latch;
-      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	switch (gimple_code (gsi_stmt (gsi)))
-	  {
-	  case GIMPLE_LABEL:
-	    if (!may_have_nonlocal_labels)
-	      {
-		tree label
-		  = gimple_label_label (as_a <glabel *> (gsi_stmt (gsi)));
-		if (DECL_NONLOCAL (label) || FORCED_LABEL (label))
-		  return false;
-	      }
-	    /* Fallthru.  */
-	  case GIMPLE_ASSIGN:
-	  case GIMPLE_CALL:
-	  case GIMPLE_DEBUG:
-	  case GIMPLE_COND:
-	    gimple_set_uid (gsi_stmt (gsi), 0);
-	    break;
-	  default:
-	    return false;
-	  }
-    }
-
   data_reference_p dr;
 
   innermost_DR_map
@@ -1579,14 +1587,6 @@ if_convertible_loop_p (class loop *loop, vec<data_reference_p> *refs)
       return false;
     }
 
-  /* More than one loop exit is too much to handle.  */
-  if (!single_exit (loop))
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "multiple exits\n");
-      return false;
-    }
-
   /* If one of the loop header's edge is an exit edge then do not
      apply if-conversion.  */
   FOR_EACH_EDGE (e, ei, loop->header->succs)
@@ -3566,9 +3566,6 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!single_exit (loop))
-    goto cleanup;
-
   /* If there are more than two BBs in the loop then there is at least one if
      to convert.  */
   if (loop->num_nodes > 2
@@ -3588,15 +3585,25 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 
   if (loop->num_nodes > 2)
     {
-      need_to_ifcvt = true;
+      /* More than one loop exit is too much to handle.  */
+      if (!single_exit (loop))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "Can not ifcvt due to multiple exits\n");
+	}
+      else
+	{
+	  need_to_ifcvt = true;
 
-      if (!if_convertible_loop_p (loop, &refs) || !dbg_cnt (if_conversion_tree))
-	goto cleanup;
+	  if (!if_convertible_loop_p (loop, &refs)
+	      || !dbg_cnt (if_conversion_tree))
+	    goto cleanup;
 
-      if ((need_to_predicate || any_complicated_phi)
-	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	      || loop->dont_vectorize))
-	goto cleanup;
+	  if ((need_to_predicate || any_complicated_phi)
+	      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+		  || loop->dont_vectorize))
+	    goto cleanup;
+	}
     }
 
   if ((flag_tree_loop_vectorize || loop->force_vectorize)
@@ -3687,7 +3694,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      PHIs, those are to be kept in sync with the non-if-converted copy.
      ???  We'll still keep dead stores though.  */
   exit_bbs = BITMAP_ALLOC (NULL);
-  bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
+  for (edge exit : get_loop_exit_edges (loop))
+    bitmap_set_bit (exit_bbs, exit->dest->index);
   bitmap_set_bit (exit_bbs, loop->latch->index);
 
   std::pair <tree, tree> *name_pair;




-- 

[-- Attachment #2: rb17495.patch --]
[-- Type: text/plain, Size: 10565 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
new file mode 100644
index 0000000000000000000000000000000000000000..0d91067ebb27b1db2b2352975c43bce8b4171e3f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
@@ -0,0 +1,60 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define RES 56
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      {
+	switch (ptr[i].a)
+	  {
+	  case 0:
+	    res += ptr[i].a + 1;
+	    break;
+	  case 1:
+	  case 2:
+	  case 3:
+	    res += ptr[i].a;
+	    break;
+	  default:
+	    return 0;
+	  }
+      }
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c
new file mode 100644
index 0000000000000000000000000000000000000000..4ac7b3fc0dfd1c9d0b5e94a2ba6a745545577ec1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      {
+	asm volatile ("" ::: "memory");
+	res += ptr[i].a;
+      }
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c
new file mode 100644
index 0000000000000000000000000000000000000000..52cfd33d937ae90f3fe9556716c90e098b768ac8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s { int i : 31; };
+
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define ELT4 {4}
+#define N 32
+#define RES 25
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT4, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      {
+	if (ptr[i].i == 4)
+	  return res;
+	res += ptr[i].i;
+      }
+
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "Bitfield OK to lower." "ifcvt" } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..ab814698131a5905def181eeed85d8a3c62b924b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+    unsigned i : 31;
+    char a : 4;
+};
+
+#define N 32
+#define ELT0 {0x7FFFFFFFUL, 0}
+#define ELT1 {0x7FFFFFFFUL, 1}
+#define ELT2 {0x7FFFFFFFUL, 2}
+#define ELT3 {0x7FFFFFFFUL, 3}
+#define ELT4 {0x7FFFFFFFUL, 4}
+#define RES 9
+struct s A[N]
+  = { ELT0, ELT4, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+    int res = 0;
+    for (int i = 0; i < n; ++i)
+      {
+	if (ptr[i].a)
+	  return 9;
+	res += ptr[i].a;
+      }
+    return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f(&A[0], N) != RES)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "Bitfield OK to lower." "ifcvt" } } */
+
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index e342532a343a3c066142adeec5fdfaf736a653e5..cdb0fe4c29dfa531e3277925022d127b13ffcc16 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -586,7 +586,7 @@ add_to_dst_predicate_list (class loop *loop, edge e,
 /* Return true if one of the successor edges of BB exits LOOP.  */
 
 static bool
-bb_with_exit_edge_p (class loop *loop, basic_block bb)
+bb_with_exit_edge_p (const class loop *loop, basic_block bb)
 {
   edge e;
   edge_iterator ei;
@@ -1268,6 +1268,44 @@ get_loop_body_in_if_conv_order (const class loop *loop)
     }
   free (blocks_in_bfs_order);
   BITMAP_FREE (visited);
+
+  /* Go through loop and reject if-conversion or lowering of bitfields if we
+     encounter statements we do not believe the vectorizer will be able to
+     handle.  If adding a new type of statement here, make sure
+     'ifcvt_local_dce' is also able to handle it propertly.  */
+  for (index = 0; index < loop->num_nodes; index++)
+    {
+      basic_block bb = blocks[index];
+      gimple_stmt_iterator gsi;
+
+      bool may_have_nonlocal_labels
+	= bb_with_exit_edge_p (loop, bb) || bb == loop->latch;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	switch (gimple_code (gsi_stmt (gsi)))
+	  {
+	  case GIMPLE_LABEL:
+	    if (!may_have_nonlocal_labels)
+	      {
+		tree label
+		  = gimple_label_label (as_a <glabel *> (gsi_stmt (gsi)));
+		if (DECL_NONLOCAL (label) || FORCED_LABEL (label))
+		  {
+		    free (blocks);
+		    return NULL;
+		  }
+	      }
+	    /* Fallthru.  */
+	  case GIMPLE_ASSIGN:
+	  case GIMPLE_CALL:
+	  case GIMPLE_DEBUG:
+	  case GIMPLE_COND:
+	    gimple_set_uid (gsi_stmt (gsi), 0);
+	    break;
+	  default:
+	    free (blocks);
+	    return NULL;
+	  }
+    }
   return blocks;
 }
 
@@ -1438,36 +1476,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs)
 	exit_bb = bb;
     }
 
-  for (i = 0; i < loop->num_nodes; i++)
-    {
-      basic_block bb = ifc_bbs[i];
-      gimple_stmt_iterator gsi;
-
-      bool may_have_nonlocal_labels
-	= bb_with_exit_edge_p (loop, bb) || bb == loop->latch;
-      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	switch (gimple_code (gsi_stmt (gsi)))
-	  {
-	  case GIMPLE_LABEL:
-	    if (!may_have_nonlocal_labels)
-	      {
-		tree label
-		  = gimple_label_label (as_a <glabel *> (gsi_stmt (gsi)));
-		if (DECL_NONLOCAL (label) || FORCED_LABEL (label))
-		  return false;
-	      }
-	    /* Fallthru.  */
-	  case GIMPLE_ASSIGN:
-	  case GIMPLE_CALL:
-	  case GIMPLE_DEBUG:
-	  case GIMPLE_COND:
-	    gimple_set_uid (gsi_stmt (gsi), 0);
-	    break;
-	  default:
-	    return false;
-	  }
-    }
-
   data_reference_p dr;
 
   innermost_DR_map
@@ -1579,14 +1587,6 @@ if_convertible_loop_p (class loop *loop, vec<data_reference_p> *refs)
       return false;
     }
 
-  /* More than one loop exit is too much to handle.  */
-  if (!single_exit (loop))
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "multiple exits\n");
-      return false;
-    }
-
   /* If one of the loop header's edge is an exit edge then do not
      apply if-conversion.  */
   FOR_EACH_EDGE (e, ei, loop->header->succs)
@@ -3566,9 +3566,6 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 	aggressive_if_conv = true;
     }
 
-  if (!single_exit (loop))
-    goto cleanup;
-
   /* If there are more than two BBs in the loop then there is at least one if
      to convert.  */
   if (loop->num_nodes > 2
@@ -3588,15 +3585,25 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
 
   if (loop->num_nodes > 2)
     {
-      need_to_ifcvt = true;
+      /* More than one loop exit is too much to handle.  */
+      if (!single_exit (loop))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "Can not ifcvt due to multiple exits\n");
+	}
+      else
+	{
+	  need_to_ifcvt = true;
 
-      if (!if_convertible_loop_p (loop, &refs) || !dbg_cnt (if_conversion_tree))
-	goto cleanup;
+	  if (!if_convertible_loop_p (loop, &refs)
+	      || !dbg_cnt (if_conversion_tree))
+	    goto cleanup;
 
-      if ((need_to_predicate || any_complicated_phi)
-	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
-	      || loop->dont_vectorize))
-	goto cleanup;
+	  if ((need_to_predicate || any_complicated_phi)
+	      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
+		  || loop->dont_vectorize))
+	    goto cleanup;
+	}
     }
 
   if ((flag_tree_loop_vectorize || loop->force_vectorize)
@@ -3687,7 +3694,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
      PHIs, those are to be kept in sync with the non-if-converted copy.
      ???  We'll still keep dead stores though.  */
   exit_bbs = BITMAP_ALLOC (NULL);
-  bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
+  for (edge exit : get_loop_exit_edges (loop))
+    bitmap_set_bit (exit_bbs, exit->dest->index);
   bitmap_set_bit (exit_bbs, loop->latch->index);
 
   std::pair <tree, tree> *name_pair;




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
  2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
@ 2023-06-28 13:41 ` Tamar Christina
  2023-06-29 22:17   ` Jason Merrill
  2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
                   ` (40 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, joseph, rguenther, nathan, jason

[-- Attachment #1: Type: text/plain, Size: 36404 bytes --]

Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C and C++
as gfortan does for FORTRAN and what ICX/ICX does for C and C++.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/c-family/ChangeLog:

	* c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
	* c-pragma.cc (init_pragma): Use it.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_while_statement, c_parser_do_statement,
	c_parser_for_statement, c_parser_statement_after_labels,
	c_parse_pragma_novector, c_parser_pragma): Wire through novector and
	default to false.

gcc/cp/ChangeLog:

	* cp-tree.def (RANGE_FOR_STMT): Update comment.
	* cp-tree.h (RANGE_FOR_NOVECTOR): New.
	(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
	finish_for_cond): Add novector param.
	* init.cc (build_vec_init): Default novector to false.
	* method.cc (build_comparison_op): Likewise.
	* parser.cc (cp_parser_statement): Likewise.
	(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
	cp_convert_range_for, cp_parser_iteration_statement,
	cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
	(cp_parser_pragma_novector): New.
	* pt.cc (tsubst_expr): Likewise.
	* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
	finish_for_cond): Likewise.

gcc/ChangeLog:

	* doc/extend.texi: Document it.
	* tree-core.h (struct tree_base): Add lang_flag_7 and reduce spare0.
	* tree.h (TREE_LANG_FLAG_7): New.

gcc/testsuite/ChangeLog:

	* g++.dg/vect/vect-novector-pragma.cc: New test.
	* gcc.dg/vect/vect-novector-pragma.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c198470f1aaa0a5a9da4e576 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -87,6 +87,7 @@ enum pragma_kind {
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
   PRAGMA_UNROLL,
+  PRAGMA_NOVECTOR,
 
   PRAGMA_FIRST_EXTERNAL
 };
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1ec4b7f8ccbd599b1a88 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1862,6 +1862,10 @@ init_pragma (void)
     cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL,
 				  false, false);
 
+  if (!flag_preprocess_only)
+    cpp_register_deferred_pragma (parse_in, "GCC", "novector", PRAGMA_NOVECTOR,
+				  false, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..9d35fe68704c8aca197bcd4805a146c655959621 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement (c_parser *, bool *,
 					  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec<tree> *);
 static void c_parser_switch_statement (c_parser *, bool *);
-static void c_parser_while_statement (c_parser *, bool, unsigned short, bool *);
-static void c_parser_do_statement (c_parser *, bool, unsigned short);
-static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
+static void c_parser_while_statement (c_parser *, bool, unsigned short, bool,
+				      bool *);
+static void c_parser_do_statement (c_parser *, bool, unsigned short, bool);
+static void c_parser_for_statement (c_parser *, bool, unsigned short, bool,
+				    bool *);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -6644,13 +6646,13 @@ c_parser_statement_after_labels (c_parser *parser, bool *if_p,
 	  c_parser_switch_statement (parser, if_p);
 	  break;
 	case RID_WHILE:
-	  c_parser_while_statement (parser, false, 0, if_p);
+	  c_parser_while_statement (parser, false, 0, false, if_p);
 	  break;
 	case RID_DO:
-	  c_parser_do_statement (parser, false, 0);
+	  c_parser_do_statement (parser, false, 0, false);
 	  break;
 	case RID_FOR:
-	  c_parser_for_statement (parser, false, 0, if_p);
+	  c_parser_for_statement (parser, false, 0, false, if_p);
 	  break;
 	case RID_GOTO:
 	  c_parser_consume_token (parser);
@@ -7146,7 +7148,7 @@ c_parser_switch_statement (c_parser *parser, bool *if_p)
 
 static void
 c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
-			  bool *if_p)
+			  bool novector, bool *if_p)
 {
   tree block, cond, body;
   unsigned char save_in_statement;
@@ -7168,6 +7170,11 @@ c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
 		   build_int_cst (integer_type_node,
 				  annot_expr_unroll_kind),
 		   build_int_cst (integer_type_node, unroll));
+  if (novector && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node,
+				  annot_expr_no_vector_kind),
+		   integer_zero_node);
   save_in_statement = in_statement;
   in_statement = IN_ITERATION_STMT;
 
@@ -7199,7 +7206,8 @@ c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
 */
 
 static void
-c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll)
+c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll,
+		       bool novector)
 {
   tree block, cond, body;
   unsigned char save_in_statement;
@@ -7228,6 +7236,11 @@ c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll)
 		   build_int_cst (integer_type_node,
 				  annot_expr_unroll_kind),
  		   build_int_cst (integer_type_node, unroll));
+  if (novector && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node,
+				  annot_expr_no_vector_kind),
+		   integer_zero_node);
   if (!c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>"))
     c_parser_skip_to_end_of_block_or_statement (parser);
 
@@ -7296,7 +7309,7 @@ c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll)
 
 static void
 c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll,
-			bool *if_p)
+			bool novector, bool *if_p)
 {
   tree block, cond, incr, body;
   unsigned char save_in_statement;
@@ -7430,6 +7443,12 @@ c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll,
 					  "with %<GCC unroll%> pragma");
 		  cond = error_mark_node;
 		}
+	      else if (novector)
+		{
+		  c_parser_error (parser, "missing loop condition in loop "
+					  "with %<GCC novector%> pragma");
+		  cond = error_mark_node;
+		}
 	      else
 		{
 		  c_parser_consume_token (parser);
@@ -7452,6 +7471,11 @@ c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll,
  			   build_int_cst (integer_type_node,
 					  annot_expr_unroll_kind),
 			   build_int_cst (integer_type_node, unroll));
+	  if (novector && cond != error_mark_node)
+	    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+			   build_int_cst (integer_type_node,
+					  annot_expr_no_vector_kind),
+			   integer_zero_node);
 	}
       /* Parse the increment expression (the third expression in a
 	 for-statement).  In the case of a foreach-statement, this is
@@ -13037,6 +13061,16 @@ c_parse_pragma_ivdep (c_parser *parser)
   return true;
 }
 
+/* Parse a pragma GCC novector.  */
+
+static bool
+c_parse_pragma_novector (c_parser *parser)
+{
+  c_parser_consume_pragma (parser);
+  c_parser_skip_to_pragma_eol (parser);
+  return true;
+}
+
 /* Parse a pragma GCC unroll.  */
 
 static unsigned short
@@ -13264,11 +13298,12 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p)
     case PRAGMA_IVDEP:
       {
 	const bool ivdep = c_parse_pragma_ivdep (parser);
-	unsigned short unroll;
+	unsigned short unroll = 0;
+	bool novector = false;
 	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL)
 	  unroll = c_parser_pragma_unroll (parser);
-	else
-	  unroll = 0;
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_NOVECTOR)
+	  novector = c_parse_pragma_novector (parser);
 	if (!c_parser_next_token_is_keyword (parser, RID_FOR)
 	    && !c_parser_next_token_is_keyword (parser, RID_WHILE)
 	    && !c_parser_next_token_is_keyword (parser, RID_DO))
@@ -13277,22 +13312,48 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p)
 	    return false;
 	  }
 	if (c_parser_next_token_is_keyword (parser, RID_FOR))
-	  c_parser_for_statement (parser, ivdep, unroll, if_p);
+	  c_parser_for_statement (parser, ivdep, unroll, novector, if_p);
 	else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
-	  c_parser_while_statement (parser, ivdep, unroll, if_p);
+	  c_parser_while_statement (parser, ivdep, unroll, novector, if_p);
 	else
-	  c_parser_do_statement (parser, ivdep, unroll);
+	  c_parser_do_statement (parser, ivdep, unroll, novector);
       }
       return true;
 
     case PRAGMA_UNROLL:
       {
 	unsigned short unroll = c_parser_pragma_unroll (parser);
-	bool ivdep;
+	bool ivdep = false;
+	bool novector = false;
 	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP)
 	  ivdep = c_parse_pragma_ivdep (parser);
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_NOVECTOR)
+	  novector = c_parse_pragma_novector (parser);
+	if (!c_parser_next_token_is_keyword (parser, RID_FOR)
+	    && !c_parser_next_token_is_keyword (parser, RID_WHILE)
+	    && !c_parser_next_token_is_keyword (parser, RID_DO))
+	  {
+	    c_parser_error (parser, "for, while or do statement expected");
+	    return false;
+	  }
+	if (c_parser_next_token_is_keyword (parser, RID_FOR))
+	  c_parser_for_statement (parser, ivdep, unroll, novector, if_p);
+	else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
+	  c_parser_while_statement (parser, ivdep, unroll, novector, if_p);
 	else
-	  ivdep = false;
+	  c_parser_do_statement (parser, ivdep, unroll, novector);
+      }
+      return true;
+
+    case PRAGMA_NOVECTOR:
+      {
+	bool novector = c_parse_pragma_novector (parser);
+	unsigned short unroll = 0;
+	bool ivdep = false;
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP)
+	  ivdep = c_parse_pragma_ivdep (parser);
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL)
+	  unroll = c_parser_pragma_unroll (parser);
 	if (!c_parser_next_token_is_keyword (parser, RID_FOR)
 	    && !c_parser_next_token_is_keyword (parser, RID_WHILE)
 	    && !c_parser_next_token_is_keyword (parser, RID_DO))
@@ -13301,11 +13362,11 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p)
 	    return false;
 	  }
 	if (c_parser_next_token_is_keyword (parser, RID_FOR))
-	  c_parser_for_statement (parser, ivdep, unroll, if_p);
+	  c_parser_for_statement (parser, ivdep, unroll, novector, if_p);
 	else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
-	  c_parser_while_statement (parser, ivdep, unroll, if_p);
+	  c_parser_while_statement (parser, ivdep, unroll, novector, if_p);
 	else
-	  c_parser_do_statement (parser, ivdep, unroll);
+	  c_parser_do_statement (parser, ivdep, unroll, novector);
       }
       return true;
 
diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def
index 0e66ca70e00caa1dc4beada1024ace32954e2aaf..c13c8ea98a523c4ef1c55a11e02d5da9db7e367e 100644
--- a/gcc/cp/cp-tree.def
+++ b/gcc/cp/cp-tree.def
@@ -305,8 +305,8 @@ DEFTREECODE (IF_STMT, "if_stmt", tcc_statement, 4)
 
 /* Used to represent a range-based `for' statement. The operands are
    RANGE_FOR_DECL, RANGE_FOR_EXPR, RANGE_FOR_BODY, RANGE_FOR_SCOPE,
-   RANGE_FOR_UNROLL, and RANGE_FOR_INIT_STMT, respectively.  Only used in
-   templates.  */
+   RANGE_FOR_UNROLL, RANGE_FOR_NOVECTOR and RANGE_FOR_INIT_STMT,
+   respectively.  Only used in templates.  */
 DEFTREECODE (RANGE_FOR_STMT, "range_for_stmt", tcc_statement, 6)
 
 /* Used to represent an expression statement.  Use `EXPR_STMT_EXPR' to
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 8398223311194837441107cb335d497ff5f5ec1c..50b0f20817a168b5e9ac58db59ad44233f079e11 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5377,6 +5377,7 @@ get_vec_init_expr (tree t)
 #define RANGE_FOR_UNROLL(NODE)	TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 4)
 #define RANGE_FOR_INIT_STMT(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 5)
 #define RANGE_FOR_IVDEP(NODE)	TREE_LANG_FLAG_6 (RANGE_FOR_STMT_CHECK (NODE))
+#define RANGE_FOR_NOVECTOR(NODE) TREE_LANG_FLAG_7 (RANGE_FOR_STMT_CHECK (NODE))
 
 /* STMT_EXPR accessor.  */
 #define STMT_EXPR_STMT(NODE)	TREE_OPERAND (STMT_EXPR_CHECK (NODE), 0)
@@ -7286,7 +7287,7 @@ extern bool maybe_clone_body			(tree);
 
 /* In parser.cc */
 extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool,
-				  unsigned short);
+				  unsigned short, bool);
 extern void cp_convert_omp_range_for (tree &, vec<tree, va_gc> *, tree &,
 				      tree &, tree &, tree &, tree &, tree &);
 extern void cp_finish_omp_range_for (tree, tree);
@@ -7609,16 +7610,19 @@ extern void begin_else_clause			(tree);
 extern void finish_else_clause			(tree);
 extern void finish_if_stmt			(tree);
 extern tree begin_while_stmt			(void);
-extern void finish_while_stmt_cond	(tree, tree, bool, unsigned short);
+extern void finish_while_stmt_cond	(tree, tree, bool, unsigned short,
+					 bool);
 extern void finish_while_stmt			(tree);
 extern tree begin_do_stmt			(void);
 extern void finish_do_body			(tree);
-extern void finish_do_stmt		(tree, tree, bool, unsigned short);
+extern void finish_do_stmt		(tree, tree, bool, unsigned short,
+					 bool);
 extern tree finish_return_stmt			(tree);
 extern tree begin_for_scope			(tree *);
 extern tree begin_for_stmt			(tree, tree);
 extern void finish_init_stmt			(tree);
-extern void finish_for_cond		(tree, tree, bool, unsigned short);
+extern void finish_for_cond		(tree, tree, bool, unsigned short,
+					 bool);
 extern void finish_for_expr			(tree, tree);
 extern void finish_for_stmt			(tree);
 extern tree begin_range_for_stmt		(tree, tree);
diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index af6e30f511e142c7a594e742d128b2bf0aa8fb8d..5b735b27e6f5bc6b439ae64665902f4f1ca76f95 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4846,7 +4846,7 @@ build_vec_init (tree base, tree maxindex, tree init,
       finish_init_stmt (for_stmt);
       finish_for_cond (build2 (GT_EXPR, boolean_type_node, iterator,
 			       build_int_cst (TREE_TYPE (iterator), -1)),
-		       for_stmt, false, 0);
+		       for_stmt, false, 0, false);
       /* We used to pass this decrement to finish_for_expr; now we add it to
 	 elt_init below so it's part of the same full-expression as the
 	 initialization, and thus happens before any potentially throwing
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 91cf943f11089c0e6bcbe8377daa4e016f956d56..fce49c796199c2c65cd70684e2942fea1b6b2ebd 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -1645,7 +1645,8 @@ build_comparison_op (tree fndecl, bool defining, tsubst_flags_t complain)
 		      add_stmt (idx);
 		      finish_init_stmt (for_stmt);
 		      finish_for_cond (build2 (LE_EXPR, boolean_type_node, idx,
-					       maxval), for_stmt, false, 0);
+					       maxval), for_stmt, false, 0,
+					       false);
 		      finish_for_expr (cp_build_unary_op (PREINCREMENT_EXPR,
 							  TARGET_EXPR_SLOT (idx),
 							  false, complain),
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index dd3665c8ccf48a8a0b1ba2c06400fe50999ea240..0bc110121d51ee13258b7ff0e4ad7851b4eae78e 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2324,15 +2324,15 @@ static tree cp_parser_selection_statement
 static tree cp_parser_condition
   (cp_parser *);
 static tree cp_parser_iteration_statement
-  (cp_parser *, bool *, bool, unsigned short);
+  (cp_parser *, bool *, bool, unsigned short, bool);
 static bool cp_parser_init_statement
   (cp_parser *, tree *decl);
 static tree cp_parser_for
-  (cp_parser *, bool, unsigned short);
+  (cp_parser *, bool, unsigned short, bool);
 static tree cp_parser_c_for
-  (cp_parser *, tree, tree, bool, unsigned short);
+  (cp_parser *, tree, tree, bool, unsigned short, bool);
 static tree cp_parser_range_for
-  (cp_parser *, tree, tree, tree, bool, unsigned short, bool);
+  (cp_parser *, tree, tree, tree, bool, unsigned short, bool, bool);
 static void do_range_for_auto_deduction
   (tree, tree, tree, unsigned int);
 static tree cp_parser_perform_range_for_lookup
@@ -12414,7 +12414,8 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr,
 	case RID_DO:
 	case RID_FOR:
 	  std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc);
-	  statement = cp_parser_iteration_statement (parser, if_p, false, 0);
+	  statement = cp_parser_iteration_statement (parser, if_p, false, 0,
+						     false);
 	  break;
 
 	case RID_BREAK:
@@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser)
    not included. */
 
 static tree
-cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
+cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll,
+	       bool novector)
 {
   tree init, scope, decl;
   bool is_range_for;
@@ -13624,14 +13626,14 @@ cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
 
   if (is_range_for)
     return cp_parser_range_for (parser, scope, init, decl, ivdep, unroll,
-				false);
+				novector, false);
   else
-    return cp_parser_c_for (parser, scope, init, ivdep, unroll);
+    return cp_parser_c_for (parser, scope, init, ivdep, unroll, novector);
 }
 
 static tree
 cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep,
-		 unsigned short unroll)
+		 unsigned short unroll, bool novector)
 {
   /* Normal for loop */
   tree condition = NULL_TREE;
@@ -13658,7 +13660,13 @@ cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep,
 		       "%<GCC unroll%> pragma");
       condition = error_mark_node;
     }
-  finish_for_cond (condition, stmt, ivdep, unroll);
+  else if (novector)
+    {
+      cp_parser_error (parser, "missing loop condition in loop with "
+		       "%<GCC novector%> pragma");
+      condition = error_mark_node;
+    }
+  finish_for_cond (condition, stmt, ivdep, unroll, novector);
   /* Look for the `;'.  */
   cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
 
@@ -13682,7 +13690,8 @@ cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep,
 
 static tree
 cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl,
-		     bool ivdep, unsigned short unroll, bool is_omp)
+		     bool ivdep, unsigned short unroll, bool novector,
+		     bool is_omp)
 {
   tree stmt, range_expr;
   auto_vec <cxx_binding *, 16> bindings;
@@ -13758,6 +13767,8 @@ cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl,
 	RANGE_FOR_IVDEP (stmt) = 1;
       if (unroll)
 	RANGE_FOR_UNROLL (stmt) = build_int_cst (integer_type_node, unroll);
+      if (novector)
+	RANGE_FOR_NOVECTOR (stmt) = 1;
       finish_range_for_decl (stmt, range_decl, range_expr);
       if (!type_dependent_expression_p (range_expr)
 	  /* do_auto_deduction doesn't mess with template init-lists.  */
@@ -13770,7 +13781,7 @@ cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl,
       stmt = begin_for_stmt (scope, init);
       stmt = cp_convert_range_for (stmt, range_decl, range_expr,
 				   decomp_first_name, decomp_cnt, ivdep,
-				   unroll);
+				   unroll, novector);
     }
   return stmt;
 }
@@ -13948,7 +13959,7 @@ warn_for_range_copy (tree decl, tree expr)
 tree
 cp_convert_range_for (tree statement, tree range_decl, tree range_expr,
 		      tree decomp_first_name, unsigned int decomp_cnt,
-		      bool ivdep, unsigned short unroll)
+		      bool ivdep, unsigned short unroll, bool novector)
 {
   tree begin, end;
   tree iter_type, begin_expr, end_expr;
@@ -14008,7 +14019,7 @@ cp_convert_range_for (tree statement, tree range_decl, tree range_expr,
 				 begin, ERROR_MARK,
 				 end, ERROR_MARK,
 				 NULL_TREE, NULL, tf_warning_or_error);
-  finish_for_cond (condition, statement, ivdep, unroll);
+  finish_for_cond (condition, statement, ivdep, unroll, novector);
 
   /* The new increment expression.  */
   expression = finish_unary_op_expr (input_location,
@@ -14175,7 +14186,7 @@ cp_parser_range_for_member_function (tree range, tree identifier)
 
 static tree
 cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep,
-			       unsigned short unroll)
+			       unsigned short unroll, bool novector)
 {
   cp_token *token;
   enum rid keyword;
@@ -14209,7 +14220,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep,
 	parens.require_open (parser);
 	/* Parse the condition.  */
 	condition = cp_parser_condition (parser);
-	finish_while_stmt_cond (condition, statement, ivdep, unroll);
+	finish_while_stmt_cond (condition, statement, ivdep, unroll, novector);
 	/* Look for the `)'.  */
 	parens.require_close (parser);
 	/* Parse the dependent statement.  */
@@ -14244,7 +14255,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep,
 	/* Parse the expression.  */
 	expression = cp_parser_expression (parser);
 	/* We're done with the do-statement.  */
-	finish_do_stmt (expression, statement, ivdep, unroll);
+	finish_do_stmt (expression, statement, ivdep, unroll, novector);
 	/* Look for the `)'.  */
 	parens.require_close (parser);
 	/* Look for the `;'.  */
@@ -14258,7 +14269,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep,
 	matching_parens parens;
 	parens.require_open (parser);
 
-	statement = cp_parser_for (parser, ivdep, unroll);
+	statement = cp_parser_for (parser, ivdep, unroll, novector);
 
 	/* Look for the `)'.  */
 	parens.require_close (parser);
@@ -43815,7 +43826,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
 	      cp_parser_require (parser, CPP_COLON, RT_COLON);
 
 	      init = cp_parser_range_for (parser, NULL_TREE, NULL_TREE, decl,
-					  false, 0, true);
+					  false, 0, true, false);
 
 	      cp_convert_omp_range_for (this_pre_body, for_block, decl,
 					orig_decl, init, orig_init,
@@ -49300,6 +49311,15 @@ cp_parser_pragma_unroll (cp_parser *parser, cp_token *pragma_tok)
   return unroll;
 }
 
+/* Parse a pragma GCC novector.  */
+
+static bool
+cp_parser_pragma_novector (cp_parser *parser, cp_token *pragma_tok)
+{
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+  return true;
+}
+
 /* Normal parsing of a pragma token.  Here we can (and must) use the
    regular lexer.  */
 
@@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
 	    break;
 	  }
 	const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
-	unsigned short unroll;
+	unsigned short unroll = 0;
+	bool novector = false;
 	cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
-	if (tok->type == CPP_PRAGMA
-	    && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
+
+	while (tok->type == CPP_PRAGMA)
 	  {
-	    tok = cp_lexer_consume_token (parser->lexer);
-	    unroll = cp_parser_pragma_unroll (parser, tok);
-	    tok = cp_lexer_peek_token (the_parser->lexer);
+	    switch (cp_parser_pragma_kind (tok))
+	      {
+		case PRAGMA_UNROLL:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    unroll = cp_parser_pragma_unroll (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		case PRAGMA_NOVECTOR:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    novector = cp_parser_pragma_novector (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		default:
+		  gcc_unreachable ();
+	      }
 	  }
-	else
-	  unroll = 0;
+
 	if (tok->type != CPP_KEYWORD
 	    || (tok->keyword != RID_FOR
 		&& tok->keyword != RID_WHILE
@@ -49632,7 +49668,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
 	    cp_parser_error (parser, "for, while or do statement expected");
 	    return false;
 	  }
-	cp_parser_iteration_statement (parser, if_p, ivdep, unroll);
+	cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector);
 	return true;
       }
 
@@ -49646,17 +49682,82 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
 	  }
 	const unsigned short unroll
 	  = cp_parser_pragma_unroll (parser, pragma_tok);
-	bool ivdep;
+	bool ivdep = false;
+	bool novector = false;
 	cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
-	if (tok->type == CPP_PRAGMA
-	    && cp_parser_pragma_kind (tok) == PRAGMA_IVDEP)
+
+	while (tok->type == CPP_PRAGMA)
 	  {
-	    tok = cp_lexer_consume_token (parser->lexer);
-	    ivdep = cp_parser_pragma_ivdep (parser, tok);
-	    tok = cp_lexer_peek_token (the_parser->lexer);
+	    switch (cp_parser_pragma_kind (tok))
+	      {
+		case PRAGMA_IVDEP:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    ivdep = cp_parser_pragma_ivdep (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		case PRAGMA_NOVECTOR:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    novector = cp_parser_pragma_novector (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		default:
+		  gcc_unreachable ();
+	      }
 	  }
-	else
-	  ivdep = false;
+
+	if (tok->type != CPP_KEYWORD
+	    || (tok->keyword != RID_FOR
+		&& tok->keyword != RID_WHILE
+		&& tok->keyword != RID_DO))
+	  {
+	    cp_parser_error (parser, "for, while or do statement expected");
+	    return false;
+	  }
+	cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector);
+	return true;
+      }
+
+    case PRAGMA_NOVECTOR:
+      {
+	if (context == pragma_external)
+	  {
+	    error_at (pragma_tok->location,
+		      "%<#pragma GCC novector%> must be inside a function");
+	    break;
+	  }
+	const bool novector
+	  = cp_parser_pragma_novector (parser, pragma_tok);
+	bool ivdep = false;
+	unsigned short unroll;
+	cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
+
+	while (tok->type == CPP_PRAGMA)
+	  {
+	    switch (cp_parser_pragma_kind (tok))
+	      {
+		case PRAGMA_IVDEP:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    ivdep = cp_parser_pragma_ivdep (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		case PRAGMA_UNROLL:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    unroll = cp_parser_pragma_unroll (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		default:
+		  gcc_unreachable ();
+	      }
+	  }
+
 	if (tok->type != CPP_KEYWORD
 	    || (tok->keyword != RID_FOR
 		&& tok->keyword != RID_WHILE
@@ -49665,7 +49766,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
 	    cp_parser_error (parser, "for, while or do statement expected");
 	    return false;
 	  }
-	cp_parser_iteration_statement (parser, if_p, ivdep, unroll);
+	cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector);
 	return true;
       }
 
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 2345a18becc1160b9d12f3d88cccb66c8917373c..7b0d01a90e3c4012ec603ebe04cbbb31a7dd1570 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -19036,7 +19036,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
       RECUR (FOR_INIT_STMT (t));
       finish_init_stmt (stmt);
       tmp = RECUR (FOR_COND (t));
-      finish_for_cond (tmp, stmt, false, 0);
+      finish_for_cond (tmp, stmt, false, 0, false);
       tmp = RECUR (FOR_EXPR (t));
       finish_for_expr (tmp, stmt);
       {
@@ -19073,6 +19073,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
 	  {
 	    RANGE_FOR_IVDEP (stmt) = RANGE_FOR_IVDEP (t);
 	    RANGE_FOR_UNROLL (stmt) = RANGE_FOR_UNROLL (t);
+	    RANGE_FOR_NOVECTOR (stmt) = RANGE_FOR_NOVECTOR (t);
 	    finish_range_for_decl (stmt, decl, expr);
 	    if (decomp_first && decl != error_mark_node)
 	      cp_finish_decomp (decl, decomp_first, decomp_cnt);
@@ -19083,7 +19084,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
 				     ? tree_to_uhwi (RANGE_FOR_UNROLL (t)) : 0);
 	    stmt = cp_convert_range_for (stmt, decl, expr,
 					 decomp_first, decomp_cnt,
-					 RANGE_FOR_IVDEP (t), unroll);
+					 RANGE_FOR_IVDEP (t), unroll,
+					 RANGE_FOR_NOVECTOR (t));
 	  }
 
 	bool prev = note_iteration_stmt_body_start ();
@@ -19096,7 +19098,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
     case WHILE_STMT:
       stmt = begin_while_stmt ();
       tmp = RECUR (WHILE_COND (t));
-      finish_while_stmt_cond (tmp, stmt, false, 0);
+      finish_while_stmt_cond (tmp, stmt, false, 0, false);
       {
 	bool prev = note_iteration_stmt_body_start ();
 	RECUR (WHILE_BODY (t));
@@ -19114,7 +19116,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
       }
       finish_do_body (stmt);
       tmp = RECUR (DO_COND (t));
-      finish_do_stmt (tmp, stmt, false, 0);
+      finish_do_stmt (tmp, stmt, false, 0, false);
       break;
 
     case IF_STMT:
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179eb2af2e82bf31d188023e9b9d41de9..b79975109c22ebcfcb060b4f20f32f69f3c3c444 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -1148,7 +1148,7 @@ begin_while_stmt (void)
 
 void
 finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep,
-			unsigned short unroll)
+			unsigned short unroll, bool novector)
 {
   cond = maybe_convert_cond (cond);
   finish_cond (&WHILE_COND (while_stmt), cond);
@@ -1168,6 +1168,13 @@ finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep,
 						     annot_expr_unroll_kind),
 				      build_int_cst (integer_type_node,
 						     unroll));
+  if (novector && cond != error_mark_node)
+    WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR,
+				      TREE_TYPE (WHILE_COND (while_stmt)),
+				      WHILE_COND (while_stmt),
+				      build_int_cst (integer_type_node,
+						     annot_expr_no_vector_kind),
+				      integer_zero_node);
   simplify_loop_decl_cond (&WHILE_COND (while_stmt), WHILE_BODY (while_stmt));
 }
 
@@ -1212,7 +1219,8 @@ finish_do_body (tree do_stmt)
    COND is as indicated.  */
 
 void
-finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll)
+finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll,
+		bool novector)
 {
   cond = maybe_convert_cond (cond);
   end_maybe_infinite_loop (cond);
@@ -1229,6 +1237,10 @@ finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll)
     cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		   build_int_cst (integer_type_node, annot_expr_unroll_kind),
 		   build_int_cst (integer_type_node, unroll));
+  if (novector && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node, annot_expr_no_vector_kind),
+		   integer_zero_node);
   DO_COND (do_stmt) = cond;
 }
 
@@ -1325,7 +1337,7 @@ finish_init_stmt (tree for_stmt)
    FOR_STMT.  */
 
 void
-finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll)
+finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll, bool novector)
 {
   cond = maybe_convert_cond (cond);
   finish_cond (&FOR_COND (for_stmt), cond);
@@ -1345,6 +1357,13 @@ finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll)
 						 annot_expr_unroll_kind),
 				  build_int_cst (integer_type_node,
 						 unroll));
+  if (novector && cond != error_mark_node)
+    FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR,
+				  TREE_TYPE (FOR_COND (for_stmt)),
+				  FOR_COND (for_stmt),
+				  build_int_cst (integer_type_node,
+						 annot_expr_no_vector_kind),
+				  integer_zero_node);
   simplify_loop_decl_cond (&FOR_COND (for_stmt), FOR_BODY (for_stmt));
 }
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 3040a9bdea65d27f8d20572b4ed37375f5fe949b..baac6643d1abbf33d592e68aca49ac83e3c29188 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -24349,6 +24349,25 @@ void ignore_vec_dep (int *a, int k, int c, int m)
 @}
 @end smallexample
 
+@cindex pragma GCC novector
+@item #pragma GCC novector
+
+With this pragma, the programmer asserts that the following loop should be
+prevented from executing concurrently with SIMD (single instruction multiple
+data) instructions.
+
+For example, the compiler cannot vectorize the following loop with the pragma:
+
+@smallexample
+void foo (int n, int *a, int *b, int *c)
+@{
+  int i, j;
+#pragma GCC novector
+  for (i = 0; i < n; ++i)
+    a[i] = b[i] + c[i];
+@}
+@end smallexample
+
 @cindex pragma GCC unroll @var{n}
 @item #pragma GCC unroll @var{n}
 
diff --git a/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc b/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc
new file mode 100644
index 0000000000000000000000000000000000000000..4667935b641a06e3004904dc86c4513a78736f04
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+#include <vector>
+
+void f4 (std::vector<int> a, std::vector<int> b, int n)
+{
+    int i = 0;
+#pragma GCC novector
+    while (i < (n & -8))
+      {
+        a[i] += b[i];
+        i++;
+      }
+}
+
+
+void f5 (std::vector<int> a, std::vector<int> b, int n)
+{
+    int i = 0;
+#pragma GCC novector
+    for (auto x : b)
+      {
+        a[i] += x;
+        i++;
+      }
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c b/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c
new file mode 100644
index 0000000000000000000000000000000000000000..c4b3957711db8f78d26a32634e9bbfdc11a33302
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+void f1 (int * restrict a, int * restrict b, int n)
+{
+#pragma GCC novector
+    for (int i = 0; i < (n & -8); i++)
+      a[i] += b[i];
+}
+
+void f2 (int * restrict a, int * restrict b, int n)
+{
+    int i = 0;
+#pragma GCC novector
+    do
+      {
+        a[i] += b[i];
+        i++;
+      }
+    while (i < (n & -8));
+}
+
+void f3 (int * restrict a, int * restrict b, int n)
+{
+    int i = 0;
+#pragma GCC novector
+    while (i < (n & -8))
+      {
+        a[i] += b[i];
+        i++;
+      }
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index c48a12b378f0b3086747bee43b38e2da3f90b24d..9268a0668390192caac9efaade0a53d9359cf9a7 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1063,6 +1063,7 @@ struct GTY(()) tree_base {
       unsigned lang_flag_4 : 1;
       unsigned lang_flag_5 : 1;
       unsigned lang_flag_6 : 1;
+      unsigned lang_flag_7 : 1;
       unsigned saturating_flag : 1;
 
       unsigned unsigned_flag : 1;
@@ -1071,7 +1072,7 @@ struct GTY(()) tree_base {
       unsigned nameless_flag : 1;
       unsigned atomic_flag : 1;
       unsigned unavailable_flag : 1;
-      unsigned spare0 : 2;
+      unsigned spare0 : 1;
 
       unsigned spare1 : 8;
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 1854fe4a7d4d25b0cb55ee70402d5721f8b629ba..e96e8884bf68de77d19c95a87ae1c147460c23df 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1112,6 +1112,8 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
   (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_5)
 #define TREE_LANG_FLAG_6(NODE) \
   (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_6)
+#define TREE_LANG_FLAG_7(NODE) \
+  (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_7)
 
 /* Define additional fields and accessors for nodes representing constants.  */
 




-- 

[-- Attachment #2: rb17496.patch --]
[-- Type: text/plain, Size: 34365 bytes --]

diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c198470f1aaa0a5a9da4e576 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -87,6 +87,7 @@ enum pragma_kind {
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
   PRAGMA_UNROLL,
+  PRAGMA_NOVECTOR,
 
   PRAGMA_FIRST_EXTERNAL
 };
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1ec4b7f8ccbd599b1a88 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1862,6 +1862,10 @@ init_pragma (void)
     cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL,
 				  false, false);
 
+  if (!flag_preprocess_only)
+    cpp_register_deferred_pragma (parse_in, "GCC", "novector", PRAGMA_NOVECTOR,
+				  false, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..9d35fe68704c8aca197bcd4805a146c655959621 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement (c_parser *, bool *,
 					  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec<tree> *);
 static void c_parser_switch_statement (c_parser *, bool *);
-static void c_parser_while_statement (c_parser *, bool, unsigned short, bool *);
-static void c_parser_do_statement (c_parser *, bool, unsigned short);
-static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
+static void c_parser_while_statement (c_parser *, bool, unsigned short, bool,
+				      bool *);
+static void c_parser_do_statement (c_parser *, bool, unsigned short, bool);
+static void c_parser_for_statement (c_parser *, bool, unsigned short, bool,
+				    bool *);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -6644,13 +6646,13 @@ c_parser_statement_after_labels (c_parser *parser, bool *if_p,
 	  c_parser_switch_statement (parser, if_p);
 	  break;
 	case RID_WHILE:
-	  c_parser_while_statement (parser, false, 0, if_p);
+	  c_parser_while_statement (parser, false, 0, false, if_p);
 	  break;
 	case RID_DO:
-	  c_parser_do_statement (parser, false, 0);
+	  c_parser_do_statement (parser, false, 0, false);
 	  break;
 	case RID_FOR:
-	  c_parser_for_statement (parser, false, 0, if_p);
+	  c_parser_for_statement (parser, false, 0, false, if_p);
 	  break;
 	case RID_GOTO:
 	  c_parser_consume_token (parser);
@@ -7146,7 +7148,7 @@ c_parser_switch_statement (c_parser *parser, bool *if_p)
 
 static void
 c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
-			  bool *if_p)
+			  bool novector, bool *if_p)
 {
   tree block, cond, body;
   unsigned char save_in_statement;
@@ -7168,6 +7170,11 @@ c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
 		   build_int_cst (integer_type_node,
 				  annot_expr_unroll_kind),
 		   build_int_cst (integer_type_node, unroll));
+  if (novector && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node,
+				  annot_expr_no_vector_kind),
+		   integer_zero_node);
   save_in_statement = in_statement;
   in_statement = IN_ITERATION_STMT;
 
@@ -7199,7 +7206,8 @@ c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
 */
 
 static void
-c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll)
+c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll,
+		       bool novector)
 {
   tree block, cond, body;
   unsigned char save_in_statement;
@@ -7228,6 +7236,11 @@ c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll)
 		   build_int_cst (integer_type_node,
 				  annot_expr_unroll_kind),
  		   build_int_cst (integer_type_node, unroll));
+  if (novector && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node,
+				  annot_expr_no_vector_kind),
+		   integer_zero_node);
   if (!c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>"))
     c_parser_skip_to_end_of_block_or_statement (parser);
 
@@ -7296,7 +7309,7 @@ c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll)
 
 static void
 c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll,
-			bool *if_p)
+			bool novector, bool *if_p)
 {
   tree block, cond, incr, body;
   unsigned char save_in_statement;
@@ -7430,6 +7443,12 @@ c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll,
 					  "with %<GCC unroll%> pragma");
 		  cond = error_mark_node;
 		}
+	      else if (novector)
+		{
+		  c_parser_error (parser, "missing loop condition in loop "
+					  "with %<GCC novector%> pragma");
+		  cond = error_mark_node;
+		}
 	      else
 		{
 		  c_parser_consume_token (parser);
@@ -7452,6 +7471,11 @@ c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll,
  			   build_int_cst (integer_type_node,
 					  annot_expr_unroll_kind),
 			   build_int_cst (integer_type_node, unroll));
+	  if (novector && cond != error_mark_node)
+	    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+			   build_int_cst (integer_type_node,
+					  annot_expr_no_vector_kind),
+			   integer_zero_node);
 	}
       /* Parse the increment expression (the third expression in a
 	 for-statement).  In the case of a foreach-statement, this is
@@ -13037,6 +13061,16 @@ c_parse_pragma_ivdep (c_parser *parser)
   return true;
 }
 
+/* Parse a pragma GCC novector.  */
+
+static bool
+c_parse_pragma_novector (c_parser *parser)
+{
+  c_parser_consume_pragma (parser);
+  c_parser_skip_to_pragma_eol (parser);
+  return true;
+}
+
 /* Parse a pragma GCC unroll.  */
 
 static unsigned short
@@ -13264,11 +13298,12 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p)
     case PRAGMA_IVDEP:
       {
 	const bool ivdep = c_parse_pragma_ivdep (parser);
-	unsigned short unroll;
+	unsigned short unroll = 0;
+	bool novector = false;
 	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL)
 	  unroll = c_parser_pragma_unroll (parser);
-	else
-	  unroll = 0;
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_NOVECTOR)
+	  novector = c_parse_pragma_novector (parser);
 	if (!c_parser_next_token_is_keyword (parser, RID_FOR)
 	    && !c_parser_next_token_is_keyword (parser, RID_WHILE)
 	    && !c_parser_next_token_is_keyword (parser, RID_DO))
@@ -13277,22 +13312,48 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p)
 	    return false;
 	  }
 	if (c_parser_next_token_is_keyword (parser, RID_FOR))
-	  c_parser_for_statement (parser, ivdep, unroll, if_p);
+	  c_parser_for_statement (parser, ivdep, unroll, novector, if_p);
 	else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
-	  c_parser_while_statement (parser, ivdep, unroll, if_p);
+	  c_parser_while_statement (parser, ivdep, unroll, novector, if_p);
 	else
-	  c_parser_do_statement (parser, ivdep, unroll);
+	  c_parser_do_statement (parser, ivdep, unroll, novector);
       }
       return true;
 
     case PRAGMA_UNROLL:
       {
 	unsigned short unroll = c_parser_pragma_unroll (parser);
-	bool ivdep;
+	bool ivdep = false;
+	bool novector = false;
 	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP)
 	  ivdep = c_parse_pragma_ivdep (parser);
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_NOVECTOR)
+	  novector = c_parse_pragma_novector (parser);
+	if (!c_parser_next_token_is_keyword (parser, RID_FOR)
+	    && !c_parser_next_token_is_keyword (parser, RID_WHILE)
+	    && !c_parser_next_token_is_keyword (parser, RID_DO))
+	  {
+	    c_parser_error (parser, "for, while or do statement expected");
+	    return false;
+	  }
+	if (c_parser_next_token_is_keyword (parser, RID_FOR))
+	  c_parser_for_statement (parser, ivdep, unroll, novector, if_p);
+	else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
+	  c_parser_while_statement (parser, ivdep, unroll, novector, if_p);
 	else
-	  ivdep = false;
+	  c_parser_do_statement (parser, ivdep, unroll, novector);
+      }
+      return true;
+
+    case PRAGMA_NOVECTOR:
+      {
+	bool novector = c_parse_pragma_novector (parser);
+	unsigned short unroll = 0;
+	bool ivdep = false;
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP)
+	  ivdep = c_parse_pragma_ivdep (parser);
+	if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL)
+	  unroll = c_parser_pragma_unroll (parser);
 	if (!c_parser_next_token_is_keyword (parser, RID_FOR)
 	    && !c_parser_next_token_is_keyword (parser, RID_WHILE)
 	    && !c_parser_next_token_is_keyword (parser, RID_DO))
@@ -13301,11 +13362,11 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p)
 	    return false;
 	  }
 	if (c_parser_next_token_is_keyword (parser, RID_FOR))
-	  c_parser_for_statement (parser, ivdep, unroll, if_p);
+	  c_parser_for_statement (parser, ivdep, unroll, novector, if_p);
 	else if (c_parser_next_token_is_keyword (parser, RID_WHILE))
-	  c_parser_while_statement (parser, ivdep, unroll, if_p);
+	  c_parser_while_statement (parser, ivdep, unroll, novector, if_p);
 	else
-	  c_parser_do_statement (parser, ivdep, unroll);
+	  c_parser_do_statement (parser, ivdep, unroll, novector);
       }
       return true;
 
diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def
index 0e66ca70e00caa1dc4beada1024ace32954e2aaf..c13c8ea98a523c4ef1c55a11e02d5da9db7e367e 100644
--- a/gcc/cp/cp-tree.def
+++ b/gcc/cp/cp-tree.def
@@ -305,8 +305,8 @@ DEFTREECODE (IF_STMT, "if_stmt", tcc_statement, 4)
 
 /* Used to represent a range-based `for' statement. The operands are
    RANGE_FOR_DECL, RANGE_FOR_EXPR, RANGE_FOR_BODY, RANGE_FOR_SCOPE,
-   RANGE_FOR_UNROLL, and RANGE_FOR_INIT_STMT, respectively.  Only used in
-   templates.  */
+   RANGE_FOR_UNROLL, RANGE_FOR_NOVECTOR and RANGE_FOR_INIT_STMT,
+   respectively.  Only used in templates.  */
 DEFTREECODE (RANGE_FOR_STMT, "range_for_stmt", tcc_statement, 6)
 
 /* Used to represent an expression statement.  Use `EXPR_STMT_EXPR' to
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 8398223311194837441107cb335d497ff5f5ec1c..50b0f20817a168b5e9ac58db59ad44233f079e11 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5377,6 +5377,7 @@ get_vec_init_expr (tree t)
 #define RANGE_FOR_UNROLL(NODE)	TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 4)
 #define RANGE_FOR_INIT_STMT(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 5)
 #define RANGE_FOR_IVDEP(NODE)	TREE_LANG_FLAG_6 (RANGE_FOR_STMT_CHECK (NODE))
+#define RANGE_FOR_NOVECTOR(NODE) TREE_LANG_FLAG_7 (RANGE_FOR_STMT_CHECK (NODE))
 
 /* STMT_EXPR accessor.  */
 #define STMT_EXPR_STMT(NODE)	TREE_OPERAND (STMT_EXPR_CHECK (NODE), 0)
@@ -7286,7 +7287,7 @@ extern bool maybe_clone_body			(tree);
 
 /* In parser.cc */
 extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool,
-				  unsigned short);
+				  unsigned short, bool);
 extern void cp_convert_omp_range_for (tree &, vec<tree, va_gc> *, tree &,
 				      tree &, tree &, tree &, tree &, tree &);
 extern void cp_finish_omp_range_for (tree, tree);
@@ -7609,16 +7610,19 @@ extern void begin_else_clause			(tree);
 extern void finish_else_clause			(tree);
 extern void finish_if_stmt			(tree);
 extern tree begin_while_stmt			(void);
-extern void finish_while_stmt_cond	(tree, tree, bool, unsigned short);
+extern void finish_while_stmt_cond	(tree, tree, bool, unsigned short,
+					 bool);
 extern void finish_while_stmt			(tree);
 extern tree begin_do_stmt			(void);
 extern void finish_do_body			(tree);
-extern void finish_do_stmt		(tree, tree, bool, unsigned short);
+extern void finish_do_stmt		(tree, tree, bool, unsigned short,
+					 bool);
 extern tree finish_return_stmt			(tree);
 extern tree begin_for_scope			(tree *);
 extern tree begin_for_stmt			(tree, tree);
 extern void finish_init_stmt			(tree);
-extern void finish_for_cond		(tree, tree, bool, unsigned short);
+extern void finish_for_cond		(tree, tree, bool, unsigned short,
+					 bool);
 extern void finish_for_expr			(tree, tree);
 extern void finish_for_stmt			(tree);
 extern tree begin_range_for_stmt		(tree, tree);
diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index af6e30f511e142c7a594e742d128b2bf0aa8fb8d..5b735b27e6f5bc6b439ae64665902f4f1ca76f95 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4846,7 +4846,7 @@ build_vec_init (tree base, tree maxindex, tree init,
       finish_init_stmt (for_stmt);
       finish_for_cond (build2 (GT_EXPR, boolean_type_node, iterator,
 			       build_int_cst (TREE_TYPE (iterator), -1)),
-		       for_stmt, false, 0);
+		       for_stmt, false, 0, false);
       /* We used to pass this decrement to finish_for_expr; now we add it to
 	 elt_init below so it's part of the same full-expression as the
 	 initialization, and thus happens before any potentially throwing
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 91cf943f11089c0e6bcbe8377daa4e016f956d56..fce49c796199c2c65cd70684e2942fea1b6b2ebd 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -1645,7 +1645,8 @@ build_comparison_op (tree fndecl, bool defining, tsubst_flags_t complain)
 		      add_stmt (idx);
 		      finish_init_stmt (for_stmt);
 		      finish_for_cond (build2 (LE_EXPR, boolean_type_node, idx,
-					       maxval), for_stmt, false, 0);
+					       maxval), for_stmt, false, 0,
+					       false);
 		      finish_for_expr (cp_build_unary_op (PREINCREMENT_EXPR,
 							  TARGET_EXPR_SLOT (idx),
 							  false, complain),
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index dd3665c8ccf48a8a0b1ba2c06400fe50999ea240..0bc110121d51ee13258b7ff0e4ad7851b4eae78e 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2324,15 +2324,15 @@ static tree cp_parser_selection_statement
 static tree cp_parser_condition
   (cp_parser *);
 static tree cp_parser_iteration_statement
-  (cp_parser *, bool *, bool, unsigned short);
+  (cp_parser *, bool *, bool, unsigned short, bool);
 static bool cp_parser_init_statement
   (cp_parser *, tree *decl);
 static tree cp_parser_for
-  (cp_parser *, bool, unsigned short);
+  (cp_parser *, bool, unsigned short, bool);
 static tree cp_parser_c_for
-  (cp_parser *, tree, tree, bool, unsigned short);
+  (cp_parser *, tree, tree, bool, unsigned short, bool);
 static tree cp_parser_range_for
-  (cp_parser *, tree, tree, tree, bool, unsigned short, bool);
+  (cp_parser *, tree, tree, tree, bool, unsigned short, bool, bool);
 static void do_range_for_auto_deduction
   (tree, tree, tree, unsigned int);
 static tree cp_parser_perform_range_for_lookup
@@ -12414,7 +12414,8 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr,
 	case RID_DO:
 	case RID_FOR:
 	  std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc);
-	  statement = cp_parser_iteration_statement (parser, if_p, false, 0);
+	  statement = cp_parser_iteration_statement (parser, if_p, false, 0,
+						     false);
 	  break;
 
 	case RID_BREAK:
@@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser)
    not included. */
 
 static tree
-cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
+cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll,
+	       bool novector)
 {
   tree init, scope, decl;
   bool is_range_for;
@@ -13624,14 +13626,14 @@ cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
 
   if (is_range_for)
     return cp_parser_range_for (parser, scope, init, decl, ivdep, unroll,
-				false);
+				novector, false);
   else
-    return cp_parser_c_for (parser, scope, init, ivdep, unroll);
+    return cp_parser_c_for (parser, scope, init, ivdep, unroll, novector);
 }
 
 static tree
 cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep,
-		 unsigned short unroll)
+		 unsigned short unroll, bool novector)
 {
   /* Normal for loop */
   tree condition = NULL_TREE;
@@ -13658,7 +13660,13 @@ cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep,
 		       "%<GCC unroll%> pragma");
       condition = error_mark_node;
     }
-  finish_for_cond (condition, stmt, ivdep, unroll);
+  else if (novector)
+    {
+      cp_parser_error (parser, "missing loop condition in loop with "
+		       "%<GCC novector%> pragma");
+      condition = error_mark_node;
+    }
+  finish_for_cond (condition, stmt, ivdep, unroll, novector);
   /* Look for the `;'.  */
   cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
 
@@ -13682,7 +13690,8 @@ cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep,
 
 static tree
 cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl,
-		     bool ivdep, unsigned short unroll, bool is_omp)
+		     bool ivdep, unsigned short unroll, bool novector,
+		     bool is_omp)
 {
   tree stmt, range_expr;
   auto_vec <cxx_binding *, 16> bindings;
@@ -13758,6 +13767,8 @@ cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl,
 	RANGE_FOR_IVDEP (stmt) = 1;
       if (unroll)
 	RANGE_FOR_UNROLL (stmt) = build_int_cst (integer_type_node, unroll);
+      if (novector)
+	RANGE_FOR_NOVECTOR (stmt) = 1;
       finish_range_for_decl (stmt, range_decl, range_expr);
       if (!type_dependent_expression_p (range_expr)
 	  /* do_auto_deduction doesn't mess with template init-lists.  */
@@ -13770,7 +13781,7 @@ cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl,
       stmt = begin_for_stmt (scope, init);
       stmt = cp_convert_range_for (stmt, range_decl, range_expr,
 				   decomp_first_name, decomp_cnt, ivdep,
-				   unroll);
+				   unroll, novector);
     }
   return stmt;
 }
@@ -13948,7 +13959,7 @@ warn_for_range_copy (tree decl, tree expr)
 tree
 cp_convert_range_for (tree statement, tree range_decl, tree range_expr,
 		      tree decomp_first_name, unsigned int decomp_cnt,
-		      bool ivdep, unsigned short unroll)
+		      bool ivdep, unsigned short unroll, bool novector)
 {
   tree begin, end;
   tree iter_type, begin_expr, end_expr;
@@ -14008,7 +14019,7 @@ cp_convert_range_for (tree statement, tree range_decl, tree range_expr,
 				 begin, ERROR_MARK,
 				 end, ERROR_MARK,
 				 NULL_TREE, NULL, tf_warning_or_error);
-  finish_for_cond (condition, statement, ivdep, unroll);
+  finish_for_cond (condition, statement, ivdep, unroll, novector);
 
   /* The new increment expression.  */
   expression = finish_unary_op_expr (input_location,
@@ -14175,7 +14186,7 @@ cp_parser_range_for_member_function (tree range, tree identifier)
 
 static tree
 cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep,
-			       unsigned short unroll)
+			       unsigned short unroll, bool novector)
 {
   cp_token *token;
   enum rid keyword;
@@ -14209,7 +14220,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep,
 	parens.require_open (parser);
 	/* Parse the condition.  */
 	condition = cp_parser_condition (parser);
-	finish_while_stmt_cond (condition, statement, ivdep, unroll);
+	finish_while_stmt_cond (condition, statement, ivdep, unroll, novector);
 	/* Look for the `)'.  */
 	parens.require_close (parser);
 	/* Parse the dependent statement.  */
@@ -14244,7 +14255,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep,
 	/* Parse the expression.  */
 	expression = cp_parser_expression (parser);
 	/* We're done with the do-statement.  */
-	finish_do_stmt (expression, statement, ivdep, unroll);
+	finish_do_stmt (expression, statement, ivdep, unroll, novector);
 	/* Look for the `)'.  */
 	parens.require_close (parser);
 	/* Look for the `;'.  */
@@ -14258,7 +14269,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep,
 	matching_parens parens;
 	parens.require_open (parser);
 
-	statement = cp_parser_for (parser, ivdep, unroll);
+	statement = cp_parser_for (parser, ivdep, unroll, novector);
 
 	/* Look for the `)'.  */
 	parens.require_close (parser);
@@ -43815,7 +43826,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
 	      cp_parser_require (parser, CPP_COLON, RT_COLON);
 
 	      init = cp_parser_range_for (parser, NULL_TREE, NULL_TREE, decl,
-					  false, 0, true);
+					  false, 0, true, false);
 
 	      cp_convert_omp_range_for (this_pre_body, for_block, decl,
 					orig_decl, init, orig_init,
@@ -49300,6 +49311,15 @@ cp_parser_pragma_unroll (cp_parser *parser, cp_token *pragma_tok)
   return unroll;
 }
 
+/* Parse a pragma GCC novector.  */
+
+static bool
+cp_parser_pragma_novector (cp_parser *parser, cp_token *pragma_tok)
+{
+  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+  return true;
+}
+
 /* Normal parsing of a pragma token.  Here we can (and must) use the
    regular lexer.  */
 
@@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
 	    break;
 	  }
 	const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
-	unsigned short unroll;
+	unsigned short unroll = 0;
+	bool novector = false;
 	cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
-	if (tok->type == CPP_PRAGMA
-	    && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
+
+	while (tok->type == CPP_PRAGMA)
 	  {
-	    tok = cp_lexer_consume_token (parser->lexer);
-	    unroll = cp_parser_pragma_unroll (parser, tok);
-	    tok = cp_lexer_peek_token (the_parser->lexer);
+	    switch (cp_parser_pragma_kind (tok))
+	      {
+		case PRAGMA_UNROLL:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    unroll = cp_parser_pragma_unroll (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		case PRAGMA_NOVECTOR:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    novector = cp_parser_pragma_novector (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		default:
+		  gcc_unreachable ();
+	      }
 	  }
-	else
-	  unroll = 0;
+
 	if (tok->type != CPP_KEYWORD
 	    || (tok->keyword != RID_FOR
 		&& tok->keyword != RID_WHILE
@@ -49632,7 +49668,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
 	    cp_parser_error (parser, "for, while or do statement expected");
 	    return false;
 	  }
-	cp_parser_iteration_statement (parser, if_p, ivdep, unroll);
+	cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector);
 	return true;
       }
 
@@ -49646,17 +49682,82 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
 	  }
 	const unsigned short unroll
 	  = cp_parser_pragma_unroll (parser, pragma_tok);
-	bool ivdep;
+	bool ivdep = false;
+	bool novector = false;
 	cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
-	if (tok->type == CPP_PRAGMA
-	    && cp_parser_pragma_kind (tok) == PRAGMA_IVDEP)
+
+	while (tok->type == CPP_PRAGMA)
 	  {
-	    tok = cp_lexer_consume_token (parser->lexer);
-	    ivdep = cp_parser_pragma_ivdep (parser, tok);
-	    tok = cp_lexer_peek_token (the_parser->lexer);
+	    switch (cp_parser_pragma_kind (tok))
+	      {
+		case PRAGMA_IVDEP:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    ivdep = cp_parser_pragma_ivdep (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		case PRAGMA_NOVECTOR:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    novector = cp_parser_pragma_novector (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		default:
+		  gcc_unreachable ();
+	      }
 	  }
-	else
-	  ivdep = false;
+
+	if (tok->type != CPP_KEYWORD
+	    || (tok->keyword != RID_FOR
+		&& tok->keyword != RID_WHILE
+		&& tok->keyword != RID_DO))
+	  {
+	    cp_parser_error (parser, "for, while or do statement expected");
+	    return false;
+	  }
+	cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector);
+	return true;
+      }
+
+    case PRAGMA_NOVECTOR:
+      {
+	if (context == pragma_external)
+	  {
+	    error_at (pragma_tok->location,
+		      "%<#pragma GCC novector%> must be inside a function");
+	    break;
+	  }
+	const bool novector
+	  = cp_parser_pragma_novector (parser, pragma_tok);
+	bool ivdep = false;
+	unsigned short unroll;
+	cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
+
+	while (tok->type == CPP_PRAGMA)
+	  {
+	    switch (cp_parser_pragma_kind (tok))
+	      {
+		case PRAGMA_IVDEP:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    ivdep = cp_parser_pragma_ivdep (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		case PRAGMA_UNROLL:
+		  {
+		    tok = cp_lexer_consume_token (parser->lexer);
+		    unroll = cp_parser_pragma_unroll (parser, tok);
+		    tok = cp_lexer_peek_token (the_parser->lexer);
+		    break;
+		  }
+		default:
+		  gcc_unreachable ();
+	      }
+	  }
+
 	if (tok->type != CPP_KEYWORD
 	    || (tok->keyword != RID_FOR
 		&& tok->keyword != RID_WHILE
@@ -49665,7 +49766,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
 	    cp_parser_error (parser, "for, while or do statement expected");
 	    return false;
 	  }
-	cp_parser_iteration_statement (parser, if_p, ivdep, unroll);
+	cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector);
 	return true;
       }
 
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 2345a18becc1160b9d12f3d88cccb66c8917373c..7b0d01a90e3c4012ec603ebe04cbbb31a7dd1570 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -19036,7 +19036,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
       RECUR (FOR_INIT_STMT (t));
       finish_init_stmt (stmt);
       tmp = RECUR (FOR_COND (t));
-      finish_for_cond (tmp, stmt, false, 0);
+      finish_for_cond (tmp, stmt, false, 0, false);
       tmp = RECUR (FOR_EXPR (t));
       finish_for_expr (tmp, stmt);
       {
@@ -19073,6 +19073,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
 	  {
 	    RANGE_FOR_IVDEP (stmt) = RANGE_FOR_IVDEP (t);
 	    RANGE_FOR_UNROLL (stmt) = RANGE_FOR_UNROLL (t);
+	    RANGE_FOR_NOVECTOR (stmt) = RANGE_FOR_NOVECTOR (t);
 	    finish_range_for_decl (stmt, decl, expr);
 	    if (decomp_first && decl != error_mark_node)
 	      cp_finish_decomp (decl, decomp_first, decomp_cnt);
@@ -19083,7 +19084,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
 				     ? tree_to_uhwi (RANGE_FOR_UNROLL (t)) : 0);
 	    stmt = cp_convert_range_for (stmt, decl, expr,
 					 decomp_first, decomp_cnt,
-					 RANGE_FOR_IVDEP (t), unroll);
+					 RANGE_FOR_IVDEP (t), unroll,
+					 RANGE_FOR_NOVECTOR (t));
 	  }
 
 	bool prev = note_iteration_stmt_body_start ();
@@ -19096,7 +19098,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
     case WHILE_STMT:
       stmt = begin_while_stmt ();
       tmp = RECUR (WHILE_COND (t));
-      finish_while_stmt_cond (tmp, stmt, false, 0);
+      finish_while_stmt_cond (tmp, stmt, false, 0, false);
       {
 	bool prev = note_iteration_stmt_body_start ();
 	RECUR (WHILE_BODY (t));
@@ -19114,7 +19116,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
       }
       finish_do_body (stmt);
       tmp = RECUR (DO_COND (t));
-      finish_do_stmt (tmp, stmt, false, 0);
+      finish_do_stmt (tmp, stmt, false, 0, false);
       break;
 
     case IF_STMT:
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179eb2af2e82bf31d188023e9b9d41de9..b79975109c22ebcfcb060b4f20f32f69f3c3c444 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -1148,7 +1148,7 @@ begin_while_stmt (void)
 
 void
 finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep,
-			unsigned short unroll)
+			unsigned short unroll, bool novector)
 {
   cond = maybe_convert_cond (cond);
   finish_cond (&WHILE_COND (while_stmt), cond);
@@ -1168,6 +1168,13 @@ finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep,
 						     annot_expr_unroll_kind),
 				      build_int_cst (integer_type_node,
 						     unroll));
+  if (novector && cond != error_mark_node)
+    WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR,
+				      TREE_TYPE (WHILE_COND (while_stmt)),
+				      WHILE_COND (while_stmt),
+				      build_int_cst (integer_type_node,
+						     annot_expr_no_vector_kind),
+				      integer_zero_node);
   simplify_loop_decl_cond (&WHILE_COND (while_stmt), WHILE_BODY (while_stmt));
 }
 
@@ -1212,7 +1219,8 @@ finish_do_body (tree do_stmt)
    COND is as indicated.  */
 
 void
-finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll)
+finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll,
+		bool novector)
 {
   cond = maybe_convert_cond (cond);
   end_maybe_infinite_loop (cond);
@@ -1229,6 +1237,10 @@ finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll)
     cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		   build_int_cst (integer_type_node, annot_expr_unroll_kind),
 		   build_int_cst (integer_type_node, unroll));
+  if (novector && cond != error_mark_node)
+    cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+		   build_int_cst (integer_type_node, annot_expr_no_vector_kind),
+		   integer_zero_node);
   DO_COND (do_stmt) = cond;
 }
 
@@ -1325,7 +1337,7 @@ finish_init_stmt (tree for_stmt)
    FOR_STMT.  */
 
 void
-finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll)
+finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll, bool novector)
 {
   cond = maybe_convert_cond (cond);
   finish_cond (&FOR_COND (for_stmt), cond);
@@ -1345,6 +1357,13 @@ finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll)
 						 annot_expr_unroll_kind),
 				  build_int_cst (integer_type_node,
 						 unroll));
+  if (novector && cond != error_mark_node)
+    FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR,
+				  TREE_TYPE (FOR_COND (for_stmt)),
+				  FOR_COND (for_stmt),
+				  build_int_cst (integer_type_node,
+						 annot_expr_no_vector_kind),
+				  integer_zero_node);
   simplify_loop_decl_cond (&FOR_COND (for_stmt), FOR_BODY (for_stmt));
 }
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 3040a9bdea65d27f8d20572b4ed37375f5fe949b..baac6643d1abbf33d592e68aca49ac83e3c29188 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -24349,6 +24349,25 @@ void ignore_vec_dep (int *a, int k, int c, int m)
 @}
 @end smallexample
 
+@cindex pragma GCC novector
+@item #pragma GCC novector
+
+With this pragma, the programmer asserts that the following loop should be
+prevented from executing concurrently with SIMD (single instruction multiple
+data) instructions.
+
+For example, the compiler cannot vectorize the following loop with the pragma:
+
+@smallexample
+void foo (int n, int *a, int *b, int *c)
+@{
+  int i, j;
+#pragma GCC novector
+  for (i = 0; i < n; ++i)
+    a[i] = b[i] + c[i];
+@}
+@end smallexample
+
 @cindex pragma GCC unroll @var{n}
 @item #pragma GCC unroll @var{n}
 
diff --git a/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc b/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc
new file mode 100644
index 0000000000000000000000000000000000000000..4667935b641a06e3004904dc86c4513a78736f04
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+#include <vector>
+
+void f4 (std::vector<int> a, std::vector<int> b, int n)
+{
+    int i = 0;
+#pragma GCC novector
+    while (i < (n & -8))
+      {
+        a[i] += b[i];
+        i++;
+      }
+}
+
+
+void f5 (std::vector<int> a, std::vector<int> b, int n)
+{
+    int i = 0;
+#pragma GCC novector
+    for (auto x : b)
+      {
+        a[i] += x;
+        i++;
+      }
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c b/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c
new file mode 100644
index 0000000000000000000000000000000000000000..c4b3957711db8f78d26a32634e9bbfdc11a33302
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+void f1 (int * restrict a, int * restrict b, int n)
+{
+#pragma GCC novector
+    for (int i = 0; i < (n & -8); i++)
+      a[i] += b[i];
+}
+
+void f2 (int * restrict a, int * restrict b, int n)
+{
+    int i = 0;
+#pragma GCC novector
+    do
+      {
+        a[i] += b[i];
+        i++;
+      }
+    while (i < (n & -8));
+}
+
+void f3 (int * restrict a, int * restrict b, int n)
+{
+    int i = 0;
+#pragma GCC novector
+    while (i < (n & -8))
+      {
+        a[i] += b[i];
+        i++;
+      }
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index c48a12b378f0b3086747bee43b38e2da3f90b24d..9268a0668390192caac9efaade0a53d9359cf9a7 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1063,6 +1063,7 @@ struct GTY(()) tree_base {
       unsigned lang_flag_4 : 1;
       unsigned lang_flag_5 : 1;
       unsigned lang_flag_6 : 1;
+      unsigned lang_flag_7 : 1;
       unsigned saturating_flag : 1;
 
       unsigned unsigned_flag : 1;
@@ -1071,7 +1072,7 @@ struct GTY(()) tree_base {
       unsigned nameless_flag : 1;
       unsigned atomic_flag : 1;
       unsigned unavailable_flag : 1;
-      unsigned spare0 : 2;
+      unsigned spare0 : 1;
 
       unsigned spare1 : 8;
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 1854fe4a7d4d25b0cb55ee70402d5721f8b629ba..e96e8884bf68de77d19c95a87ae1c147460c23df 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1112,6 +1112,8 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
   (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_5)
 #define TREE_LANG_FLAG_6(NODE) \
   (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_6)
+#define TREE_LANG_FLAG_7(NODE) \
+  (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_7)
 
 /* Define additional fields and accessors for nodes representing constants.  */
 




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 3/19]middle-end clean up vect testsuite using pragma novector
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
  2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
  2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
@ 2023-06-28 13:42 ` Tamar Christina
  2023-06-28 13:54   ` Tamar Christina
  2023-07-04 11:31   ` Richard Biener
  2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
                   ` (39 subsequent siblings)
  42 siblings, 2 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 406562 bytes --]

Hi All,

The support for early break vectorization breaks lots of scan vect and slp
testcases because they assume that loops with abort () in them cannot be
vectorized.  Additionally it breaks the point of having a scalar loop to check
the output of the vectorizer if that loop is also vectorized.

For that reason this adds

#pragma GCC novector to all tests which have a scalar loop that we would have
vectorized using this patch series.

FWIW, none of these tests were failing to vectorize or run before the pragma.
The tests that did point to some issues were copies to the early break test
suit as well.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

	* g++.dg/vect/pr84556.cc: Add novector pragma.
	* g++.dg/vect/simd-1.cc: Add novector pragma.
	* g++.dg/vect/simd-2.cc: Add novector pragma.
	* g++.dg/vect/simd-3.cc: Add novector pragma.
	* g++.dg/vect/simd-4.cc: Add novector pragma.
	* g++.dg/vect/simd-5.cc: Add novector pragma.
	* g++.dg/vect/simd-6.cc: Add novector pragma.
	* g++.dg/vect/simd-7.cc: Add novector pragma.
	* g++.dg/vect/simd-8.cc: Add novector pragma.
	* g++.dg/vect/simd-9.cc: Add novector pragma.
	* g++.dg/vect/simd-clone-6.cc: Add novector pragma.
	* gcc.dg/vect/O3-pr70130.c: Add novector pragma.
	* gcc.dg/vect/Os-vect-95.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-1.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-16.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-2.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-24.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-25.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-26.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-27.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-28.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-29.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-42.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-cond-1.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-over-widen-1.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-over-widen-2.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-pattern-1.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-pattern-2.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-pow-1.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-pr101615-2.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-pr65935.c: Add novector pragma.
	* gcc.dg/vect/bb-slp-subgroups-1.c: Add novector pragma.
	* gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: Add novector pragma.
	* gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: Add novector pragma.
	* gcc.dg/vect/costmodel/i386/costmodel-vect-68.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c: Add novector pragma.
	* gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: Add novector pragma.
	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: Add novector pragma.
	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: Add novector pragma.
	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c: Add novector pragma.
	* gcc.dg/vect/fast-math-bb-slp-call-1.c: Add novector pragma.
	* gcc.dg/vect/fast-math-bb-slp-call-2.c: Add novector pragma.
	* gcc.dg/vect/fast-math-vect-call-1.c: Add novector pragma.
	* gcc.dg/vect/fast-math-vect-call-2.c: Add novector pragma.
	* gcc.dg/vect/fast-math-vect-complex-3.c: Add novector pragma.
	* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-noreassoc-outer-1.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-noreassoc-outer-2.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-noreassoc-outer-3.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-noreassoc-outer-5.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-10.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-10a.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-10b.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-11.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-12.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-15.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-16.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-17.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-18.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-19.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-20.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-21.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-22.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-3.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-4.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-5.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-6-global.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-6.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-7.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-8.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-9.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-9a.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-outer-9b.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-slp-30.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-slp-31.c: Add novector pragma.
	* gcc.dg/vect/no-scevccp-vect-iv-2.c: Add novector pragma.
	* gcc.dg/vect/no-section-anchors-vect-31.c: Add novector pragma.
	* gcc.dg/vect/no-section-anchors-vect-34.c: Add novector pragma.
	* gcc.dg/vect/no-section-anchors-vect-36.c: Add novector pragma.
	* gcc.dg/vect/no-section-anchors-vect-64.c: Add novector pragma.
	* gcc.dg/vect/no-section-anchors-vect-65.c: Add novector pragma.
	* gcc.dg/vect/no-section-anchors-vect-66.c: Add novector pragma.
	* gcc.dg/vect/no-section-anchors-vect-68.c: Add novector pragma.
	* gcc.dg/vect/no-section-anchors-vect-69.c: Add novector pragma.
	* gcc.dg/vect/no-section-anchors-vect-outer-4h.c: Add novector pragma.
	* gcc.dg/vect/no-trapping-math-2.c: Add novector pragma.
	* gcc.dg/vect/no-trapping-math-vect-111.c: Add novector pragma.
	* gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c: Add novector pragma.
	* gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c: Add novector pragma.
	* gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c: Add novector pragma.
	* gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c: Add novector pragma.
	* gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c: Add novector pragma.
	* gcc.dg/vect/no-tree-dom-vect-bug.c: Add novector pragma.
	* gcc.dg/vect/no-tree-pre-slp-29.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-pr29145.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-101.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-102.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-102a.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-37.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-43.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-45.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-49.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-51.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-53.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-57.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-61.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-79.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-depend-1.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-depend-2.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-depend-3.c: Add novector pragma.
	* gcc.dg/vect/no-vfa-vect-dv-2.c: Add novector pragma.
	* gcc.dg/vect/pr101445.c: Add novector pragma.
	* gcc.dg/vect/pr103581.c: Add novector pragma.
	* gcc.dg/vect/pr105219.c: Add novector pragma.
	* gcc.dg/vect/pr108608.c: Add novector pragma.
	* gcc.dg/vect/pr18400.c: Add novector pragma.
	* gcc.dg/vect/pr18536.c: Add novector pragma.
	* gcc.dg/vect/pr20122.c: Add novector pragma.
	* gcc.dg/vect/pr25413.c: Add novector pragma.
	* gcc.dg/vect/pr30784.c: Add novector pragma.
	* gcc.dg/vect/pr37539.c: Add novector pragma.
	* gcc.dg/vect/pr40074.c: Add novector pragma.
	* gcc.dg/vect/pr45752.c: Add novector pragma.
	* gcc.dg/vect/pr45902.c: Add novector pragma.
	* gcc.dg/vect/pr46009.c: Add novector pragma.
	* gcc.dg/vect/pr48172.c: Add novector pragma.
	* gcc.dg/vect/pr51074.c: Add novector pragma.
	* gcc.dg/vect/pr51581-3.c: Add novector pragma.
	* gcc.dg/vect/pr51581-4.c: Add novector pragma.
	* gcc.dg/vect/pr53185-2.c: Add novector pragma.
	* gcc.dg/vect/pr56826.c: Add novector pragma.
	* gcc.dg/vect/pr56918.c: Add novector pragma.
	* gcc.dg/vect/pr56920.c: Add novector pragma.
	* gcc.dg/vect/pr56933.c: Add novector pragma.
	* gcc.dg/vect/pr57705.c: Add novector pragma.
	* gcc.dg/vect/pr57741-2.c: Add novector pragma.
	* gcc.dg/vect/pr57741-3.c: Add novector pragma.
	* gcc.dg/vect/pr59591-1.c: Add novector pragma.
	* gcc.dg/vect/pr59591-2.c: Add novector pragma.
	* gcc.dg/vect/pr59594.c: Add novector pragma.
	* gcc.dg/vect/pr59984.c: Add novector pragma.
	* gcc.dg/vect/pr60276.c: Add novector pragma.
	* gcc.dg/vect/pr61194.c: Add novector pragma.
	* gcc.dg/vect/pr61680.c: Add novector pragma.
	* gcc.dg/vect/pr62021.c: Add novector pragma.
	* gcc.dg/vect/pr63341-2.c: Add novector pragma.
	* gcc.dg/vect/pr64252.c: Add novector pragma.
	* gcc.dg/vect/pr64404.c: Add novector pragma.
	* gcc.dg/vect/pr64421.c: Add novector pragma.
	* gcc.dg/vect/pr64493.c: Add novector pragma.
	* gcc.dg/vect/pr64495.c: Add novector pragma.
	* gcc.dg/vect/pr66251.c: Add novector pragma.
	* gcc.dg/vect/pr66253.c: Add novector pragma.
	* gcc.dg/vect/pr68502-1.c: Add novector pragma.
	* gcc.dg/vect/pr68502-2.c: Add novector pragma.
	* gcc.dg/vect/pr69820.c: Add novector pragma.
	* gcc.dg/vect/pr70021.c: Add novector pragma.
	* gcc.dg/vect/pr70354-1.c: Add novector pragma.
	* gcc.dg/vect/pr70354-2.c: Add novector pragma.
	* gcc.dg/vect/pr71259.c: Add novector pragma.
	* gcc.dg/vect/pr78005.c: Add novector pragma.
	* gcc.dg/vect/pr78558.c: Add novector pragma.
	* gcc.dg/vect/pr80815-2.c: Add novector pragma.
	* gcc.dg/vect/pr80815-3.c: Add novector pragma.
	* gcc.dg/vect/pr80928.c: Add novector pragma.
	* gcc.dg/vect/pr81410.c: Add novector pragma.
	* gcc.dg/vect/pr81633.c: Add novector pragma.
	* gcc.dg/vect/pr81740-1.c: Add novector pragma.
	* gcc.dg/vect/pr81740-2.c: Add novector pragma.
	* gcc.dg/vect/pr85586.c: Add novector pragma.
	* gcc.dg/vect/pr87288-1.c: Add novector pragma.
	* gcc.dg/vect/pr87288-2.c: Add novector pragma.
	* gcc.dg/vect/pr87288-3.c: Add novector pragma.
	* gcc.dg/vect/pr88903-1.c: Add novector pragma.
	* gcc.dg/vect/pr88903-2.c: Add novector pragma.
	* gcc.dg/vect/pr90018.c: Add novector pragma.
	* gcc.dg/vect/pr92420.c: Add novector pragma.
	* gcc.dg/vect/pr94994.c: Add novector pragma.
	* gcc.dg/vect/pr96783-1.c: Add novector pragma.
	* gcc.dg/vect/pr96783-2.c: Add novector pragma.
	* gcc.dg/vect/pr97081-2.c: Add novector pragma.
	* gcc.dg/vect/pr97558-2.c: Add novector pragma.
	* gcc.dg/vect/pr97678.c: Add novector pragma.
	* gcc.dg/vect/section-anchors-pr27770.c: Add novector pragma.
	* gcc.dg/vect/section-anchors-vect-69.c: Add novector pragma.
	* gcc.dg/vect/slp-1.c: Add novector pragma.
	* gcc.dg/vect/slp-10.c: Add novector pragma.
	* gcc.dg/vect/slp-11a.c: Add novector pragma.
	* gcc.dg/vect/slp-11b.c: Add novector pragma.
	* gcc.dg/vect/slp-11c.c: Add novector pragma.
	* gcc.dg/vect/slp-12a.c: Add novector pragma.
	* gcc.dg/vect/slp-12b.c: Add novector pragma.
	* gcc.dg/vect/slp-12c.c: Add novector pragma.
	* gcc.dg/vect/slp-13-big-array.c: Add novector pragma.
	* gcc.dg/vect/slp-13.c: Add novector pragma.
	* gcc.dg/vect/slp-14.c: Add novector pragma.
	* gcc.dg/vect/slp-15.c: Add novector pragma.
	* gcc.dg/vect/slp-16.c: Add novector pragma.
	* gcc.dg/vect/slp-17.c: Add novector pragma.
	* gcc.dg/vect/slp-18.c: Add novector pragma.
	* gcc.dg/vect/slp-19a.c: Add novector pragma.
	* gcc.dg/vect/slp-19b.c: Add novector pragma.
	* gcc.dg/vect/slp-19c.c: Add novector pragma.
	* gcc.dg/vect/slp-2.c: Add novector pragma.
	* gcc.dg/vect/slp-20.c: Add novector pragma.
	* gcc.dg/vect/slp-21.c: Add novector pragma.
	* gcc.dg/vect/slp-22.c: Add novector pragma.
	* gcc.dg/vect/slp-23.c: Add novector pragma.
	* gcc.dg/vect/slp-24-big-array.c: Add novector pragma.
	* gcc.dg/vect/slp-24.c: Add novector pragma.
	* gcc.dg/vect/slp-25.c: Add novector pragma.
	* gcc.dg/vect/slp-26.c: Add novector pragma.
	* gcc.dg/vect/slp-28.c: Add novector pragma.
	* gcc.dg/vect/slp-3-big-array.c: Add novector pragma.
	* gcc.dg/vect/slp-3.c: Add novector pragma.
	* gcc.dg/vect/slp-33.c: Add novector pragma.
	* gcc.dg/vect/slp-34-big-array.c: Add novector pragma.
	* gcc.dg/vect/slp-34.c: Add novector pragma.
	* gcc.dg/vect/slp-35.c: Add novector pragma.
	* gcc.dg/vect/slp-37.c: Add novector pragma.
	* gcc.dg/vect/slp-4-big-array.c: Add novector pragma.
	* gcc.dg/vect/slp-4.c: Add novector pragma.
	* gcc.dg/vect/slp-41.c: Add novector pragma.
	* gcc.dg/vect/slp-43.c: Add novector pragma.
	* gcc.dg/vect/slp-45.c: Add novector pragma.
	* gcc.dg/vect/slp-46.c: Add novector pragma.
	* gcc.dg/vect/slp-47.c: Add novector pragma.
	* gcc.dg/vect/slp-48.c: Add novector pragma.
	* gcc.dg/vect/slp-49.c: Add novector pragma.
	* gcc.dg/vect/slp-5.c: Add novector pragma.
	* gcc.dg/vect/slp-6.c: Add novector pragma.
	* gcc.dg/vect/slp-7.c: Add novector pragma.
	* gcc.dg/vect/slp-8.c: Add novector pragma.
	* gcc.dg/vect/slp-9.c: Add novector pragma.
	* gcc.dg/vect/slp-cond-1.c: Add novector pragma.
	* gcc.dg/vect/slp-cond-2-big-array.c: Add novector pragma.
	* gcc.dg/vect/slp-cond-2.c: Add novector pragma.
	* gcc.dg/vect/slp-cond-3.c: Add novector pragma.
	* gcc.dg/vect/slp-cond-4.c: Add novector pragma.
	* gcc.dg/vect/slp-cond-5.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-1.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-10.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-11-big-array.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-11.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-12.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-2.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-3.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-4.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-5.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-6.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-7.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-8.c: Add novector pragma.
	* gcc.dg/vect/slp-multitypes-9.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-1.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-10.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-11.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-12.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-2.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-3.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-4.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-5.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-6.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-7.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-8.c: Add novector pragma.
	* gcc.dg/vect/slp-perm-9.c: Add novector pragma.
	* gcc.dg/vect/slp-widen-mult-half.c: Add novector pragma.
	* gcc.dg/vect/slp-widen-mult-s16.c: Add novector pragma.
	* gcc.dg/vect/slp-widen-mult-u8.c: Add novector pragma.
	* gcc.dg/vect/vect-100.c: Add novector pragma.
	* gcc.dg/vect/vect-103.c: Add novector pragma.
	* gcc.dg/vect/vect-104.c: Add novector pragma.
	* gcc.dg/vect/vect-105-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-105.c: Add novector pragma.
	* gcc.dg/vect/vect-106.c: Add novector pragma.
	* gcc.dg/vect/vect-107.c: Add novector pragma.
	* gcc.dg/vect/vect-108.c: Add novector pragma.
	* gcc.dg/vect/vect-109.c: Add novector pragma.
	* gcc.dg/vect/vect-11.c: Add novector pragma.
	* gcc.dg/vect/vect-110.c: Add novector pragma.
	* gcc.dg/vect/vect-113.c: Add novector pragma.
	* gcc.dg/vect/vect-114.c: Add novector pragma.
	* gcc.dg/vect/vect-115.c: Add novector pragma.
	* gcc.dg/vect/vect-116.c: Add novector pragma.
	* gcc.dg/vect/vect-117.c: Add novector pragma.
	* gcc.dg/vect/vect-11a.c: Add novector pragma.
	* gcc.dg/vect/vect-12.c: Add novector pragma.
	* gcc.dg/vect/vect-122.c: Add novector pragma.
	* gcc.dg/vect/vect-124.c: Add novector pragma.
	* gcc.dg/vect/vect-13.c: Add novector pragma.
	* gcc.dg/vect/vect-14.c: Add novector pragma.
	* gcc.dg/vect/vect-15-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-15.c: Add novector pragma.
	* gcc.dg/vect/vect-17.c: Add novector pragma.
	* gcc.dg/vect/vect-18.c: Add novector pragma.
	* gcc.dg/vect/vect-19.c: Add novector pragma.
	* gcc.dg/vect/vect-2-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-2.c: Add novector pragma.
	* gcc.dg/vect/vect-20.c: Add novector pragma.
	* gcc.dg/vect/vect-21.c: Add novector pragma.
	* gcc.dg/vect/vect-22.c: Add novector pragma.
	* gcc.dg/vect/vect-23.c: Add novector pragma.
	* gcc.dg/vect/vect-24.c: Add novector pragma.
	* gcc.dg/vect/vect-25.c: Add novector pragma.
	* gcc.dg/vect/vect-26.c: Add novector pragma.
	* gcc.dg/vect/vect-27.c: Add novector pragma.
	* gcc.dg/vect/vect-28.c: Add novector pragma.
	* gcc.dg/vect/vect-29.c: Add novector pragma.
	* gcc.dg/vect/vect-3.c: Add novector pragma.
	* gcc.dg/vect/vect-30.c: Add novector pragma.
	* gcc.dg/vect/vect-31-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-31.c: Add novector pragma.
	* gcc.dg/vect/vect-32-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-32.c: Add novector pragma.
	* gcc.dg/vect/vect-33-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-33.c: Add novector pragma.
	* gcc.dg/vect/vect-34-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-34.c: Add novector pragma.
	* gcc.dg/vect/vect-35-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-35.c: Add novector pragma.
	* gcc.dg/vect/vect-36-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-36.c: Add novector pragma.
	* gcc.dg/vect/vect-38.c: Add novector pragma.
	* gcc.dg/vect/vect-4.c: Add novector pragma.
	* gcc.dg/vect/vect-40.c: Add novector pragma.
	* gcc.dg/vect/vect-42.c: Add novector pragma.
	* gcc.dg/vect/vect-44.c: Add novector pragma.
	* gcc.dg/vect/vect-46.c: Add novector pragma.
	* gcc.dg/vect/vect-48.c: Add novector pragma.
	* gcc.dg/vect/vect-5.c: Add novector pragma.
	* gcc.dg/vect/vect-50.c: Add novector pragma.
	* gcc.dg/vect/vect-52.c: Add novector pragma.
	* gcc.dg/vect/vect-54.c: Add novector pragma.
	* gcc.dg/vect/vect-56.c: Add novector pragma.
	* gcc.dg/vect/vect-58.c: Add novector pragma.
	* gcc.dg/vect/vect-6-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-6.c: Add novector pragma.
	* gcc.dg/vect/vect-60.c: Add novector pragma.
	* gcc.dg/vect/vect-62.c: Add novector pragma.
	* gcc.dg/vect/vect-63.c: Add novector pragma.
	* gcc.dg/vect/vect-64.c: Add novector pragma.
	* gcc.dg/vect/vect-65.c: Add novector pragma.
	* gcc.dg/vect/vect-66.c: Add novector pragma.
	* gcc.dg/vect/vect-67.c: Add novector pragma.
	* gcc.dg/vect/vect-68.c: Add novector pragma.
	* gcc.dg/vect/vect-7.c: Add novector pragma.
	* gcc.dg/vect/vect-70.c: Add novector pragma.
	* gcc.dg/vect/vect-71.c: Add novector pragma.
	* gcc.dg/vect/vect-72.c: Add novector pragma.
	* gcc.dg/vect/vect-73-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-73.c: Add novector pragma.
	* gcc.dg/vect/vect-74-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-74.c: Add novector pragma.
	* gcc.dg/vect/vect-75-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-75.c: Add novector pragma.
	* gcc.dg/vect/vect-76-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-76.c: Add novector pragma.
	* gcc.dg/vect/vect-77-alignchecks.c: Add novector pragma.
	* gcc.dg/vect/vect-77-global.c: Add novector pragma.
	* gcc.dg/vect/vect-77.c: Add novector pragma.
	* gcc.dg/vect/vect-78-alignchecks.c: Add novector pragma.
	* gcc.dg/vect/vect-78-global.c: Add novector pragma.
	* gcc.dg/vect/vect-78.c: Add novector pragma.
	* gcc.dg/vect/vect-8.c: Add novector pragma.
	* gcc.dg/vect/vect-80-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-80.c: Add novector pragma.
	* gcc.dg/vect/vect-82.c: Add novector pragma.
	* gcc.dg/vect/vect-82_64.c: Add novector pragma.
	* gcc.dg/vect/vect-83.c: Add novector pragma.
	* gcc.dg/vect/vect-83_64.c: Add novector pragma.
	* gcc.dg/vect/vect-85-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-85.c: Add novector pragma.
	* gcc.dg/vect/vect-86.c: Add novector pragma.
	* gcc.dg/vect/vect-87.c: Add novector pragma.
	* gcc.dg/vect/vect-88.c: Add novector pragma.
	* gcc.dg/vect/vect-89-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-89.c: Add novector pragma.
	* gcc.dg/vect/vect-9.c: Add novector pragma.
	* gcc.dg/vect/vect-92.c: Add novector pragma.
	* gcc.dg/vect/vect-93.c: Add novector pragma.
	* gcc.dg/vect/vect-95.c: Add novector pragma.
	* gcc.dg/vect/vect-96.c: Add novector pragma.
	* gcc.dg/vect/vect-97-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-97.c: Add novector pragma.
	* gcc.dg/vect/vect-98-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-98.c: Add novector pragma.
	* gcc.dg/vect/vect-99.c: Add novector pragma.
	* gcc.dg/vect/vect-alias-check-10.c: Add novector pragma.
	* gcc.dg/vect/vect-alias-check-11.c: Add novector pragma.
	* gcc.dg/vect/vect-alias-check-12.c: Add novector pragma.
	* gcc.dg/vect/vect-alias-check-14.c: Add novector pragma.
	* gcc.dg/vect/vect-alias-check-15.c: Add novector pragma.
	* gcc.dg/vect/vect-alias-check-16.c: Add novector pragma.
	* gcc.dg/vect/vect-alias-check-18.c: Add novector pragma.
	* gcc.dg/vect/vect-alias-check-19.c: Add novector pragma.
	* gcc.dg/vect/vect-alias-check-20.c: Add novector pragma.
	* gcc.dg/vect/vect-alias-check-8.c: Add novector pragma.
	* gcc.dg/vect/vect-alias-check-9.c: Add novector pragma.
	* gcc.dg/vect/vect-align-1.c: Add novector pragma.
	* gcc.dg/vect/vect-align-2.c: Add novector pragma.
	* gcc.dg/vect/vect-all-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-all.c: Add novector pragma.
	* gcc.dg/vect/vect-avg-1.c: Add novector pragma.
	* gcc.dg/vect/vect-avg-11.c: Add novector pragma.
	* gcc.dg/vect/vect-avg-15.c: Add novector pragma.
	* gcc.dg/vect/vect-avg-16.c: Add novector pragma.
	* gcc.dg/vect/vect-avg-5.c: Add novector pragma.
	* gcc.dg/vect/vect-bitfield-write-1.c: Add novector pragma.
	* gcc.dg/vect/vect-bitfield-write-2.c: Add novector pragma.
	* gcc.dg/vect/vect-bitfield-write-3.c: Add novector pragma.
	* gcc.dg/vect/vect-bitfield-write-4.c: Add novector pragma.
	* gcc.dg/vect/vect-bitfield-write-5.c: Add novector pragma.
	* gcc.dg/vect/vect-bool-cmp.c: Add novector pragma.
	* gcc.dg/vect/vect-bswap16.c: Add novector pragma.
	* gcc.dg/vect/vect-bswap32.c: Add novector pragma.
	* gcc.dg/vect/vect-bswap64.c: Add novector pragma.
	* gcc.dg/vect/vect-complex-1.c: Add novector pragma.
	* gcc.dg/vect/vect-complex-2.c: Add novector pragma.
	* gcc.dg/vect/vect-complex-4.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-1.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-10.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-11.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-3.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-4.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-5.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-6.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-7.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-8.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-9.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-arith-1.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-arith-3.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-arith-4.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-arith-5.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-arith-6.c: Add novector pragma.
	* gcc.dg/vect/vect-cond-arith-7.c: Add novector pragma.
	* gcc.dg/vect/vect-cselim-1.c: Add novector pragma.
	* gcc.dg/vect/vect-cselim-2.c: Add novector pragma.
	* gcc.dg/vect/vect-div-bitmask-4.c: Add novector pragma.
	* gcc.dg/vect/vect-div-bitmask-5.c: Add novector pragma.
	* gcc.dg/vect/vect-div-bitmask.h: Add novector pragma.
	* gcc.dg/vect/vect-double-reduc-1.c: Add novector pragma.
	* gcc.dg/vect/vect-double-reduc-2.c: Add novector pragma.
	* gcc.dg/vect/vect-double-reduc-3.c: Add novector pragma.
	* gcc.dg/vect/vect-double-reduc-4.c: Add novector pragma.
	* gcc.dg/vect/vect-double-reduc-5.c: Add novector pragma.
	* gcc.dg/vect/vect-double-reduc-6-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-double-reduc-6.c: Add novector pragma.
	* gcc.dg/vect/vect-double-reduc-7.c: Add novector pragma.
	* gcc.dg/vect/vect-float-extend-1.c: Add novector pragma.
	* gcc.dg/vect/vect-float-truncate-1.c: Add novector pragma.
	* gcc.dg/vect/vect-floatint-conversion-1.c: Add novector pragma.
	* gcc.dg/vect/vect-floatint-conversion-2.c: Add novector pragma.
	* gcc.dg/vect/vect-fma-1.c: Add novector pragma.
	* gcc.dg/vect/vect-gather-1.c: Add novector pragma.
	* gcc.dg/vect/vect-gather-3.c: Add novector pragma.
	* gcc.dg/vect/vect-ifcvt-11.c: Add novector pragma.
	* gcc.dg/vect/vect-ifcvt-16.c: Add novector pragma.
	* gcc.dg/vect/vect-ifcvt-17.c: Add novector pragma.
	* gcc.dg/vect/vect-ifcvt-2.c: Add novector pragma.
	* gcc.dg/vect/vect-ifcvt-3.c: Add novector pragma.
	* gcc.dg/vect/vect-ifcvt-4.c: Add novector pragma.
	* gcc.dg/vect/vect-ifcvt-5.c: Add novector pragma.
	* gcc.dg/vect/vect-ifcvt-6.c: Add novector pragma.
	* gcc.dg/vect/vect-ifcvt-7.c: Add novector pragma.
	* gcc.dg/vect/vect-ifcvt-9.c: Add novector pragma.
	* gcc.dg/vect/vect-intfloat-conversion-1.c: Add novector pragma.
	* gcc.dg/vect/vect-intfloat-conversion-2.c: Add novector pragma.
	* gcc.dg/vect/vect-intfloat-conversion-3.c: Add novector pragma.
	* gcc.dg/vect/vect-intfloat-conversion-4a.c: Add novector pragma.
	* gcc.dg/vect/vect-intfloat-conversion-4b.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-1.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-10.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-2.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-3.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-4.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-5.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-6.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-7.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-8-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-8.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-8a-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-iv-8a.c: Add novector pragma.
	* gcc.dg/vect/vect-live-1.c: Add novector pragma.
	* gcc.dg/vect/vect-live-2.c: Add novector pragma.
	* gcc.dg/vect/vect-live-3.c: Add novector pragma.
	* gcc.dg/vect/vect-live-4.c: Add novector pragma.
	* gcc.dg/vect/vect-live-5.c: Add novector pragma.
	* gcc.dg/vect/vect-live-slp-1.c: Add novector pragma.
	* gcc.dg/vect/vect-live-slp-2.c: Add novector pragma.
	* gcc.dg/vect/vect-live-slp-3.c: Add novector pragma.
	* gcc.dg/vect/vect-mask-load-1.c: Add novector pragma.
	* gcc.dg/vect/vect-mask-loadstore-1.c: Add novector pragma.
	* gcc.dg/vect/vect-mulhrs-1.c: Add novector pragma.
	* gcc.dg/vect/vect-mult-const-pattern-1.c: Add novector pragma.
	* gcc.dg/vect/vect-mult-const-pattern-2.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-1.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-10.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-11.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-12.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-13.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-14.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-16.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-17.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-2.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-3.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-4.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-5.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-6.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-8.c: Add novector pragma.
	* gcc.dg/vect/vect-multitypes-9.c: Add novector pragma.
	* gcc.dg/vect/vect-nb-iter-ub-1.c: Add novector pragma.
	* gcc.dg/vect/vect-nb-iter-ub-2.c: Add novector pragma.
	* gcc.dg/vect/vect-nb-iter-ub-3.c: Add novector pragma.
	* gcc.dg/vect/vect-neg-store-1.c: Add novector pragma.
	* gcc.dg/vect/vect-neg-store-2.c: Add novector pragma.
	* gcc.dg/vect/vect-nest-cycle-1.c: Add novector pragma.
	* gcc.dg/vect/vect-nest-cycle-2.c: Add novector pragma.
	* gcc.dg/vect/vect-nest-cycle-3.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-2-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-2.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-2a-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-2a.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-2b.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-2c-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-2c.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-2d.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-3-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-3.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-3a-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-3a.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-3b.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-3c.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-4.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-4d-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-4d.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-5.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-6.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-fir-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-fir-lb-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-fir-lb.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-fir.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-simd-1.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-simd-2.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-simd-3.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-slp-2.c: Add novector pragma.
	* gcc.dg/vect/vect-outer-slp-3.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-1-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-1.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-11.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-13.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-15.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-17.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-18.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-19.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-2-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-2.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-20.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-21.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-22.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-3-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-3.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-4-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-4.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-5.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-7.c: Add novector pragma.
	* gcc.dg/vect/vect-over-widen-9.c: Add novector pragma.
	* gcc.dg/vect/vect-peel-1-src.c: Add novector pragma.
	* gcc.dg/vect/vect-peel-2-src.c: Add novector pragma.
	* gcc.dg/vect/vect-peel-4-src.c: Add novector pragma.
	* gcc.dg/vect/vect-recurr-1.c: Add novector pragma.
	* gcc.dg/vect/vect-recurr-2.c: Add novector pragma.
	* gcc.dg/vect/vect-recurr-3.c: Add novector pragma.
	* gcc.dg/vect/vect-recurr-4.c: Add novector pragma.
	* gcc.dg/vect/vect-recurr-5.c: Add novector pragma.
	* gcc.dg/vect/vect-recurr-6.c: Add novector pragma.
	* gcc.dg/vect/vect-sdiv-pow2-1.c: Add novector pragma.
	* gcc.dg/vect/vect-sdivmod-1.c: Add novector pragma.
	* gcc.dg/vect/vect-shift-1.c: Add novector pragma.
	* gcc.dg/vect/vect-shift-3.c: Add novector pragma.
	* gcc.dg/vect/vect-shift-4.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-1.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-10.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-11.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-12.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-13.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-14.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-15.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-16.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-17.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-18.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-19.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-20.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-8.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-9.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-1.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-10.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-11.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-15.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-2.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-3.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-4.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-5.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-6.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-7.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-8.c: Add novector pragma.
	* gcc.dg/vect/vect-simd-clone-9.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-a-mult.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-a-u16-i2.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-a-u16-i4.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-a-u16-mult.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-a-u32-mult.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-a-u8-i2-gap.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-a-u8-i8-gap2.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-a-u8-i8-gap7.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-float.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-mult-char-ls.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-mult.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-same-dr.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-shift-1.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-store-a-u8-i2.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-store-u16-i4.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-store-u32-i2.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-store.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u16-i2.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u16-i3.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u16-i4.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u32-i4.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u32-i8.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u32-mult.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u8-i2-gap.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u8-i2.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u8-i8-gap2.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u8-i8-gap4.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u8-i8-gap7.c: Add novector pragma.
	* gcc.dg/vect/vect-strided-u8-i8.c: Add novector pragma.
	* gcc.dg/vect/vect-vfa-01.c: Add novector pragma.
	* gcc.dg/vect/vect-vfa-02.c: Add novector pragma.
	* gcc.dg/vect/vect-vfa-03.c: Add novector pragma.
	* gcc.dg/vect/vect-vfa-04.c: Add novector pragma.
	* gcc.dg/vect/vect-vfa-slp.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-mult-1.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-mult-const-s16.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-mult-const-u16.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-mult-half-u8.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-mult-half.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-mult-s16.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-mult-s8.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-mult-u16.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-mult-u8-s16-s32.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-mult-u8-u32.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-mult-u8.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-shift-s16.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-shift-s8.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-shift-u16.c: Add novector pragma.
	* gcc.dg/vect/vect-widen-shift-u8.c: Add novector pragma.
	* gcc.dg/vect/wrapv-vect-7.c: Add novector pragma.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/g++.dg/vect/pr84556.cc b/gcc/testsuite/g++.dg/vect/pr84556.cc
index e0655536f7a0a1c32a918f4b112604a7e6b5e389..e2c97e917bed3e7c5e709f61384d75588f522308 100644
--- a/gcc/testsuite/g++.dg/vect/pr84556.cc
+++ b/gcc/testsuite/g++.dg/vect/pr84556.cc
@@ -15,6 +15,7 @@ main ()
   };
   x ();
   x ();
+#pragma GCC novector
   for (int i = 0; i < 8; ++i)
     if (y[i] != i + 3)
       __builtin_abort ();
diff --git a/gcc/testsuite/g++.dg/vect/simd-1.cc b/gcc/testsuite/g++.dg/vect/simd-1.cc
index 76ce45d939dca8ddbc4953885ac71cf9f6ad298b..991db1d5dfee2a8d89de4aeae659b797629406c1 100644
--- a/gcc/testsuite/g++.dg/vect/simd-1.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-1.cc
@@ -88,12 +88,14 @@ main ()
   s.foo (x, y);
   if (x != 1024 || s.s != 2051 || s.t != 2054)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1025; ++i)
     if (a[i] != 2 * i)
       abort ();
   s.bar (x, y);
   if (x != 2049 || s.s != 4101 || s.t != 4104)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1025; ++i)
     if (a[i] != 4 * i)
       abort ();
@@ -102,12 +104,14 @@ main ()
   s.baz (x, y);
   if (x != 1024 || s.s != 2051 || s.t != 2054)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1025; ++i)
     if (a[i] != 2 * i)
       abort ();
   s.qux (x, y);
   if (x != 2049 || s.s != 4101 || s.t != 4104)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1025; ++i)
     if (a[i] != 4 * i)
       abort ();
diff --git a/gcc/testsuite/g++.dg/vect/simd-2.cc b/gcc/testsuite/g++.dg/vect/simd-2.cc
index 6f5737b7e40b5c2889f26cb4e4c3445e1c3822dd..0ff57e3178d1d79393120529ceea282498015d09 100644
--- a/gcc/testsuite/g++.dg/vect/simd-2.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-2.cc
@@ -110,6 +110,7 @@ main ()
   foo (a, b);
   if (r.s != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += i;
@@ -121,6 +122,7 @@ main ()
   if (bar ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += 2 * i;
@@ -132,6 +134,7 @@ main ()
   if (r.s != 1024 * 1023 / 2)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += i;
@@ -143,6 +146,7 @@ main ()
   if (qux ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += 2 * i;
diff --git a/gcc/testsuite/g++.dg/vect/simd-3.cc b/gcc/testsuite/g++.dg/vect/simd-3.cc
index d9981719f58ced487c4ffbbecb7c8a5564165bc7..47148f050ed056a2b3340f1e60604606f6cc1311 100644
--- a/gcc/testsuite/g++.dg/vect/simd-3.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-3.cc
@@ -75,6 +75,7 @@ main ()
   foo (a, b, r);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -86,6 +87,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
@@ -99,6 +101,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -110,6 +113,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
diff --git a/gcc/testsuite/g++.dg/vect/simd-4.cc b/gcc/testsuite/g++.dg/vect/simd-4.cc
index 8f3198943a7427ae3d4800bfbc5575c5849627ff..15b1bc1c99d5d42ecca330e063fed19a50fb3276 100644
--- a/gcc/testsuite/g++.dg/vect/simd-4.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-4.cc
@@ -77,6 +77,7 @@ main ()
   foo (a, b, r);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -88,6 +89,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
@@ -101,6 +103,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -112,6 +115,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
diff --git a/gcc/testsuite/g++.dg/vect/simd-5.cc b/gcc/testsuite/g++.dg/vect/simd-5.cc
index dd817b8888b1b17d822f576d6d6b123f338e984f..31c2ce8e7129983e02237cdd32e41ef0a8f25f90 100644
--- a/gcc/testsuite/g++.dg/vect/simd-5.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-5.cc
@@ -110,6 +110,7 @@ main ()
   foo (a, b, r);
   if (r.s != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += i;
@@ -121,6 +122,7 @@ main ()
   if (bar ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += 2 * i;
@@ -132,6 +134,7 @@ main ()
   if (r.s != 1024 * 1023 / 2)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += i;
@@ -143,6 +146,7 @@ main ()
   if (qux ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += 2 * i;
diff --git a/gcc/testsuite/g++.dg/vect/simd-6.cc b/gcc/testsuite/g++.dg/vect/simd-6.cc
index 883b769a9b854bd8c1915648d15ea8996d461f05..7de41a90cae3d80c0ccafad8a9b041bee89764d3 100644
--- a/gcc/testsuite/g++.dg/vect/simd-6.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-6.cc
@@ -118,6 +118,7 @@ main ()
   foo (a, b);
   if (r.s != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -129,6 +130,7 @@ main ()
   if (bar<int> ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -140,6 +142,7 @@ main ()
   if (r.s != 1024 * 1023 / 2)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -151,6 +154,7 @@ main ()
   if (qux ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
diff --git a/gcc/testsuite/g++.dg/vect/simd-7.cc b/gcc/testsuite/g++.dg/vect/simd-7.cc
index 1467849e0c6baa791016b039ca21cfa2cc63ce7f..b543efb191cfbf9c561b243996cdd3a4b66b7533 100644
--- a/gcc/testsuite/g++.dg/vect/simd-7.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-7.cc
@@ -79,6 +79,7 @@ main ()
   foo<int *, int &> (a, b, r);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -90,6 +91,7 @@ main ()
   if (bar<int> () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -103,6 +105,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -114,6 +117,7 @@ main ()
   if (qux<int &> () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
diff --git a/gcc/testsuite/g++.dg/vect/simd-8.cc b/gcc/testsuite/g++.dg/vect/simd-8.cc
index 8e297e246bd41a2f63469260f4fdcfcb5a68a62e..4d76a97a97233cecd4d35797a4cc52f70a4c5e3b 100644
--- a/gcc/testsuite/g++.dg/vect/simd-8.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-8.cc
@@ -77,6 +77,7 @@ main ()
   foo (a, b, r);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -88,6 +89,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -101,6 +103,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -112,6 +115,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
diff --git a/gcc/testsuite/g++.dg/vect/simd-9.cc b/gcc/testsuite/g++.dg/vect/simd-9.cc
index 4c5b0508fbd79f0e6aa311072062725536d8e2a3..5d1a174e0fc5425f33769fd017b4fd6a51a2fb14 100644
--- a/gcc/testsuite/g++.dg/vect/simd-9.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-9.cc
@@ -110,6 +110,7 @@ main ()
   foo (a, b, r);
   if (r.s != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -121,6 +122,7 @@ main ()
   if (bar ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -132,6 +134,7 @@ main ()
   if (r.s != 1024 * 1023 / 2)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -143,6 +146,7 @@ main ()
   if (qux ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
diff --git a/gcc/testsuite/g++.dg/vect/simd-clone-6.cc b/gcc/testsuite/g++.dg/vect/simd-clone-6.cc
index fb00e8816a5fc157b780edd1d7064804a67d6373..2d9bb62555ff6c9473db2d1b754aed0123f2cb62 100644
--- a/gcc/testsuite/g++.dg/vect/simd-clone-6.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-clone-6.cc
@@ -30,6 +30,7 @@ do_main ()
   #pragma omp simd
   for (i = 0; i < N; i++)
     e[i] = foo (c[i], d[i], f[i]);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (e[i] != 6 * i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/O3-pr70130.c b/gcc/testsuite/gcc.dg/vect/O3-pr70130.c
index f8b84405140e87a2244ae9f5db6136af2fe9cf57..17ce6c392546f7e46a6db9f30f76dcaedb96d08c 100644
--- a/gcc/testsuite/gcc.dg/vect/O3-pr70130.c
+++ b/gcc/testsuite/gcc.dg/vect/O3-pr70130.c
@@ -90,6 +90,7 @@ main (void)
   for (i = 0; i < 8; i++)
     Loop_err (images + i, s, -1);
 
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (__builtin_memcmp (&expected, images + i, sizeof (expected)))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/Os-vect-95.c b/gcc/testsuite/gcc.dg/vect/Os-vect-95.c
index 97e516ed68e6166eb5f0631004d89f8eedde1cc4..8039be89febdb150226b513ffe267f6065613ccb 100644
--- a/gcc/testsuite/gcc.dg/vect/Os-vect-95.c
+++ b/gcc/testsuite/gcc.dg/vect/Os-vect-95.c
@@ -10,6 +10,7 @@ void bar (float *pd, float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-1.c
index 793c41f6b724d2b6f5ecca6511ea8504e1731a8c..3dc5e746cd0d5c99dcb0c88a05b94c73b44b0e65 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-1.c
@@ -29,6 +29,7 @@ main1 (int dummy)
     }
 
   /* check results: */ 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
index 82fae06e3244a9bbb4a471faecdc5f1174970229..76430e0915e2d6ad342dae602fd22337f4559b63 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
@@ -37,6 +37,7 @@ main1 (int dummy)
 
   a = 0;
   /* check results: */ 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8] + a
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-2.c
index fcf1cd327e0b20582e3512faacfebfe6b7db7278..cb1b38dda14785c6755d311683fbe9703355b39a 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-2.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-2.c
@@ -28,6 +28,7 @@ main1 (int dummy)
     }
 
   /* check results:  */ 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-24.c b/gcc/testsuite/gcc.dg/vect/bb-slp-24.c
index ca049c81ba05482813dbab50ab3f4c6df94570e4..6de8dd8affce8e6f6ad40a36d6a163fc25b3fcf9 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-24.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-24.c
@@ -44,6 +44,7 @@ int main (void)
 
   foo (dst, src, N, 8);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (dst[i] != A * i)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-25.c b/gcc/testsuite/gcc.dg/vect/bb-slp-25.c
index 7a9cf955e3e540e08b42cd80872bb99b53cabcb2..d44d585ff25aed7394945cff64f20923b5600061 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-25.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-25.c
@@ -45,6 +45,7 @@ int main (void)
 
   foo (dst, src, N, 8);
 
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (dst[i] != A * i + i + 8)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-26.c b/gcc/testsuite/gcc.dg/vect/bb-slp-26.c
index df529673f6c817620a8423ab14724fe4e72bca49..fde062e86c7a01ca29d6e7eb8367414bd734500b 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-26.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-26.c
@@ -45,6 +45,7 @@ int main (void)
 
   foo (dst, src, N, 8);
 
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (dst[i] != A * src[i] + src[i+8])
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-27.c b/gcc/testsuite/gcc.dg/vect/bb-slp-27.c
index bc27f2fca04de8f837ce51090657c8f2cc250c24..3647dd97c69df8a36fc66ca8e9988e215dad71eb 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-27.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-27.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo (A);
 
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     {
       if (dst[i] != A * i)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-28.c b/gcc/testsuite/gcc.dg/vect/bb-slp-28.c
index 8749a1f22a6cc1e62a15bd988c50f6f63f26a0a2..c92b687aa44705118f21421a817ac3067e2023c6 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-28.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-28.c
@@ -56,6 +56,7 @@ int main (void)
 
   foo (A);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (dst[i] != A * i
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-29.c b/gcc/testsuite/gcc.dg/vect/bb-slp-29.c
index b531350ff3073b7f54b9c03609d6c8279e0374db..9272f02b2aa14f52b04e3d6bb08f15be17ce6a2f 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-29.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-29.c
@@ -45,6 +45,7 @@ int main (void)
 
   foo (dst, src, N, 8);
 
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (dst[i] != A * src[i] + B * src[i+1])
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-42.c b/gcc/testsuite/gcc.dg/vect/bb-slp-42.c
index 1dfa301184aad4c8edf00af80fb861562c941049..69fd0968491544f98d1406ff8a166b723714dd23 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-42.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-42.c
@@ -36,6 +36,7 @@ main ()
   foo (a, b);
 
   for (int i = 0; i < 4; ++i)
+#pragma GCC novector
     for (int j = 0; j < ARR_SIZE; ++j)
       if (a[i][j] != (i + 1) * ARR_SIZE - j + 20 * i)
 	__builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
index ccb4ef659e47e3524d0dd602fa9d1291847dee3c..c8024429e9c44d924f5bb2af2fcc6b5eaa1b7db7 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
@@ -35,6 +35,7 @@ int main ()
 
   foo (a, 4);
 
+#pragma GCC novector
   for (i = 1; i < N; i++)
     if (a[i] != i%4 + 1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
index 5a9fe423691e549ea877c42e46e9ba70d6ab5b00..b556a1d627865f5425e644df11f98661e6a85c29 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
@@ -45,6 +45,7 @@ DEF_LOOP (unsigned)
 	asm volatile ("" ::: "memory");			\
       }							\
     f_##SIGNEDNESS (a, b, c);				\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       if (a[i] != (BASE_B + BASE_C + i * 29) >> 1)	\
 	__builtin_abort ();				\
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
index 15a94e680be4568232e31956732d7416549a18ff..d1aa161c3adcfad1d916de486a04c075f0aaf958 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
@@ -44,6 +44,7 @@ DEF_LOOP (unsigned)
 	asm volatile ("" ::: "memory");			\
       }							\
     f_##SIGNEDNESS (a, b, C);				\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       if (a[i] != (BASE_B + C + i * 15) >> 1)		\
 	__builtin_abort ();				\
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
index 47b1a43665130e11f902f5aea11b01faf307101b..a3ff0f5b3da2f25ce62a5e9fabe5b38e9b952fa9 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
@@ -37,6 +37,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c
index c50560b53696c340b0c071296f002f65bcb91631..05fde3a7feba81caf54acff82870079b87b7cf53 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c
@@ -39,6 +39,7 @@ int main ()
 
   foo (a, b, 8);
 
+#pragma GCC novector
   for (i = 1; i < N; i++)
     if (a[i] != i%8 + 1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c
index fc76700ced3d4f439b0f12eaf9dbc2b1fec72c20..c186c7b66c65e5f62edee25a924fdcfb25b252ab 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c
@@ -16,6 +16,7 @@ int
 main (void)
 {
   f (a);
+#pragma GCC novector
   for (int i = 0; i < 4; ++i)
     {
       if (a[i] != (i + 1) * (i + 1))
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c
index ac89883de22c9f647041fb373618dae5b7c036f3..dda74ebe03c35811ee991a181379e688430d8412 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c
@@ -16,6 +16,8 @@ int main()
 	for (int e = 0; e <= 4; e++)
 	  a[e + 1] |= 3;
     }
+
+#pragma GCC novector
   for (int d = 0; d < 6; d++)
     if (a[d] != res[d])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
index ee12136491071c6bfd7678c164df7a1c0a71818f..77d3ae7d424e208409c5baf18c6f39f294f7e351 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
@@ -51,6 +51,7 @@ int main()
   rephase ();
   for (i = 0; i < 32; ++i)
     for (j = 0; j < 3; ++j)
+#pragma GCC novector
       for (k = 0; k < 3; ++k)
 	if (lattice->link[i].e[j][k].real != i
 	    || lattice->link[i].e[j][k].imag != i)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
index 40a02ed1309e2b6b4dc44cf56018a4bb71cc519f..bea3b92ba775a4e8b547d4edccf3ae4a4aa50b40 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
@@ -31,9 +31,11 @@ main (int argc, char **argv)
   __asm__ volatile ("" : : : "memory");
   test (a, b);
   __asm__ volatile ("" : : : "memory");
+#pragma GCC novector
   for (int i = 0; i < 4; i++)
     if (a[i] != i+4)
       abort ();
+#pragma GCC novector
   for (int i = 4; i < 8; i++)
     if (a[i] != 0)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
index cc50a5bde01315be13058ac3409db07f4ce6de5f..085cb986b99c00cb1449db61bb68ccec4e7aa0ba 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
@@ -32,6 +32,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
@@ -45,6 +46,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
@@ -58,6 +60,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
@@ -71,6 +74,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.e.k[i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
index b82b8916be125b194a02aa74cef74f821796de7f..f07893458b658fc728703ffc8897a7f7aeafdbb3 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
@@ -23,6 +23,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
index 51847599fa62a88ecc090673ab670f7c0a8ac711..cfe7b8536892caa5455e9440505187f21fa09e63 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1.a.n[1][2][i] != 5)
@@ -42,6 +43,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i < N-1; i++)
     {
       if (tmp1.a.n[1][2][i] != 6)
@@ -55,6 +57,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.e.n[1][2][i] != 7)
@@ -68,6 +71,7 @@ int main1 ()
     }
  
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i <N-3; i++)
     {
       if (tmp1.e.n[1][2][i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
index c00a5bec6d5f9c325beb7e79a4520b76843f0a43..9e57cae9751d7231a2156acbb4c63c49dc0e8b95 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
@@ -48,6 +48,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
@@ -73,6 +74,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  (in[i*4] + 2) * 3
@@ -92,6 +94,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out2[i*2] !=  (float) (in[i*2] * 2 + 11)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c
index e27152eb91ef2feb6e547e5a00b0fc8fe40e2cee..4afbeea9927676b7dbdf78480671056e8777b183 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c
@@ -23,6 +23,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/4; i++)
     {
       if (tmp.b[2*i] != 5
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
index c092848dfc9d093e6fc78ce171bb4c1f59a0cf85..9cfae91534f38248a06fb60ebbe05c84a4baccd2 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
@@ -58,6 +58,7 @@ main (void)
   foo ();
 
   /* Check resiults. */ 
+#pragma GCC novector
   for (i = 0; i < 16; i++)
     {
       if (cf[i].f1 != res1[i])
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
index c57f065cccdd6cba4f96efe777318310415863c9..454a714a309163a39128bf20ef7e8426bd26da15 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
@@ -30,6 +30,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c
index 9bb81e371725ea0714f91eee1f5683c7c014e64c..f69e5c2ee5383abb0a242938426ef09621e54043 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c
@@ -31,6 +31,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c
index d062d659ffb0138859333f3d7e375bd83fc1c99a..cab6842f72d150b83d525abf7a23971817b9082e 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c
@@ -30,6 +30,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c
index dc170a0530c564c884bc739e6d82874ccddad12c..05c28fe75e6dc67acba59e73d2b8d3363cd47c9b 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c
@@ -22,6 +22,7 @@ __attribute__((noipa)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c
index ce27e4f082151b630376bd9cfbbabb78e80e4387..648e19f1071f844cc9f968414952897c12897688 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1.a.n[1][2][i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c
index dae5a78808f1d6a5754adb8e7ff4b22608ea33b4..badf5dff70225104207b65a6fe4a2a79223ff1ff 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i < N-1; i++)
     {
       if (tmp1.a.n[1][2][i] != 6)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c
index 8221f9e49f8875f453dbc12ca0da4a226e7cf62d..d71a202d8d2b6edaee8b71a485fa68ff56e983ba 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.e.n[1][2][i] != 7)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c
index 2fc751ce70da35b055c64d9e8bec222a4b4feb8b..f18da3fc1f0c0df27c5bd9dd7995deae19352620 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c
@@ -26,6 +26,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != pib[i - OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
index 5da4343198c10e3d35c9f446bc96f1b97d123f84..cbbfbb24658f8a11d4695fe5e16de4e4cfbdbc7e 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
@@ -28,6 +28,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (pib[i - OFF] != ic[i])
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c
index 1fc14666f286c1f1170d66120d734647db7686cf..2a672122bcc549029c95563745b56d74f41d9a82 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c
@@ -26,6 +26,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != ic[i - OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c
index 1a1a86878405bd3bf240e1417ad68970a585c562..9c659f83928046df2b40c2dcc20cdc12fad6c4fe 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c
@@ -59,6 +59,7 @@ int main (void)
   foo ();
   fir ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (out[i] != fir_out[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
index cc50a5bde01315be13058ac3409db07f4ce6de5f..085cb986b99c00cb1449db61bb68ccec4e7aa0ba 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
@@ -32,6 +32,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
@@ -45,6 +46,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
@@ -58,6 +60,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
@@ -71,6 +74,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.e.k[i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
index 5e4affae7db61a0a07568603f1c80aefaf819adb..2f48955caa19f61c12e4c178f60f564c2e277bee 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
@@ -23,6 +23,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
index 51847599fa62a88ecc090673ab670f7c0a8ac711..cfe7b8536892caa5455e9440505187f21fa09e63 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1.a.n[1][2][i] != 5)
@@ -42,6 +43,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i < N-1; i++)
     {
       if (tmp1.a.n[1][2][i] != 6)
@@ -55,6 +57,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.e.n[1][2][i] != 7)
@@ -68,6 +71,7 @@ int main1 ()
     }
  
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i <N-3; i++)
     {
       if (tmp1.e.n[1][2][i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c
index cfea8723ba2731c334c1fffd749dc157d8f68e36..d9f19d90431ab1e458de738411d7d903445cd04d 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c
@@ -32,6 +32,7 @@ main1 ()
       d[i] = i * i;
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + i + i - a[i]) >= 0.0001f)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c
index 6d67d12f9961f5cbc53d6f7df5240ac2178a08ac..76bb044914f462cf6d76b559b751f1338a3fc0f8 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c
@@ -44,12 +44,14 @@ main1 ()
       b[i] = ((i & 1) ? -4 * i : 4 * i) + 0.25;
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1 + i)
       abort ();
     else
       a[i] = 131.25;
   f2 ();
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
index 495c0319c9dabd65436b5f6180114dfa8967f071..ad22f6e82b3c3312c9f10522377c4749e87ce3aa 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
@@ -65,24 +65,28 @@ main1 ()
       d[i] = i * i;
     }
   f1 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + (i & 3) + i - a[i]) >= 0.0001f)
       abort ();
     else
       a[i] = 131.25;
   f2 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + (i & 1) + i - a[i]) >= 0.0001f)
       abort ();
     else
       a[i] = 131.25;
   f3 ();
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + i - a[i]) >= 0.0001f)
       abort ();
     else
       a[i] = 131.25;
   f4 (10);
+#pragma GCC novector
   for (i = 0; i < 60; i++)
     if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + (i % 3) + i - a[i]) >= 0.0001f)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c
index 274ff0f9942c5aff6c6aaca5243ef21bd8708856..d51e17ff656b7cc7ef3d87d207f78aae8eec9373 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c
@@ -82,36 +82,42 @@ main1 ()
       b[i] = ((i & 1) ? -4 * i : 4 * i) + 0.25;
     }
   f1 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1 + (i & 3))
       abort ();
     else
       a[i] = 131.25;
   f2 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1 + (i & 1))
       abort ();
     else
       a[i] = 131.25;
   f3 ();
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1)
       abort ();
     else
       a[i] = 131.25;
   f4 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i))
       abort ();
     else
       a[i] = 131.25;
   f5 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i))
       abort ();
     else
       a[i] = 131.25;
   f6 ();
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c b/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
index 893e521ed8b83768699bc9b70f7d33b91dd89c9b..07992cf72dcfa4da5211a7a160fb146cf0b7ba5c 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
@@ -47,6 +47,7 @@ main (void)
   foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
   {
     if (c[i] != res[i])
diff --git a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
index 71f2db3e0f281c4cdb1bf89315cc959382459e83..fc710637ac8142778b18810cefadf00dda3f39a6 100644
--- a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
+++ b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
@@ -56,6 +56,7 @@ main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i].a != result[2*i] || out[i].b != result[2*i+1])
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c
index 82b37d4ca71344d15e00e0453dae6470c8d5ba9b..aeaf8146b1a817379a09dc3bf09f542524522f99 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c
@@ -32,6 +32,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c
index cafb96f52849c2a9b51591753898207beac9bdd0..635df4573c7cc0d4005421ce12d87b0c6511a228 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c
@@ -31,6 +31,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<200*N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c
index b376fb1d13ec533eebdbcb8092f03b4790de379a..494ff0b6f8f14f3d3b6aba1ada60d6442ce10811 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c
@@ -31,6 +31,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c
index 64c8dfc470ab02f3ea323f13b6477d6370210937..ba766a3f157db3f1a3d174ca6062fe7ddc60812c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c
@@ -38,6 +38,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c
index 277b73b4934a7bd689f8b2856b7813567dd762bc..d2eee349a42cd1061917c828895e45af5f730eb1 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c
@@ -38,6 +38,7 @@ int main (void)
   foo (N-1);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
index 325e201e2dac7aff88f4cb7aff53a7ee25b18631..cf7d605f23ba94b7a0a71526db02b59b517cbacc 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
@@ -42,6 +42,7 @@ int main (void)
   foo (N-1);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
index d9cf28b22d9712f4e7f16ed18b89b0875d94daee..cfb837dced894ad8a885dcb392f489be381a3065 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
@@ -41,6 +41,7 @@ int main (void)
   foo (N-1);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c
index f5aeac981870d0e58679d5574dd12e2c4b40d23a..d650a9d1cdc7af778a2dac8e3e251527b825487d 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c
@@ -34,6 +34,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
index b5f8c3c88e4d0a6562ba867ae83c1ab120077111..e9ec4ca0da316be7d4d02138b0313a9ab087a601 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
@@ -33,6 +33,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c
index 9d642415a133eefb420ced6642ac6d32a0e7e33f..13aac4a939846f05826f2b8628258c0fbd2e413a 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c
@@ -32,6 +32,7 @@ int main (void)
   foo (3);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
index f00132ede83134eb10639b77f5617487356e2ef1..c7c2fa8a5041fbc67747b4d4b98571f71f9599b6 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
@@ -41,6 +41,7 @@ int main (void)
   res = foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum += i;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
index 2dfdd59e14ef9685d22b2b8c34d55052ee747e7e..ba904a6c03e5a94f4a2b225f180bfe6a384f21d1 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
@@ -47,6 +47,7 @@ int main (void)
   res = foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum += (b[i] - c[i]);
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c
index 49dd5207d862fd1de81b59013a07ea74ee9b5beb..464fcb1fc310a7366ef6a55c5ed491a7410720f8 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c
@@ -35,6 +35,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
index 934eadd3e326b950bf33eac03136868089fa1371..5cd4049d08c84ab9f3503a3f1577d170df8ce6c3 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
@@ -36,6 +36,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c
index 42e218924fdbeb7c21830537a55364ad5ca822ac..a9ef1c04c70510797006d8782dcc6abf2908e4f4 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c
@@ -38,6 +38,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
index 75b7f066abae1496caa98080cdf4355ca1383091..72e53c2bfb0338a48def620159e384d423399d0b 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
@@ -41,6 +41,7 @@ int main (void)
   res = foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum += i;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c
index ec04bc28f6279c0cd6a6c174698aedc4312c7ab5..b41b2c322b91ab0a9a06ab93acd335b53f654a6d 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c
@@ -38,6 +38,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c
index ee39891efea2231362dc776efc4193898f06a02c..91e57a5963ac81964fb0c98a28f7586bf98df059 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c
@@ -35,6 +35,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c
index f8ef02d25190a29315e6909b9d89642f699b6c6a..a6c29956f3b84ee0def117bdc886219bf07ec2d0 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c
@@ -39,6 +39,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c
index 2ef43cd146bdbbe6e7a8b8f0a66a11a1b8b7ec08..f01fcfb5c34906dbb96d050068b528192aa0f79a 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c
@@ -37,6 +37,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
index 7ac4e1ed949caecd6d2aaa7bf6d33d459ff74f8c..cf529efa31d6a10d3aaad69570f3f3ae102d327c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
@@ -39,6 +39,7 @@ int main (void)
     a[i] = foo (b,i);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = b[i];
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c
index ad2f472f1ed100912386d51ef999353baf50dd93..9c1e251f6a79fd34a820d64393696722c508e671 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c
@@ -38,6 +38,7 @@ int main (void)
     a[i] = foo (b,i);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = b[i];
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
index f56bd2e50af42f20f57791b2e3f0227dac13ee82..543ee98b5a44c91c2c249df0ece304dd3282cc1a 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
@@ -63,6 +63,7 @@ int main (void)
   res = foo (2);
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       if (a[i] != bar ())
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c
index 7c9113bc0f0425139c6723105c78cc8306d82f8c..0ed589b47e6bc722386a9db83e6397377f0e2069 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c
@@ -34,6 +34,7 @@ int main (void)
   foo (a);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c
index cea495c44808373543242d8998cdbfb9691499ca..62fa559e6ce064065b3191f673962a63e874055f 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c
@@ -34,6 +34,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
index 9e1f7890dd1ebc14b4a9a88488625347dcabd38a..96ffb4ce7b4a8a06cb6966acc15924512ad00f31 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
@@ -38,6 +38,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
index ee65ceb6f92a185ca476afcc0b82295ab0034ba5..d76752c0dba3bbedb2913f87ed4b95f7d48ed2cf 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
@@ -37,6 +37,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
index fe9e7e7ab4038acfe02d3e6ea9c4fc37ba207043..00d0eca56eeca6aee6f11567629dc955c0924c74 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
@@ -24,6 +24,7 @@ main1 ()
    }
 
   /* check results:  */
+#pragma GCC novector
    for (j = 0; j < N; j++)
    {
     for (i = 0; i < N; i++)
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
index dc5f16fcfd2c269a719f7dcc5d2d0d4f9dbbf556..48b6a9b0681cf1fe410755c3e639b825b27895b0 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
@@ -24,6 +24,7 @@ main1 ()
    }
 
   /* check results:  */
+#pragma GCC novector
  for (i = 0; i < N; i++)
    {
     for (j = 0; j < N; j++) 
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c
index 131d2d9e03f44ed680cb49c71673908511c9236f..57ebd5c92a4297940bbdfc051c8a08d99a3b184e 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c
@@ -22,6 +22,7 @@ int main1 ()
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (arr1[i] != 2+2*i)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
index d2ae7976781e20c6e4257e0ad4141ceb21ed711b..a1311504d2f8e67c275e8738b3c201187cd02bc0 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
@@ -39,6 +39,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
@@ -52,6 +53,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
@@ -65,6 +67,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
@@ -78,6 +81,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.e.k[i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c
index 1edad1ca30eeca0a224a61b5035546615a360fef..604d4b1bc6772f7bf9466b204ebf43e639642a02 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c
index 7663ca7281aacc0ba3e685887e3c20be97322148..3eada6057dd91995709f313d706b6d94b8fb99eb 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c
@@ -32,6 +32,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != s.cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
index 243e01e6dadf48d976fdd72bedd9547746cf73b5..19fbe331b57fde1412bfdaf7024e8c108f913da5 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
@@ -54,6 +54,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j] != ib[i])
@@ -64,6 +65,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[i][1][1][j] != ib[i])
@@ -74,6 +76,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (id[i][1][j+1] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c
index 581554064b572b7eb26d5f9852d4d13622317c7e..d51ef31aeac0d910a69d0959cc0da46d92bd7af9 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c
@@ -44,6 +44,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < M; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j] != ib[2][i][j])
@@ -64,6 +65,7 @@ int main1 ()
   /* check results: */
   for (i = 0; i < M; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[j] != ib[2][i][j])
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
index e339590bacb494569558bfe9536c43f0d6339b8e..23cd3d5c11157f6735ed219c16075007f26034e5 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
@@ -29,6 +29,7 @@ int main1 ()
     {
       for (j = 0; j < N; j++)
         {
+#pragma GCC novector
            if (ia[2][6][j] != 5)
                 abort();
         }
@@ -45,6 +46,7 @@ int main1 ()
     {
       for (j = 2; j < N+2; j++)
         {
+#pragma GCC novector
            if (ia[3][6][j] != 5)
                 abort();
         }
@@ -62,6 +64,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < 16; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[2][1][6][j+1] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
index c403a8302d842a8eda96d2ee0fb25a94e8323254..36b79c2907cc1b41664cdca5074d458e36bdee98 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
@@ -35,6 +35,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1.a.n[1][2][i] != 5)
@@ -48,6 +49,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i < N-1; i++)
     {
       if (tmp1.a.n[1][2][i] != 6)
@@ -61,6 +63,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.e.n[1][2][i] != 7)
@@ -74,6 +77,7 @@ int main1 ()
     }
  
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i <N-3; i++)
     {
       if (tmp1.e.n[1][2][i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
index 34317ccb624a4ca75c612c70a5b5105bb85e272b..a0e53d5fef91868dfdbd542dd0a98dff92bd265b 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
@@ -52,6 +52,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1[2].a.n[1][2][i] != 5)
@@ -65,6 +66,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = NINTS; i < N - 1; i++)
     {
       if (tmp1[2].a.n[1][2][i] != 6)
@@ -81,6 +83,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       for (j = 0; j < N; j++)
@@ -100,6 +103,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N - NINTS; i++)
     {
       for (j = 0; j < N - NINTS; j++)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
index 2199d11e2faee58663484a4d4e6ed06be508188b..f79b74d15700ccd86fc268e039efc8d7b8d245c2 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
@@ -31,7 +31,9 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < M; j++) {
       if (a[j][i] != 4)
         abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c
index d0e4ec2373b66b76235f53522c50ac1067ece4d2..8358b6e54328336f1bd0f6c618c58e96b19401d5 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c
@@ -21,6 +21,7 @@ main1 (void)
     a[i] = (b[i] > 0 ? b[i] : 0);
   }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
   {
     if (a[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c
index d718b5923b11aaee4d259c62cab1a82c714cc934..ae5d23fab86a4dd363e3df7310571ac93fc93f81 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c
@@ -20,6 +20,7 @@ main1 (void)
     a[i] = (b[i] > 0 ? b[i] : 0);
   }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
   {
     if (a[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c
index 7316985829f589dbbbe782b037096b2c5bd2be3c..4aaff3430a4cb110d586da83e2db410ae88bc977 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] >= MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c
index e87bcb8b43d3b82d30f8d3c2340b4968c8dd8da4..c644523a0047a6dfaa0ec8f3d74db79f71b82ec7 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c
@@ -21,6 +21,7 @@ int main ()
     A[i] = ( A[i] > MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c
index 9bd583fde6e71096b9cfd07d2668a9f32b50bf17..5902f61f954c5f65929616b0f924b8941cac847c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] <= MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c
index 9bd583fde6e71096b9cfd07d2668a9f32b50bf17..5902f61f954c5f65929616b0f924b8941cac847c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] <= MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c
index dcb09b7e7c7a3c983763fb3e57ea036e26d2d1ba..7f436a69e99bff6cebbc19a35c2dbbe5dce94c5a 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] < MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c b/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c
index ebde13167c863d91376d7c17d65191c047a7c9e7..d31157713bf3d0f0fadf305053dfae0612712b8d 100644
--- a/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c
+++ b/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c
@@ -21,6 +21,7 @@ int main ()
   check_vect ();
   main1 (32);
 
+#pragma GCC novector
   for (si = 0; si < 32; ++si)
     if (stack_vars_sorted[si] != si)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c b/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c
index e965910d66d06434a367f08553fde8a733a53e41..8491d5f0070233af5c0baf64f9123d270fe1d51c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c
+++ b/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c
@@ -22,6 +22,7 @@ main1 (unsigned short *in)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -48,6 +49,7 @@ main2 (unsigned short * __restrict__ in, unsigned short * __restrict__ out)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] !=  in[i*4]
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c b/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c
index a92ec9c1656275e1b0e31cfe1dcde3be78dfac7e..45cca1d1991c126fdef29bb129c443aae249a295 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c
@@ -41,6 +41,7 @@ int main(void)
   with_restrict(a + 1);
   without_restrict(b + 1);
 
+#pragma GCC novector
   for (i = 0; i < 1002; ++i) {
     if (a[i] != b[i])
       abort();
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
index ce934279ddfe073a96ef8cd7e0d383ca979bda7a..73b92177dabf5193d9d158a92e0383d389b67c82 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
@@ -30,6 +30,7 @@ int main1 (int x, int y) {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (p->a[i] != a[i] || p->b[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
index d9e0529e73f0a566220020ad671f432f3e72299f..9a3fdab128a3bf2609018f92a38a7a6de8b7270b 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
@@ -35,6 +35,7 @@ int main1 (int x, int y) {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (p->a[i] != 1) 
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c
index 581438823fd2d1fa83ae4cb770995ff30c18abf8..439347c3bb10711911485a9c1f3bc6abf1c7798c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c
@@ -34,6 +34,7 @@ int main1 (int x, int y) {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (p->a[i] != 1)
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c
index 6f4c84b4cd2b928c5df21a44e910620c1937e863..f59eb69d99fbe2794f3f6c6822cc87b209e8295f 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c
@@ -24,6 +24,7 @@ int main1 (char *y)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != cb[i])
@@ -38,6 +39,7 @@ int main1 (char *y)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != s.q[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c
index 18d4d1bbe6d0fdd357a95ab997437ab6b9a46ded..6b4542f5948bc32ca736ad92328a0fd37e44334c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c
@@ -11,6 +11,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
@@ -66,6 +67,7 @@ main2 (float *pa, float *pb, float *pc)
     }   
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (b[i] * c[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c
index cad507a708f3079f36e2c85c594513514a1e172b..5db05288c81bf5c4c158efbc50f6d4862bf3f335 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c
@@ -11,6 +11,7 @@ void bar (const float *pa, const float *pb, const float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c
index a364c7b0d6f1f19292b937eedf0854163c1f549a..a33375f94fec55183493f96c84099224b7f4af6f 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c
@@ -11,6 +11,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c
index 69e921b95031b9275e6f4edeb120f247e93646a3..5ebb8fea0b7cb101f73fa2b079f4a37092eb6f2d 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c
@@ -11,6 +11,7 @@ void bar (const float *pa, const float *pb, const float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c
index b1c1d86587e5bd9b1dcd364ad495ee7a52ccfb2b..b6d251ec48950dacdecc4d141ebceb4cedaa0755 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c
@@ -11,6 +11,7 @@ void bar (const float *pa, const float *pb, const float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c
index 83dc628f0b0803eab9489101c6f3c26f87cf429c..6291dd9d53c33160a0aacf05aeb6febb79fdadf0 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c
@@ -12,6 +12,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c
index 9524454d367db2a45ab744d55a9d32a32e773140..d0334e3ba90f511fd6c0bc5faa72d78c07510cd9 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c
index 6e9ddcfa5ce61f7a53829e81cab277165ecd1d91..37e474f8a06f1f7df7e9a83290e865d1baa12fce 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c
@@ -23,6 +23,7 @@ main1 (float *pa, float *pb, float *pc)
       pa[i] = q[i] * pc[i];
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != q[i] * pc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c
index da3506a4cecdce11bf929a98c533026d31fc5f96..e808c87158076d3430eac124df9fdd55192821a8 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c
@@ -21,6 +21,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N - 1; i++)
     {
       if (ia[i] != 0)
@@ -34,6 +35,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N - 1; i++)
     {
       if (ib[i] != res[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
index 89958378fca009fba6b59509c2ea7f96fa53805b..25a3409ae5e2ebdb6f7ebabc7974cd49ac7b7d47 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
@@ -21,6 +21,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != 0)
@@ -34,6 +35,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ib[i] != res[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
index e5914d970e3596a082e015725ba99369670db4e7..d1d70dda2eb9b3d7b462ebe0c30536a1f2744af4 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
@@ -130,6 +130,7 @@ main1 (void)
 	case 7: f8 (); break;
 	}
 
+#pragma GCC novector
       for (i = 0; i <= N; i++)
 	{
 	  int ea = i + 3;
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
index 8cc69ab22c5ab7cc193eeba1aa50365db640b254..407b683961ff0f5caaa1f168913fb7011b7fd2a3 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
@@ -37,6 +37,7 @@ int main ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N-20; i++)
     {
       if (A[i] != D[i+20])
@@ -50,6 +51,7 @@ int main ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < 16; i++)
     {
       if (B[i] != C[i] + 5)
@@ -63,6 +65,7 @@ int main ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < 4; i++)
     {
       if (C[i] != E[i])
diff --git a/gcc/testsuite/gcc.dg/vect/pr101445.c b/gcc/testsuite/gcc.dg/vect/pr101445.c
index f8a6e9ce6f7fa514cacd8b58d9263636d1d28eff..143156f2464e84e392c04231e4717ef9ec7d8a6e 100644
--- a/gcc/testsuite/gcc.dg/vect/pr101445.c
+++ b/gcc/testsuite/gcc.dg/vect/pr101445.c
@@ -21,6 +21,7 @@ int main()
 {
   check_vect ();
   foo ();
+#pragma GCC novector
   for (int d = 0; d < 25; d++)
     if (a[d] != 0)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr103581.c b/gcc/testsuite/gcc.dg/vect/pr103581.c
index d072748de31d2c6beb5d6dd86bf762ee1f4d0182..92695c83d99bf048b52c8978634027bcfd71c13d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr103581.c
+++ b/gcc/testsuite/gcc.dg/vect/pr103581.c
@@ -39,6 +39,7 @@ main()
   unsigned int *resusiusi = maskgatherusiusi (16, idx4, data4);
   unsigned long long *resudiudi = maskgatherudiudi (16, idx8, data8);
   unsigned int *resusiudi = maskgatherusiudi (16, idx8, data4);
+#pragma GCC novector
   for (int i = 0; i < 16; ++i)
     {
       unsigned int d = idx4[i];
diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c b/gcc/testsuite/gcc.dg/vect/pr105219.c
index 4bca5bbba30a9740a54e6205bc0d0c8011070977..2289f5e1a633b56218d089d81528599d4f1f282b 100644
--- a/gcc/testsuite/gcc.dg/vect/pr105219.c
+++ b/gcc/testsuite/gcc.dg/vect/pr105219.c
@@ -22,6 +22,7 @@ int main()
       {
         __builtin_memset (data, 0, sizeof (data));
         foo (&data[start], n);
+#pragma GCC novector
         for (int j = 0; j < n; ++j)
           if (data[start + j] != j)
             __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr108608.c b/gcc/testsuite/gcc.dg/vect/pr108608.c
index e968141ba03639ab86ccf77e5e9ad5dd56a66e0d..fff5c1a89365665edc3478263ee909b2b260e178 100644
--- a/gcc/testsuite/gcc.dg/vect/pr108608.c
+++ b/gcc/testsuite/gcc.dg/vect/pr108608.c
@@ -13,6 +13,7 @@ main (void)
 {
   check_vect ();
   float ptr[256];
+#pragma GCC novector
   for (int j = 0; j < 16; ++j)
     {
       for (int i = 0; i < 256; ++i)
diff --git a/gcc/testsuite/gcc.dg/vect/pr18400.c b/gcc/testsuite/gcc.dg/vect/pr18400.c
index 012086138f7199fdf2b4b40666795f7df03a89d2..dd96d87be99287da19df4634578e2e073ab42455 100644
--- a/gcc/testsuite/gcc.dg/vect/pr18400.c
+++ b/gcc/testsuite/gcc.dg/vect/pr18400.c
@@ -19,6 +19,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/pr18536.c b/gcc/testsuite/gcc.dg/vect/pr18536.c
index 6d02675913b68c811f4e3bc1f71df830d7f4e2aa..33ee3a5ddcfa296672924678b40474bea947b9ea 100644
--- a/gcc/testsuite/gcc.dg/vect/pr18536.c
+++ b/gcc/testsuite/gcc.dg/vect/pr18536.c
@@ -22,6 +22,7 @@ int main (void)
   main1 (0, x);
 
   /* check results:  */
+#pragma GCC novector
   while (++i < 4)
     {
       if (x[i-1] != 2)
diff --git a/gcc/testsuite/gcc.dg/vect/pr20122.c b/gcc/testsuite/gcc.dg/vect/pr20122.c
index 4f1b7bd6c1e723405b6625f7c7c890a46d3272bc..3a0387e7728fedc9872cb385dd7817f7f5cf07ac 100644
--- a/gcc/testsuite/gcc.dg/vect/pr20122.c
+++ b/gcc/testsuite/gcc.dg/vect/pr20122.c
@@ -27,6 +27,7 @@ static void VecBug2(short Kernel[8][24])
             Kernshort2[i] = Kernel[k][i];
 
     for (k = 0; k<8; k++)
+#pragma GCC novector
         for (i = 0; i<24; i++)
             if (Kernshort2[i] != Kernel[k][i])
                 abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr25413.c b/gcc/testsuite/gcc.dg/vect/pr25413.c
index e80d6970933e675b6056e5d119c6eb0e817a40f9..266ef3109f20df7615e85079a5d2330f26cf540d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr25413.c
+++ b/gcc/testsuite/gcc.dg/vect/pr25413.c
@@ -26,6 +26,7 @@ int main (void)
   check_vect ();
   
   main1 ();
+#pragma GCC novector
   for (i=0; i<N; i++)
     if (a.d[i] != 1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr30784.c b/gcc/testsuite/gcc.dg/vect/pr30784.c
index 840dbc5f1f139aafe012904a774c1e5b9739b653..ad1fa05d8edae5e28a3308f39ff304de3b1d60c1 100644
--- a/gcc/testsuite/gcc.dg/vect/pr30784.c
+++ b/gcc/testsuite/gcc.dg/vect/pr30784.c
@@ -21,6 +21,7 @@ int main ()
   check_vect ();
   main1 (32);
 
+#pragma GCC novector
   for (si = 0; si < 32; ++si)
     if (stack_vars_sorted[si] != si)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr37539.c b/gcc/testsuite/gcc.dg/vect/pr37539.c
index dfbfc20c5cbca0cfa7158423ee4a42e5976b56fe..c7934eb384739778a841271841fd8b7777ee19be 100644
--- a/gcc/testsuite/gcc.dg/vect/pr37539.c
+++ b/gcc/testsuite/gcc.dg/vect/pr37539.c
@@ -17,6 +17,7 @@ ayuv2yuyv_ref (int *d, int *src, int n)
   }
 
   /* Check results.  */
+#pragma GCC novector
   for(i=0;i<n/2;i++){
    if (dest[i*4 + 0] != (src[i*2 + 0])>>16
        || dest[i*4 + 1] != (src[i*2 + 1])>>8
diff --git a/gcc/testsuite/gcc.dg/vect/pr40074.c b/gcc/testsuite/gcc.dg/vect/pr40074.c
index 143ee05b1fda4b0f858e31cad2ecd4211530e7b6..b75061a8116c34f609eb9ed59256b6eea87976a4 100644
--- a/gcc/testsuite/gcc.dg/vect/pr40074.c
+++ b/gcc/testsuite/gcc.dg/vect/pr40074.c
@@ -30,6 +30,7 @@ main1 ()
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N-1; i++)
     {
       if (res[i] != arr[i].b + arr[i].d + arr[i+1].b)
diff --git a/gcc/testsuite/gcc.dg/vect/pr45752.c b/gcc/testsuite/gcc.dg/vect/pr45752.c
index 4ddac7ad5097c72f08b948f64caa54421d4f55d0..e8b364f29eb0c4b20bb2b2be5d49db3aab5ac39b 100644
--- a/gcc/testsuite/gcc.dg/vect/pr45752.c
+++ b/gcc/testsuite/gcc.dg/vect/pr45752.c
@@ -146,6 +146,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output, input2, output2);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (output[i] != check_results[i]
         || output2[i] != check_results2[i])
diff --git a/gcc/testsuite/gcc.dg/vect/pr45902.c b/gcc/testsuite/gcc.dg/vect/pr45902.c
index ac8e1ca6d38159d3c26497a414b638f49846381e..74510bf94b82850b6492c6d1ed0abacb73f65a16 100644
--- a/gcc/testsuite/gcc.dg/vect/pr45902.c
+++ b/gcc/testsuite/gcc.dg/vect/pr45902.c
@@ -34,6 +34,7 @@ main ()
 
   main1 ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (res[i] != a[i] >> 8)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr46009.c b/gcc/testsuite/gcc.dg/vect/pr46009.c
index 9649e2fb4bbfd74e134a9ef3d068d50b9bcb86c0..fe73dbf5db08732cc74115281dcf6a020f893cb6 100644
--- a/gcc/testsuite/gcc.dg/vect/pr46009.c
+++ b/gcc/testsuite/gcc.dg/vect/pr46009.c
@@ -49,6 +49,7 @@ main (void)
       e[i] = -1;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     {
       int g;
@@ -59,6 +60,7 @@ main (void)
       e[i] = -1;
     }
   bar ();
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     {
       int g;
diff --git a/gcc/testsuite/gcc.dg/vect/pr48172.c b/gcc/testsuite/gcc.dg/vect/pr48172.c
index a7fc05cae9119076efad4ca13a0f6fd0aff004b7..850e9b92bc15ac5f51fee8ac7fd2c9122def66b6 100644
--- a/gcc/testsuite/gcc.dg/vect/pr48172.c
+++ b/gcc/testsuite/gcc.dg/vect/pr48172.c
@@ -25,6 +25,7 @@ int main() {
     array[HALF+i] = array[2*i] + array[2*i + 1];
 
   /* see if we have any failures */
+#pragma GCC novector
   for (i = 0; i < HALF - 1; i++)
     if (array[HALF+i] != array[2*i] + array[2*i + 1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr51074.c b/gcc/testsuite/gcc.dg/vect/pr51074.c
index 4144572126e9de36f5b2e85bb56ff9fdff372bce..d6c8cea1f842e08436a3d04af513307d3e980d27 100644
--- a/gcc/testsuite/gcc.dg/vect/pr51074.c
+++ b/gcc/testsuite/gcc.dg/vect/pr51074.c
@@ -15,6 +15,7 @@ main ()
       s[i].a = i;
     }
   asm volatile ("" : : : "memory");
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (s[i].b != 0 || s[i].a != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr51581-3.c b/gcc/testsuite/gcc.dg/vect/pr51581-3.c
index 76c156adf9d0dc083b7eb5fb2e6f056398e2b845..25acceef0e5ca6f8c180a41131cd190b9c84b533 100644
--- a/gcc/testsuite/gcc.dg/vect/pr51581-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr51581-3.c
@@ -97,17 +97,20 @@ main ()
     }
   f1 ();
   f2 ();
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (a[i] != b[i] / 8 || c[i] != d[i] / 3)
       abort ();
   f3 ();
   f4 ();
+#pragma GCC novector
   for (i = 0; i < 8; i+= 2)
     if (a[i] != b[i] / 8 || a[i + 1] != b[i + 1] / 4
 	|| c[i] != d[i] / 3 || c[i + 1] != d[i + 1] / 5)
       abort ();
   f5 ();
   f6 ();
+#pragma GCC novector
   for (i = 0; i < 8; i+= 2)
     if (a[i] != b[i] / 14 || a[i + 1] != b[i + 1] / 15
 	|| c[i] != d[i] / (i == 6 ? 13 : 6) || c[i + 1] != d[i + 1] / 5)
diff --git a/gcc/testsuite/gcc.dg/vect/pr51581-4.c b/gcc/testsuite/gcc.dg/vect/pr51581-4.c
index 632c96e7481339a6dfac92913a519ad5501d34c4..f6234f3e7c09194dba54af08832171798c7d9c09 100644
--- a/gcc/testsuite/gcc.dg/vect/pr51581-4.c
+++ b/gcc/testsuite/gcc.dg/vect/pr51581-4.c
@@ -145,17 +145,20 @@ main ()
     }
   f1 ();
   f2 ();
+#pragma GCC novector
   for (i = 0; i < 16; i++)
     if (a[i] != b[i] / 8 || c[i] != d[i] / 3)
       abort ();
   f3 ();
   f4 ();
+#pragma GCC novector
   for (i = 0; i < 16; i+= 2)
     if (a[i] != b[i] / 8 || a[i + 1] != b[i + 1] / 4
 	|| c[i] != d[i] / 3 || c[i + 1] != d[i + 1] / 5)
       abort ();
   f5 ();
   f6 ();
+#pragma GCC novector
   for (i = 0; i < 16; i+= 2)
     if (a[i] != b[i] / 14 || a[i + 1] != b[i + 1] / 15
 	|| c[i] != d[i] / ((i & 7) == 6 ? 13 : 6) || c[i + 1] != d[i + 1] / 5)
diff --git a/gcc/testsuite/gcc.dg/vect/pr53185-2.c b/gcc/testsuite/gcc.dg/vect/pr53185-2.c
index 6057c69a24a81be20ecc5582685fb4516f47803d..51614e70d8feac0004644b2e6bb7deb52eeeefea 100644
--- a/gcc/testsuite/gcc.dg/vect/pr53185-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr53185-2.c
@@ -20,6 +20,7 @@ int main ()
   for (off = 0; off < 8; ++off)
     {
       fn1 (&a[off], &b[off], 32 - off, 3);
+#pragma GCC novector
       for (i = 0; i < 32 - off; ++i)
 	if (a[off+i] != b[off+i*3])
 	  abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr56826.c b/gcc/testsuite/gcc.dg/vect/pr56826.c
index e8223808184e6b7b37a6d458bdb440566314e959..2f2da458b89ac04634cb809873d7a60e55484499 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56826.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56826.c
@@ -35,6 +35,7 @@ int main()
       __asm__ volatile ("");
     }
   bar (&A[0], &B[0], 100);
+#pragma GCC novector
   for (i=0; i<300; i++)
     if (A[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr56918.c b/gcc/testsuite/gcc.dg/vect/pr56918.c
index 1c88d324b902e9389afe4c5c729f20b2ad790dbf..4941453bbe9940b4e775239c4c2c9606435ea20a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56918.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56918.c
@@ -22,6 +22,7 @@ main ()
   foo ();
   if (data[0] != 3 || data[7] != 1)
     abort ();
+#pragma GCC novector
   for (i = 1; i < 4; ++i)
     if (data[i] != i || data[i + 3] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr56920.c b/gcc/testsuite/gcc.dg/vect/pr56920.c
index 865cfda760d1978eb1f3f063c75e2bac558254bd..ef73471468392b573e999a59e282b4d796556b8d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56920.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56920.c
@@ -12,6 +12,7 @@ main ()
   check_vect ();
   for (i = 0; i < 15; ++i)
     a[i] = (i * 2) % 15;
+#pragma GCC novector
   for (i = 0; i < 15; ++i)
     if (a[i] != (i * 2) % 15)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr56933.c b/gcc/testsuite/gcc.dg/vect/pr56933.c
index 7206682d7935a0436aaf502537bb56642d5e4648..2f2afe6df134163d2e7761be4906d778dbd6b670 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56933.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56933.c
@@ -25,6 +25,7 @@ int main()
   for (i = 0; i < 2*1024; i++)
     d[i] = 1.;
   foo (b, d, f);
+#pragma GCC novector
   for (i = 0; i < 1024; i+= 2)
     {
       if (d[2*i] != 2.)
@@ -32,6 +33,7 @@ int main()
       if (d[2*i+1] != 4.)
 	abort ();
     }
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     {
       if (b[i] != 1.)
diff --git a/gcc/testsuite/gcc.dg/vect/pr57705.c b/gcc/testsuite/gcc.dg/vect/pr57705.c
index e17ae09beb68051637c3ece69ac2f29e1433008d..39c32946d74ef01efce6fbc2f23c72dd0b33091d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr57705.c
+++ b/gcc/testsuite/gcc.dg/vect/pr57705.c
@@ -47,14 +47,17 @@ main ()
   int i;
   check_vect ();
   foo (5, 3);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (a[i] != 5 + 4 * i)
       abort ();
   bar (5, 3);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (a[i] != 9 + 4 * i)
       abort ();
   baz (5, 3);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (a[i] != 5 + 4 * i || b[i] != (unsigned char) i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr57741-2.c b/gcc/testsuite/gcc.dg/vect/pr57741-2.c
index df63a49927d38badb2503787bcd828b796116199..6addd76b422614a2e28272f4d696e3cba4bb0376 100644
--- a/gcc/testsuite/gcc.dg/vect/pr57741-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr57741-2.c
@@ -34,6 +34,7 @@ main ()
   int i;
   check_vect ();
   foo (p, q, 1.5f);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (p[i] != 1.0f + i * 1.5f || q[i] != 2.0f + i * 0.5f)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr57741-3.c b/gcc/testsuite/gcc.dg/vect/pr57741-3.c
index 2e4954ac7f14b21463b0ef0ca97e05c4eb96e8fd..916fa131513b88321d36cdbe46f101361b4f8244 100644
--- a/gcc/testsuite/gcc.dg/vect/pr57741-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr57741-3.c
@@ -33,6 +33,7 @@ main ()
   check_vect ();
   r[0] = 0;
   foo (1.5f);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (p[i] != 1.0f + i * 1.5f || q[i] != 2.0f + i * 0.5f || r[i] != 1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr59591-1.c b/gcc/testsuite/gcc.dg/vect/pr59591-1.c
index 892fce58e36b37e5412cc6c100f82b6077ace77e..e768fb3e1de48cf43b389cf83b4f7f1f030c4f91 100644
--- a/gcc/testsuite/gcc.dg/vect/pr59591-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr59591-1.c
@@ -31,6 +31,7 @@ bar (void)
       t[i] = i * 13;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 256; i++)
     if ((i >> 2) & (1 << (i & 3)))
       {
diff --git a/gcc/testsuite/gcc.dg/vect/pr59591-2.c b/gcc/testsuite/gcc.dg/vect/pr59591-2.c
index bd82d765794a32af6509ffd60d1f552ce10570a3..3bdf4252cffe63830b5b47cd17fa29a3c65afc73 100644
--- a/gcc/testsuite/gcc.dg/vect/pr59591-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr59591-2.c
@@ -32,6 +32,7 @@ bar (void)
       t[i] = i * 13;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 256; i++)
     if ((i >> 2) & (1 << (i & 3)))
       {
diff --git a/gcc/testsuite/gcc.dg/vect/pr59594.c b/gcc/testsuite/gcc.dg/vect/pr59594.c
index 947fa4c0c301d98cbdfeb5da541482858b69180f..e3ece8abf7131aa4ed0a2d5af79d4bdea90bd8c1 100644
--- a/gcc/testsuite/gcc.dg/vect/pr59594.c
+++ b/gcc/testsuite/gcc.dg/vect/pr59594.c
@@ -22,6 +22,7 @@ main ()
     }
   if (b[0] != 1)
     __builtin_abort ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (b[i + 1] != i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr59984.c b/gcc/testsuite/gcc.dg/vect/pr59984.c
index d6977f0020878c043376b7e7bfdc6a0e85ac2663..c00c2267158667784fb084b0ade19e2ab763c6a3 100644
--- a/gcc/testsuite/gcc.dg/vect/pr59984.c
+++ b/gcc/testsuite/gcc.dg/vect/pr59984.c
@@ -37,6 +37,7 @@ test (void)
       foo (a[i], &v1, &v2);
       a[i] = v1 * v2;
     }
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * i * i * i - 1)
       __builtin_abort ();
@@ -49,6 +50,7 @@ test (void)
       bar (a[i], &v1, &v2);
       a[i] = v1 * v2;
     }
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * i * i * i - 1)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr60276.c b/gcc/testsuite/gcc.dg/vect/pr60276.c
index 9fc18ac7428cf71903b6ebb04b90eb21b2e8b3c7..824e2a336b6d9fad2e7a72c445ec2edf80be8138 100644
--- a/gcc/testsuite/gcc.dg/vect/pr60276.c
+++ b/gcc/testsuite/gcc.dg/vect/pr60276.c
@@ -44,6 +44,7 @@ int main(void)
   foo (out + 2, lp + 1, 48);
   foo_novec (out2 + 2, lp + 1, 48);
 
+#pragma GCC novector
   for (s = 0; s < 49; s++)
     if (out[s] != out2[s])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr61194.c b/gcc/testsuite/gcc.dg/vect/pr61194.c
index 8421367577278cdf5762327d83cdc4a0e65c9411..8cd38b3d5da616d65ba131d048280b1d5644339d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr61194.c
+++ b/gcc/testsuite/gcc.dg/vect/pr61194.c
@@ -32,6 +32,7 @@ int main()
 
   barX();
 
+#pragma GCC novector
   for (i = 0; i < 1024; ++i)
     if (z[i] != ((x[i]>0 && w[i]<0) ? 0. : 1.))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr61680.c b/gcc/testsuite/gcc.dg/vect/pr61680.c
index e25bf78090ce49d68cb3694233253b403709331a..bb24014bdf045f22a0c9c5234481f07153c25d41 100644
--- a/gcc/testsuite/gcc.dg/vect/pr61680.c
+++ b/gcc/testsuite/gcc.dg/vect/pr61680.c
@@ -8,6 +8,7 @@ bar (double p[][4])
 {
   int i;
   double d = 172.0;
+#pragma GCC novector
   for (i = 0; i < 4096; i++)
     {
       if (p[i][0] != 6.0 || p[i][1] != 6.0 || p[i][2] != 10.0)
diff --git a/gcc/testsuite/gcc.dg/vect/pr62021.c b/gcc/testsuite/gcc.dg/vect/pr62021.c
index 40c64429d6382821af4a31b3569c696ea0e5fa2a..460fadb3f6cd73c7cac2bbba65cc09d4211396e8 100644
--- a/gcc/testsuite/gcc.dg/vect/pr62021.c
+++ b/gcc/testsuite/gcc.dg/vect/pr62021.c
@@ -24,6 +24,7 @@ main ()
   #pragma omp simd
   for (i = 0; i < 1024; i++)
     b[i] = foo (b[i], i);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (b[i] != &a[1023])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr63341-2.c b/gcc/testsuite/gcc.dg/vect/pr63341-2.c
index 2004a79b80ef4081136ade20df9b6acd5b6428c1..aa338263a7584b06f10e4cb4a6baf19dea20f40a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr63341-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr63341-2.c
@@ -16,6 +16,7 @@ foo ()
   int i;
   for (i = 0; i < 32; i++)
     d[i] = t.s[i].s + 4;
+#pragma GCC novector
   for (i = 0; i < 32; i++)
     if (d[i] != t.s[i].s + 4)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr64252.c b/gcc/testsuite/gcc.dg/vect/pr64252.c
index b82ad017c16fda6e031b503a9b11fe39a3691a6c..89070c27ff0f9763bd8eaff4a81b5b0197ae12dc 100644
--- a/gcc/testsuite/gcc.dg/vect/pr64252.c
+++ b/gcc/testsuite/gcc.dg/vect/pr64252.c
@@ -57,6 +57,7 @@ int main()
   int i;
   check_vect ();
   bar(2, q);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (q[0].a[i].f != 0 || q[0].a[i].c != i || q[0].a[i].p != -1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr64404.c b/gcc/testsuite/gcc.dg/vect/pr64404.c
index 26fceb6cd8936f7300fb0067c0f18c3d35ac4595..6fecf9ecae18e49808a58fe17a6b912786bdbad3 100644
--- a/gcc/testsuite/gcc.dg/vect/pr64404.c
+++ b/gcc/testsuite/gcc.dg/vect/pr64404.c
@@ -42,6 +42,7 @@ main (void)
 
   Compute ();
 
+#pragma GCC novector
   for (d = 0; d < 1024; d++)
     {
       if (Y[d].l != X[d].l + X[d].h
diff --git a/gcc/testsuite/gcc.dg/vect/pr64421.c b/gcc/testsuite/gcc.dg/vect/pr64421.c
index 3b5ab2d980c207c1d5e7fff73cd403ac38790080..47afd22d93e5ed8fbfff034cd2a03d8d70f7e422 100644
--- a/gcc/testsuite/gcc.dg/vect/pr64421.c
+++ b/gcc/testsuite/gcc.dg/vect/pr64421.c
@@ -27,6 +27,7 @@ main ()
     a[i] = foo (a[i], i);
   if (a[0] != 1 || a[1] != 3)
     abort ();
+#pragma GCC novector
   for (i = 2; i < 1024; i++)
     if (a[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr64493.c b/gcc/testsuite/gcc.dg/vect/pr64493.c
index 6fb13eb6d96fe67471fdfafd2eed2a897ae8b670..d3faf84bcc16d31fc11dd2d0cd7242972fdbafdc 100644
--- a/gcc/testsuite/gcc.dg/vect/pr64493.c
+++ b/gcc/testsuite/gcc.dg/vect/pr64493.c
@@ -9,6 +9,7 @@ main ()
 
   for (; a; a--)
     for (d = 1; d <= 0; d++)
+#pragma GCC novector
       for (; d;)
 	if (h)
 	  {
diff --git a/gcc/testsuite/gcc.dg/vect/pr64495.c b/gcc/testsuite/gcc.dg/vect/pr64495.c
index 5cbaeff8389dafd3444f90240a910e7d5e4f2431..c48f9389aa325a8b8ceb5697684f563b8c13a72d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr64495.c
+++ b/gcc/testsuite/gcc.dg/vect/pr64495.c
@@ -11,6 +11,7 @@ main ()
 
   for (; a;)
     for (; g; g++)
+#pragma GCC novector
       for (; f; f++)
 	if (j)
 	  {
diff --git a/gcc/testsuite/gcc.dg/vect/pr66251.c b/gcc/testsuite/gcc.dg/vect/pr66251.c
index 26afbc96a5d57a49fbbac95753f4df006cb36018..355590e69a98687084fee2c5486d14c2a20f3fcb 100644
--- a/gcc/testsuite/gcc.dg/vect/pr66251.c
+++ b/gcc/testsuite/gcc.dg/vect/pr66251.c
@@ -51,6 +51,7 @@ int main ()
 
       test1(da, ia, stride, 256/stride);
 
+#pragma GCC novector
       for (i = 0; i < 256/stride; i++)
 	{
 	  if (da[i*stride] != ia[i*stride])
@@ -66,6 +67,7 @@ int main ()
 
       test2(ia, da, stride, 256/stride);
 
+#pragma GCC novector
       for (i = 0; i < 256/stride; i++)
 	{
 	  if (da[i*stride] != ia[i*stride])
diff --git a/gcc/testsuite/gcc.dg/vect/pr66253.c b/gcc/testsuite/gcc.dg/vect/pr66253.c
index bdf3ff9ca51f7f656fad687fd8c77c6ee053794f..6b99b4f3b872cbeab14e035f2e2d40aab6e438e4 100644
--- a/gcc/testsuite/gcc.dg/vect/pr66253.c
+++ b/gcc/testsuite/gcc.dg/vect/pr66253.c
@@ -39,6 +39,7 @@ int main ()
 
       test1(da, ia, ca, stride, 256/stride);
 
+#pragma GCC novector
       for (i = 0; i < 256/stride; i++)
 	{
 	  if (da[i*stride] != 0.5 * ia[i*stride] * ca[i*stride])
diff --git a/gcc/testsuite/gcc.dg/vect/pr68502-1.c b/gcc/testsuite/gcc.dg/vect/pr68502-1.c
index 4f7d0bfca38693877ff080842d6ef7abf3d3e17b..cc6e6cd9a2be0e921382bda3c653f6a6b730b905 100644
--- a/gcc/testsuite/gcc.dg/vect/pr68502-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr68502-1.c
@@ -41,6 +41,7 @@ int main ()
   for (i = 0; i < numf1s; i++)
     f1_layer[i].I = (double *)-1;
   reset_nodes ();
+#pragma GCC novector
   for (i = 0; i < numf1s; i++)
     if (f1_layer[i].I != (double *)-1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr68502-2.c b/gcc/testsuite/gcc.dg/vect/pr68502-2.c
index a3eddafc7ca76cbe4c21f6ed873249cb2c94b7a6..11f87125b75df9db29669aa55cdc3c202b0fedda 100644
--- a/gcc/testsuite/gcc.dg/vect/pr68502-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr68502-2.c
@@ -41,6 +41,7 @@ int main ()
   for (i = 0; i < numf1s; i++)
     f1_layer[i].I = -1;
   reset_nodes ();
+#pragma GCC novector
   for (i = 0; i < numf1s; i++)
     if (f1_layer[i].I != -1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr69820.c b/gcc/testsuite/gcc.dg/vect/pr69820.c
index be24e4fa9a1343e4308bfd967f1ccfdd3549db5c..72d10b65c16b54764aac0cf271138ffa187f4052 100644
--- a/gcc/testsuite/gcc.dg/vect/pr69820.c
+++ b/gcc/testsuite/gcc.dg/vect/pr69820.c
@@ -28,6 +28,7 @@ main ()
       c[i] = 38364;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 100; ++i)
     if (b[i] != 0xed446af8U)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr70021.c b/gcc/testsuite/gcc.dg/vect/pr70021.c
index 988fc53216d12908bbbc564c9efc4d63a5c057d7..d4d5db12bc0e646413ba393b57edc60ba1189059 100644
--- a/gcc/testsuite/gcc.dg/vect/pr70021.c
+++ b/gcc/testsuite/gcc.dg/vect/pr70021.c
@@ -32,6 +32,7 @@ main ()
       e[i] = 14234165565810642243ULL;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < N; ++i)
     if (e[i] != ((i & 3) ? 14234165565810642243ULL : 1ULL))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr70354-1.c b/gcc/testsuite/gcc.dg/vect/pr70354-1.c
index 9d601dc9d4a92922e4114b8b4d1b7ef2f49c0c44..2687758b022b01af3eb7b444fee25be8bc1f8b3c 100644
--- a/gcc/testsuite/gcc.dg/vect/pr70354-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr70354-1.c
@@ -41,6 +41,7 @@ main ()
       h[i] = 8193845517487445944ULL;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (d[i] != 8193845517487445944ULL || e[i] != 1
 	|| g[i] != 4402992416302558097ULL)
diff --git a/gcc/testsuite/gcc.dg/vect/pr70354-2.c b/gcc/testsuite/gcc.dg/vect/pr70354-2.c
index 160e1e083e03e0652d06bf29df060192cbe75fd5..cb4cdaae30ba5760fc32e255b651072ca397a499 100644
--- a/gcc/testsuite/gcc.dg/vect/pr70354-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr70354-2.c
@@ -29,6 +29,7 @@ main ()
       b[i] = 0x1200000000ULL + (i % 54);
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != (0x1234ULL << (i % 54)))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr71259.c b/gcc/testsuite/gcc.dg/vect/pr71259.c
index 587a8e3c8f378f3c57f8a9a2e9fa5aee3a968860..6cb22f622ee2ce2d6de51c440472e36fe7294362 100644
--- a/gcc/testsuite/gcc.dg/vect/pr71259.c
+++ b/gcc/testsuite/gcc.dg/vect/pr71259.c
@@ -20,6 +20,7 @@ main ()
   asm volatile ("" : : : "memory");
   for (i = 0; i < 44; i++) 
     for (j = 0; j < 17; j++)
+#pragma GCC novector
       for (k = 0; k < 2; k++)
 	if (c[i][j][k] != -5105075050047261684)
 	  __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr78005.c b/gcc/testsuite/gcc.dg/vect/pr78005.c
index 7cefe73fe1b3d0050befeb5e25aec169867fd96a..6da7acf50c2a1237b817abf8e6b9191b3c3e1378 100644
--- a/gcc/testsuite/gcc.dg/vect/pr78005.c
+++ b/gcc/testsuite/gcc.dg/vect/pr78005.c
@@ -22,6 +22,7 @@ foo (int n, int d)
 
 #define check_u(x)		\
   foo (x, 2);			\
+  _Pragma("GCC novector")	\
   for (i = 0; i < N; i++)	\
     {				\
       if (u[i] != res##x[i])	\
diff --git a/gcc/testsuite/gcc.dg/vect/pr78558.c b/gcc/testsuite/gcc.dg/vect/pr78558.c
index 2606d4ec10d3fa18a4c0e4b8e9dd02131cb57ba7..2c28426eb85fc6663625c542e84860fa7bcfd3c2 100644
--- a/gcc/testsuite/gcc.dg/vect/pr78558.c
+++ b/gcc/testsuite/gcc.dg/vect/pr78558.c
@@ -37,6 +37,7 @@ main ()
   asm volatile ("" : : "g" (s), "g" (d) : "memory");
   foo ();
   asm volatile ("" : : "g" (s), "g" (d) : "memory");
+#pragma GCC novector
   for (i = 0; i < 50; ++i)
     if (d[i].q != i || d[i].r != 50 * i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr80815-2.c b/gcc/testsuite/gcc.dg/vect/pr80815-2.c
index 83557daa6963632ccf2cf0a641a4106b4dc833f5..3ffff0be3be96df4c3e6a3d5caa68b7d4b6bad9a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr80815-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr80815-2.c
@@ -38,6 +38,7 @@ int main (void)
 
   foo (a, b);
 
+#pragma GCC novector
   for (i = 973; i < 1020; i++)
     if (arr[i] != res[i - 973])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr80815-3.c b/gcc/testsuite/gcc.dg/vect/pr80815-3.c
index 50392ab1a417de2af81af6473bf0a85bd9eb7279..5e2be5262ebb639d4bd771e326f9a07ed2ee0680 100644
--- a/gcc/testsuite/gcc.dg/vect/pr80815-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr80815-3.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo (a, b, 50);
 
+#pragma GCC novector
   for (i = 975; i < 1025; i++)
     if (arr[i] != res[i - 975])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr80928.c b/gcc/testsuite/gcc.dg/vect/pr80928.c
index e6c1f1ab5a7f4ca7eac98cf91fccffbff2dcfc7a..34566c4535247d2fa39c5d856d1e0c32687e9a2a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr80928.c
+++ b/gcc/testsuite/gcc.dg/vect/pr80928.c
@@ -25,6 +25,7 @@ int main ()
   foo ();
 
   /* check results */
+#pragma GCC novector
   for (int i = 0; i < 1020; ++i)
     if (a[i] != ((i + 4) / 5) * 5)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr81410.c b/gcc/testsuite/gcc.dg/vect/pr81410.c
index 9c91c08d33c729d8ff26cae72f4651081850b550..6b7586992fe46918aab537a06f166ce2e25f90d8 100644
--- a/gcc/testsuite/gcc.dg/vect/pr81410.c
+++ b/gcc/testsuite/gcc.dg/vect/pr81410.c
@@ -26,6 +26,7 @@ int main()
       __asm__ volatile ("" : : : "memory");
     }
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 8; ++i)
     if (y[2*i] != 3*i || y[2*i+1] != 3*i + 1)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr81633.c b/gcc/testsuite/gcc.dg/vect/pr81633.c
index 9689ab3959cd9df8234b89ec307b7cd5d6f9d795..2ad144a60444eb82b8e8575efd8fcec94fcd6f01 100644
--- a/gcc/testsuite/gcc.dg/vect/pr81633.c
+++ b/gcc/testsuite/gcc.dg/vect/pr81633.c
@@ -24,6 +24,7 @@ int main(void)
   double A[4][4] = {{0.0}};
   kernel(A);
   for ( int i = 0; i < 4; i++ )
+#pragma GCC novector
     for ( int j = 0; j < 4; j++ )
       if (A[i][j] != expected[i][j])
 	__builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr81740-1.c b/gcc/testsuite/gcc.dg/vect/pr81740-1.c
index f6fd43c7c87e0aad951ba092796f0aae39b80d54..b01e1994834934bbd50f3fc1cbcf494ecc62c315 100644
--- a/gcc/testsuite/gcc.dg/vect/pr81740-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr81740-1.c
@@ -14,6 +14,7 @@ main ()
     for (c = 0; c <= 6; c++)
       a[c + 1][b + 2] = a[c][b + 1];
   for (i = 0; i < 8; i++)
+#pragma GCC novector
     for (d = 0; d < 10; d++)
       if (a[i][d] != (i == 3 && d == 6) * 4)
 	__builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr81740-2.c b/gcc/testsuite/gcc.dg/vect/pr81740-2.c
index 1e0d6645a03f77c9c042313fd5377b71ba75c4d6..7b2bfe139f20fb66c90cfd643b65df3edb9b536e 100644
--- a/gcc/testsuite/gcc.dg/vect/pr81740-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr81740-2.c
@@ -15,6 +15,7 @@ main ()
     for (c = 6; c >= 0; c--)
       a[c + 1][b + 2] = a[c][b + 1];
   for (i = 0; i < 8; i++)
+#pragma GCC novector
     for (d = 0; d < 10; d++)
       if (a[i][d] != (i == 3 && d == 6) * 4)
 	__builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr85586.c b/gcc/testsuite/gcc.dg/vect/pr85586.c
index 3d075bfcec83bab119f77bad7b642eb3d634fb4c..a4a170a1fcd130d84da3be9f897889ff4cfc717c 100644
--- a/gcc/testsuite/gcc.dg/vect/pr85586.c
+++ b/gcc/testsuite/gcc.dg/vect/pr85586.c
@@ -24,6 +24,7 @@ main (void)
     }
 
   foo (out, in, 1);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (out[i] != in[i])
       __builtin_abort ();
@@ -33,6 +34,7 @@ main (void)
   foo (out + N - 1, in, -1);
   if (out[0] != in[N - 1])
     __builtin_abort ();
+#pragma GCC novector
   for (int i = 1; i <= N; ++i)
     if (out[i] != 2)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr87288-1.c b/gcc/testsuite/gcc.dg/vect/pr87288-1.c
index 0d0a70dff6f21b2f07fecd937d4fe26c0df61513..ec968dfcd0153cdb001e8e282146dbdb67d23c65 100644
--- a/gcc/testsuite/gcc.dg/vect/pr87288-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr87288-1.c
@@ -16,6 +16,7 @@ run (int *restrict a, int *restrict b, int count)
 void __attribute__ ((noipa))
 check (int *restrict a, int count)
 {
+#pragma GCC novector
   for (int i = 0; i < count * N; ++i)
     if (a[i * 2] != i * 41 + count || a[i * 2 + 1] != count)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr87288-2.c b/gcc/testsuite/gcc.dg/vect/pr87288-2.c
index e9ff9a0be7c08a9755972717a63025f2825e95cf..03c7f88a6a48507bbbfbf2e177425d28605a3aa6 100644
--- a/gcc/testsuite/gcc.dg/vect/pr87288-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr87288-2.c
@@ -22,6 +22,7 @@ RUN_COUNT (4)
 void __attribute__ ((noipa))
 check (int *restrict a, int count)
 {
+#pragma GCC novector
   for (int i = 0; i < count * N; ++i)
     if (a[i * 2] != i * 41 + count || a[i * 2 + 1] != count)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr87288-3.c b/gcc/testsuite/gcc.dg/vect/pr87288-3.c
index 23f574ccb53268b59b933ec59a5eadaa890007ff..0475990992e58451de8649b735fa16f0e32ed657 100644
--- a/gcc/testsuite/gcc.dg/vect/pr87288-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr87288-3.c
@@ -22,6 +22,7 @@ RUN_COUNT (4)
 void __attribute__ ((noipa))
 check (int *restrict a, int count)
 {
+#pragma GCC novector
   for (int i = 0; i < count * N + 1; ++i)
     if (a[i * 2] != i * 41 + count || a[i * 2 + 1] != count)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr88903-1.c b/gcc/testsuite/gcc.dg/vect/pr88903-1.c
index 77dbfd47c91be8cce0edde8b09b7b90d40268306..0f78ccc995d5dcd35d5d7ba0f35afdc8bb5a1b2b 100644
--- a/gcc/testsuite/gcc.dg/vect/pr88903-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr88903-1.c
@@ -19,6 +19,7 @@ main()
   for (int i = 0; i < 1024; ++i)
     x[i] = i;
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != i << ((i/2+1) & 31))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr88903-2.c b/gcc/testsuite/gcc.dg/vect/pr88903-2.c
index cd88a99c6045c6a3eb848f053386d22b9cbe46ce..8a1cf9c523632f392d95aa2d6ec8332fa50fec5b 100644
--- a/gcc/testsuite/gcc.dg/vect/pr88903-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr88903-2.c
@@ -21,6 +21,7 @@ int main()
   for (int i = 0; i < 1024; ++i)
     x[i] = i, y[i] = i % 8;
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != i << ((i & ~1) % 8))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr90018.c b/gcc/testsuite/gcc.dg/vect/pr90018.c
index 52640f5aa6f02d6deed3b2790482a2d2d01ddd5b..08ca326f7ebfab1a42813bc121f1e5a46394e983 100644
--- a/gcc/testsuite/gcc.dg/vect/pr90018.c
+++ b/gcc/testsuite/gcc.dg/vect/pr90018.c
@@ -41,6 +41,7 @@ int main(int argc, char **argv)
       a42[i*4+n*4+1] = tem4 + a42[i*4+n*4+1];
       __asm__ volatile ("": : : "memory");
     }
+#pragma GCC novector
   for (int i = 0; i < 4 * n * 3; ++i)
     if (a4[i] != a42[i])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr92420.c b/gcc/testsuite/gcc.dg/vect/pr92420.c
index e43539fbbd7202b3ae2e9f71bfd82a3fcdf8bde3..e56eb0e12fbec55b16785e244f3a24b889af784d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr92420.c
+++ b/gcc/testsuite/gcc.dg/vect/pr92420.c
@@ -41,6 +41,7 @@ main ()
     }
   foo (a, b + N, d, N);
   bar (a, c, e, N);
+#pragma GCC novector
   for (i = 0; i < N; ++i)
     if (d[i].r != e[i].r || d[i].i != e[i].i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr94994.c b/gcc/testsuite/gcc.dg/vect/pr94994.c
index e98aeb090d8cbcfc9628052b553b7a7d226069d1..2f598eacd541eafaef02f9aee34fc769dac2a4c6 100644
--- a/gcc/testsuite/gcc.dg/vect/pr94994.c
+++ b/gcc/testsuite/gcc.dg/vect/pr94994.c
@@ -41,6 +41,7 @@ main (void)
       for (unsigned int j = 0; j < INPUT_SIZE + MAX_STEP; ++j)
 	x[j] = j + 10;
       copy (x + i, x, INPUT_SIZE);
+#pragma GCC novector
       for (int j = 0; j < INPUT_SIZE + i; ++j)
 	{
 	  int expected;
diff --git a/gcc/testsuite/gcc.dg/vect/pr96783-1.c b/gcc/testsuite/gcc.dg/vect/pr96783-1.c
index 55d1364f056febd86c49272ede488bd37867dbe8..2de222d2ae6491054b6c7a6cf5891580abf5c6f7 100644
--- a/gcc/testsuite/gcc.dg/vect/pr96783-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr96783-1.c
@@ -31,6 +31,7 @@ int main ()
     a[i] = i;
   foo (a + 3 * 5, 6-1, 5);
   const long b[3 * 8] = { 0, 1, 2, 21, 22, 23, 18, 19, 20, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 };
+#pragma GCC novector
   for (int i = 0; i < 3 * 8; ++i)
     if (a[i] != b[i])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr96783-2.c b/gcc/testsuite/gcc.dg/vect/pr96783-2.c
index 33c37109e3a8de646edd8339b0c98300bed25b51..bcdcfac072cf564d965edd4be7fbd9b23302e759 100644
--- a/gcc/testsuite/gcc.dg/vect/pr96783-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr96783-2.c
@@ -20,6 +20,7 @@ int main()
   for (int i = 0; i < 1024; ++i)
     b[i] = i;
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 256; ++i)
     if (a[3*i] != 1023 - 3*i - 2
 	|| a[3*i+1] != 1023 - 3*i - 1
diff --git a/gcc/testsuite/gcc.dg/vect/pr97081-2.c b/gcc/testsuite/gcc.dg/vect/pr97081-2.c
index 98ad3c3fe17e4556985cb6a0392de72a19911a97..436e897cd2e6a8bb41228cec14480bac88e98952 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97081-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97081-2.c
@@ -24,6 +24,7 @@ main ()
       c[i] = i;
     }
   foo (3);
+#pragma GCC novector
   for (int i = 0; i < 1024; i++)
     if (s[i] != (unsigned short) ((i << 3) | (i >> (__SIZEOF_SHORT__ * __CHAR_BIT__ - 3)))
         || c[i] != (unsigned char) ((((unsigned char) i) << 3) | (((unsigned char) i) >> (__CHAR_BIT__ - 3))))
diff --git a/gcc/testsuite/gcc.dg/vect/pr97558-2.c b/gcc/testsuite/gcc.dg/vect/pr97558-2.c
index 8f0808686fbad0b5b5ec11471fd38f53ebd81bde..5dff065f2e220b1ff31027c271c07c9670b98f9c 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97558-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97558-2.c
@@ -41,6 +41,7 @@ int main (void)
   foo (N-1);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/pr97678.c b/gcc/testsuite/gcc.dg/vect/pr97678.c
index 7fb6c93515e41257f173f664d9304755a8dc0de2..1fa56326422e832e82bb6f1739f14ea1a1cb4955 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97678.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97678.c
@@ -19,6 +19,7 @@ main ()
       b[i * 2 + 1] = i * 8;
     }
 
+#pragma GCC novector
   for (i = 0; i < 158; ++i)
     if (b[i*2] != (unsigned short)(i*7)
         || b[i*2+1] != (unsigned short)(i*8))
diff --git a/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c b/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c
index 4373dce917f9d7916e128a639e81179fe1250ada..1154b40d4855b5a42187134e9d5f08a98a160744 100644
--- a/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c
+++ b/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c
@@ -22,6 +22,7 @@ int main (void)
   int i;
   check_vect ();
   foo ();
+#pragma GCC novector
   for (i = 0; i < 100; i++)
     if (f[i]!=1) 
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
index e3466d0da1de6207b8583f42aad412b2c2000dcc..dbf65605e91c4219b6f5c6de220384ed09e999a7 100644
--- a/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
+++ b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
@@ -50,6 +50,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1[2].a.n[1][2][i] != 5)
@@ -63,6 +64,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = NINTS - 1; i < N - 1; i++)
     {
       if (tmp1[2].a.n[1][2][i] != 6)
@@ -81,6 +83,7 @@ int main1 ()
   /* check results:  */
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
 	{
           if (tmp1[2].e.n[1][i][j] != 8)
@@ -100,6 +103,7 @@ int main1 ()
   /* check results:  */
   for (i = 0; i < N - NINTS; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N - NINTS; j++)
 	{
           if (tmp2[2].e.n[1][i][j] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-1.c b/gcc/testsuite/gcc.dg/vect/slp-1.c
index 26b71d654252bcd2e4591f11a78a4c0a3dad5d85..82e4f6469fb9484f84c5c832d0461576b63ba8fe 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-1.c
@@ -20,6 +20,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != 8 
@@ -42,6 +43,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] != 8
@@ -66,6 +68,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*5] != 8
@@ -91,6 +94,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*9] != 8
diff --git a/gcc/testsuite/gcc.dg/vect/slp-10.c b/gcc/testsuite/gcc.dg/vect/slp-10.c
index da44f26601a9ba8ea52417ec5a160dc4bedfc315..2759b66f7772cb1af508622a3099bdfb524cba56 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-10.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-10.c
@@ -46,6 +46,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
@@ -68,6 +69,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  (in[i*4] + 2) * 3
@@ -84,6 +86,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out2[i*2] !=  (float) (in[i*2] * 2 + 5)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-11a.c b/gcc/testsuite/gcc.dg/vect/slp-11a.c
index e6632fa77be8092524a202d6a322354b45e1794d..fcb7cf6c7a2c5d42ec7ce8bc081db7394ba2bd96 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11a.c
@@ -44,6 +44,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-11b.c b/gcc/testsuite/gcc.dg/vect/slp-11b.c
index d0b972f720be1c965207ded917f979957c76ee67..df64c8db350dbb12295c61e84d32d5a5c20a1ebe 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11b.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11b.c
@@ -22,6 +22,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  (in[i*4] + 2) * 3
diff --git a/gcc/testsuite/gcc.dg/vect/slp-11c.c b/gcc/testsuite/gcc.dg/vect/slp-11c.c
index bdcf434ce31ebc1df5f7cfecb5051ebc71af3aed..0f680cd4e60c41624992e4fb68d2c3664ff1722e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11c.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11c.c
@@ -21,6 +21,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out[i*2] !=  ((float) in[i*2] * 2 + 6)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c b/gcc/testsuite/gcc.dg/vect/slp-12a.c
index 08a8f55bab0b3d09e7eae14354c515203146b3d8..f0dda55acaea38e463044c7495af1f57ac121ce0 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c
@@ -47,6 +47,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-12b.c b/gcc/testsuite/gcc.dg/vect/slp-12b.c
index 48e78651a6dca24de91a1f36d0cd757e18f7c1b8..e2ea24d6c535c60ba903ce2411290e603414009a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12b.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12b.c
@@ -23,6 +23,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out2[i*2] !=  (float) (in[i*2] * 2 + 11)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-12c.c b/gcc/testsuite/gcc.dg/vect/slp-12c.c
index 6650b8bd94ece71dd9ccb9adcc3d17be2f2bc07a..9c48dff3bf486a8cd1843876975dfba40a055a23 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12c.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12c.c
@@ -24,6 +24,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  (in[i*4] + 2) * 3
diff --git a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
index a16656ace00a6a31d0c056056ec2e3e1f050c09f..ca70856c1dd54f106c9f1c3cde6b0ff5f7994e74 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
@@ -34,6 +34,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8] + i
@@ -65,6 +66,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
         if (out2[i*12] != in2[i*12] + i
@@ -100,6 +102,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
         if (out2[i*12] != in2[i*12] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-13.c b/gcc/testsuite/gcc.dg/vect/slp-13.c
index 8769d62cfd4d975a063ad953344855091a1cd129..b7f947e6dbe1fb7d9a8aa8b5f6ac1edfc89d33a2 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-13.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-13.c
@@ -28,6 +28,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8] + i
@@ -59,6 +60,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
         if (out2[i*12] != in2[i*12] + i
@@ -94,6 +96,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
         if (out2[i*12] != in2[i*12] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-14.c b/gcc/testsuite/gcc.dg/vect/slp-14.c
index 6af70815dd43c13fc9abfcebd70c562268dea86f..ccf23c1e44b78ac62dc78eef0ff6c6bc26e99fc1 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-14.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-14.c
@@ -64,6 +64,7 @@ main1 (int n)
 }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-15.c b/gcc/testsuite/gcc.dg/vect/slp-15.c
index dbced88c98d1fc8d289e6ac32a84dc9f4072e49f..13a0f3e3014d84a16a68a807e6a2730cbe8e6840 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-15.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-15.c
@@ -64,6 +64,7 @@ main1 (int n)
 }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-16.c b/gcc/testsuite/gcc.dg/vect/slp-16.c
index a7da9932c54c28669875d46e3e3945962d5e2dee..d053a64276db5c306749969cca7f336ba6a19b0b 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-16.c
@@ -38,6 +38,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*2] !=  (in[i*2] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-17.c b/gcc/testsuite/gcc.dg/vect/slp-17.c
index 6fa11e4c53ad73735af9ee74f56ddff0b777b99b..c759a5f0145ac239eb2a12efa89c4865fdbf703e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-17.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-17.c
@@ -27,6 +27,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*2] != in[i*2] + 5
diff --git a/gcc/testsuite/gcc.dg/vect/slp-18.c b/gcc/testsuite/gcc.dg/vect/slp-18.c
index ed426a344985d1e205f7a94f72f86954a77b3d92..f31088cb76b4cdd80460c0d6a24568430e595ea0 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-18.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-18.c
@@ -57,6 +57,7 @@ main1 ()
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-19a.c b/gcc/testsuite/gcc.dg/vect/slp-19a.c
index 0f92de92cd396227cc668396cd567ca965e9784b..ca7a0a8e456b1b787ad82e910ea5e3c5e5048c80 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-19a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-19a.c
@@ -28,6 +28,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-19b.c b/gcc/testsuite/gcc.dg/vect/slp-19b.c
index 237b36dd227186c8f0cb78b703351fdae6fef27c..4d53ac698dbd164d20271c4fe9ccc2c20f3c4eaa 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-19b.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-19b.c
@@ -29,6 +29,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-19c.c b/gcc/testsuite/gcc.dg/vect/slp-19c.c
index 32566cb5e1320de2ce9c83867c05902a24036de4..188ab37a0b61ba33ff4c19115e5c54e0f7bac500 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-19c.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-19c.c
@@ -47,6 +47,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*12] !=  in[i*12]
@@ -79,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*6] !=  in[i*6]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-2.c b/gcc/testsuite/gcc.dg/vect/slp-2.c
index 8d374d724539a47930fc951888471a7b367cd845..d0de3577eb6a1b8219e8a79a1a684f6b1b7baf52 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-2.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-2.c
@@ -25,6 +25,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != a8 
@@ -55,6 +56,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*16] != a8
@@ -85,6 +87,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*3] != a8
@@ -110,6 +113,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*11] != a8
diff --git a/gcc/testsuite/gcc.dg/vect/slp-20.c b/gcc/testsuite/gcc.dg/vect/slp-20.c
index dc5eab669ea9eaf7db83606b4c426921a6a5da15..ea19095f9fa06db508cfedda68ca2c65769b35b0 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-20.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-20.c
@@ -34,6 +34,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != b0 
@@ -77,6 +78,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != b0 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c b/gcc/testsuite/gcc.dg/vect/slp-21.c
index 4b83adb9807fc29fb9f2d618d15e8eb15290dd67..712a73b69d730fd27cb75d3ebb3624809317f841 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-21.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-21.c
@@ -45,6 +45,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       a0 = in[i*4];
@@ -101,6 +102,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       a0 = in[i*4];
@@ -158,6 +160,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       a0 = in[i*4];
diff --git a/gcc/testsuite/gcc.dg/vect/slp-22.c b/gcc/testsuite/gcc.dg/vect/slp-22.c
index e2a0002ffaf363fc12b76deaaee3067c9a0a186b..2c083dc4ea3b1d7d3c6b56508cc7465b76060aa1 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-22.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-22.c
@@ -39,6 +39,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != b0 
@@ -92,6 +93,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != b0 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-23.c b/gcc/testsuite/gcc.dg/vect/slp-23.c
index d7c67fe2c6e9c6ecf94a2ddc8c1d7a4c234933c8..d32ee5ba73becb9e0b53bfc2af27a64571c56899 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-23.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-23.c
@@ -39,6 +39,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].c + arr[i].c
@@ -67,6 +68,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].c != arr[i].c + arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
index abd3a878f1ac36a7c8cde58743496f79b71f4476..5eaea9600acb2b8ffe674730bcf9514b51ae105f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
@@ -42,6 +42,7 @@ main1 (unsigned char x, unsigned char max_result, unsigned char min_result, s *a
     pIn++;
   }
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (ua1[2*i] != ub[2*i]
         || ua1[2*i+1] != ub[2*i+1]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-24.c b/gcc/testsuite/gcc.dg/vect/slp-24.c
index a45ce7de71fa6a8595b611dd47507df4e91e3b36..59178f2c0f28bdbf657ad68658d373e75d076f79 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-24.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-24.c
@@ -41,6 +41,7 @@ main1 (unsigned char x, unsigned char max_result, unsigned char min_result, s *a
     pIn++;
   }
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (ua1[2*i] != ub[2*i]
         || ua1[2*i+1] != ub[2*i+1]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-25.c b/gcc/testsuite/gcc.dg/vect/slp-25.c
index 1c33927c4342e01f80765d0ea723e01cec5fe2e6..9e3b5bbc9469fd0dc8631332643c1eb496652218 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-25.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-25.c
@@ -24,6 +24,7 @@ int main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N/2; i++)
     {
       if (ia[2*i] != 25
@@ -38,6 +39,7 @@ int main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= n/2; i++)
     {
       if (sa[2*i] != 25
diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c b/gcc/testsuite/gcc.dg/vect/slp-26.c
index f8b49ff603c16127694e599137b1f48ea665c4db..d398a5acb0cdb337b442f071c96f3ce62fe84cff 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-26.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
@@ -24,6 +24,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] !=  in[i*4]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-28.c b/gcc/testsuite/gcc.dg/vect/slp-28.c
index 0bb5f0eb0e40307558dc3ab826d583ea004891cd..67b7be29b22bb646b4bea2e0448e919319b11c98 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-28.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-28.c
@@ -34,6 +34,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (in[i] != i+5)
@@ -51,6 +52,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (in2[i] != (i % 4) + (i / 4) * 5)
@@ -69,6 +71,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (in3[i] != (i % 12) + (i / 12) * 5)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c
index 4cf0e7a0ece17204221c483bcac8fe9bdab3c85c..615a79f4a30f8002a989047c99eea13dd9f9e1a6 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c
@@ -32,6 +32,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
@@ -54,6 +55,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -84,6 +86,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*16] !=  in[i*16]
@@ -120,6 +123,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       if (out[i*9] !=  in[i*9]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-3.c b/gcc/testsuite/gcc.dg/vect/slp-3.c
index 760b3fa35a2a2018a103b344c329464ca8cb52fe..183c7e65c57ae7dfe3994757385d9968b1de45e5 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-3.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
@@ -48,6 +49,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -78,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*16] !=  in[i*16]
@@ -114,6 +117,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       if (out[i*9] !=  in[i*9]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-33.c b/gcc/testsuite/gcc.dg/vect/slp-33.c
index 2404a5f19b407ef47d4ed6e597da9381629530ff..c382093c2329b09d3ef9e78abadd1f7ffe22dfda 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-33.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-33.c
@@ -43,6 +43,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*7] !=  (in[i*7] + 5) * 3 - 2
@@ -64,6 +65,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*3] !=  (in[i*3] + 2) * 3
@@ -81,6 +83,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out2[i*3] !=  (float) (in[i*3] * 2 + 5)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
index 9e9c8207f7bbb0235e5864b529869b6db3768087..0baaff7dc6e6b8eeb958655f964f234512cc4500 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
@@ -36,6 +36,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*3] != in[i*3] + 5
diff --git a/gcc/testsuite/gcc.dg/vect/slp-34.c b/gcc/testsuite/gcc.dg/vect/slp-34.c
index 1fd09069247f546a9614c47fca529da4bc465497..41832d7f5191bfe7f82159cde69c1787cfdc6d8c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-34.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-34.c
@@ -30,6 +30,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*3] != in[i*3] + 5
diff --git a/gcc/testsuite/gcc.dg/vect/slp-35.c b/gcc/testsuite/gcc.dg/vect/slp-35.c
index 76dd7456d89859108440eb0be2374215a16cfa57..5e9f6739e1f25d109319da1db349a4063f5aaa1b 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-35.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-35.c
@@ -32,6 +32,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].c + arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/slp-37.c b/gcc/testsuite/gcc.dg/vect/slp-37.c
index a765cd70a09c2eb69df6d85b2056f0d90fc4120f..caee2bb508f1824fa549568dd09911c8624222f4 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-37.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-37.c
@@ -28,6 +28,7 @@ foo1 (s1 *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
        if (arr[i].a != 6 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c
index 98ac3f1f2839c717d66c04ba4e0179d4497be33e..fcda45ff368511b350b25857f21b2eaeb721561a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c
@@ -34,6 +34,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
@@ -59,6 +60,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -92,6 +94,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*16] !=  in[i*16]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-4.c b/gcc/testsuite/gcc.dg/vect/slp-4.c
index e4f65bc37f8c5e45c1673d2218bf75a2a98b3daf..29e741df02ba0ef6874cde2a4410b79d1d7608ee 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-4.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-4.c
@@ -28,6 +28,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
@@ -53,6 +54,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -86,6 +88,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*16] !=  in[i*16]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-41.c b/gcc/testsuite/gcc.dg/vect/slp-41.c
index 2ad9fd2077231a0124c7fe2aaf37570a3a10f849..b96de4fbcb7f9a3c60b884a47bbfc52ebbe1dd44 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-41.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-41.c
@@ -48,6 +48,7 @@ int main()
        __asm__ volatile ("");
     }
   testi (ia, sa, 8, 32);
+#pragma GCC novector
   for (i = 0; i < 128; ++i)
     if (sa[i] != ia[(i / 4) * 8 + i % 4])
       abort ();
@@ -58,6 +59,7 @@ int main()
        __asm__ volatile ("");
     }
   testi2 (ia, sa, 8, 32);
+#pragma GCC novector
   for (i = 0; i < 128; ++i)
     if (ia[i] != sa[(i / 4) * 8 + i % 4])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-43.c b/gcc/testsuite/gcc.dg/vect/slp-43.c
index 3cee613bdbed4b7ca7a796d45776b833cff2d1a2..3d8ffb113276c3b244436b98048fe78112340e0c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-43.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-43.c
@@ -23,11 +23,13 @@ foo_ ## T ## _ ## N (T * __restrict__ in_, T * __restrict__ out_, int s) \
 }
 
 #define TEST(T,N) \
+ _Pragma("GCC novector") \
  do { \
   memset (out, 0, 4096); \
   foo_ ## T ## _ ## N ((T *)in, (T *)out, 1); \
   if (memcmp (in, out, sizeof (T) * MAX_VEC_ELEMENTS * N) != 0) \
     __builtin_abort (); \
+  _Pragma("GCC novector") \
   for (int i = sizeof (T) * MAX_VEC_ELEMENTS * N; i < 4096; ++i) \
     if (out[i] != 0) \
       __builtin_abort (); \
diff --git a/gcc/testsuite/gcc.dg/vect/slp-45.c b/gcc/testsuite/gcc.dg/vect/slp-45.c
index fadc4e5924308d46aaac81a0d5b42564285d58ff..f34033004520f106240fd4a7f6a6538cb22622ff 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-45.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-45.c
@@ -23,11 +23,13 @@ foo_ ## T ## _ ## N (T * __restrict__ in_, T * __restrict__ out_, int s) \
 }
 
 #define TEST(T,N) \
+ _Pragma("GCC novector") \
  do { \
   memset (out, 0, 4096); \
   foo_ ## T ## _ ## N ((T *)in, (T *)out, 1); \
   if (memcmp (in, out, sizeof (T) * MAX_VEC_ELEMENTS * N) != 0) \
     __builtin_abort (); \
+  _Pragma("GCC novector") \
   for (int i = sizeof (T) * MAX_VEC_ELEMENTS * N; i < 4096; ++i) \
     if (out[i] != 0) \
       __builtin_abort (); \
diff --git a/gcc/testsuite/gcc.dg/vect/slp-46.c b/gcc/testsuite/gcc.dg/vect/slp-46.c
index 18476a43d3f61c07aede8d90ca69817b0e0b5342..2d5534430b39f10c15ab4d0bdab47bf68af86376 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-46.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-46.c
@@ -54,6 +54,7 @@ main ()
     }
 
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[i/2])
       abort ();
@@ -65,6 +66,7 @@ main ()
     }
 
   bar ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[2*(i/2)])
       abort ();
@@ -76,6 +78,7 @@ main ()
     }
 
   baz ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[511 - i/2])
       abort ();
@@ -87,6 +90,7 @@ main ()
     }
 
   boo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[2*(511 - i/2)])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-47.c b/gcc/testsuite/gcc.dg/vect/slp-47.c
index 7b2ddf664dfefa97ac80f9f9eb7993e18980c411..7772bb71c8d013b8699bee644a3bb471ff41678f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-47.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-47.c
@@ -35,6 +35,7 @@ main ()
     }
 
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[1023 - i])
       abort ();
@@ -46,6 +47,7 @@ main ()
     }
 
   bar ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[1023 - i^1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-48.c b/gcc/testsuite/gcc.dg/vect/slp-48.c
index 0b327aede8e6bb53d01315553ed9f2c3c3dc3290..38f533233d657189851a8942e8fa8133a9d2eb91 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-48.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-48.c
@@ -35,6 +35,7 @@ main ()
     }
 
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[1023 - i^1])
       abort ();
@@ -46,6 +47,7 @@ main ()
     }
 
   bar ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[1023 - i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-49.c b/gcc/testsuite/gcc.dg/vect/slp-49.c
index 4141a09ed97a9ceadf89d394d18c0b0226eb55d7..b2433c920793c34fb316cba925d7659db356af28 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-49.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-49.c
@@ -24,6 +24,7 @@ main()
 
   foo (17);
 
+#pragma GCC novector
   for (int i = 0; i < 512; ++i)
     {
       if (a[2*i] != 5 + i
diff --git a/gcc/testsuite/gcc.dg/vect/slp-5.c b/gcc/testsuite/gcc.dg/vect/slp-5.c
index 989e05ac8be6bdd1fb36c4bdc079866ce101e017..6d51f6a73234ac41eb2cc4d2fcedc8928d9932b2 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-5.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-5.c
@@ -30,6 +30,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
@@ -55,6 +56,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -86,6 +88,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*16] !=  in[i*16]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-6.c b/gcc/testsuite/gcc.dg/vect/slp-6.c
index ec85eb77236e4b8bf5e0c6a8d07abf44a28e2a5c..ea9f7889734dca9bfa3b28747c382e94bb2c1c84 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-6.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-6.c
@@ -28,6 +28,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8] + 5
@@ -50,6 +51,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4] + 2
@@ -80,6 +82,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out2[i*16] !=  in2[i*16] * 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-7.c b/gcc/testsuite/gcc.dg/vect/slp-7.c
index e836a1ae9b5b60685e8ec2d15ca5005ff35a895e..2845a99dedf5c99032b099a136acd96f37fc5295 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-7.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-7.c
@@ -30,6 +30,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8] + 5
@@ -55,6 +56,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4] + 1
@@ -86,6 +88,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out2[i*16] !=  in2[i*16] * 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-8.c b/gcc/testsuite/gcc.dg/vect/slp-8.c
index e9ea0ef0d6b32d23977d728c943bac05dc982b2d..8647249f546267185bb5c232f088a4c0984f2039 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-8.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       if (fa[4*i] != (float) ib[4*i]      
diff --git a/gcc/testsuite/gcc.dg/vect/slp-9.c b/gcc/testsuite/gcc.dg/vect/slp-9.c
index d5212dca3ddcbffabdc9fbed8f2380ffceee626d..4fb6953cced876c2a1e5761b0f94968c5774da9e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-9.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-9.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
index 482fc080a0fc132409509b084fcd67ef95f2aa17..450c7141c96b07b9f798c62950d3de30eeab9a28 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
@@ -79,11 +79,13 @@ main ()
       e[i] = 2 * i;
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 17 : 0))
       abort ();
 
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       switch (i % 9)
@@ -115,6 +117,7 @@ main ()
   f3 ();
 
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? e[i] : d[i]))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
index 57cc67ee121108bcc5ccaaee0dca5085264c8818..cb7eb94b3a3ba207d513e3e701cd1c9908000a01 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
@@ -82,11 +82,13 @@ main ()
     }
 
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 17 : 0))
       abort ();
 
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       switch (i % 9)
@@ -118,6 +120,7 @@ main ()
   f3 ();
 
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (f[i] != ((i % 3) == 0 ? e[i] : d[i]))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-2.c b/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
index 7350695ece0f53e36de861c4e7724ebf36ff6b76..1dcee46cd9540690521df07c9cacb608e37b62b7 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
@@ -82,11 +82,13 @@ main ()
     }
 
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 17 : 0))
       abort ();
 
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       switch (i % 9)
@@ -118,6 +120,7 @@ main ()
   f3 ();
 
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (f[i] != ((i % 3) == 0 ? e[i] : d[i]))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-3.c b/gcc/testsuite/gcc.dg/vect/slp-cond-3.c
index d19ec13a21ac8660cc326dfaa4a36becab219d82..64904b001e6a39623eff9a1ddc530afbc5e64687 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-3.c
@@ -72,6 +72,7 @@ int main ()
     }
 
   bar (a, b, c, d, e, 2);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (e[i] != ((i % 3) == 0 ? 10 : 2 * i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-4.c b/gcc/testsuite/gcc.dg/vect/slp-cond-4.c
index f82b8416d8467a8127fbb498040c5559e33d6608..0e1bd3b40994016bb6232bd6a1e129602c03167b 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-4.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-4.c
@@ -75,6 +75,7 @@ int main ()
     }
 
   bar (a, b, c, d, e, 2);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (e[i] != ((i % 3) == 0 ? 5 : i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-5.c b/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
index 5ade7d1fbad9eee7861d1b0d12ac98e42d453422..f0a703f0030b4c01d4119c812086de2a8e78ff4f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
@@ -70,6 +70,7 @@ int main ()
     }
 
   bar (a, b, c, d, e, 2);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (e[i] != ((i % 3) == 0 || i <= 5 ? 10 : 2 * i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c
index 1850f063eb4fc74c26a9b1a1016f9d70a0c28441..605f6ab8ba638175d557145c82f2b78c30eb5835 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sout[i*4] != 8 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c
index 62580c070c8e19468812a9c81edc1c5847327ebb..06d9029e9202b15dc8de6d054779f9d53fbea60d 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out[i].a !=  (unsigned char) in[i*2] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c
index a3d0670cea98379af381fd7282f28e9724096a93..2792b932734a7a8ad4958454de56956081753d7c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c
@@ -34,6 +34,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i].a !=  (int) in[i*3] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
index 86a3fafa16f41dc2c63f4704b85268330ad5568d..5c75dc12b695785405b7d56891e7e71ac24e2539 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
@@ -28,6 +28,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i].a !=  (int) in[i*3] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c
index d4c929de2ecbc73c75c08ae498b8b400f67bf636..13119822200fef23a96e920bde8ca968f0a09f84 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c
@@ -32,6 +32,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sout[i*4] != 8 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c
index 28a645c79472578d3775e9e2eb28cb7ee69efad0..c15baa00dd00fb8fa0ae79470d846b31ee4dd578 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c
@@ -41,6 +41,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*16] != a8
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c
index 39bd7c41f8cca2a517486bc9a9898031911115c6..c79906a8d7b30834dfcda5c70d6bf472849a39cb 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c
@@ -45,6 +45,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*8] !=  in[i*8] + 5
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c
index faf17d6f0cde5eacb7756996a224e4004b305f7f..b221f705070af661716d1d6fbf70f16ef3652ca9 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (int) in[i*8] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c
index fb4f720aa4935da6862951a3c618799bb37f535f..3237773e1b13223164473ad88b3c806c8df243b2 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (short) in[i*8] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c
index f006d081346aa4f067d1e02018f2c46d4fcf1680..e62d16b6de34ce1919545a5815600263931e11ac 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (unsigned char) in[i*8] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c
index 286e2fc42af815dcc724f1a66d8d01a96c915beb..08ab2dc3d10f6ab208841e53609dc7c672a69c5e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (int) in[i*8] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c
index d88ebe4d778c4487c00ef055059d2b825542679a..0b67ecc8e0730813966cfd6922e8d3f9db740408 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c
@@ -20,6 +20,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out[i*2] !=  (int) in[i*2] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c
index 872b20cac93c119854b8250eb85dc43767743da4..49261483166cbd6dcf99800a5c7062f7f091c103 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c
@@ -20,6 +20,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out[i*2] !=  (unsigned char) in[i*2] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
index ca7803ec1a9a49b4800cf396bcdc05f263f344ee..dbb107f95fec3338b135ff965e8be2b514cc1fe6 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
@@ -69,6 +69,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (output[i] != check_results[i])
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-10.c b/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
index 678152ba4168d32f84a1d1b01ba6c43b210ec8b9..2cce30c2444323ba6166ceee6a768fbd9d881a47 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
@@ -35,6 +35,7 @@ int main ()
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < 32; ++i)
     if (b[i*8+0] != i*8+0
 	|| b[i*8+1] != i*8+0
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-11.c b/gcc/testsuite/gcc.dg/vect/slp-perm-11.c
index 0318d468ef102cb263d090a33429849221dc3c0d..0d25d9d93bbf14b64fb6f2c116fe70bf17b5f432 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-11.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-11.c
@@ -26,6 +26,7 @@ int main ()
       __asm__ volatile ("");
     }
   foo (4);
+#pragma GCC novector
   for (i = 0; i < 64; ++i)
     if (a[i] != (4*(i/2) + (i & 1) ^ 1))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-12.c b/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
index 113223ab0f96507b74cfff8fc6b112070cabb5ee..642b1e8b399e7ffc77e54e02067eec053ea54c7e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
@@ -42,6 +42,7 @@ int main()
 
   test (a, b);
 
+#pragma GCC novector
   for (i = 0; i < 64; ++i)
     if (a[i] != 253)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
index 82776f3f06af8a7b82e0d190a922b213d17aee88..41fd159adce8395dd805f089e94aacfe7eeba09f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
@@ -43,6 +43,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (output[i] != check_results[i])
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
index 1807275d1bfcc895ed68bd5e536b5837adf336e6..9ea35ba5afca2db0033150e35fca6b961b389c03 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
@@ -56,6 +56,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (output[i] != check_results[i])
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c b/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
index 8457e4f45d62d6d704145b1c4f62af14c1877762..107968f1f7ce65c53bf0280e700f659f625d8c1e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
@@ -103,6 +103,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (output[i] != check_results[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-5.c b/gcc/testsuite/gcc.dg/vect/slp-perm-5.c
index b86a3dc8756e0d30551a40ed1febb142813190a4..7128cf471555d5f589b11e1e58a65b0211e7d6fd 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-5.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-5.c
@@ -96,6 +96,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output, input2, output2);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
      if (output[i] != check_results[i] || output2[i] != check_results2[i])
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-6.c b/gcc/testsuite/gcc.dg/vect/slp-perm-6.c
index bec1544650ac9e897ab1c06f120fb6416091dec6..5cc6261d69a15d2a3f6b691c13544c27dc8f9941 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-6.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-6.c
@@ -95,6 +95,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output, input2, output2);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
      if (output[i] != check_results[i] || output2[i] != check_results2[i])
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
index 346411fd5042add21fdc6413922506bcb92f4594..df13c37bc75d43173d4e1b9d0daf533ba5829c7f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
@@ -88,6 +88,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output, input2, output2);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
      if (output[i] != check_results[i] || output2[i] != check_results2[i])
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
index 44df21aae2a2f860d49c36568122733e693d4310..029be5485b62ffef915f3b6b28306501852733d7 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
@@ -52,6 +52,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N - (N % 3); i++)
      if (output[i] != check_results[i])
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
index 154c00af598d05bac9ebdad3bfb4eeb28594a1fc..c92fc2f38619a5c086f7029db444a6cb208749f0 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
@@ -50,6 +50,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N - (N % 3); i++)
      if (output[i] != check_results[i])
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
index e3bfee33348c5164f657a1494f480db26a7aeffa..72811eb852e5ed51ed5f5d042fac4e9b487911c2 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
@@ -40,6 +40,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (out[i] != in[i] * COEF || out2[i] != in[i] + COEF2)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c
index abb10fde45bc807269cd5bb58f463a77f75118d8..f8ec1fa730d21cde5f2bbb0791b04ddf0e0b358c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c
@@ -29,6 +29,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c
index 0756119afb455a0b834fd835553318eb29887f4d..76507c4f46157a8ded48e7c600ee53424e01382f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c
@@ -29,6 +29,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-100.c b/gcc/testsuite/gcc.dg/vect/vect-100.c
index 9a4d4de06718228fcc0bd011d2e23d4c564c29ff..0d8703281f28c995a7c08c4366a4fccf22cf16e2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-100.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-100.c
@@ -30,6 +30,7 @@ int main1 () {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (p->a[i] != a[i] || p->b[i] != b[i])
@@ -55,6 +56,7 @@ int main2 () {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (p->a[i] != c[i] || p->b[i] != d[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-103.c b/gcc/testsuite/gcc.dg/vect/vect-103.c
index d03562f7cddd0890e3e159fbdc7c5d629b54d58c..59d8edc38cacda52e53a5d059171b6eefee9f920 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-103.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-103.c
@@ -43,6 +43,7 @@ int main1 (int x, int y) {
   /* check results: */
   if (p->a[0] != a[N - 1])
     abort ();
+#pragma GCC novector
   for (i = 1; i < N; i++)
     if (p->a[i] != b[i - 1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-104.c b/gcc/testsuite/gcc.dg/vect/vect-104.c
index a77c98735ebad6876c97ee22467f5287b4575a01..e0e5b5a53bdae1e148c61db716f0290bf3e829f1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-104.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-104.c
@@ -43,6 +43,7 @@ int main1 (int x) {
   }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
    {
     for (j = 0; j < N; j++)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c
index 433565bfd4d3cea87abe23de29edbe8823054515..ec7e676439677ae587a67eae15aab34fd5ac5b03 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c
@@ -75,6 +75,7 @@ int main1 (int x) {
   /* check results: */
   for (i = 0; i < N; i++)
    {
+#pragma GCC novector
     for (j = 0; j < N; j++)
      {
        if (p->a[i][j] != c[i][j])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-105.c b/gcc/testsuite/gcc.dg/vect/vect-105.c
index 17b6e89d8f69053b5825c859f3ab5c68c49b3a5d..f0823fbe397358cb34bf4654fccce21a053ba2a7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-105.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-105.c
@@ -45,6 +45,7 @@ int main1 (int x) {
   /* check results: */
   for (i = 0; i < N; i++)
    {
+#pragma GCC novector
     for (j = 0; j < N; j++)
      {
        if (p->a[i][j] != c[i][j])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-106.c b/gcc/testsuite/gcc.dg/vect/vect-106.c
index 0171cfcdfa6e60e6cb8158d098d435c0e472abf8..4b3451cc783e9f83f7a6cb8c54cf50f4c43dddc0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-106.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-106.c
@@ -28,6 +28,7 @@ int main1 () {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (*q != a[i] || *p != b[i])
@@ -50,6 +51,7 @@ int main1 () {
   q = q1;
   p = p1;
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (*q != b[i] || *p != a[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-107.c b/gcc/testsuite/gcc.dg/vect/vect-107.c
index aaab9c00345bf7f0b25fbcda25a141988bda9eac..60c83a99a19f4797bc7a5a175f33aecbc598f8e2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-107.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-107.c
@@ -24,6 +24,7 @@ main1 (void)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (a[i] != b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-108.c b/gcc/testsuite/gcc.dg/vect/vect-108.c
index 4af6326e9c35963ec7109d66dd0d321cf1055597..2cbb6701d5c6df749482d5e4351b9cb4a808b94f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-108.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-108.c
@@ -21,6 +21,7 @@ main1 (void)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] * ic[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-109.c b/gcc/testsuite/gcc.dg/vect/vect-109.c
index fe7ea6c420fb1512286b0b468cbe9ffed5daae71..31b9aa2be690fb4f2d9cf8062acbf1b42971098d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-109.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-109.c
@@ -34,6 +34,7 @@ int main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i+2] != sb[i] + sc[i] || ia[i+1] != ib[i] + ic[i])
@@ -56,6 +57,7 @@ int main2 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i] != sb[i] + sc[i] || ia[i+1] != ib[i] + ic[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-11.c b/gcc/testsuite/gcc.dg/vect/vect-11.c
index 044fc5edc2dddb0bddaca545b4e97de1499be8bd..1171757e323bc9a64c5e6762e98c101120fc1449 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-11.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] * ic[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-110.c b/gcc/testsuite/gcc.dg/vect/vect-110.c
index 47c6456107ddd4f326e8c9e783b01c59e23087e6..69ee547cfd17965f334d0d1af6bc28f99ae3a671 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-110.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-110.c
@@ -20,6 +20,7 @@ main1 (void)
   }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N - 1; i++){
     if (a[i] != b[i] + c[i])
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-113.c b/gcc/testsuite/gcc.dg/vect/vect-113.c
index a9d45ce9fcc21195030dfcdf773ffc3a41e48a37..8e9cc545ce6b3204b5c9f4a220e12d0068aa4f3e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-113.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-113.c
@@ -17,6 +17,7 @@ main1 (void)
     a[i] = i;
   }
 
+#pragma GCC novector
   for ( i = 0; i < N; i++) 
   {
     if (a[i] != i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-114.c b/gcc/testsuite/gcc.dg/vect/vect-114.c
index 557b44110a095ae725b58cf1ca2494a103b96dd7..1617d3009eb3fdf0bb16980feb0f54d2862b8f3c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-114.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-114.c
@@ -19,6 +19,7 @@ main1 (void)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != b[N-1-i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-115.c b/gcc/testsuite/gcc.dg/vect/vect-115.c
index 0502d15ed3ebd37d8dda044dbe13d68525f3e30a..82b8e2eea1f3374bdbe5460ca58641f217d1ab33 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-115.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-115.c
@@ -41,6 +41,7 @@ int main1 ()
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.strc_t.strc_s.b[i] != a[i])
@@ -54,6 +55,7 @@ int main1 ()
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.ptr_t->strc_s.c[i] != a[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-116.c b/gcc/testsuite/gcc.dg/vect/vect-116.c
index d4aa069772ed76f895f99c91609852bdcc43d324..ac603db44ee2601665c1de4bb60aee95f545c8ef 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-116.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-116.c
@@ -18,6 +18,7 @@ void foo()
   for (i = 0; i < 256; ++i)
     C[i] = A[i] * B[i];
 
+#pragma GCC novector
   for (i = 0; i < 256; ++i)
     if (C[i] != (unsigned char)(i * i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-117.c b/gcc/testsuite/gcc.dg/vect/vect-117.c
index 22f8e01187272e2cfe445c66ca590f77923d4e95..f2c1c5857059a9bcaafad4ceadff02e192209840 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-117.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-117.c
@@ -47,6 +47,7 @@ int main (void)
 
   for (i = 0; i < N; i++)
    {
+#pragma GCC novector
     for (j = 0; j < N; j++)
      {
        if (a[i][j] != c[i][j])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-11a.c b/gcc/testsuite/gcc.dg/vect/vect-11a.c
index 4f1e15e74293187d495c8c11cda333a1af1139a6..9d93a2e8951f61b34079f6d867abfaf0fccbb8fc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-11a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-11a.c
@@ -21,6 +21,7 @@ void u ()
   
   for (i=0; i<8; i++)
     C[i] = A[i] * B[i];
+#pragma GCC novector
   for (i=0; i<8; i++)
     if (C[i] != Answer[i])
       abort ();
@@ -41,6 +42,7 @@ void s()
   
   for (i=0; i<8; i++)
     F[i] = D[i] * E[i];
+#pragma GCC novector
   for (i=0; i<8; i++)
     if (F[i] != Dnswer[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-12.c b/gcc/testsuite/gcc.dg/vect/vect-12.c
index b095170f008c719326a6cfd5820a7926ae8c722e..096ff10f53c9a4d7e0d3a8bbe4d8ef513a82c46c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-12.c
@@ -24,6 +24,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] + ic[i] || sa[i] != sb[i] + sc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-122.c b/gcc/testsuite/gcc.dg/vect/vect-122.c
index 04dae679647ff9831224b6dc200a25b2b1bb28d7..6e7a4c1578f4c4cddf43a81e3e4bc6ab87efa3ca 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-122.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-122.c
@@ -50,6 +50,7 @@ main ()
   f2 ();
   f3 ();
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i || b[i] != i || l[i] != i * (i + 7LL) || m[i] != i * 7LL)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-124.c b/gcc/testsuite/gcc.dg/vect/vect-124.c
index c720648aaddbe72d0073fcf7548408ce6bda3cdd..6b6730a22bdb62e0f8770b4a288aa1adeff756c2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-124.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-124.c
@@ -21,6 +21,7 @@ main ()
   
   check_vect ();
   foo (6);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 3 + 6)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-13.c b/gcc/testsuite/gcc.dg/vect/vect-13.c
index 5d902924ec20e2ea0ee29418a1b52d4e2ede728e..f1e99a3ec02487cd331e171c6e42496924e931a2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-13.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-13.c
@@ -22,6 +22,7 @@ int main1()
     }
 
   /* Check results  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != results[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-14.c b/gcc/testsuite/gcc.dg/vect/vect-14.c
index 1640220a134ed8962e31b9d201c0e4a8630d631f..5898d4cd8924a5a6036f38efa79bc4146a78320d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-14.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-14.c
@@ -17,6 +17,7 @@ int main1 ()
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c
index 5313eae598b4787e5294eefe87bf59f5a3581657..bc2689fce50cebf55720bfc9f60bd7c0dd9659dc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c
@@ -25,6 +25,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != b[N-1-i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-15.c b/gcc/testsuite/gcc.dg/vect/vect-15.c
index 178bc4404c420c3a7d74ca381f3503aaefc195db..4a73d0681f0db2b12e68ce805f987aabf8f1cf6f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-15.c
@@ -19,6 +19,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != b[N-1-i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-17.c b/gcc/testsuite/gcc.dg/vect/vect-17.c
index 471a82336cf466856186eb9ad3f7a95e4087cedc..797444a4c4a312d41d9b507c5d2d024e5b5b87bb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-17.c
@@ -81,6 +81,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != (ib[i] & ic[i]))
@@ -95,6 +96,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != (cb[i] & cc[i]))
@@ -109,6 +111,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != (sb[i] & sc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-18.c b/gcc/testsuite/gcc.dg/vect/vect-18.c
index 28b2941e581fa6abecbdafaa812cf4ff07ea9e5f..8c0fab43e28da6193f1e948e0c59985b2bff1119 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-18.c
@@ -80,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != (ib[i] | ic[i]))
@@ -94,6 +95,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != (cb[i] | cc[i]))
@@ -108,6 +110,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != (sb[i] | sc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-19.c b/gcc/testsuite/gcc.dg/vect/vect-19.c
index 27c6dc835a60c42e8360521d343b13f461a0b009..fe2a88c7fd855a516c34ff3fa3b5da5364fb0a81 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-19.c
@@ -80,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != (ib[i] ^ ic[i]))
@@ -94,6 +95,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != (cb[i] ^ cc[i]))
@@ -108,6 +110,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != (sb[i] ^ sc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c
index 162cb54b58d17efc205778adc14e846be39afab1..70595db744e349bdc6d786c7e64b762406689c64 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c
@@ -26,6 +26,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-2.c b/gcc/testsuite/gcc.dg/vect/vect-2.c
index d975668cbd023b0324c7526e162bc1aeb21dfcd7..80415a5b54b75f9e9b03f0123a53fd70ee07e7cd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-2.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-20.c b/gcc/testsuite/gcc.dg/vect/vect-20.c
index 8d759f3c6a66e6a6e318510ba59196ab91b757ac..0491bb2fc73bcef98cb26e82fb74778c8fea2dc0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-20.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-20.c
@@ -52,6 +52,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != ~ib[i])
@@ -66,6 +67,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != ~cb[i])
@@ -80,6 +82,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != ~sb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-21.c b/gcc/testsuite/gcc.dg/vect/vect-21.c
index ab77df6ef88890907f57a89870e645bb51d51c5a..f98ae8b22ee3e8bbb2c8e4abbc6022c11150fdb1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-21.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-21.c
@@ -80,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != !ib[i])
@@ -94,6 +95,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != !cb[i])
@@ -108,6 +110,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != !sb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-22.c b/gcc/testsuite/gcc.dg/vect/vect-22.c
index 78dc1ce91def46c31e913806aada5907d02fd4e0..3ab5070d94e85e8d332f55fe8511bbb82df781a6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-22.c
@@ -63,6 +63,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != -ib[i])
@@ -77,6 +78,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != -cb[i])
@@ -91,6 +93,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != -sb[i])
@@ -105,6 +108,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (fa[i] != -fb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-23.c b/gcc/testsuite/gcc.dg/vect/vect-23.c
index 69e0848c8eca10661d85a2f0b17b9a3d99319135..1a1c0b415a9247a3ed2555ca094d0a59e698384b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-23.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-23.c
@@ -80,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != ib[i] && ic[i])
@@ -94,6 +95,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != cb[i] && cc[i])
@@ -108,6 +110,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != sb[i] && sc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-24.c b/gcc/testsuite/gcc.dg/vect/vect-24.c
index fa4c0620d29cd44b82fc75f0dc3bab8a862058d9..2da477077111e04d86801c85282822319cd8cfb8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-24.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-24.c
@@ -81,6 +81,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != (ib[i] || ic[i]))
@@ -95,6 +96,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != (cb[i] || cc[i]))
@@ -109,6 +111,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != (sb[i] || sc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-25.c b/gcc/testsuite/gcc.dg/vect/vect-25.c
index 904eea8a17b7572ffa335dcf60d27df648f01f18..d665c3e53cde7e5be416a88ace81f68343c1f115 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-25.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-25.c
@@ -19,6 +19,7 @@ int main1 (int n, int *p)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != n)
@@ -32,6 +33,7 @@ int main1 (int n, int *p)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ib[i] != k)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-26.c b/gcc/testsuite/gcc.dg/vect/vect-26.c
index 8a141f38400308c35a99aa77b0d181a4dce0643c..2ea9aa93dc46dbf11c91d468cdb91a1c0936b323 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-26.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-26.c
@@ -21,6 +21,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-27.c b/gcc/testsuite/gcc.dg/vect/vect-27.c
index ac86b21aceb7b238665e86bbbd8a46e2aaa4d162..d459a84cf85d285e56e4abb5b56b2c6157db4b6a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-27.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-27.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i-1] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-28.c b/gcc/testsuite/gcc.dg/vect/vect-28.c
index e213df1a46548d7d2962335c5600c252d9d5d5f3..531a7babb214ed2e6694f845c4b1d6f66f1c5d31 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-28.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-28.c
@@ -21,6 +21,7 @@ int main1 (int off)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i+off] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-29.c b/gcc/testsuite/gcc.dg/vect/vect-29.c
index bbd446dfe63f1477f91e7d548513d99be4c11d7d..42fb0467f1e31b0e89ef9323b60e3360c970f222 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-29.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-29.c
@@ -30,6 +30,7 @@ int main1 (int off)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-3.c b/gcc/testsuite/gcc.dg/vect/vect-3.c
index 6fc6557cf9f13e9dcfb9e4198b4846bca44542ba..2c9b5066dd47f8b654e005fb6fac8a5a28f48111 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-3.c
@@ -29,6 +29,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       float fres = b[i] + c[i] + d[i];
diff --git a/gcc/testsuite/gcc.dg/vect/vect-30.c b/gcc/testsuite/gcc.dg/vect/vect-30.c
index 71f7a2d169f44990a59f57dcecd83e0a2824f81d..3585ac8cfefa1bd2c89611857c11de23d846f3f6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-30.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-30.c
@@ -21,6 +21,7 @@ int main1 (int n)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (a[i] != b[i])
@@ -43,6 +44,7 @@ int main2 (unsigned int n)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < nn; i++)
     {
       if (c[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c
index 5621eb4d4ba17aaa6321807ee2d3610e38f8cceb..24bd0c7737df02a6b5dd5de9e745be070b0d8468 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c
@@ -31,6 +31,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
@@ -44,6 +45,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
@@ -57,6 +59,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
@@ -70,6 +73,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.e.k[i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-31.c b/gcc/testsuite/gcc.dg/vect/vect-31.c
index 3f7d00c1748058ef662710eda30d89f0a0560f2f..8e1274bae53d95cbe0a4e959fe6a6002dede7590 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-31.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-31.c
@@ -31,6 +31,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
@@ -44,6 +45,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
@@ -57,6 +59,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
@@ -70,6 +73,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.e.k[i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c
index 3e1403bbe96948188e7544d05f183a271828640f..5a4053ee8212ecb0f3824f2d0b2e6e03cb8e09ed 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c
@@ -19,6 +19,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-32.c b/gcc/testsuite/gcc.dg/vect/vect-32.c
index 2684cf2e0d390406e4c6c2ac30ac178ecfe70d5c..b04cbeb7c8297d589608b1e7468d536a5f265337 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-32.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-32.c
@@ -23,6 +23,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
index c1aa399a240e8c7f50ae10610e2c40d41ea8d555..c3bfaaeb055183ee7a059a050d2fc8fe139bbbae 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
@@ -23,6 +23,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-33.c b/gcc/testsuite/gcc.dg/vect/vect-33.c
index e215052ff777a911358e1291630df9cabd27e343..8ffd888d482bc91e10b225317b399c0926ba437a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-33.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-33.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c
index 0aa6d507a82f086056113157bc4b7ce0d5a87691..c3d44b4d15fef5b719cf618293bbc2a541582f4a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c
@@ -26,6 +26,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-34.c b/gcc/testsuite/gcc.dg/vect/vect-34.c
index 9cc590253c78317843930fff480b64aaa68de2e2..e3beba56623e9312c1bcfcc81b96d19adb36d83f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-34.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-34.c
@@ -21,6 +21,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
index 28a99c910fd507414a4a732a6bcc93c4ce142ba6..a88d111b21a0ce2670311103678fb91bf1aff80f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
@@ -26,6 +26,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.a[i] != i + 1)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-35.c b/gcc/testsuite/gcc.dg/vect/vect-35.c
index a7ec0f16d4cf0225c2f62c2f0aabf142704b2af8..4267c0bebaef82f5a58601daefd7330fff21c5b1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-35.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-35.c
@@ -26,6 +26,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.a[i] != i + 1)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c
index d40fcb6d9925de2730acfd37dba2724904159ebb..9aa3bd7c2f40991ef8a3682058d8aea1bab9ba05 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c
@@ -27,6 +27,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != s.cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-36.c b/gcc/testsuite/gcc.dg/vect/vect-36.c
index 64bc7fe18095178bc4bc0db5ef93e4c6706fa7d2..59bef84ad2e134c2a47746fb0daf96f0aaa92a34 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-36.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-36.c
@@ -27,6 +27,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != s.cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-38.c b/gcc/testsuite/gcc.dg/vect/vect-38.c
index 01d984c61b8245997b4db358dd579fc2042df9ff..81d9f38515afebd9e7e8c85a08660e4ff09aa571 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-38.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-38.c
@@ -19,6 +19,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-4.c b/gcc/testsuite/gcc.dg/vect/vect-4.c
index b0cc45be7de6c24af16f0abedf34bc98370ae3e7..393c88df502ecd9261ac45a8366de969bfee84ae 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-4.c
@@ -21,6 +21,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != b[i] * c[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-40.c b/gcc/testsuite/gcc.dg/vect/vect-40.c
index c74703268f913194119e89982092ec4ce7fa0fde..d524b4ebd433b434f55ca1681ef8ade732dfa1bc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-40.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-40.c
@@ -12,6 +12,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-42.c b/gcc/testsuite/gcc.dg/vect/vect-42.c
index 086cbf20c0a2cf7c38ede4e9db30042ac3237972..c1d16f659f130aeabbce4fcc1c1ab9d2cb46e12d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-42.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-42.c
@@ -14,6 +14,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-44.c b/gcc/testsuite/gcc.dg/vect/vect-44.c
index f7f1fd28665f23560cd7a2f397a0c773290c923f..b6895bd1d8287a246c2581ba24132f344dabb27e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-44.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-44.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-46.c b/gcc/testsuite/gcc.dg/vect/vect-46.c
index 185ac1424f94956fbcd5b26d0f4e6d36fd5f708b..7ca8b56ea9ffc50ae1cc99dc74662aea60d63023 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-46.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-46.c
@@ -12,6 +12,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-48.c b/gcc/testsuite/gcc.dg/vect/vect-48.c
index b29fe47635a349c0a845c43655c1a44d569d765e..10d8e09cac1daafeb0d5aa6e12eb7f3ecf6d33fc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-48.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-48.c
@@ -30,6 +30,7 @@ main1 (float *pb, float *pc)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-5.c b/gcc/testsuite/gcc.dg/vect/vect-5.c
index 17f3b2fac9a72f11b512659046dd8710d2e2f9a2..a999989215aa7693a1520c261d690c66f6f9ba13 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-5.c
@@ -25,6 +25,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != c[i])
@@ -38,6 +39,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != d[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-50.c b/gcc/testsuite/gcc.dg/vect/vect-50.c
index f43676896af4b9de482521b4aa915a47596ff4a9..76304cd10ce00881de8a2a6dc37fddf100e534c5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-50.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-50.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-52.c b/gcc/testsuite/gcc.dg/vect/vect-52.c
index c20a4be2edee6c958ae150b7de81121d01b2ab8a..2ad7149fc612e5df4adc390dffc6a0e72717308f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-52.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-52.c
@@ -30,6 +30,7 @@ main1 (int n, float *pb, float *pc)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-54.c b/gcc/testsuite/gcc.dg/vect/vect-54.c
index 2b236e48e196106b7892d3f28b4bd901a700ff9c..7ae59c3e4d391200bcb46a1b3229c30ed26b6083 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-54.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-54.c
@@ -14,6 +14,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i+1] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-56.c b/gcc/testsuite/gcc.dg/vect/vect-56.c
index c914126ece5f5929d316c5c107e7633efa4da55c..a8703d1e00969afdbb58782068e51e571b612b1d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-56.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-56.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
@@ -50,6 +51,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-58.c b/gcc/testsuite/gcc.dg/vect/vect-58.c
index da4f9740e3358f67e9a05f82c87cf78bf3620e56..43a596f6e9522531c2c4d2138f80eae73da43038 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-58.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-58.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i+1] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c
index c5de86b167a07ddf9043ae1ba77466ffd16765e6..a38373888907a7ed8f5ac610e030cd919315727d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c
@@ -39,6 +39,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       if (a[i] != results1[i] || e[i] != results2[i])
@@ -52,6 +53,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <=N-4; i++)
     {
       if (a[i+3] != b[i-1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-6.c b/gcc/testsuite/gcc.dg/vect/vect-6.c
index c3e6336bb43c6ab30eb2c55049e0f1a9bd5788b6..eb006ad0735c70bd6a416d7575501a49febafd91 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-6.c
@@ -24,6 +24,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       if (a[i] != results1[i] || e[i] != results2[i])
@@ -37,6 +38,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <=N-4; i++)
     {
       if (a[i+3] != b[i-1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-60.c b/gcc/testsuite/gcc.dg/vect/vect-60.c
index 121c503c63afaf7cc5faa96bb537f4a184c82b00..2de6f0031aa6faf854a61bf60acf4e5a05a7d3d0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-60.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-60.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
@@ -50,6 +51,7 @@ main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-62.c b/gcc/testsuite/gcc.dg/vect/vect-62.c
index abd3d700668b019a075c52edfaff16061200305b..ea6ae91f56b9aea165a51c5fe6489729d5ba4e62 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-62.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-62.c
@@ -25,6 +25,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j+8] != ib[i])
@@ -46,6 +47,7 @@ int main1 ()
   /* check results: */
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][8] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-63.c b/gcc/testsuite/gcc.dg/vect/vect-63.c
index 8d002a5e3c349bd4cbf9e37e8194e9a7450d0bde..20600728145325962598d6fbc17640296c5ca199 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-63.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-63.c
@@ -25,6 +25,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i + j][1][j] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-64.c b/gcc/testsuite/gcc.dg/vect/vect-64.c
index 240b68f6d0d2d4bbef72b60aac2b26ba366514df..96773f6cab610ee565f33038515345ea799ba2c9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-64.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-64.c
@@ -45,6 +45,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j] != ib[i])
@@ -55,6 +56,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[i][1][1][j] != ib[i])
@@ -65,6 +67,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (id[i][1][j+1] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-65.c b/gcc/testsuite/gcc.dg/vect/vect-65.c
index 9ac8ea4f013a5bea6dbfe8673056d35fc1c3fabb..af714d03ebb7f30ab56a93799c4c0d521b9cea93 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-65.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-65.c
@@ -42,6 +42,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < M; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j] != ib[2][i][j])
@@ -62,6 +63,7 @@ int main1 ()
   /* check results: */
   for (i = 0; i < M; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[j] != ib[2][i][j])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-66.c b/gcc/testsuite/gcc.dg/vect/vect-66.c
index ccb66bc80017d3aa64698cba43f932a296a82e7d..cf16dd15ac2d1664d2edf9a676955c4479715fd2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-66.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-66.c
@@ -23,6 +23,7 @@ void main1 ()
   /* check results: */  
   for (i = 0; i < 16; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[2][6][j] != 5)
@@ -47,6 +48,7 @@ void main2 ()
   /* check results: */  
   for (i = 0; i < 16; i++)
     {
+#pragma GCC novector
       for (j = 2; j < N+2; j++)
         {
            if (ia[3][6][j] != 5)
@@ -73,6 +75,7 @@ void main3 ()
   /* check results: */  
   for (i = 0; i < 16; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[2][1][6][j+1] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-67.c b/gcc/testsuite/gcc.dg/vect/vect-67.c
index 12183a233c273d8ae3932fa312e1734b48f8c7b0..f3322a32c1e34949a107772dc6a3f4a7064e7ce5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-67.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-67.c
@@ -31,6 +31,7 @@ int main1 (int a, int b)
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j + NINTS] != (a == b))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-68.c b/gcc/testsuite/gcc.dg/vect/vect-68.c
index 3012d88494d0494ec137ca89fef4e98e13ae108e..8cc2d84140967d2c54d3db2b408edf92c53340d6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-68.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-68.c
@@ -30,6 +30,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1.a.n[1][2][i] != 5)
@@ -43,6 +44,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i < N-1; i++)
     {
       if (tmp1.a.n[1][2][i] != 6)
@@ -56,6 +58,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.e.n[1][2][i] != 7)
@@ -69,6 +72,7 @@ int main1 ()
     }
  
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i <N-3; i++)
     {
       if (tmp1.e.n[1][2][i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-7.c b/gcc/testsuite/gcc.dg/vect/vect-7.c
index c4556e321c6b0d6bf1a2cd36136d71a43718af32..fb2737e92f5dc037c3253803134687081064ae0e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-7.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sb[i] != 5)
@@ -32,6 +33,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sa[i] != 105)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-70.c b/gcc/testsuite/gcc.dg/vect/vect-70.c
index 793dbfb748160ba709dd835dc253cb436f7aada1..cd432a6545a97d83ebac2323fe2b1a960df09c6e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-70.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-70.c
@@ -52,6 +52,7 @@ int main1 ()
 
   /* check results:  */
   for (i = 0; i < OUTERN; i++)
+#pragma GCC novector
     for (j = NINTS - 1; j < N - NINTS + 1; j++)
     {
       if (tmp1.e[i].n[1][2][j] != 8)
@@ -67,6 +68,7 @@ int main1 ()
   
   /* check results:  */
   for (i = 0; i < OUTERN; i++)
+#pragma GCC novector
     for (j = NINTS - 1; j < N - NINTS + 1; j++)
     {
       if (tmp1.e[j].n[1][2][j] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-71.c b/gcc/testsuite/gcc.dg/vect/vect-71.c
index 581473fa4a1dcf1a7ee570336693ada765d429f3..46226c5f056bdceb902e73326a00959544892600 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-71.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-71.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 2; i < N+1; i++)
     {
       if (ia[ib[i]] != 0)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-72.c b/gcc/testsuite/gcc.dg/vect/vect-72.c
index 9e8e91b7ae6a0bc61410ffcd3f0e5fdf4c3488f1..2ab51fdf307c0872248f2bb107c77d19e53894f4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-72.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-72.c
@@ -33,6 +33,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i-1] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c
index 1c9d1fdaf9a2bb4eee4e9e766e531b72a3ecef2c..d81498ac0ce5926fb384c00aa5f66cc2a976cfdb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c
@@ -28,6 +28,7 @@ int main1 ()
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (ia[i] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-73.c b/gcc/testsuite/gcc.dg/vect/vect-73.c
index fdb49b86362774b0fdf3e10e918b7d73f3383dd7..48e1e64558e53fe109b96bd56eb8af92268cd7ec 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-73.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-73.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results: */  
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (ia[i] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c
index ba1ae63bd57cd3347820d888045005a7d4d83f1a..27d708745d31bdb09f4f0d01d551088e02ba24b9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c
@@ -36,6 +36,7 @@ main1 (float *__restrict__  pa, float * __restrict__ pb, float * __restrict__ pc
       pa[i] = q[i] * pc[i];
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != q[i] * pc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-74.c b/gcc/testsuite/gcc.dg/vect/vect-74.c
index a44f643ee96729fc0952a64e32a52275321557eb..c23c38a85063024b46c95c2e1c5158c81b6dcd65 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-74.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-74.c
@@ -24,6 +24,7 @@ main1 (float *__restrict__  pa, float * __restrict__ pb, float * __restrict__ pc
       pa[i] = q[i] * pc[i];
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != q[i] * pc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c
index a3fb5053037fcca89d7518c47eb2debfc136ba7f..10a3850d0da6d55a124fd6a7f4a2b7fd0efb3fae 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c
@@ -32,6 +32,7 @@ int main1 (int *ib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-75.c b/gcc/testsuite/gcc.dg/vect/vect-75.c
index 88da97f0bb7cecee4ee93a9d3fa7f55f0ae9641c..ecf5174921cc779f92e12fc64c3014d1a4997783 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-75.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-75.c
@@ -32,6 +32,7 @@ int main1 (int *ib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c
index 5825cfc446468b16eff60fa2115bb1de4872654f..4f317f273c8737ab07e51699ed19e66d9eb8a51b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c
@@ -32,6 +32,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != pib[i - OFF])
@@ -45,6 +46,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != pib[i - OFF])
@@ -58,6 +60,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != ic[i - OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-76.c b/gcc/testsuite/gcc.dg/vect/vect-76.c
index 3f4feeff8ac7882627c88490298c2f39b5172b7e..23210d4b775bfd4d436b2cdf2af2825cbf1924f0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-76.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-76.c
@@ -26,6 +26,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != pib[i - OFF])
@@ -39,6 +40,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != pib[i - OFF])
@@ -52,6 +54,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != ic[i - OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c b/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c
index fb3e49927826f77149d4813185a6a2cac00232d4..5fb833441d46ce2b6b0df2def5b3093290a2f7a4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c
@@ -32,6 +32,7 @@ int main1 (int *ib, int off)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-77-global.c b/gcc/testsuite/gcc.dg/vect/vect-77-global.c
index 1580d6e075b018696c56de4d680a0999a837bbca..b9622420c64b732047712ff343a3c0027e7bcf3a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-77-global.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-77-global.c
@@ -28,6 +28,7 @@ int main1 (int *ib, int off)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-77.c b/gcc/testsuite/gcc.dg/vect/vect-77.c
index d402e147043c0245f6523f6713dafc83e5357121..033d4ba79869c54f12fc3eea24a11ada871373ab 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-77.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-77.c
@@ -25,6 +25,7 @@ int main1 (int *ib, int off)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c b/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c
index 57e8da0a9090cae7d501ecb83220afff0bf553b2..f7563c4608546696e5c1174402b42bfc2fd3fa83 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c
@@ -33,6 +33,7 @@ int main1 (int *ib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-78-global.c b/gcc/testsuite/gcc.dg/vect/vect-78-global.c
index ea039b389b22fe16af9353bd5efa59a375a6a71c..11b7e0e9b63cd95bfff9f64f0cfca8b5e4137fe2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-78-global.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-78-global.c
@@ -29,6 +29,7 @@ int main1 (int *ib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-78.c b/gcc/testsuite/gcc.dg/vect/vect-78.c
index faa7f2f4f768b0d7a191b8b67f5000f53c485142..b2bf78108dc9b2f8d43235b64a307addeb71e82a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-78.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-78.c
@@ -25,6 +25,7 @@ int main1 (int *ib)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-8.c b/gcc/testsuite/gcc.dg/vect/vect-8.c
index 44c5f53ebaf260c2087b298abf0428c8d21e8cfa..85bc347ff2f2803d8b830bc1a231e8dadfa525be 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-8.c
@@ -19,6 +19,7 @@ int main1 (int n)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (a[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c
index 0baf4d2859b679f7b20d6b5fc939b71ec2533fb4..a43ec9ca9a635d055a6ef70dcdd919102ae3690d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c
@@ -35,6 +35,7 @@ main1 (float * __restrict__ pa, float * __restrict__ pb, float *__restrict__ pc)
       pa[i] = q[i] * pc[i];
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != q[i] * pc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-80.c b/gcc/testsuite/gcc.dg/vect/vect-80.c
index 45aac84a578fa55624f1f305e9316bbc98e877bb..44299d3c7fed9ac9c213699f6982ba3858bbe0bb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-80.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-80.c
@@ -24,6 +24,7 @@ main1 (float * __restrict__ pa, float * __restrict__ pb, float *__restrict__ pc)
       pa[i] = q[i] * pc[i];
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != q[i] * pc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-82.c b/gcc/testsuite/gcc.dg/vect/vect-82.c
index fcafb36c06388302775a68f0f056b925725e8aa8..2c1b567d10f2e7e519986c5b1d2e2c6b11353bc2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-82.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-82.c
@@ -17,6 +17,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != 0)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-82_64.c b/gcc/testsuite/gcc.dg/vect/vect-82_64.c
index 358a85a838f7519a0c1e0b2bae037d6e8aafeea9..d0962e06c62a8888cb5cabb1c1e08438e3a16c8e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-82_64.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-82_64.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != 0)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-83.c b/gcc/testsuite/gcc.dg/vect/vect-83.c
index a300a0a08c462c043b2841961c58b8c8f2849cc5..4fd14cac2abd9581cd47d67e8194795b74c68402 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-83.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-83.c
@@ -17,6 +17,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != 2)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-83_64.c b/gcc/testsuite/gcc.dg/vect/vect-83_64.c
index a5e897e093d955e0d1aff88021f99caf3a70d928..e3691011c7771328b9f83ea70aec20f373b10da4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-83_64.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-83_64.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != 2)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c
index ade04016cc3136470db804ea7a1bac3010d6da91..9d527b06c7476c4de7d1f5a8863088c189ce6142 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c
@@ -22,10 +22,12 @@ int main1 (int *a)
     }
 
 
+#pragma GCC novector
   for (j = 0; j < N; j++)
     if (a[j] != i + N - 1)
       abort ();
 
+#pragma GCC novector
   for (j = 0; j < N; j++)
     if (b[j] != j + N)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-85.c b/gcc/testsuite/gcc.dg/vect/vect-85.c
index a73bae1ad41a23ab583d7fd1f5cf8234d516d515..367cea72b142d3346acfb62cb16be58104de4f1c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-85.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-85.c
@@ -22,10 +22,12 @@ int main1 (int *a)
     }
 
 
+#pragma GCC novector
   for (j = 0; j < N; j++)
     if (a[j] != i + N - 1)
       abort();	
 
+#pragma GCC novector
   for (j = 0; j < N; j++)
     if (b[j] != j + N)
       abort();	
diff --git a/gcc/testsuite/gcc.dg/vect/vect-86.c b/gcc/testsuite/gcc.dg/vect/vect-86.c
index ff1d41df23f1e1eaab7f066726d5217b48fadb57..fea07f11d74c132fec987db7ac181927abc03564 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-86.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-86.c
@@ -24,11 +24,12 @@ int main1 (int n)
       b[i] = k;
     }
 
-
+#pragma GCC novector
   for (j = 0; j < n; j++)
     if (a[j] != i + n - 1)
       abort();	
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (b[i] != i + n)
       abort();	
diff --git a/gcc/testsuite/gcc.dg/vect/vect-87.c b/gcc/testsuite/gcc.dg/vect/vect-87.c
index 17b1dcdee99c819c8a65eadbf9159d9f78242f62..0eadc85eecdf4f8b5ab8e7a94782157534acf0a6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-87.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-87.c
@@ -23,10 +23,12 @@ int main1 (int n, int *a)
     }
 
 
+#pragma GCC novector
   for (j = 0; j < n; j++)
     if (a[j] != i + n - 1)
       abort();	
 
+#pragma GCC novector
   for (j = 0; j < n; j++)
     if (b[j] != j + n)
       abort();	
diff --git a/gcc/testsuite/gcc.dg/vect/vect-88.c b/gcc/testsuite/gcc.dg/vect/vect-88.c
index b99cb4d89a4b8e94000dc6334514af042e1d2031..64341e66b1227ada7de8f26da353e6c6c440c9a9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-88.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-88.c
@@ -23,10 +23,12 @@ int main1 (int n, int *a)
     }
 
 
+#pragma GCC novector
   for (j = 0; j < n; j++)
     if (a[j] != i + n - 1)
       abort();	
 
+#pragma GCC novector
   for (j = 0; j < n; j++)
     if (b[j] != j + n)
       abort();	
diff --git a/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c
index 59e1aae0017d92c5b98858777e7e55bceb73a90a..64578b353fec58c4af632346a546ab655b615125 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c
@@ -28,6 +28,7 @@ int main1 ()
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (p->y[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-89.c b/gcc/testsuite/gcc.dg/vect/vect-89.c
index 356ab96d330046c553364a585e770653609e5cfe..6e7c875c01e2313ba362506542f6018534bfb443 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-89.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-89.c
@@ -32,6 +32,7 @@ int main1 ()
     }
 
   /* check results: */  
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (p->y[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-9.c b/gcc/testsuite/gcc.dg/vect/vect-9.c
index 87600fb5df0d104daf4438e6a7a020e08c277502..dcecef729a60bf22741407e3470e238840ef6def 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-9.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != (int) sb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-92.c b/gcc/testsuite/gcc.dg/vect/vect-92.c
index 9ceb0fbadcd61ec9a5c3682cf3582abf464ce106..86864126951ccd8392cc7f7e87642be23084d5ea 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-92.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-92.c
@@ -36,6 +36,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < 10; i++)
     {
       if (pa[i+1] != (pb[i+1] * pc[i+1]))
@@ -56,6 +57,7 @@ main2 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < 12; i++)
     {
       if (pa[i+1] != (pb[i+1] * pc[i+1]))
@@ -76,6 +78,7 @@ main3 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (pa[i+1] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-93.c b/gcc/testsuite/gcc.dg/vect/vect-93.c
index c3e12783b2c47a4e296fd47cc9dc8e73b7ccebb0..b4ccbeedd08fe1285dc362b28cb6d975c6313137 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-93.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-93.c
@@ -23,6 +23,7 @@ main1 (float *pa)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N1; i++)
     {
       if (pa[i] != 2.0)
@@ -36,6 +37,7 @@ main1 (float *pa)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N2; i++)
     {
       if (pa[i] != 3.0)
@@ -60,6 +62,7 @@ int main (void)
   for (i = 1; i <= 256; i++) a[i] = b[i-1];
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= 256; i++)
     {
       if (a[i] != i-1)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-95.c b/gcc/testsuite/gcc.dg/vect/vect-95.c
index 1e8bc1e7240ded152ea81f60addab9f7179d3bfc..cfca253e810ff1caf2ef2eef0d7bafc39896ea3e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-95.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-95.c
@@ -11,6 +11,7 @@ void bar (float *pd, float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-96.c b/gcc/testsuite/gcc.dg/vect/vect-96.c
index c0d6c37b21db23b175de895a582f48b302255e9f..e36196b50d7527f88a88b4f12bebbe780fe23f08 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-96.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-96.c
@@ -28,7 +28,8 @@ int main1 (int off)
   for (i = 0; i < N; i++)
       pp->ia[i] = ib[i];
 
-  /* check results: */  
+  /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (pp->ia[i] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c
index 977a9d57ed4795718722c83344c2efd761e6783e..e015c1684ad856a4732084fbe49783aaeac31e58 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c
@@ -32,6 +32,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != cb[i])
@@ -48,6 +49,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != s.q[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-97.c b/gcc/testsuite/gcc.dg/vect/vect-97.c
index 734ba3b6ca36cf56d810a1ce4329f9cb1862dede..e5af7462ef89e7f47b2ca822f563401b7bd95e2c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-97.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-97.c
@@ -27,6 +27,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != cb[i])
@@ -43,6 +44,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != s.q[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c
index 61b749d4669386a890f5c2f5ba83d6e00d269b4f..2d4435d22e476de5b40c6245f26209bff824139c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c
@@ -22,6 +22,7 @@ int main1 (int ia[][N])
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (ic[0][i] != DOT16 (ia[i], ib))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-98.c b/gcc/testsuite/gcc.dg/vect/vect-98.c
index 2055cce70b20b96dd69d06775e3d6deb9f27e3b2..72a1f37290358b6a89db6c89aada2c1650d2e7a5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-98.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-98.c
@@ -19,7 +19,8 @@ int main1 (int ia[][N])
 	ic[0][i] = DOT4 (ia[i], ib);
     }
 
-  /* check results: */  
+  /* check results: */
+#pragma GCC novector
   for (i = 0; i < M; i++)
     {
        if (ic[0][i] != DOT4 (ia[i], ib))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-99.c b/gcc/testsuite/gcc.dg/vect/vect-99.c
index ae23b3afbd1d42221f6fe876f23ee7b9beaebca3..0ef9051d907209e025a8fee057d04266ee2fcb03 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-99.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-99.c
@@ -21,6 +21,7 @@ int main (void)
 
   foo(100);
 
+#pragma GCC novector
   for (i = 0; i < 100; ++i) {
     if (ca[i] != 2)
       abort();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
index b6cc309dbe87b088c9969e07dea03c7f6b5993dd..8fd3bf407e9db3d188b897112ab1e41b381ae3c5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
@@ -45,6 +45,7 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)					\
+  _Pragma("GCC novector")				\
   for (int j = -M; j <= M; ++j)				\
     {							\
       TYPE a[N * M], b[N * M];				\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
index 09a4ebfa69e867869adca3bb5daece02fcee93da..5ecdc3250708e99c30e790da84b002b99a8d7e9b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
@@ -51,6 +51,7 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)					\
+  _Pragma("GCC novector")				\
   for (int j = -M; j <= M; ++j)				\
     {							\
       TYPE a1[N * M], a2[N * M], b1[N], b2[N];		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
index 63a897f4bad4894a6ec4b2ff8749eed3f9e33782..23690c45b65a1b95bf88d50f80d021d5c481d5f1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
@@ -52,6 +52,7 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)					\
+  _Pragma("GCC novector")				\
   for (int j = 0; j <= M; ++j)				\
     {							\
       TYPE a1[N * M], a2[N * M], b1[N], b2[N];		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
index 29bc571642db8858d3e4ca1027131a1a6559c4c1..b36ad116762e2e3c90ccd79fc4f8564cc57fc3f1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
@@ -39,6 +39,7 @@ typedef unsigned long long ull;
       for (int i = 0; i < N + M; ++i)				\
 	a[i] = TEST_VALUE (i);					\
       test_##TYPE (a + j, a);					\
+      _Pragma("GCC novector")					\
       for (int i = 0; i < N; i += 2)				\
 	{							\
 	  TYPE base1 = j == 0 ? TEST_VALUE (i) : a[i];		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
index ad74496a6913dcf57ee4573ef1589263a32b074c..f7545e79d935f1d05641415246aabc2dbe9b7d27 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
@@ -33,6 +33,7 @@ typedef unsigned long long ull;
     {								\
       TYPE a[N + DIST * 2] = {};				\
       test_##TYPE (a + DIST, a + i);				\
+      _Pragma("GCC novector")					\
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	{							\
 	  TYPE expected = 0;					\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
index 8a9a6fffde1d39f138c5f54221854e73cef89079..d90adc70e28420e5e8fd0e36c15316da12224b38 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
@@ -33,12 +33,14 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)						\
+  _Pragma("GCC novector")					\
   for (int i = 0; i < DIST * 2; ++i)				\
     {								\
       TYPE a[N + DIST * 2];					\
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	a[j] = TEST_VALUE (j);					\
       TYPE res = test_##TYPE (a + DIST, a + i);			\
+      _Pragma("GCC novector")					\
       for (int j = 0; j < N; ++j)				\
 	if (a[j + DIST] != (TYPE) j)				\
 	  __builtin_abort ();					\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
index b9f5d2bbc9f6437e3e8058264cc0c9aaa522b3e2..3b576a4dc432725c67b4e7f31d2bc5937bc34b7a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
@@ -34,6 +34,7 @@ typedef unsigned long long ull;
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	a_##TYPE[j] = TEST_VALUE (j);				\
       test_##TYPE (i + N - 1, DIST + N - 1);			\
+      _Pragma("GCC novector")					\
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	{							\
 	  TYPE expected;					\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
index 7c0ff36a8c43f11197de413cb682bcd0a3afcae8..36771b04ed5cc0d6c14c0fe1a0e9fd49db4265c4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
@@ -34,6 +34,7 @@ typedef unsigned long long ull;
     {								\
       __builtin_memset (a_##TYPE, 0, sizeof (a_##TYPE));	\
       test_##TYPE (DIST, i);					\
+      _Pragma("GCC novector")					\
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	{							\
 	  TYPE expected = 0;					\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c
index 8a699ebfda8bfffdafc5e5f09d137bb0c7e78beb..9658f8ce38e8efb8d19806a4078e1dc4fe57d2ef 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c
@@ -34,11 +34,13 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)						\
+  _Pragma("GCC novector")					\
   for (int i = 0; i < DIST * 2; ++i)				\
     {								\
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	a_##TYPE[j] = TEST_VALUE (j);				\
       TYPE res = test_##TYPE (DIST, i);				\
+      _Pragma("GCC novector")					\
       for (int j = 0; j < N; ++j)				\
 	if (a_##TYPE[j + DIST] != (TYPE) j)			\
 	  __builtin_abort ();					\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
index 7e5df1389991da8115df2c6784b52ff3e15f8124..3bc78bed676d8267f7512b71849a7d33cb4ab05b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
@@ -29,6 +29,7 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)						\
+  _Pragma("GCC novector")					\
   for (int i = 0; i < DIST * 2; ++i)				\
     {								\
       for (int j = 0; j < N + DIST * 2; ++j)			\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
index a7fc1fcebbb2679fbe6a98c6fa340edcde492ba9..c11c1d13e0ba253b00afb02306aeec786cee1161 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
@@ -37,6 +37,7 @@ typedef unsigned long long ull;
       for (int i = 0; i < N + M; ++i)			\
 	a[i] = TEST_VALUE (i);				\
       test_##TYPE (a + j, a);				\
+      _Pragma("GCC novector")				\
       for (int i = 0; i < N; i += 2)			\
 	if (a[i + j] != (TYPE) (a[i] + 2)		\
 	    || a[i + j + 1] != (TYPE) (a[i + 1] + 3))	\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-align-1.c b/gcc/testsuite/gcc.dg/vect/vect-align-1.c
index d56898c4d23406b4c8cc53fa1409974b6ab05485..9630fc0738cdf4aa5db67effdd5eb47de4459f6f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-align-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-align-1.c
@@ -28,6 +28,7 @@ main1 (struct foo * __restrict__ p)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (p->y[i] != x[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-align-2.c b/gcc/testsuite/gcc.dg/vect/vect-align-2.c
index 39708648703357e9360e0b63ca7070c4c21def03..98759c155d683475545dc20cae23d54c19bd8aed 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-align-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-align-2.c
@@ -26,6 +26,7 @@ void fbar(struct foo *fp)
         f2.y[i][j] = z[i];
 
    for (i=0; i<N; i++)
+#pragma GCC novector
       for (j=0; j<N; j++)
 	if (f2.y[i][j] != z[i])
 	  abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c
index 6eb9533a8bb17acf7f9e29bfaa7f7a7aca2dc221..3f3137bd12e1462e44889c7e096096beca4d5b40 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c
@@ -18,6 +18,7 @@ __attribute__ ((noinline))
 void icheck_results (int *a, int *results)
 {
   int i;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != results[i])
@@ -29,6 +30,7 @@ __attribute__ ((noinline))
 void fcheck_results (float *a, float *results)
 {
   int i;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != results[i])
@@ -108,6 +110,7 @@ main1 ()
       ca[i] = cb[i];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != cb[i])
@@ -163,6 +166,7 @@ main1 ()
       a[i+3] = b[i-1];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <=N-4; i++)
     {
       if (a[i+3] != b[i-1])
@@ -180,6 +184,7 @@ main1 ()
       j++;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != c[i])
@@ -193,6 +198,7 @@ main1 ()
       a[N-i] = d[N-i];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != d[i])
@@ -206,6 +212,7 @@ main1 ()
       a[i] = 5.0;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != 5.0)
@@ -217,6 +224,7 @@ main1 ()
       sa[i] = 5;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sa[i] != 5)
@@ -228,6 +236,7 @@ main1 ()
       ia[i] = ib[i] + 5;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] + 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-all.c b/gcc/testsuite/gcc.dg/vect/vect-all.c
index cc41e2dd3d313a0557dea16204564a5a0c694950..6fd579fa6ad24623f387d9ebf5c863ca6e91dfe6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-all.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-all.c
@@ -18,6 +18,7 @@ __attribute__ ((noinline))
 void icheck_results (int *a, int *results)
 {
   int i;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != results[i])
@@ -29,6 +30,7 @@ __attribute__ ((noinline))
 void fcheck_results (float *a, float *results)
 {
   int i;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != results[i])
@@ -91,6 +93,7 @@ main1 ()
       ca[i] = cb[i];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != cb[i])
@@ -134,6 +137,7 @@ main1 ()
       a[i+3] = b[i-1];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <=N-4; i++)
     {
       if (a[i+3] != b[i-1])
@@ -151,6 +155,7 @@ main1 ()
       j++;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != c[i])
@@ -164,6 +169,7 @@ main1 ()
       a[N-i] = d[N-i];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != d[i])
@@ -177,6 +183,7 @@ main1 ()
       a[i] = 5.0;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != 5.0)
@@ -188,6 +195,7 @@ main1 ()
       sa[i] = 5;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sa[i] != 5)
@@ -199,6 +207,7 @@ main1 ()
       ia[i] = ib[i] + 5;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] + 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-1.c b/gcc/testsuite/gcc.dg/vect/vect-avg-1.c
index a7bc7cc90963c8aa8e14d0960d57dc724486247f..4a752cd7d573cd53ea1a59dba0180d017a7f73a5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-avg-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-avg-1.c
@@ -35,6 +35,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != ((BASE1 + BASE2 + i * 9 + BIAS) >> 1))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-11.c b/gcc/testsuite/gcc.dg/vect/vect-avg-11.c
index 85292f1b82416b70698619e284ae76f3a3d9410d..0046f8ceb4e7b2688059073645175b8845246346 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-avg-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-avg-11.c
@@ -43,6 +43,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (((((BASE1 + i * 5) ^ 0x55)
 		   + (BASE2 + i * 4)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-15.c b/gcc/testsuite/gcc.dg/vect/vect-avg-15.c
index 48d7ed773000486c42277535cebe34f101e035ef..57b6670cb98cdf92e60dd6c7154b4a8012b05a1e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-avg-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-avg-15.c
@@ -37,6 +37,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, N / 20, 20);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       int d = (BASE1 + BASE2 + i * 5) >> 1;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-16.c b/gcc/testsuite/gcc.dg/vect/vect-avg-16.c
index f3e3839a879b6646aba6237e55e2dcd943eac168..319edba1fa3c04b6b74b343cf5397277a36dd6d1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-avg-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-avg-16.c
@@ -37,6 +37,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, N / 20);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       int d = (BASE1 + BASE2 + i * 5) >> 1;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-5.c b/gcc/testsuite/gcc.dg/vect/vect-avg-5.c
index 6c43575f448325e84975999c2e8aa91afb525f87..6bdaeff0d5ab4c55bb5cba1df51a85c4525be6fb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-avg-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-avg-5.c
@@ -39,6 +39,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != ((BASE1 + BASE2 + i * 9 + BIAS) >> 1))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
index 19683d277b1ade1034496136f1d03bb2b446900f..22e6235301417d72e1f85ecbdd96d8e498500991 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -19,6 +19,7 @@ f(struct s *ptr, unsigned n) {
 
 void __attribute__ ((noipa))
 check_f(struct s *ptr) {
+#pragma GCC novector
     for (unsigned i = 0; i < N; ++i)
       if (ptr[i].i != V)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
index 1a101357ccc9e1b8bb157793eb3f709e99330bf6..0c8291c9363d0de4c09f81525015b7b88004bc94 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -23,6 +23,7 @@ f(struct s *ptr, unsigned n) {
 
 void __attribute__ ((noipa))
 check_f(struct s *ptr) {
+#pragma GCC novector
     for (unsigned i = 0; i < N; ++i)
       if (ptr[i].a != V)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
index 5dc679627d52e2ad229d0920e5ad8087a71281fe..46fcb02b2f1b6bb2689a6b709901584605cc9a45 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -24,6 +24,7 @@ f(struct s *ptr, unsigned n) {
 
 void __attribute__ ((noipa))
 check_f(struct s *ptr) {
+#pragma GCC novector
     for (unsigned i = 0; i < N; ++i)
       if (ptr[i].a != V)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
index fae6ea3557dcaba7b330ebdaa471281d33d2ba15..5a7227a93e4665cd10ee564c8b15165dc6cef303 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
@@ -22,6 +22,7 @@ f(struct s *ptr, unsigned n) {
 
 void __attribute__ ((noipa))
 check_f(struct s *ptr) {
+#pragma GCC novector
     for (unsigned i = 0; i < N; ++i)
       if (ptr[i].a != V)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
index 99360c2967b076212c67eb4f34b8fd91711d8821..e0b36e411a4a72335d4043f0f360c2e88b667397 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
@@ -22,6 +22,7 @@ f(struct s *ptr, unsigned n) {
 
 void __attribute__ ((noipa))
 check_f(struct s *ptr) {
+#pragma GCC novector
     for (unsigned i = 0; i < N; ++i)
       if (ptr[i].a != V)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c b/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c
index c97da5289141d35a9f7ca220ae62aa82338fa7f5..a1be71167025c960fc2304878c1ed15d90484dfb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c
@@ -183,6 +183,7 @@ check (int *p, cmp_fn fn)
 {
   int i;
 
+#pragma GCC novector
   for (i = 0; i < 32; i++)
     {
       int t1 = ((i % 4) > 1) == 0;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bswap16.c b/gcc/testsuite/gcc.dg/vect/vect-bswap16.c
index d29b352b832a67e89e7cb3856634390244369daa..7d2cb297738378863ddf78b916036b0998d28e6f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bswap16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bswap16.c
@@ -30,6 +30,7 @@ main (void)
 
   vfoo16 (arr);
 
+#pragma GCC novector
   for (i = 0; i < N; ++i)
     {
       if (arr[i] != expect[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bswap32.c b/gcc/testsuite/gcc.dg/vect/vect-bswap32.c
index 88d88b5f034153cb736391e4fc46a9b786ec28c5..1139754bbf1b8f7ef7a5a86f5621c9fe319dec08 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bswap32.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bswap32.c
@@ -30,6 +30,7 @@ main (void)
 
   vfoo32 (arr);
 
+#pragma GCC novector
   for (i = 0; i < N; ++i)
     {
       if (arr[i] != expect[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bswap64.c b/gcc/testsuite/gcc.dg/vect/vect-bswap64.c
index fd15d713c5d63db335e61c892c670b06ee9da25f..38d598eba33019bfb7c50dc2f0d5b7fec3a4736c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bswap64.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bswap64.c
@@ -30,6 +30,7 @@ main (void)
 
   vfoo64 (arr);
 
+#pragma GCC novector
   for (i = 0; i < N; ++i)
     {
       if (arr[i] != expect[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-1.c b/gcc/testsuite/gcc.dg/vect/vect-complex-1.c
index 2a87e2feadeba7f1eaef3cce72e27a7d0ffafb5f..b3a02fe9c6d840e79764cb6469a86cfce315a337 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-complex-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-complex-1.c
@@ -43,6 +43,7 @@ main (void)
   foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (c[i] != res[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-2.c b/gcc/testsuite/gcc.dg/vect/vect-complex-2.c
index 19b24e1eb87feacc8f7b90fb067124007e22c90f..7bbfdd95b5c46f83f24263e33bf5e3d2ecee0a4d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-complex-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-complex-2.c
@@ -43,6 +43,7 @@ main (void)
   foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (c[i] != res[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-4.c b/gcc/testsuite/gcc.dg/vect/vect-complex-4.c
index 49cfdbe1738794c3bf873c330fff4d7f4626e10b..d5e50cc15df66501fe1aa1618f04ff293908469a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-complex-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-complex-4.c
@@ -92,6 +92,7 @@ main (void)
   foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (c[i].f1 != res[i].f1)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-1.c b/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
index 261d828dbb2855fe680b396d3fcbf094e814b6fd..e438cbb67e196a5b3e5e2e2769efc791b0c2d6b7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
@@ -43,6 +43,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (j = 0; j < M; j++)
     if (x_out[j] != check_result[j])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-10.c b/gcc/testsuite/gcc.dg/vect/vect-cond-10.c
index b2f97d735ef7d94a80a67265b4535a1e228e20ca..dbbe4877db41c43d5be5e3f35cb275b96322c9bc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-10.c
@@ -120,41 +120,49 @@ main ()
 	}
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f3 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f5 (k);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f6 (k);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f7 (k);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f8 (k);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-11.c b/gcc/testsuite/gcc.dg/vect/vect-cond-11.c
index f28af658f331849a0c5103ba96dd2e3b60de428d..38f1f8f50901c3039d0e7cb17d1bd47b18b89c71 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-11.c
@@ -79,13 +79,16 @@ baz (unsigned int *a, unsigned int *b,
     }
   if (fn (a, b) != -512U - (N - 32) * 16U + 32 * 127U)
     __builtin_abort ();
+#pragma GCC novector
   for (i = -64; i < 0; i++)
     if (a[i] != 19 || b[i] != 17)
       __builtin_abort ();
+#pragma GCC novector
   for (; i < N; i++)
     if (a[i] != (i - 512U < 32U ? i - 512U + 127 : i - 512U - 16)
 	|| b[i] != (i - 512U < 32U ? i * 2U : i + 1U))
       __builtin_abort ();
+#pragma GCC novector
   for (; i < N + 64; i++)
     if (a[i] != 27 || b[i] != 19)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-3.c b/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
index 8a66b4b52ed8a98dd52ef945afb3822de8fe37e9..1521fedd1b5b9d6f3021a1e5653f9ed8df0610b2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
@@ -50,6 +50,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (j = 0; j < M; j++)
     if (x_out_a[j] != check_result_a[j]
         || x_out_b[j] != check_result_b[j])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
index 2a6577c6db33a49c7fac809f67b7e957c0b707c2..4057d14c702c22ef41f504a8d3714a871866f04f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
@@ -47,6 +47,7 @@ int main (void)
 
   foo (125);
 
+#pragma GCC novector
   for (j = 0; j < M; j++)
     if (x_out_a[j] != 125
         || x_out_b[j] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-5.c b/gcc/testsuite/gcc.dg/vect/vect-cond-5.c
index 41e57f9235b90347e7842d88c9710ee682ea4bd4..f10feab71df6daa76966f8d6bc3a4deba8a7b56a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-5.c
@@ -46,6 +46,7 @@ int main ()
 
   foo(5);
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-6.c b/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
index 65fdc4a9ef195f7210b08289242e74cda1db4831..a46479a07eb105f5b2635f3d5848e882efd8aabf 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
@@ -47,6 +47,7 @@ int main ()
 
   foo(125);
 
+#pragma GCC novector
   for (k = 0; k < K; k++) 
     if (out[k] != 33)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-7.c b/gcc/testsuite/gcc.dg/vect/vect-cond-7.c
index bd2947516584bf0039d91589422acefd0d27cc35..ea11693ff21798e9e792cfc43aca3c59853e84a0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-7.c
@@ -53,6 +53,7 @@ main ()
 #undef F
 #define F(var) f##var ();
   TESTS
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     {
       asm volatile ("" : : : "memory");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-8.c b/gcc/testsuite/gcc.dg/vect/vect-cond-8.c
index d888442aa456e7520cf57e4a07c0938849758068..88289018b9be7d20edd9c7d898bb51d947ed7806 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-8.c
@@ -79,18 +79,22 @@ main ()
       e[i] = 2 * i;
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 17 : 0))
       abort ();
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 0 : 24))
       abort ();
   f3 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 51 : 12))
       abort ();
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (f[i] != ((i % 3) == 0 ? d[i] : e[i]))
       abort ();
@@ -112,6 +116,7 @@ main ()
       b[i] = i / 2;
     }
   f5 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (c[i] != ((i % 3) == 0 ? a[i] : b[i]))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-9.c b/gcc/testsuite/gcc.dg/vect/vect-cond-9.c
index 63eee1b47296d8c422b4ff899e5840ca4d4f59f5..87febca10e7049cb0f4547a13d27f533011d44bc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-9.c
@@ -145,51 +145,61 @@ main ()
 	}
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f3 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, -6, sizeof (k));
   f5 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, -6, sizeof (k));
   f6 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f7 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f8 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f9 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, -6, sizeof (k));
   f10 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c
index d52e81e9109cc4d81de84adf370b2322799c8c27..5138712731f245eb1f17ef2e9e02e333c8e214de 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c
@@ -23,6 +23,7 @@
 #define TEST(OP)					\
   {							\
     f_##OP (a, b, 10);					\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	int bval = (i % 17) * 10;			\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c
index f02b0dc5d3a11e3cfa8a23536f570ecb04a039fd..11a680061c21fb7da69739892b79ff37d1599027 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c
@@ -24,6 +24,7 @@
 #define TEST(INV)					\
   {							\
     f_##INV (a, b, c, d);				\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	double mb = (INV & 1 ? -b[i] : b[i]);		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
index 55a174a7ec1fa42c40d4359e882ca475a4feaca3..1af0fe642a0f6a186a225e7619bff130bd09246f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
@@ -20,6 +20,7 @@
 #define TEST(OP)					\
   {							\
     f_##OP (a, b, 10);					\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	int bval = (i % 17) * 10;			\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
index d2eadc4e9454eba204b94532ee3b002692969ddb..ec3d9db42021c0f1273bf5fa37bd24fa77c1f183 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
@@ -21,6 +21,7 @@
 #define TEST(OP)					\
   {							\
     f_##OP (a, b, 10);					\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	int bval = (i % 17) * 10;			\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
index cc70b8a54c44fbc1d20aa9c2599b9a37d9fc135b..2aeebd44f835ee99f110629ded9572b338d6fb50 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
@@ -23,6 +23,7 @@
 #define TEST(OP)						\
   {								\
     f_##OP (a, b, 10);						\
+    _Pragma("GCC novector")					\
     for (int i = 0; i < N; ++i)					\
       {								\
 	int bval = (i % 17) * 10;				\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c
index 739b98f59aece34b73ed4762c2eeda2512834539..9d20f977884213a6b4580b90e1a187161cf5c945 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c
@@ -22,6 +22,7 @@
 #define TEST(INV)					\
   {							\
     f_##INV (a, b, c, d);				\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	double mb = (INV & 1 ? -b[i] : b[i]);		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c b/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
index e6ad865303c42c9d5958cb6e7eac6a766752902b..faeccca865f63bc55ee1a8b412a5e738115811e9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
@@ -73,6 +73,7 @@ main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i].a != result[2*i] || out[i].b != result[2*i+1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c b/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c
index 95efe7ad62eac1f66b85ffdc359fd60bd7465cfd..f3b7db076e6b223fcf8b341f41be636e10cc952a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c
@@ -55,6 +55,7 @@ main (void)
 
   foo (a, b);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != result[2*i] || b[i] != result[2*i+1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
index c81f8946922250234bf759e0a0a04ea8c1f73e3c..f02f98faf2fad408f7d7e65a09c678f242aa32eb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
@@ -16,6 +16,7 @@ int
 main (void)
 {
   V v = foo ((V) { 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff }, 0xffff);
+#pragma GCC novector
   for (unsigned i = 0; i < sizeof (v) / sizeof (v[0]); i++)
     if (v[i] != 0x00010001)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
index b4eb1a4dacba481e6306b49914d2a29b933de625..80293e50bbc6bbae90cac0fcf436c790b3215c0e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
@@ -44,6 +44,7 @@ int main ()
   fun1 (a, N / 2, N);
   fun2 (b, N / 2, N);
 
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       if (DEBUG)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
index 29a16739aa4b706616367bfd1832f28ebd07993e..bfdc730fe5f7b38117854cffbf2e450dad7c3b5a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
@@ -30,6 +30,7 @@ int main ()
   fun1 (a, N / 2, N);
   fun2 (b, N / 2, N);
 
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       if (DEBUG)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c
index 6abf76392c8df94765c63c248fbd7045dc24aab1..6456b3aad8666888fe15061b2be98047c28ffed2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c
@@ -43,6 +43,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c
index 4bfd1630c4e9927d89bf23ddc90716e0cc249813..d5613e55eb20731070eabeee8fe49c9e61d8be50 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c
@@ -43,6 +43,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c
index 3bdf9efe9472342359b64d51ef308a4d4f8f9a79..239ddb0b444163803c310e4e9910cfe4e4c44be7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c
@@ -48,12 +48,14 @@ int main ()
 
   foo(0, 0);
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out_max[k] != check_max[k] || out_min[k] != 0)
       abort ();
 
   foo(100, 45);
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out_min[k] != check_min[k] || out_max[k] != 100)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c
index e5937705400c7c015513abc513a8629c6d66d140..5344c80741091e4e69b41ce056b9541b75215df2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c
@@ -43,6 +43,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
index 079704cee81cc17b882b476c42cbeee0280369cf..7465eae1c4762d39c14048077cd4786ffb8e4848 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
@@ -43,6 +43,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c
index 1d9dcdab5e9c09514a8427cd65c419e74962c9de..a032e33993970e65e9e8a90cca4d23a9ff97f1e8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c
@@ -49,6 +49,7 @@ int main ()
 
   foo ();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c
index 85aec1bf609582988f06826afb6b7ce77d6d83de..d1d1faf7c3add6ce2c3378d4d094bf0fc2aba046 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c
@@ -38,6 +38,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c
index c3145a2ad029f92e96995f59e9be9823e016ec11..1ef7a2d19c8b6ee96280aee0e9d69b441b597a89 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c
@@ -52,6 +52,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c b/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c
index 76b436948c185ca73e21203ef68b0a9d4da03408..603f48167d10fe41143f329cd50ca7f6c8e9a154 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c
@@ -21,6 +21,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (da[i] != (double) fb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c b/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c
index 8b82c82f1cdd1078898847c31c6c06371f4232f6..9f404f0e36e10ebf61b44e95d6771d26a25faea8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c
@@ -20,6 +20,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (fa[i] != (float) db[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c
index fc5081b7e8e143893009b60147d667855efa12ad..f80da6a7ca7f0de224d88860a48f24b4fd8c2ad8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c
@@ -20,6 +20,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != (int) fb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
index 64fab3876310d60ca016b78938e449201c80997d..dc038857a42813e665591c10eb3ab7f744d691ad 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
@@ -19,6 +19,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != (int) db[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-fma-1.c b/gcc/testsuite/gcc.dg/vect/vect-fma-1.c
index 6b6b4f726e9476ac6a90984e15fdd0839dff8885..27d206d9fa0601812b09a3ead2ee9730623e97e4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-fma-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-fma-1.c
@@ -22,6 +22,7 @@
 #define TEST(INV)					\
   {							\
     f_##INV (a, b, c, d);				\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	double mb = (INV & 1 ? -b[i] : b[i]);		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
index 4cee73fc7752681c2f677d3e6fddf7daf6e183eb..e3bbf5c0bf8db8cb258d8d05591c246d80c5e755 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
@@ -50,6 +50,7 @@ main (void)
   check_vect ();
 
   f (y, x, indices);
+#pragma GCC novector
   for (int i = 0; i < 32; ++i)
     if (y[i] != expected[i])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
index 738bd3f3106948754e38ffa93fec5097560511d3..adfef3bf407fb46ef7a2ad01c495e44456b37b7b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
@@ -54,6 +54,7 @@ main (void)
   check_vect ();
 
   f (y, x, indices);
+#pragma GCC novector
   for (int i = 0; i < 32; ++i)
     if (y[i] != expected[i])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
index 7e323693087598942f57aa8b7cf3686dde4a52c9..04d5fd07723e851442e1dc496fdf004d9196caa2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
@@ -26,6 +26,7 @@ int main ()
   check_vect ();
   foo ();
   /* check results:  */
+#pragma GCC novector
   for (int i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c
index 56a8e01993d1a0701998e377fb7fac4fa2119aed..0f752b716ca811de093373cce75d948923386653 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] != MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c
index 962be1c0230cca6bef2c097b35833ddd6c270875..8b028d7f75f1de1c8d10376e4f0ce14b60dffc70 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] == MAX ? 0 : MAX);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c
index 6406875951bd52c3a5c3691eb2bc062e5525a4a1..10145d049083b541c95b813f2fd12d3d62041f53 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] >= MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c
index d55440c9fa421719cb03a30baac5d58ca1ac2fb6..4964343c0ac80abf707fe11cacf473232689123e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] > MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c
index 5cef85175131bd6b2e08d7801966f5526ededf8e..63f53a4c4eef6e1397d67c7ce5570dfec3160e83 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] <= MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c
index 3118e2d5a5536e175838284d367a8f2eedf8eb86..38b014336482dc22ecedaed81b79f8e7d5913d1e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] < MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c
index 272fde09429b6a46ee4a081b49736613136cc328..56e0f71bc799d16725e589a53c99abebe5dca40a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] != MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c
index c0c7a3cdb2baafa5702a7fcf80b7198175ecc4f2..879d88a5ce9239bf872cc0ee1b4eb921b95235d0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] == MAX ? 0 : MAX); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c
index e6446a765c0298857f71b80ffcaefdf77e4f5ce3..bbeccae0f228ad3fc7478c879ae4a741ae6fe7a3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c
@@ -27,6 +27,7 @@ int main ()
   check_vect ();
   foo ();
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
index bef3efa5658ae6d91010d286967e319906f9aeb5..f75c0f5a1a6645fdee6a8a04ffc55bd67cb7ac43 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
@@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (fa[i] != (float) ib[i]) 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
index 666ee34a4a753ff1d0e33012d95a77496f1986fa..32df21fb52a0b9f16aff7340eee21e76e832cceb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
@@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (float_arr[i] != (float) int_arr[i]) 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
index 78fc3da481c6693611b45d3939fe03d23e84f8f7..db33a84b54d70c9355079adf2ee163c904c68e57 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
@@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (da[i] != (double) ib[i]) 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
index af8fe89a7b02b555acc64b578a07c735f5ef45eb..6fc23bb4621eea594a0e70347a8007a85fb53db8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
@@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (fa[i] != (float) sb[i]) 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
index 49c83182026b91c7b52667fec7a44554e3aff638..b570db5dc96db9c6e95b0e4dbebe1dae19c5ba7c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
@@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (fa[i] != (float) usb[i]) 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-1.c b/gcc/testsuite/gcc.dg/vect/vect-iv-1.c
index 90163c440d34bcd70a7024b83f70abb7b83f8077..e6dcf29ebe0d2b2dc6695e754c4a1043f743dd58 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-1.c
@@ -22,6 +22,7 @@ __attribute__ ((noinline)) int main1 (int X)
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (arr[i] != result[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-10.c b/gcc/testsuite/gcc.dg/vect/vect-iv-10.c
index 195474b56441bee9b20f373a6aa991610a551e10..83bc7805c3de27ef3dd697d593ee86c1662e742c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-10.c
@@ -17,6 +17,7 @@ int main1 ()
   }
 
   /* check results:  */
+#pragma GCC novector
   for (j=0,i=N;  j<N,i>0;  i--,j++) {
       if (ia[j] != i)
         abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-2.c b/gcc/testsuite/gcc.dg/vect/vect-iv-2.c
index 73e30ee9bac6857b545242136d9c1408f6bfe60e..d85bb3436b2e0abcc4d0d0a7b480f4f267b4898c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-2.c
@@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 ()
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (arr1[i] != 2+2*i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-3.c b/gcc/testsuite/gcc.dg/vect/vect-iv-3.c
index f8ca94dd17db81d8be824dfb2f023517f05d7c04..c0738ebc469f1780eb8ce90e89caa222df0e1fba 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-3.c
@@ -24,6 +24,7 @@ __attribute__ ((noinline)) int main1 ()
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (arr1[i] != 2+2*i || arr2[i] != 5 + 2*i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-4.c b/gcc/testsuite/gcc.dg/vect/vect-iv-4.c
index dfe5bc14458c856122f48bd6bc6a50092d7729e1..2dd8ae30513260c858504f8dc0e8c7b6fd3ea59b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-4.c
@@ -24,6 +24,7 @@ __attribute__ ((noinline)) int main1 ()
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (arr1[i] != 2+2*i || arr2[i] != 5 + 2*i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-5.c b/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
index 2015385fbf5fac1349124dd35d57b26c49af6346..c3c4735f03432f9be07ed2fb14c94234ee8f4e52 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
@@ -20,6 +20,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (arr[i] != 1.0 + 2.0*i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-6.c b/gcc/testsuite/gcc.dg/vect/vect-iv-6.c
index ccd7458a98f1d3833b19c838a27e9f582631e89c..4c9d9f19b45825a210ea3fa26160a306facdfea5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-6.c
@@ -28,6 +28,7 @@ __attribute__ ((noinline)) int main1 (int X)
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (arr1[i+1] != X+6*i+2
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-7.c b/gcc/testsuite/gcc.dg/vect/vect-iv-7.c
index 24b59fe55c498bf21d107bef72bdc93690229c20..f6d93360d8dda6f9380425b5518ea5904f938322 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-7.c
@@ -22,6 +22,7 @@ __attribute__ ((noinline, noclone)) int main1 (int X)
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (arr[i] != result[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c
index 45d82c65e2f85b7b470a22748dacc78a63c3bd3e..26e8c499ce50cc91116c558a2425a47ebe21cdf7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c
@@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8.c
index dd37d250e91c3839c21fb3c22dc895be367cdcec..b4bb29d88003d2bbc0e90377351cb46d1ff72b55 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8.c
@@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c
index 63b6b6e893f7a55a56aef89331610fd76d2c1c42..dceae27bbbee36a13af8055785dd4258b03e3dba 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c
@@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != (signed char)myX || b[i] != myX || c[i] != (int)myX++)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c
index 1f8fedf2716745d469771cfce2629dd05478bce8..dfe3a27f024031427344f337d490d4c75d8a04be 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c
@@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != (signed char)myX || b[i] != myX || c[i] != (int)myX++)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-1.c b/gcc/testsuite/gcc.dg/vect/vect-live-1.c
index f628c5d3998930ea3e0cee271c20ff3eb17edf62..e4a6433a89961b008a2b766f6669e16f378ca01e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-1.c
@@ -38,6 +38,7 @@ main (void)
   if (ret != MAX + START)
     abort ();
 
+#pragma GCC novector
   for (i=0; i<MAX; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-2.c b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
index 19d8c22859e0804ccab9d25ba69f22e50d635ebb..dae36e9ed67c8f6f5adf735345b817d59a3741f4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
@@ -48,6 +48,7 @@ main (void)
   if (ret != MAX - 1)
     abort ();
 
+#pragma GCC novector
   for (i=0; i<MAX; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-3.c b/gcc/testsuite/gcc.dg/vect/vect-live-3.c
index 8f5ccb27365dea5e8cd8561d3c8a406e47469ebe..1f6b3ea0faf047715484ee64c1a49ef74dc1850e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-3.c
@@ -45,6 +45,7 @@ main (void)
   if (ret != (MAX - 1) * 3)
     abort ();
 
+#pragma GCC novector
   for (i=0; i<MAX; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-4.c b/gcc/testsuite/gcc.dg/vect/vect-live-4.c
index 553ffcd49f744cabd6bdd42e6aca8c12d15ceb01..170927802d2d8f1c42890f3c82f9dabd18eb2f38 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-4.c
@@ -42,6 +42,7 @@ main (void)
   if (ret != MAX + 4)
     abort ();
 
+#pragma GCC novector
   for (i=0; i<MAX; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-5.c b/gcc/testsuite/gcc.dg/vect/vect-live-5.c
index 7cde1db534bb1201e106ba34c9e8716c1f0445a1..9897552c25ce64130645887439c9d1f0763ed399 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-5.c
@@ -39,6 +39,7 @@ main (void)
   if (ret != 99)
     abort ();
 
+#pragma GCC novector
   for (i=0; i<MAX; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c b/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
index 965437c8f03eaa707add3577c6c19e9ec4c50302..6270c11e025ed6e181c7a607da7b1b4fbe82b325 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
@@ -51,6 +51,7 @@ main (void)
       a[i] = i;
     }
 
+#pragma GCC novector
   for (i=0; i<4; i++)
     {
       __asm__ volatile ("");
@@ -60,6 +61,7 @@ main (void)
       if (ret != (MAX * 4) - 4 + i)
 	abort ();
 
+#pragma GCC novector
       for (i=0; i<MAX*4; i++)
 	{
 	  __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c b/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c
index 0d2f17f9003178d65c3dc5358e13c45f8ac980e3..c9987018e88b04f5f0ff195baaf528ad86722714 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c
@@ -45,6 +45,7 @@ main (void)
       a[i] = i;
     }
 
+#pragma GCC novector
   for (i=0; i<2; i++)
     {
       __asm__ volatile ("");
@@ -54,6 +55,7 @@ main (void)
       if (ret != (MAX * 2) - 2 + i)
 	abort ();
 
+#pragma GCC novector
       for (i=0; i<MAX*2; i++)
 	{
 	  __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
index a3f60f6ce6d24fa35e94d95f2dea4bfd14bfdc74..e37822406751b99b3e5e7b33722dcb1912483345 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
@@ -52,6 +52,7 @@ main (void)
       a[i] = i;
     }
 
+#pragma GCC novector
   for (i=0; i<4; i++)
     {
       __asm__ volatile ("");
@@ -61,6 +62,7 @@ main (void)
       if (ret != (MAX * 4) - 4 + i)
 	abort ();
 
+#pragma GCC novector
       for (i=0; i<MAX*4; i++)
 	{
 	  __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c
index 992cbda2e91628cd145d28c8fdabdb7a4d63ee68..91d4d40a86013dca896913d082773e20113a17e2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c
@@ -36,6 +36,7 @@ main ()
       asm ("");
     }
   foo (a, b);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (a[i] != ((i & 1)
 		 ? 7 * i + 2.0 * (7 * i * 7.0 + 3.0)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
index 7d9dc5addf54264bf2fd0c733ccfb83bb1c8f20d..76f72597589c6032d298adbc8e687ea4808e9cd4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
@@ -36,6 +36,7 @@ main ()
       asm ("");
     }
   foo (a, b, c);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (a[i] != ((i & 1) ? -i : i)
 	|| b[i] != ((i & 1) ? a[i] + 2.0f : 7 * i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c b/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c
index 8e46ff6b01fe765f597add737e0b64ec5b505dd1..4df0581efe08333df976dfc9c52eaab310d5a1cc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c
@@ -37,6 +37,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, N);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != HRS(BASE1 * BASE2 + i * i * (CONST1 * CONST2)
 		    + i * (BASE1 * CONST2 + BASE2 * CONST1)))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
index b63e9a8a6d9d0c396c3843069d100fbb9d5fa913..1e90d19a684eb0eebf223f85c4ea2b2fd93aa0c5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
@@ -27,6 +27,7 @@ main (void)
     }
 
   foo (data);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (data[i] / 123 != i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
index a8253837c3863f5bc5bfea1d188a5588aea501c6..f19829b55a96227f0157527b015291da6abd54bf 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
@@ -26,6 +26,7 @@ main (void)
     }
 
   foo (data);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (data[i] / -19594LL != i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
index 378a5fe642ac415cd20f45e88f06e8d7b9040c98..06dbb427ea11e14879d1856c379934ebdbe50e04 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
@@ -39,6 +39,7 @@ __attribute__ ((noinline)) int main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i + NSHORTS - 1] != sb[i] || ia[i + NINTS - 1] != ib[i + 1])
@@ -69,6 +70,7 @@ __attribute__ ((noinline)) int main2 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i + NINTS - 1] != sb[i + 1] || ia[i + NINTS - 1] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c
index 891ba6d8e7169c67e840733402e953eea919274e..c47cf8c11d9ade3c4053f3fcf18bf719fe58c971 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c
@@ -48,6 +48,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresult[i] != (unsigned short)uY[i])
       abort ();
@@ -55,6 +56,7 @@ int main (void)
   
   foo2 (N);
   
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != (short)Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c
index c58391f495eb8d19aec9054f4d324a1bdf4461a4..29d178cf88d8df72b546772047b1e99a1a74043b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c
@@ -30,6 +30,7 @@ int main (void)
 
   foo (N,z+2);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (z[i+2] != x[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
index 4782d3f7d1066e1dcf5c3c1004d055eb56bd3aec..dd5fffaed8e714114dcf964ffc6b5419fba1aa9f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
@@ -31,6 +31,7 @@ int main (void)
 
   foo (N,z+2);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (z[i+2] != x[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c
index 2b185bb1f86ede937842596cec86f285a7c40d27..5bf796388f9c41083a69f3d6be3f5a334e9410a1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c
@@ -44,6 +44,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresult[i] != (unsigned int)uX[i])
       abort ();
@@ -51,6 +52,7 @@ int main (void)
   
   foo2 (N);
   
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != (int)X[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c
index ff5f8e989b2ea57fb265e8fca3a39366afb06aaa..6f9b81d1c01ab831a79608074f060b3b231f177d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresultX[i] != uX[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
index cf45703e01867a7954325f6f8642594e31da9744..a61f1a9a2215e238f6c67e229f642db6ec07a00c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
@@ -26,6 +26,7 @@ int main (void)
 
   foo (N,z+2);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (z[i+2] != x[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c
index 79ad80e3013e189c0efb9425de2b507cf486f39a..d2eff3a20986593a5185e981ae642fcad9a57a29 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c
@@ -30,6 +30,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresultX[i] != uX[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c
index 7f93938349f91c0490dad8ea2de3aec780c30b2b..069ef44154effb38f74792e1a00dc3ee236ee6db 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c
@@ -26,6 +26,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
index 1f82121df06181ad27478378a2323dbf478eacbe..04b144c869fc2a8f8be91a8252387e09d7fca2f2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
@@ -39,6 +39,7 @@ int main1 (int n, int * __restrict__ pib,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (ia[i] != pib[i] 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
index b0f74083f2ba992620ebdf3a3874f6c5fa29f84d..18ab9538675b3fd227ae57fafc1bfd1e840b8607 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
@@ -41,6 +41,7 @@ int main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
@@ -75,6 +76,7 @@ int main2 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c
index ad11d78a548735a67f76b3aa7f98731d88868b56..7c54479db1f684b9661d59816a3cd9b0e5f35619 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c
@@ -30,6 +30,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] + ic[i] 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
index 864b17ac640577753d8164f1ae3ea84181a553c1..73d3b30384ebc4f15b853a140512d004262db3ef 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
@@ -46,6 +46,7 @@ int main1 (int n,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (ia[i] != pib[i] + pic[i] 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c
index 315c9aba731ac28189cd5f463262fc973d52abe2..001671ebdc699ca950f6fd157bd93dea0871c5ab 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresultX[i] != uX[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c
index 8c5c611947f720c9ef744c33bdd09a78967d4a4c..3e599b3462d13a8afcad22144100f8efa58ac921 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c
@@ -44,6 +44,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresult[i] != (unsigned short)uX[i])
       abort ();
@@ -51,6 +52,7 @@ int main (void)
   
   foo2 (N);
   
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != (short)X[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
index 75b210c1a21c56c114f25b354fb368bdbe9462d5..357d006177f60a5376597929846efbfaa787f90b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
@@ -20,6 +20,7 @@ int main (int argc, const char **argv)
   int i;
   check_vect ();
   foo (31);
+#pragma GCC novector
   for (i = 0; i < 31; i++)
     if (ii[i] != i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
index 229ce987db5f5a5b48177d0c9d74e416e417d3f6..dc4c7a64aee4f800997d62550f891b3b35f7b633 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
@@ -23,6 +23,7 @@ int main (int argc, const char **argv)
   int i;
   check_vect ();
   foo (32);
+#pragma GCC novector
   for (i = 0; i < 32; i++)
     if (ii[i] != i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
index 16665265c4062c0a3acb31e01a1473dea3125685..268e65458bf839e2403a7ae3e4c679e7df6dcac7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
@@ -22,6 +22,7 @@ int main (int argc, const char **argv)
   int i;
   check_vect ();
   foo (33);
+#pragma GCC novector
   for (i = 0; i < 33; i++)
     if (ii[i] != i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c b/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c
index fca8ee0963860fa0a938db41c865e8225bf554c3..aa6e403b51ce8e9a29ddd39da5d252c9238ca7eb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c
@@ -28,10 +28,12 @@ int main (void)
    
   test1 (x + 16);
   
+#pragma GCC novector
   for (i = 0; i < 128; i++)
    if (x[i + 16] != 1234)
      abort ();
   
+#pragma GCC novector
   for (i = 0; i < 16; i++)
     if (x[i] != 5678
        || x[i + 144] != 5678)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c b/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c
index c924b12b02fd438d039d0de6b6639813047839e7..95b16196007488f52b2ec9a2dfb5a4f24ab49bba 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c
@@ -28,10 +28,12 @@ int main (void)
    
   test1 (x + 16, 1234);
   
+#pragma GCC novector
   for (i = 0; i < 128; i++)
    if (x[i + 16] != 1234)
      abort ();
   
+#pragma GCC novector
   for (i = 0; i < 16; i++)
     if (x[i] != 5678
        || x[i + 144] != 5678)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c
index f52f30aa24e83768f9beb03fb2ac7b17f37e0b77..129dab2ba1cfe8175644e0a2330349974efca679 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c
@@ -28,6 +28,7 @@ foo ()
       out[i] = res;
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)  
     if (out[i] != check_res[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c
index 5aa977df633c9a5d24e248b0c02ec21751f78241..26ad6fa65c6d1489aa1b1ce9ae09ea6f81ad44d2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c
@@ -27,6 +27,7 @@ foo ()
       out[i] = res;
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)  
     if (out[i] != check_res[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c
index f2ab30c63b2e28fbd453af68628d3491d6b4d034..4e3b8343ff7b4b1f43397fe2e71a8de1e89e9a74 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c
@@ -27,6 +27,7 @@ main1 ()
     }
 
   /* Check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != DIFF)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c
index 02f01cfb5791319d766f61465c2d1b64718674de..32c40fb76e325571347993571547fa12dd6255aa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c
@@ -28,6 +28,7 @@ int main (void)
   foo ();
 
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j++) {
       if (image[j][i] != j+i)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2.c
index 55023d594dd2e0cb18c3c9dc838ac831ede938da..a0a419c1547fc451b948628dafeb48ef2f836daa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2.c
@@ -28,6 +28,7 @@ int main (void)
   foo ();
 
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j++) {
       if (image[j][i] != j+i)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c
index 6b9fefedf3a5c9ee43c9201039987468710df62d..5ca835a2dda468bab1cbba969278a74beff0de32 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c
@@ -28,6 +28,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j++) {
       if (image[k][j][i] != j+i+k)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c
index 3a4dafee0720bd1a5e532eb2c0062c5eb78556b6..f9924fcb2b40531e8e7a4536d787b5d1b6e2b4ee 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c
@@ -28,6 +28,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j++) {
       if (image[k][j][i] != j+i+k)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c
index bb4e74b7b333ce036159db4cbf5aaa7107dc35d9..218df61cf4b18709cb891969ae53977081a86f1d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c
@@ -27,6 +27,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j++) {
       if (image[k+i][j][i] != j+i+k)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c
index 6adde9769215e8c98132ec91ab015e56b710c47a..36c9681201532960b3eecda2b252ebe83036a95a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c
@@ -28,6 +28,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j+=2) {
       if (image[k][j][i] != j+i+k)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c
index bf6abfef01fa96904adbf350935de3609550f2af..678d7e46a5513e0bdeaf0ec24f2469d58df2cbc5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c
@@ -28,6 +28,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j+=2) {
       if (image[k][j][i] != j+i+k)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c
index b75281bc3187f84824e1360ba92a18f627686aa5..81a4fc407086372c901b1ff34c75cada3e8efb8a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c
@@ -27,6 +27,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < i+1; j++) {
       if (image[k][j][i] != j+i+k)
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c
index fdc8a0544dd941f28a97a22e706bd3f5c3c9d2a3..231989917d7c4d5ff02b4f13a36d32c543114c37 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < N; j++) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3.c
index 921db48a0f76763bc724d41f90c74472da8e25fb..c51787fe5753f4317b8c1e82c413b009e865ad11 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < N; j++) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c
index fd841b182e3c81eed43a249fe401c6213814ea36..7ae931e39be5a4e6da45242b415459e073f1384a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < N; j++) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
index d26440d1a64e887aa2cd6ccf1330cb34d244ef12..bfadac0c5e70b61b23b15afa9271ac9070c267c1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < N; j++) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c
index b915c4e370c55293ec00665ddd344b9ddafec3b4..1e2bbf1e7bac29563a530c4bbcd637d8541ddfca 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N/2; i++) {
     diff = 0;
     for (j = 0; j < N; j++) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c
index 091c6826f66acb07dbc412ae687d72c84800146d..952bba4d911956c49a515276c536a87a68433d40 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < N; j+=4) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4.c b/gcc/testsuite/gcc.dg/vect/vect-outer-4.c
index 9614b777aded3c9d2f5229d27ce8e5cfbce0c7d2..8a803cd330f25324669467a595534100878f3ddc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4.c
@@ -38,6 +38,7 @@ int main (void)
 
   foo ();
   
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < M; j+=4) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c
index b656064697c93177cb9cd9aae8f9f278b9af40b0..587eabaf004705fb6d89882a43a628921361c30e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < M; j+=4) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c b/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c
index 443a00d49e19dae2a0dd32d6e9e28d2bf5972201..0c9115f60a681f48125dfb2a6428202cc1ec7557 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo ();
   
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < M; j+=4) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-5.c b/gcc/testsuite/gcc.dg/vect/vect-outer-5.c
index 10b558fd20905d2c8b9915d44a41e89b406028d9..67be075278847ea09e309c5d2ae2b4cf8c51b736 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-5.c
@@ -38,6 +38,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N-20; i++)
     {
       s = 0;
@@ -57,6 +58,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < 4; i++)
     {
       if (B[i] != E[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-6.c b/gcc/testsuite/gcc.dg/vect/vect-outer-6.c
index 201ca8424828d6dabe1c6d90dff8396438a71ff4..13a5496f70c069f790d24d036642e0715a133b3b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-6.c
@@ -48,6 +48,7 @@ int main ()
   main1();
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       s = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c
index 6299e9fed4233b3ec2c0b9892afdca42edf0bee0..8114934ed03332aaa682c6d4b5a7f62dfc33a51e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c
@@ -62,6 +62,7 @@ int main (void)
   foo ();
   fir ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (out[i] != fir_out[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c
index d575229f2fb3bb6ece1fbc013019ebb0fbaa505e..9c4be4b9f658f7abd1e65b7b5a9124a5670f7ab9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c
@@ -66,6 +66,7 @@ int main (void)
   foo ();
   fir ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (out[i] != fir_out[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c
index 9414e82f3edb1ea00587b916bfaf66847ac07574..4f1ccfccfa229105eb4e8a5c96a5ebfb13384c5d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c
@@ -66,6 +66,7 @@ int main (void)
   foo ();
   fir ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (out[i] != fir_out[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c
index 0d181dfec24a212d430a1cac493ee914ebe25325..1c68c6738580d8670b7b108c52987d576efee4ac 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c
@@ -62,6 +62,7 @@ int main (void)
   foo ();
   fir ();
   
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (out[i] != fir_out[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c
index 217b32cd2bde12247f94f36787ccdf67bb014ba2..795bff5f3d5f1629b75cdc7fefdc48ff4c05ad8a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c
@@ -66,6 +66,7 @@ int main()
       t2[i] = z1[i]; z1[i] = 1.0f;
     }
   foo2 ();  /* scalar variant.  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     if (x1[i] != t1[i] || z1[i] != t2[i])
       abort ();	
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
index 3ae1020936f960a5e46d6c74bee80d3b52df6db5..ead8d6f8e79187f0054d874b1d6e5fe3c273b5ca 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
@@ -67,6 +67,7 @@ int main ()
       t2[i] = z1[i]; z1[i] = 1.0f;
     }
   foo2 (n);  /* scalar variant.  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     if (x1[i] != t1[i] || z1[i] != t2[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c
index 59e54db075254498b34f673198f8f4f373b728a5..a102ddd7d8d4d9182436646e1ca4d0bd1dd86479 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c
@@ -70,6 +70,7 @@ int main()
       t2[i] = z1[i]; z1[i] = 1.0f;
     }
   foo2 ();  /* scalar variant.  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     if (x1[i] != t1[i] || z1[i] != t2[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c
index ec1e1036f57022716361977fb419b0806e55123d..0e5388b46ce80b610d75e18c725b8f05881c244b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c
@@ -28,6 +28,7 @@ int main ()
 
   foo ();
 
+#pragma GCC novector
   for (int i = 0; i < 20; i++)
     {
       double suma = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c
index 53865d4737b1333c3eb49723d35d2f0e385049a3..3dce51426b5b83d85bc93aaaa67bca3e4c29bc44 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c
@@ -35,6 +35,7 @@ int main ()
 
   foo ();
 
+#pragma GCC novector
   for (int i = 0; i < 20; i++)
     {
       double suma = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
index 9a5141ee6ecce133ce85edcf75603e0b3ce41f04..a7ce95bcdcefc1b71d84426290a72e8891d8775b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
@@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
index f2d284ca9bee4af23c25726a54866bfaf054c46c..21fbcf4ed70716b47da6cbd268f041965584d08b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
@@ -30,6 +30,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
index 222d854b2d6e3b606e83131862c2d23a56f11829..1e48dab5ccb4b13c82800d890cdd5a5a5d6dd295 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
@@ -43,6 +43,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, d);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       int res = BASE_B + BASE_C + i * 9;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
index b25db881afbc668bb163915a893bfb8b83243f32..08a65ea551812ba48298884ec32c6c7c5e46bdd2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
@@ -36,6 +36,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
index d31050ee926ac7e12c8bce99bf3edc26a1b11fbe..bd7acbb613f47fd61f85b4af777387ae88d4580a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
@@ -38,6 +38,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
index 333d74ac3e7bdf99cc22b8fc7e919e39af7d2ca4..53fcfd0c06c14e5d9ddc06cdb3c36e2add364d3b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
@@ -33,6 +33,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4)))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
index ecb74d7793eeaa80b0d48479b2be6c68e64c61b0..aa58cd1c95789ad4f17317c5fa501385a185edc9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
@@ -35,6 +35,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4)))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
index 11546fe1a502a02c750ea955f483bc3a8b3a0ac7..c93cd4d09af5fc602b5019352073404bb1f5d127 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
@@ -40,6 +40,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, d, e);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != i * 2 + 3
 	|| b[i] != i + 100
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
index 82aec9f26517b2e00568f3240ff88d954af29bea..4bbb30ac8aca529d062e0daacfe539177ab92224 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
@@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (int *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
index 0bcbd4f1b5315ec84e4aa3bd92e058b6ca9ea0ec..ad423f133c0bc25dfad42e30c34eceb5a8b852ab 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
@@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (int *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
index 47f970d231ee61c74c7c4d5b3f9e9bab0673cfe2..81292d42f0d695f98b62607053daf8a5c94d98d3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
@@ -40,6 +40,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, d, e);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != i * 2 + 3
 	|| b[i] != i + 100
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
index 6e13f260f1124ef221bae41b31f8f52ae35162d3..361f77081a6d0a1d30051107f37aa4a4b764af4f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
@@ -38,6 +38,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, d, e);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != i * 2 + 3
 	|| b[i] != i + 100
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c
index 187bdf159feaa770b8497c020bd3bc82becdea15..830f221019871a3df26925026b7b8c506da097db 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c
@@ -37,6 +37,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, d, 0x73);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (b[i] != ((i * 2 + 3) ^ 0x73)
 	|| a[i] != ((i * 11) | b[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
index 6f89aacbebf5094c7b1081b12c7fcce1b97d536b..55de14161d85db871ae253b86086a1341eba275c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
@@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
index a1e1182c6067db47445ad07b77e5c6e067858488..3d833561972da4a128c1bc01eff277564f084f14 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
@@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
index 03a6e6795ec68e5f9a35da93ca7a8d50a3012a21..6b3a2b88abfb6a5cd4587d766b889825c2d53d60 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
@@ -28,6 +28,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
index 0ef377f1f58a6f6466380a59c381333dbc4805df..60c9c2cc1ec272b46b7bb9a5cf856a57591425b0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
@@ -32,6 +32,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
index 269df5387d20c859806da03aed91d77955fa651a..c2ab11a9d325c1e636003e61bdae1bab63e4cf85 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
@@ -37,6 +37,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (BASE_B + BASE_C + i * 9) >> 1)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
index 314a6828c53c161d2e63b88bdecf0cee9070a794..1d55e13fb1fbc4273d3a64da20dc1e80fb760296 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
@@ -39,6 +39,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, D);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
index 5baba09a575f0f316aac1a967e145dbbbdade5b4..36bfc68e05357359b8d9bdfe818910a3d0ddcb5a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
@@ -40,6 +40,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       int res = BASE_B + BASE_C + i * 9;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c b/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c
index 7980d4dd6438d9a063051c78608f73f1cea1c740..717850a166b2b811797cf9cdd0753afea676bf74 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c
@@ -21,6 +21,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i] != ib[i+2] + ib[i+6])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c b/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
index f6fc134c8705567a628dcd62c053ad6f2ca2904d..5e5a358d34bece8bbe5092bf2d617c0995388634 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i] != ib[i+2] + ib[i+6])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c b/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c
index 33088fb090271c3b97fae2300e5d7fc86242e246..1b85f14351242304af71564660de7db757294400 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c
@@ -18,6 +18,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i] != ib[i] + ib[i+5])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
index 64de22a1db4d7a8b354ad3755685171308a79a00..698ca5bf0672d3bfce0121bd2eae27abb2f75ca2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
@@ -28,6 +28,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c);
+#pragma GCC novector
   for (int i = 1; i < 64; ++i)
     if (b[i] != a[i] - a[i-1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
index 086b48d9087c02ccbc0aaf36f575a3174f2916af..777051ee4a16a47f20339f97e13ad396837dea9a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
@@ -29,6 +29,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c);
+#pragma GCC novector
   for (int i = 1; i < 64; ++i)
     if (b[i] != a[i] - a[i-1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
index 3389736ead98df2207a89de3ecb34a4a95faa6f5..aeb7da3877d7e0df77d6fee1a379f352ae2a5750 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
@@ -29,6 +29,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c);
+#pragma GCC novector
   for (int i = 1; i < 64; ++i)
     if (b[i] != a[i] - a[i-1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
index c0b73cd8f3322ae01b7a1889657bc92d38fa4af6..f4ab59671b7934e3e6f5d893159a3618f4aa3898 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
@@ -31,6 +31,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c);
+#pragma GCC novector
   for (int i = 2; i < 64; i+=2)
     if (b[i] != a[i] - a[i-2]
 	|| b[i+1] != a[i+1] - a[i-1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
index 7327883cc31ae4a37e5e4597b44b35e6376b4ed2..2fed60df68cdfbdc3ebf420db51d132ed335dc14 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
@@ -32,6 +32,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c);
+#pragma GCC novector
   for (int i = 2; i < 64; i+=2)
     if (b[i] != a[i] - a[i-2]
 	|| b[i+1] != a[i+1] - a[i-1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
index f678b326f1043d2bce51b1d652de5ee2b55d6d0f..c170f4c345cdee1d5078452f9e301e6ef6dff398 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
@@ -28,6 +28,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c, 63);
+#pragma GCC novector
   for (int i = 1; i < 63; ++i)
     if (b[i] != a[i] - a[i-1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
index 484efb1e8c826a8dafb43ed18e25794951418a9c..49ecbe216f2740329d5cd2169527a9aeb7ab844c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
@@ -70,6 +70,7 @@ main (void)
       fns[i].div (b, a, N);
       fns[i].mod (c, a, N);
 
+#pragma GCC novector
       for (int j = 0; j < N; j++)
 	if (a[j] != (b[j] * p + c[j]))
           __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c b/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c
index dfd8ebace5610b22cc0da33647953ae33e084a42..0c4025abceb0e36092f5f7be1f813e4a6ebeda15 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c
@@ -88,6 +88,7 @@ main ()
   f4 (4095);
   if (a[0] != (-2048 << 8))
     abort ();
+#pragma GCC novector
   for (i = 1; i < 4096; i++)
     if (a[i] != ((1 + ((i - 2048) % 16)) << 8))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-shift-1.c
index 0c3086b1d683441e9b7d0096d4edce37e86d3cc1..d5fc4748758cea2762efc1977126d48df265f1c3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-shift-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-1.c
@@ -21,6 +21,7 @@ int main ()
     A[i] = A[i] >> 3;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-3.c b/gcc/testsuite/gcc.dg/vect/vect-shift-3.c
index a1b4b0752291e64d51206fca644e241c8e0063a9..0a9d562feb56ec69e944d0a3581853249d9642ae 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-shift-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-3.c
@@ -26,6 +26,7 @@ int main()
 
   array_shift ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (dst[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-4.c b/gcc/testsuite/gcc.dg/vect/vect-shift-4.c
index 09f6e5a9584099b34e539b72dbe95e33da83cd20..d53faa52ee88b00d09eeefa504c9938084fa6230 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-shift-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-4.c
@@ -26,6 +26,7 @@ int main()
 
   array_shift ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (dst[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-1.c
index 7c3feeeffae363b8ad42989a3569ca394519a414..09722ae090d0edb875cb91f5b20da71074aee7d3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-1.c
@@ -44,19 +44,23 @@ main ()
 {
   check_vect ();
   foo ();
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != 1)
       abort ();
   x = 1;
   foo ();
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != 2)
       abort ();
   baz ();
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != 3)
       abort ();
   qux ();
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != 4)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-10.c b/gcc/testsuite/gcc.dg/vect/vect-simd-10.c
index e49566a3847a97dee412148bed63a4b69af8dd1b..af0999a726288890a525fe18966331e0cb5c0cad 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-10.c
@@ -76,6 +76,7 @@ main ()
   if (r * 16384.0f != 0.125f)
     abort ();
   float m = -175.25f;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s *= a[i];
@@ -91,6 +92,7 @@ main ()
   if (bar () != 592.0f)
     abort ();
   s = FLT_MIN_VALUE;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (s < a[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-11.c b/gcc/testsuite/gcc.dg/vect/vect-simd-11.c
index e7d8aa0eb03879fcf0a77a512afc3281fbeabe76..2620dfebbc0dde80d219660dcead43ae01c7c41f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-11.c
@@ -109,6 +109,7 @@ main ()
       || r2 != (unsigned short) r
       || r3 != (unsigned char) r)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -129,6 +130,7 @@ main ()
       || s3 != (unsigned char) (1024 * 1023))
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
@@ -152,6 +154,7 @@ main ()
       || r3 != (unsigned char) r)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -174,6 +177,7 @@ main ()
       || s3 != (unsigned char) (1024 * 1023))
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-12.c b/gcc/testsuite/gcc.dg/vect/vect-simd-12.c
index cdfec81a6e6d761b6959fd434fc3367ad01d7026..45b55384006b1674c36a89f4539d2ffee2e4236e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-12.c
@@ -77,6 +77,7 @@ main ()
   foo (a, b);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -88,6 +89,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -101,6 +103,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -112,6 +115,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-13.c b/gcc/testsuite/gcc.dg/vect/vect-simd-13.c
index aee5244d85e18e707163a34cb93a9cd5b1317fc3..3ef4aa9a991c0b6259f3b3057616c1aa298663d9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-13.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-13.c
@@ -79,6 +79,7 @@ main ()
   foo (a, b);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -90,6 +91,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -103,6 +105,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -114,6 +117,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-14.c b/gcc/testsuite/gcc.dg/vect/vect-simd-14.c
index 9e73792ed7c36030b2f6885e1257a66991cdc4d1..c8a38f85ad4f29c9bbc664a368e23254effdd976 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-14.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-14.c
@@ -76,6 +76,7 @@ main ()
   if (r * 16384.0f != 0.125f)
     abort ();
   float m = -175.25f;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -89,6 +90,7 @@ main ()
   if (bar () != 592.0f)
     abort ();
   s = FLT_MIN_VALUE;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-15.c b/gcc/testsuite/gcc.dg/vect/vect-simd-15.c
index 91e34cd6428c4b841ab55226e49a5fc10444df57..6982a59da78276bad2779827ee0b8c1e1691e2e3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-15.c
@@ -109,6 +109,7 @@ main ()
       || r2 != (unsigned short) r
       || r3 != (unsigned char) r)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s
@@ -129,6 +130,7 @@ main ()
       || s3 != (unsigned char) (1024 * 1023))
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s
@@ -152,6 +154,7 @@ main ()
       || r3 != (unsigned char) r)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s
@@ -174,6 +177,7 @@ main ()
       || s3 != (unsigned char) (1024 * 1023))
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-16.c b/gcc/testsuite/gcc.dg/vect/vect-simd-16.c
index ee4459a9341815c7ac4a5f6be4b9ca7679f13022..1ac13a5c5b4f568afa448af8d294d114533c061b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-16.c
@@ -41,12 +41,14 @@ main ()
   check_vect ();
   if (foo (a) != 64)
     abort ();
+#pragma GCC novector
   for (i = 0; i < 64; ++i)
     if (a[i] != i)
       abort ();
     else
       a[i] = -8;
   bar (a);
+#pragma GCC novector
   for (i = 0; i < 64; ++i)
     if (a[i] != i + 1)
       abort ();
@@ -54,6 +56,7 @@ main ()
       a[i] = -8;
   if (baz (a) != 64)
     abort ();
+#pragma GCC novector
   for (i = 0; i < 64; ++i)
     if (a[i] != i + 2)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-17.c b/gcc/testsuite/gcc.dg/vect/vect-simd-17.c
index 951ba3afd9e332d7cd22addd273adf733e0fb71a..79b3602a6c08969a84856bf98ba59c18b45d5b11 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-17.c
@@ -52,12 +52,14 @@ doit (void)
   if (i != 11 || j != 101 || x != 10340 || niters != 550 || err)
     abort ();
   for (i = 1; i <= 10; i++)
+#pragma GCC novector
     for (j = 1; j <= 10 * i; j++)
       if (k[i][j] == 3)
 	k[i][j] = 0;
       else
 	abort ();
   for (i = 0; i < 11; i++)
+#pragma GCC novector
     for (j = 0; j < 101; j++)
       if (k[i][j] != 0)
 	abort ();
@@ -101,12 +103,14 @@ doit (void)
   if (i != 10 || j != 90 || x != 9305 || niters != 450 || err)
     abort ();
   for (i = 0; i < 10; i++)
+#pragma GCC novector
     for (j = 0; j < 10 * i; j++)
       if (k[i][j] == 3)
 	k[i][j] = 0;
       else
 	abort ();
   for (i = 0; i < 11; i++)
+#pragma GCC novector
     for (j = 0; j < 101; j++)
       if (k[i][j] != 0)
 	abort ();
@@ -156,6 +160,7 @@ doit (void)
       else
 	abort ();
   for (i = 0; i < 11; i++)
+#pragma GCC novector
     for (j = 0; j < 101; j++)
       if (k[i][j] != 0)
 	abort ();
@@ -199,12 +204,14 @@ doit (void)
   if (i != 11 || j != 10 || x != 9225 || niters != 25 || err)
     abort ();
   for (i = 1; i < 10; i += 2)
+#pragma GCC novector
     for (j = 1; j < i + 1; j++)
       if (k[i][j] == 3)
 	k[i][j] = 0;
       else
 	abort ();
   for (i = 0; i < 11; i++)
+#pragma GCC novector
     for (j = 0; j < 101; j++)
       if (k[i][j] != 0)
 	abort ();
@@ -244,11 +251,13 @@ doit (void)
       }
   if (i != 16 || j != 4 || x != 5109 || niters != 3 || err)
     abort ();
+#pragma GCC novector
   for (j = -11; j >= -41; j -= 15)
     if (k[0][-j] == 3)
       k[0][-j] = 0;
     else
       abort ();
+#pragma GCC novector
   for (j = -11; j >= -41; j--)
     if (k[0][-j] != 0)
       abort ();
@@ -288,6 +297,7 @@ doit (void)
       }
   if (/*i != 11 || j != 2 || */x != -12295 || niters != 28 || err)
     abort ();
+#pragma GCC novector
   for (j = -34; j <= -7; j++)
     if (k[0][-j] == 3)
       k[0][-j] = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-18.c b/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
index cca350f5c21125fa4380611a1ba42be317fd9d85..e454abe88009a7572cfad1397bbd5770c7086a6b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
@@ -25,12 +25,14 @@ main ()
   int i, r;
   check_vect ();
   r = foo (78, p);
+#pragma GCC novector
   for (i = 0; i < 10000 / 78; i++)
     if (p[i] != 78 * i)
       abort ();
   if (r != (10000 / 78) * (10000 / 78 + 1) / 2 * 78 * 3)
     abort ();
   r = foo (87, p);
+#pragma GCC novector
   for (i = 0; i < 10000 / 87; i++)
     if (p[i] != 87 * i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-19.c b/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
index 67e25c0e07eeff8e3453a8a3b5e4df54b16f3f30..4d25b43f5dca9df6562a146e12e1c3542d094602 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
@@ -25,12 +25,14 @@ main ()
   int i, r;
   check_vect ();
   r = foo (78, 0, 10000, p);
+#pragma GCC novector
   for (i = 0; i < 10000 / 78; i++)
     if (p[i] != 78 * i)
       abort ();
   if (r != (10000 / 78) * (10000 / 78 + 1) / 2 * 78 * 3)
     abort ();
   r = foo (87, 0, 10000, p);
+#pragma GCC novector
   for (i = 0; i < 10000 / 87; i++)
     if (p[i] != 87 * i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-20.c b/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
index 57217c8a6ba4c15095f777cfa64aee9ffbe3e459..9ba7c3ce956a613e175ee6bd1f04b0531e6a79bd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
@@ -27,6 +27,7 @@ main ()
   check_vect ();
   r = foo (78, 0, 10000, p);
   for (j = 0; j < 7; j++)
+#pragma GCC novector
     for (i = 0; i < 10000 / 78; i++)
       if (p[j * (10000 / 78 + 1) + i] != 78 * i)
 	abort ();
@@ -34,6 +35,7 @@ main ()
     abort ();
   r = foo (87, 0, 10000, p);
   for (j = 0; j < 7; j++)
+#pragma GCC novector
     for (i = 0; i < 10000 / 87; i++)
       if (p[j * (10000 / 87 + 1) + i] != 87 * i)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-8.c b/gcc/testsuite/gcc.dg/vect/vect-simd-8.c
index 5d10ad90501835bf6cac2c2d81ee98bc6ce6db5b..a3c2decee2e36949950ca87a0a9942bc303ee633 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-8.c
@@ -77,6 +77,7 @@ main ()
   foo (a, b);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -88,6 +89,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
@@ -101,6 +103,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -112,6 +115,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-9.c b/gcc/testsuite/gcc.dg/vect/vect-simd-9.c
index 52eb24f680f1362ee93b7a22de5fd46d37119216..b652759e5ad5ec723a644cf9c6cb31677d120e2d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-9.c
@@ -79,6 +79,7 @@ main ()
   foo (a, b);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -90,6 +91,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
@@ -103,6 +105,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -114,6 +117,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
index cd65fc343f1893accb6f25a6222a22f64a8b4b2e..c44bfe511a5743198a647247c691075951f2258d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
@@ -46,10 +46,12 @@ main ()
   int i;
   check_vect ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (array[i] != (i < 30 ? 5 : i * 4 + 123))
       abort ();
   baz ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (array[i] != (i < 30 ? 5 : i * 8 + 123))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c
index 03acd375e089c3a430adbed8d71197f39d7c512b..ed63ff59cc05e5f0a240376c4ca0985213a7eb48 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c
@@ -65,6 +65,7 @@ main ()
   check_vect ();
   fn3 ();
   fn1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
@@ -72,6 +73,7 @@ main ()
       abort ();
   fn3 ();
   fn2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c
index 29acde22f1783e8b11376d1ae2e702e09182350c..4974e5cc0ccdc5e01bf7a61a022bae9c2a6a048b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c
@@ -44,19 +44,23 @@ main ()
   if (sizeof (int) * __CHAR_BIT__ < 32)
     return 0;
   bar (a + 7);
+#pragma GCC novector
   for (i = 0; i < N / 2; i++)
     if (a[i + 7] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
       abort ();
   bar (a);
+#pragma GCC novector
   for (i = 0; i < N / 2; i++)
     if (a[i] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
       abort ();
 #if 0
   baz (a + 7);
+#pragma GCC novector
   for (i = 0; i < N / 2; i++)
     if (a[i + 7] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
       abort ();
   baz (a);
+#pragma GCC novector
   for (i = 0; i < N / 2; i++)
     if (a[i] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c
index 675ac7026b67edda2e573367643eb68063559bc2..866f1000f34098fb578001395f4a35e29cc8c0af 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c
@@ -32,6 +32,7 @@ main ()
   int i;
   check_vect ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (array[i] != ((i >> 1) + (-3 * i)))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
index ffcbf9380d609d7a3ed7420a38df5c11f632b46a..feab989cfd595f9fdb839aa8bd3e8486751abf2f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
@@ -44,6 +44,7 @@ main ()
   check_vect ();
   baz ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (array[i] != 5 * (i & 7) * i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
index 18d68779cc5dd8faec77a71a8f1cfa9785ff36ed..fef48c5066918a42fa80f1e14f9800e28ddb2c96 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
@@ -37,6 +37,7 @@ main ()
   int i;
   check_vect ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (d[i] != (i < 30 ? 5 : i * 4 + 123) || e[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
index e9af0b83162e5bbd40e6a54df7d656ad956a8fd8..42414671c254ffcd93169849d7a982861aa5ac0b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
@@ -40,6 +40,7 @@ main ()
   int i;
   check_vect ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (d[i] != (i < 30 ? 5.0f : i * 4 + 123.0f) || e[i] || f[i] != 1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
index 46da496524d99ff70e3673682040c0d5067afe03..620cec36e4c023e1f52160327a3d5ba21540ad3b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
@@ -35,6 +35,7 @@ main ()
   int i;
   check_vect ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (d[i] != i * 4 + 123 || e[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c
index 6143a91eaf078d5b73e608bcfa080b70a5896f3d..440091d70e83be80574a6fcf9e034c53aed15786 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c
@@ -57,6 +57,7 @@ main ()
   check_vect ();
   baz ();
   bar (0);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != 2 * i || b[i] != 6 - 7 * i
 	|| c[i] != 6 - 5 * i + ((i & 31) << 4))
@@ -64,6 +65,7 @@ main ()
     else
       a[i] = c[i];
   bar (17);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != 6 - 5 * i + ((i & 31) << 4)
 	|| b[i] != 6 - 7 * i
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c
index a0316e9e5813ac4c9076aaf5f762b9cc5dc98b1e..62246e28837272ef1e18860912643422f6dce018 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c
@@ -57,6 +57,7 @@ main ()
   check_vect ();
   baz ();
   bar (0);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != 2 * i || b[i] != 6 - 7 * i
 	|| c[i] != 6 - 5 * i + ((i & 31) << 4))
@@ -64,6 +65,7 @@ main ()
     else
       a[i] = c[i];
   bar (17);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != 6 - 5 * i + ((i & 31) << 4)
 	|| b[i] != 6 - 7 * i
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
index f414285a170c7e3469fdad07256ef09e1b46e17b..11ea2132689137cfb7175b176e39539b9197a330 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
@@ -76,6 +76,7 @@ main ()
   check_vect ();
   fn3 ();
   fn1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
@@ -83,6 +84,7 @@ main ()
       abort ();
   fn3 ();
   fn2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c
index a968b9ce91a17c454f66aa76ec8b094e011e1c74..0112e553f8f130b06ee23a8c269a78d7764dcfff 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c
@@ -76,6 +76,7 @@ main ()
   check_vect ();
   fn3 ();
   fn1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
@@ -83,6 +84,7 @@ main ()
       abort ();
   fn3 ();
   fn2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c
index da47a824cb6046dcd9808bd7bd80161dbc0531b5..1531553651ceb6185ce16ab49f447496ad923408 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c
@@ -46,6 +46,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].b != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c
index d53b7669a6b50d6bc27e646d08af98ca6fd093e3..b8d094723f9035083a244cfcee98d3de46512206 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c
@@ -33,6 +33,7 @@ main1 ()
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
index 37ff3abe97d60d9b968addaee7812cb0b05b6f44..0f1344c42017fc2a5bfda3a9c17d46fbdd523127 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
@@ -44,6 +44,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c
index 9237a9074deeb72c4d724771d5397d36593ced7c..b0d36486714159c88419ce9e793c27a398ddcbcb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c
@@ -39,6 +39,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].b != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c
index 62a8a51e2034b1065a4438a712a80e0a7c149985..1c9906fa65237a7b9e0bbd2162e9c56b6e86074f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c
@@ -39,6 +39,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i] != arr[i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
index f64a1347350a465b9e7a0c123fe2b5bcbc2bf860..dc9ad168c7161c15f6de4a57d53e301e6754e525 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
@@ -33,6 +33,7 @@ main1 ()
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].a
@@ -49,6 +50,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c
index 2add5b489915cffda25f3c59b41bd1c44edf16ce..d35e427996f472ce9fffdf9570fb6685c3115037 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c
@@ -62,6 +62,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != check_res[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
index 2b7a1a4bb77f4dce44958c50864a0a6ecac90c53..a9524a9d8e5cb152ec879db68f316d5568161ec1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
@@ -51,6 +51,7 @@ main1 ()
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c
index e487de8b4e7d8e092054a73b337a345ba00e4e02..95ff41930d3f1ab95f0a20947e0527f39c78e715 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c
@@ -71,6 +71,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != check_res[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
index 0f3347e8bb2200f48927b21938e7ebd348a73ada..b2dd1aee116d212bda7df0b0b1ca5470bd35ab83 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
@@ -56,6 +56,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-float.c b/gcc/testsuite/gcc.dg/vect/vect-strided-float.c
index 6d6bfae7bc5ce4cbcaeaadc07856773e6d77bdb4..716cce3eecbec0390f85f393e9cc714bd1a1faae 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-float.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-float.c
@@ -22,6 +22,7 @@ main1 (void)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (a[i*2] != b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c b/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c
index 82727e595c166a52c8a1060339259ec7c39b594f..59008499192388c618f3eb38d91d9dcb5e47e3d9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c
@@ -35,6 +35,7 @@ main1 (s *arr, ii *iarr)
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].b != arr[i].b - arr[i].a 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c
index 0fac615011601d45c64e83be1a6ec1e1af407192..350223fa23ace9253e8e56bbbbd065e575639b19 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c
@@ -35,6 +35,7 @@ main1 (s *arr, ii *iarr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].b != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c b/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c
index 8c560480bc4eac50c381ed51cfbc6ccc696d0424..e988c5c846911a875a188cbb6ec8a4e4b80b787a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c
@@ -35,6 +35,7 @@ main1 (s * __restrict__  pIn, s* __restrict__ pOut)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (q->a != p->a + 5
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
index dcae9c26d8621dd393f00d81257262a27913d7a8..37b8eb80ce0ce0dfe1ce5f9e5c13618bffbe41ff 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
@@ -24,6 +24,7 @@ main (int argc, char **argv)
     }
   loop ();
   __asm__ volatile ("" : : : "memory");
+#pragma GCC novector
   for (int i = 0; i < N; i++)
     {
       if (out[i] != i*2 + 7)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c
index 6be939eea167992aade397ada0ee50d4daa43066..a55cd32e5896be4c1592e4e815baccede0f30e82 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c
@@ -38,6 +38,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != a[i] + 3
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c
index 9d1ad579e6a607f34ec953395f741f180474a77a..170f23472b967cedec88c1fa82dfb898014a6d09 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c
@@ -34,6 +34,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].c != a[i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c
index a081d4e396e36a4633eb224d927543c7379d3108..11c2f2c4df60d8238830c188c3400a324444ab4d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c
@@ -22,6 +22,7 @@ main1 (void)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (a[i*2] != b[i] + c[i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store.c
index e8303b63bd4812e0643dc96888eeee2ea8ca082a..dfdafe8e8b46ea33e3c9ed759687788784a22607 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-store.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store.c
@@ -19,12 +19,14 @@ int main()
   float src[] = {1, 2, 3, 4, 5, 6, 7, 8};
   float dest[64];
   check_vect ();
+#pragma GCC novector
   for (stride = 0; stride < 8; stride++)
     {
       sumit (dest, src, src, stride, 8);
       if (!stride && dest[0] != 16)
 	abort();
       else if (stride)
+#pragma GCC novector
 	for (i = 0; i < 8; i++)
 	  if (2*src[i] != dest[i*stride])
 	    abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c
index 7d264f39c60d668927232a75fe3843dbee087aa5..004db4e1f84735d8857c5591453158c96f213246 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c
@@ -25,6 +25,7 @@ main1 (s *arr)
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c
index 63a4da797cbeb70bde0b1329fe39f510c24a990c..5d94e8f49bc41431df9de2b809c65e48cc269fa0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c
@@ -18,6 +18,7 @@ check1 (s *res)
 {
   int i;
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (res[i].a != C (i)
 	|| res[i].b != A (i)
@@ -30,6 +31,7 @@ check2 (unsigned short *res)
 {
   int i;
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (res[i] != (unsigned short) (A (i) + B (i) + C (i)))
       abort ();
@@ -40,6 +42,7 @@ check3 (s *res)
 {
   int i;
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (res[i].a != i
 	|| res[i].b != i
@@ -52,6 +55,7 @@ check4 (unsigned short *res)
 {
   int i;
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (res[i] != (unsigned short) (A (i) + B (i)))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c
index ee8ea0d666db4b7671cd3f788fc7f6056189f3da..547ad9b9ee3d35802d3f8d7b9c43d578fb14f828 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c
@@ -34,6 +34,7 @@ main1 (s *arr)
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c
index fe41dbd9cf452b9452084e988d48ede232f548bf..8f58e24c4a8b8be2da0a6c136924a370b9952691 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c
@@ -29,6 +29,7 @@ main1 (s *arr)
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c
index a88c0f08456cf278c4fa5a5b9b0a06900cb7c9be..edb13d1b26f5963113917e8882f199c7dd4d8de7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c
@@ -37,6 +37,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c
index cddd0c88f42a99f526362ca117e9386c013c768d..0c2bd9d8cbde5e789474595db519d603b374e74c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c
@@ -29,6 +29,7 @@ main1 (unsigned short *arr, ii *iarr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i] != arr[i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
index ab841205e4f5b3c0aea29f60045934e84644a6a7..fd7920031dcf6df98114cfde9a56037d655bb74d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
@@ -25,6 +25,7 @@ main1 (s *arr)
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b
@@ -41,6 +42,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c
index 0afd50db0b8de7758faf7f2bff14247a27a7ee38..ae2345a9787804af0edc45d93f18e75d159326b0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c
@@ -24,6 +24,7 @@ main1 (s *arr)
       ptr++;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c
index ef532251465e5b1eb16e820fc30844a7995b82a9..c7a1da534baea886fe14add1220c105153d6bb80 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c
@@ -39,6 +39,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != check_res[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
index 04f18fbb591d9dc50d56b20bce99cb79903e5e27..2a068d821aebee8ab646ff1b4c33209dc5b2fcbf 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
@@ -37,6 +37,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c
index b5eb87f4b96e1a577930654f4b1709024256e90e..ac7bf000196b3671044de57d88dd3a32080b68a8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c
@@ -41,6 +41,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != check_res[i].a
@@ -64,6 +65,7 @@ main1 (s *arr)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c
index 69b970ef33b9dd8834b10baf7085b88a0c441a46..0a6050ae462332b8d74043fce094776892a80386 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c
@@ -53,6 +53,7 @@ main1 (s *arr, int n)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     { 
       if (res[i].c != arr[i].b + arr[i].c
@@ -67,6 +68,7 @@ main1 (s *arr, int n)
    }
 
   /* Check also that we don't do more iterations than needed.  */
+#pragma GCC novector
   for (i = n; i < N; i++)
     {
       if (res[i].c == arr[i].b + arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
index f1d05a5aaf9f6885b921c5ae3370d9c17795ff82..9ead5a776d0b1a69bec804615ffe7639f61f993f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
@@ -39,6 +39,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b + arr[i].c
@@ -62,6 +63,7 @@ main1 (s *arr)
     }
   
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c
index b703e636b49f8c7995c4c463b38b585f79acbdf2..176c6a784bc73e0300e3114a74aba05dc8185cac 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c
@@ -44,6 +44,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != check_res[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c
index 764f10d0adaca01e664bb45dd4da59a0c3f8a2af..cef88f6bf8246a98933ff84103c090664398cedd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c
@@ -42,6 +42,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c
index 35bab79ce826ac663eabb1a1036ed7afd6d33e8b..c29c3ff6cdc304e5447f0e12aac00cd0fcd7b61e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c
@@ -44,6 +44,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c
index ea35835465c8ed18be1a0c9c4f226f078a51acaa..2d5c10a878c7145972aeaa678e0e11c1cf1b79dd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c
@@ -27,6 +27,7 @@ main (void)
   foo (X, Y);
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (Y[i] != result[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c
index df6b999c5a4d88c8b106829f6f9df8edbe00f86f..4848215a7a8f5fea569c0bfaf5909ac68a81bbf2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c
@@ -32,6 +32,7 @@ main (void)
   foo (X, Y, Z);
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (Y[i] != resultY[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c
index 36861a059e03b1103adc2dca32409878ca95611e..2a94c73907e813019fcfbc912a1599f7423e2a47 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c
@@ -40,6 +40,7 @@ main (void)
   foo (X, Y);
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (Y[i].a != result[i].a)  
diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c
index bfbb48b21ee632243f2f5ba63d7eeec0f687daef..b0e9d6f90391cfc05911f7cc709df199d7fbbdf1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c
@@ -26,6 +26,7 @@ main (void)
   foo (X, &X[2]);
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N+2; i++)
     {
       if (X[i] != result[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c
index d775f320e0c1e2c6de2e77a1d8df621971fc3d2d..27d762490908829d54cdbb81247926c2f677fe36 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c
@@ -40,6 +40,7 @@ main (void)
   foo (X, Y);
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (Y[i].a != result[i].a)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c
index 0d6e64081a17fed8d9b9239f9ba02ffa1b7a758d..f3abc9407f52784e391c495152e617b1f0753e92 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c
@@ -38,6 +38,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (SIGNEDNESS_1 short) ((BASE + i * 5)
 				      * (BASE + OFFSET + i * 4)))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
index 4c95dd2017922904122aee2925491e9b9b48fe8e..dfbb2171c004565045d91605354b5d6e7219ab19 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
@@ -17,6 +17,7 @@ foo (int *__restrict a,
   for (i = 0; i < n; i++)
     a[i] = b[i] * 2333;
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (a[i] != b[i] * 2333)
       abort ();
@@ -32,6 +33,7 @@ bar (int *__restrict a,
   for (i = 0; i < n; i++)
     a[i] = b[i] * (short) 2333;
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (a[i] != b[i] * (short) 2333)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
index 4075f815cea0ffbad1e05e0ac8b9b232bf3efe61..c2ad58f69e7fe5b62a9fbc55dd5dab43ba785104 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
@@ -17,6 +17,7 @@ foo (unsigned int *__restrict a,
   for (i = 0; i < n; i++)
     a[i] = b[i] * 2333;
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (a[i] != b[i] * 2333)
       abort ();
@@ -32,6 +33,7 @@ bar (unsigned int *__restrict a,
   for (i = 0; i < n; i++)
     a[i] = (unsigned short) 2333 * b[i];
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (a[i] != b[i] * (unsigned short) 2333)
       abort ();
@@ -47,6 +49,7 @@ baz (unsigned int *__restrict a,
   for (i = 0; i < n; i++)
     a[i] = b[i] * 233333333;
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (a[i] != b[i] * 233333333)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
index c4ac88e186dbc1a8f36f4d7567a9983446557eea..bfdcbaa09fbd42a16197023b09087cee6642105a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
@@ -43,12 +43,14 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (out[i] != in[i] * COEF)
       abort ();
 
   bar ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (out[i] != in[i] * COEF)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
index ebbf4f5e841b75cb1f5171ddedec85cd327f385e..e46b0cc3135fd982b07e0824955654f0ebc59506 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
@@ -38,6 +38,7 @@ int main (void)
 
   foo (COEF2);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (out[i] != in[i] * COEF || out2[i] != in[i] + COEF2)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c
index 91a8a290263c9630610a48bce3829de753a4b320..6b094868064e9b86c40018363564f356220125a5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c
@@ -33,6 +33,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c
index 7e1f7457f1096d4661dcc724a59a0511555ec0e3..444d41169b5c198c6fa146c3bb71336b0f6b0432 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c
@@ -33,6 +33,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
index 2e28baae0b804cf76ad74926c35126df98857482..14411ef43eda2ff348de9c9c1540e1359f20f55b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
index d277f0b2b9492db77237a489cc8bea4749d8d719..f40def5dddf58f6a6661d9c286b774f954126840 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
index f50358802587d32c1d6e73c0f6e06bd8ff837fc2..63866390835c55e53b6f90f305a71bbdbff85afa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
@@ -34,6 +34,7 @@ int main (void)
 
   foo (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c
index 03d1379410eb927a3ef705afc6523230eb9fb58b..78ad74b5d499c23256e4ca38a82fefde8720e4e9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c
@@ -34,6 +34,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c
index 5f6c047849b8625f908bc7432b803dff5e671cd3..26d5310807781eb5a7935c51e813bc88892f747c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c
@@ -32,6 +32,7 @@ foo (short *src, int *dst)
 
   s = src;
   d = dst;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       b = *s++;
@@ -60,6 +61,7 @@ foo (short *src, int *dst)
 
   s = src;
   d = dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c
index 46512f2c69ba50521d6c7519a1c3d073e90b7436..7450d2aef75d755db558e471b807bfefb777f472 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c
@@ -23,6 +23,7 @@ foo (char *src, int *dst)
 
   s = src;
   d = dst;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c
index 212b5dbea18a91bd59d2caf9dc4f4cc3fe531762..ae086b88e7e83f2864d6e74fa94301f7f8ab62f6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c
@@ -23,6 +23,7 @@ foo (unsigned short *src, unsigned int *dst)
 
   s = src;
   d = dst;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c
index 844e5df3269d0a774d2ab8a88de11f17271d6f60..a8e536adee0f04611115e97725608d0e82e9893c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c
@@ -27,6 +27,7 @@ foo (unsigned char *src, unsigned int *dst1, unsigned int *dst2)
   s = src;
   d1 = dst1;
   d2 = dst2;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c b/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c
index c4d2de1a64e2ebc151c4ade2327c8fceb7ba04e4..414bd9d3e1279db574d860b7a721e4310d4972da 100644
--- a/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c
+++ b/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c
@@ -19,6 +19,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sb[i] != 5)
@@ -31,6 +32,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sa[i] != 105)




-- 

[-- Attachment #2: rb17497.patch --]
[-- Type: text/plain, Size: 369254 bytes --]

diff --git a/gcc/testsuite/g++.dg/vect/pr84556.cc b/gcc/testsuite/g++.dg/vect/pr84556.cc
index e0655536f7a0a1c32a918f4b112604a7e6b5e389..e2c97e917bed3e7c5e709f61384d75588f522308 100644
--- a/gcc/testsuite/g++.dg/vect/pr84556.cc
+++ b/gcc/testsuite/g++.dg/vect/pr84556.cc
@@ -15,6 +15,7 @@ main ()
   };
   x ();
   x ();
+#pragma GCC novector
   for (int i = 0; i < 8; ++i)
     if (y[i] != i + 3)
       __builtin_abort ();
diff --git a/gcc/testsuite/g++.dg/vect/simd-1.cc b/gcc/testsuite/g++.dg/vect/simd-1.cc
index 76ce45d939dca8ddbc4953885ac71cf9f6ad298b..991db1d5dfee2a8d89de4aeae659b797629406c1 100644
--- a/gcc/testsuite/g++.dg/vect/simd-1.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-1.cc
@@ -88,12 +88,14 @@ main ()
   s.foo (x, y);
   if (x != 1024 || s.s != 2051 || s.t != 2054)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1025; ++i)
     if (a[i] != 2 * i)
       abort ();
   s.bar (x, y);
   if (x != 2049 || s.s != 4101 || s.t != 4104)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1025; ++i)
     if (a[i] != 4 * i)
       abort ();
@@ -102,12 +104,14 @@ main ()
   s.baz (x, y);
   if (x != 1024 || s.s != 2051 || s.t != 2054)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1025; ++i)
     if (a[i] != 2 * i)
       abort ();
   s.qux (x, y);
   if (x != 2049 || s.s != 4101 || s.t != 4104)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1025; ++i)
     if (a[i] != 4 * i)
       abort ();
diff --git a/gcc/testsuite/g++.dg/vect/simd-2.cc b/gcc/testsuite/g++.dg/vect/simd-2.cc
index 6f5737b7e40b5c2889f26cb4e4c3445e1c3822dd..0ff57e3178d1d79393120529ceea282498015d09 100644
--- a/gcc/testsuite/g++.dg/vect/simd-2.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-2.cc
@@ -110,6 +110,7 @@ main ()
   foo (a, b);
   if (r.s != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += i;
@@ -121,6 +122,7 @@ main ()
   if (bar ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += 2 * i;
@@ -132,6 +134,7 @@ main ()
   if (r.s != 1024 * 1023 / 2)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += i;
@@ -143,6 +146,7 @@ main ()
   if (qux ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += 2 * i;
diff --git a/gcc/testsuite/g++.dg/vect/simd-3.cc b/gcc/testsuite/g++.dg/vect/simd-3.cc
index d9981719f58ced487c4ffbbecb7c8a5564165bc7..47148f050ed056a2b3340f1e60604606f6cc1311 100644
--- a/gcc/testsuite/g++.dg/vect/simd-3.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-3.cc
@@ -75,6 +75,7 @@ main ()
   foo (a, b, r);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -86,6 +87,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
@@ -99,6 +101,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -110,6 +113,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
diff --git a/gcc/testsuite/g++.dg/vect/simd-4.cc b/gcc/testsuite/g++.dg/vect/simd-4.cc
index 8f3198943a7427ae3d4800bfbc5575c5849627ff..15b1bc1c99d5d42ecca330e063fed19a50fb3276 100644
--- a/gcc/testsuite/g++.dg/vect/simd-4.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-4.cc
@@ -77,6 +77,7 @@ main ()
   foo (a, b, r);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -88,6 +89,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
@@ -101,6 +103,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -112,6 +115,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
diff --git a/gcc/testsuite/g++.dg/vect/simd-5.cc b/gcc/testsuite/g++.dg/vect/simd-5.cc
index dd817b8888b1b17d822f576d6d6b123f338e984f..31c2ce8e7129983e02237cdd32e41ef0a8f25f90 100644
--- a/gcc/testsuite/g++.dg/vect/simd-5.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-5.cc
@@ -110,6 +110,7 @@ main ()
   foo (a, b, r);
   if (r.s != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += i;
@@ -121,6 +122,7 @@ main ()
   if (bar ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += 2 * i;
@@ -132,6 +134,7 @@ main ()
   if (r.s != 1024 * 1023 / 2)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += i;
@@ -143,6 +146,7 @@ main ()
   if (qux ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s.s += 2 * i;
diff --git a/gcc/testsuite/g++.dg/vect/simd-6.cc b/gcc/testsuite/g++.dg/vect/simd-6.cc
index 883b769a9b854bd8c1915648d15ea8996d461f05..7de41a90cae3d80c0ccafad8a9b041bee89764d3 100644
--- a/gcc/testsuite/g++.dg/vect/simd-6.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-6.cc
@@ -118,6 +118,7 @@ main ()
   foo (a, b);
   if (r.s != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -129,6 +130,7 @@ main ()
   if (bar<int> ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -140,6 +142,7 @@ main ()
   if (r.s != 1024 * 1023 / 2)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -151,6 +154,7 @@ main ()
   if (qux ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
diff --git a/gcc/testsuite/g++.dg/vect/simd-7.cc b/gcc/testsuite/g++.dg/vect/simd-7.cc
index 1467849e0c6baa791016b039ca21cfa2cc63ce7f..b543efb191cfbf9c561b243996cdd3a4b66b7533 100644
--- a/gcc/testsuite/g++.dg/vect/simd-7.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-7.cc
@@ -79,6 +79,7 @@ main ()
   foo<int *, int &> (a, b, r);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -90,6 +91,7 @@ main ()
   if (bar<int> () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -103,6 +105,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -114,6 +117,7 @@ main ()
   if (qux<int &> () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
diff --git a/gcc/testsuite/g++.dg/vect/simd-8.cc b/gcc/testsuite/g++.dg/vect/simd-8.cc
index 8e297e246bd41a2f63469260f4fdcfcb5a68a62e..4d76a97a97233cecd4d35797a4cc52f70a4c5e3b 100644
--- a/gcc/testsuite/g++.dg/vect/simd-8.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-8.cc
@@ -77,6 +77,7 @@ main ()
   foo (a, b, r);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -88,6 +89,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -101,6 +103,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -112,6 +115,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
diff --git a/gcc/testsuite/g++.dg/vect/simd-9.cc b/gcc/testsuite/g++.dg/vect/simd-9.cc
index 4c5b0508fbd79f0e6aa311072062725536d8e2a3..5d1a174e0fc5425f33769fd017b4fd6a51a2fb14 100644
--- a/gcc/testsuite/g++.dg/vect/simd-9.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-9.cc
@@ -110,6 +110,7 @@ main ()
   foo (a, b, r);
   if (r.s != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -121,6 +122,7 @@ main ()
   if (bar ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -132,6 +134,7 @@ main ()
   if (r.s != 1024 * 1023 / 2)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
@@ -143,6 +146,7 @@ main ()
   if (qux ().s != 1024 * 1023)
     abort ();
   s.s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i].s != s.s)
diff --git a/gcc/testsuite/g++.dg/vect/simd-clone-6.cc b/gcc/testsuite/g++.dg/vect/simd-clone-6.cc
index fb00e8816a5fc157b780edd1d7064804a67d6373..2d9bb62555ff6c9473db2d1b754aed0123f2cb62 100644
--- a/gcc/testsuite/g++.dg/vect/simd-clone-6.cc
+++ b/gcc/testsuite/g++.dg/vect/simd-clone-6.cc
@@ -30,6 +30,7 @@ do_main ()
   #pragma omp simd
   for (i = 0; i < N; i++)
     e[i] = foo (c[i], d[i], f[i]);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (e[i] != 6 * i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/O3-pr70130.c b/gcc/testsuite/gcc.dg/vect/O3-pr70130.c
index f8b84405140e87a2244ae9f5db6136af2fe9cf57..17ce6c392546f7e46a6db9f30f76dcaedb96d08c 100644
--- a/gcc/testsuite/gcc.dg/vect/O3-pr70130.c
+++ b/gcc/testsuite/gcc.dg/vect/O3-pr70130.c
@@ -90,6 +90,7 @@ main (void)
   for (i = 0; i < 8; i++)
     Loop_err (images + i, s, -1);
 
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (__builtin_memcmp (&expected, images + i, sizeof (expected)))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/Os-vect-95.c b/gcc/testsuite/gcc.dg/vect/Os-vect-95.c
index 97e516ed68e6166eb5f0631004d89f8eedde1cc4..8039be89febdb150226b513ffe267f6065613ccb 100644
--- a/gcc/testsuite/gcc.dg/vect/Os-vect-95.c
+++ b/gcc/testsuite/gcc.dg/vect/Os-vect-95.c
@@ -10,6 +10,7 @@ void bar (float *pd, float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-1.c
index 793c41f6b724d2b6f5ecca6511ea8504e1731a8c..3dc5e746cd0d5c99dcb0c88a05b94c73b44b0e65 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-1.c
@@ -29,6 +29,7 @@ main1 (int dummy)
     }
 
   /* check results: */ 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
index 82fae06e3244a9bbb4a471faecdc5f1174970229..76430e0915e2d6ad342dae602fd22337f4559b63 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
@@ -37,6 +37,7 @@ main1 (int dummy)
 
   a = 0;
   /* check results: */ 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8] + a
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-2.c
index fcf1cd327e0b20582e3512faacfebfe6b7db7278..cb1b38dda14785c6755d311683fbe9703355b39a 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-2.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-2.c
@@ -28,6 +28,7 @@ main1 (int dummy)
     }
 
   /* check results:  */ 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-24.c b/gcc/testsuite/gcc.dg/vect/bb-slp-24.c
index ca049c81ba05482813dbab50ab3f4c6df94570e4..6de8dd8affce8e6f6ad40a36d6a163fc25b3fcf9 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-24.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-24.c
@@ -44,6 +44,7 @@ int main (void)
 
   foo (dst, src, N, 8);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (dst[i] != A * i)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-25.c b/gcc/testsuite/gcc.dg/vect/bb-slp-25.c
index 7a9cf955e3e540e08b42cd80872bb99b53cabcb2..d44d585ff25aed7394945cff64f20923b5600061 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-25.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-25.c
@@ -45,6 +45,7 @@ int main (void)
 
   foo (dst, src, N, 8);
 
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (dst[i] != A * i + i + 8)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-26.c b/gcc/testsuite/gcc.dg/vect/bb-slp-26.c
index df529673f6c817620a8423ab14724fe4e72bca49..fde062e86c7a01ca29d6e7eb8367414bd734500b 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-26.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-26.c
@@ -45,6 +45,7 @@ int main (void)
 
   foo (dst, src, N, 8);
 
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (dst[i] != A * src[i] + src[i+8])
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-27.c b/gcc/testsuite/gcc.dg/vect/bb-slp-27.c
index bc27f2fca04de8f837ce51090657c8f2cc250c24..3647dd97c69df8a36fc66ca8e9988e215dad71eb 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-27.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-27.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo (A);
 
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     {
       if (dst[i] != A * i)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-28.c b/gcc/testsuite/gcc.dg/vect/bb-slp-28.c
index 8749a1f22a6cc1e62a15bd988c50f6f63f26a0a2..c92b687aa44705118f21421a817ac3067e2023c6 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-28.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-28.c
@@ -56,6 +56,7 @@ int main (void)
 
   foo (A);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (dst[i] != A * i
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-29.c b/gcc/testsuite/gcc.dg/vect/bb-slp-29.c
index b531350ff3073b7f54b9c03609d6c8279e0374db..9272f02b2aa14f52b04e3d6bb08f15be17ce6a2f 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-29.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-29.c
@@ -45,6 +45,7 @@ int main (void)
 
   foo (dst, src, N, 8);
 
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (dst[i] != A * src[i] + B * src[i+1])
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-42.c b/gcc/testsuite/gcc.dg/vect/bb-slp-42.c
index 1dfa301184aad4c8edf00af80fb861562c941049..69fd0968491544f98d1406ff8a166b723714dd23 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-42.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-42.c
@@ -36,6 +36,7 @@ main ()
   foo (a, b);
 
   for (int i = 0; i < 4; ++i)
+#pragma GCC novector
     for (int j = 0; j < ARR_SIZE; ++j)
       if (a[i][j] != (i + 1) * ARR_SIZE - j + 20 * i)
 	__builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
index ccb4ef659e47e3524d0dd602fa9d1291847dee3c..c8024429e9c44d924f5bb2af2fcc6b5eaa1b7db7 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
@@ -35,6 +35,7 @@ int main ()
 
   foo (a, 4);
 
+#pragma GCC novector
   for (i = 1; i < N; i++)
     if (a[i] != i%4 + 1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
index 5a9fe423691e549ea877c42e46e9ba70d6ab5b00..b556a1d627865f5425e644df11f98661e6a85c29 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
@@ -45,6 +45,7 @@ DEF_LOOP (unsigned)
 	asm volatile ("" ::: "memory");			\
       }							\
     f_##SIGNEDNESS (a, b, c);				\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       if (a[i] != (BASE_B + BASE_C + i * 29) >> 1)	\
 	__builtin_abort ();				\
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
index 15a94e680be4568232e31956732d7416549a18ff..d1aa161c3adcfad1d916de486a04c075f0aaf958 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
@@ -44,6 +44,7 @@ DEF_LOOP (unsigned)
 	asm volatile ("" ::: "memory");			\
       }							\
     f_##SIGNEDNESS (a, b, C);				\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       if (a[i] != (BASE_B + C + i * 15) >> 1)		\
 	__builtin_abort ();				\
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
index 47b1a43665130e11f902f5aea11b01faf307101b..a3ff0f5b3da2f25ce62a5e9fabe5b38e9b952fa9 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
@@ -37,6 +37,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c
index c50560b53696c340b0c071296f002f65bcb91631..05fde3a7feba81caf54acff82870079b87b7cf53 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c
@@ -39,6 +39,7 @@ int main ()
 
   foo (a, b, 8);
 
+#pragma GCC novector
   for (i = 1; i < N; i++)
     if (a[i] != i%8 + 1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c
index fc76700ced3d4f439b0f12eaf9dbc2b1fec72c20..c186c7b66c65e5f62edee25a924fdcfb25b252ab 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c
@@ -16,6 +16,7 @@ int
 main (void)
 {
   f (a);
+#pragma GCC novector
   for (int i = 0; i < 4; ++i)
     {
       if (a[i] != (i + 1) * (i + 1))
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c
index ac89883de22c9f647041fb373618dae5b7c036f3..dda74ebe03c35811ee991a181379e688430d8412 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c
@@ -16,6 +16,8 @@ int main()
 	for (int e = 0; e <= 4; e++)
 	  a[e + 1] |= 3;
     }
+
+#pragma GCC novector
   for (int d = 0; d < 6; d++)
     if (a[d] != res[d])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
index ee12136491071c6bfd7678c164df7a1c0a71818f..77d3ae7d424e208409c5baf18c6f39f294f7e351 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
@@ -51,6 +51,7 @@ int main()
   rephase ();
   for (i = 0; i < 32; ++i)
     for (j = 0; j < 3; ++j)
+#pragma GCC novector
       for (k = 0; k < 3; ++k)
 	if (lattice->link[i].e[j][k].real != i
 	    || lattice->link[i].e[j][k].imag != i)
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
index 40a02ed1309e2b6b4dc44cf56018a4bb71cc519f..bea3b92ba775a4e8b547d4edccf3ae4a4aa50b40 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
@@ -31,9 +31,11 @@ main (int argc, char **argv)
   __asm__ volatile ("" : : : "memory");
   test (a, b);
   __asm__ volatile ("" : : : "memory");
+#pragma GCC novector
   for (int i = 0; i < 4; i++)
     if (a[i] != i+4)
       abort ();
+#pragma GCC novector
   for (int i = 4; i < 8; i++)
     if (a[i] != 0)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
index cc50a5bde01315be13058ac3409db07f4ce6de5f..085cb986b99c00cb1449db61bb68ccec4e7aa0ba 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
@@ -32,6 +32,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
@@ -45,6 +46,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
@@ -58,6 +60,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
@@ -71,6 +74,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.e.k[i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
index b82b8916be125b194a02aa74cef74f821796de7f..f07893458b658fc728703ffc8897a7f7aeafdbb3 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
@@ -23,6 +23,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
index 51847599fa62a88ecc090673ab670f7c0a8ac711..cfe7b8536892caa5455e9440505187f21fa09e63 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1.a.n[1][2][i] != 5)
@@ -42,6 +43,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i < N-1; i++)
     {
       if (tmp1.a.n[1][2][i] != 6)
@@ -55,6 +57,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.e.n[1][2][i] != 7)
@@ -68,6 +71,7 @@ int main1 ()
     }
  
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i <N-3; i++)
     {
       if (tmp1.e.n[1][2][i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
index c00a5bec6d5f9c325beb7e79a4520b76843f0a43..9e57cae9751d7231a2156acbb4c63c49dc0e8b95 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
@@ -48,6 +48,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
@@ -73,6 +74,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  (in[i*4] + 2) * 3
@@ -92,6 +94,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out2[i*2] !=  (float) (in[i*2] * 2 + 11)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c
index e27152eb91ef2feb6e547e5a00b0fc8fe40e2cee..4afbeea9927676b7dbdf78480671056e8777b183 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c
@@ -23,6 +23,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/4; i++)
     {
       if (tmp.b[2*i] != 5
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
index c092848dfc9d093e6fc78ce171bb4c1f59a0cf85..9cfae91534f38248a06fb60ebbe05c84a4baccd2 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
@@ -58,6 +58,7 @@ main (void)
   foo ();
 
   /* Check resiults. */ 
+#pragma GCC novector
   for (i = 0; i < 16; i++)
     {
       if (cf[i].f1 != res1[i])
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
index c57f065cccdd6cba4f96efe777318310415863c9..454a714a309163a39128bf20ef7e8426bd26da15 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
@@ -30,6 +30,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c
index 9bb81e371725ea0714f91eee1f5683c7c014e64c..f69e5c2ee5383abb0a242938426ef09621e54043 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c
@@ -31,6 +31,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c
index d062d659ffb0138859333f3d7e375bd83fc1c99a..cab6842f72d150b83d525abf7a23971817b9082e 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c
@@ -30,6 +30,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c
index dc170a0530c564c884bc739e6d82874ccddad12c..05c28fe75e6dc67acba59e73d2b8d3363cd47c9b 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c
@@ -22,6 +22,7 @@ __attribute__((noipa)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c
index ce27e4f082151b630376bd9cfbbabb78e80e4387..648e19f1071f844cc9f968414952897c12897688 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1.a.n[1][2][i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c
index dae5a78808f1d6a5754adb8e7ff4b22608ea33b4..badf5dff70225104207b65a6fe4a2a79223ff1ff 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i < N-1; i++)
     {
       if (tmp1.a.n[1][2][i] != 6)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c
index 8221f9e49f8875f453dbc12ca0da4a226e7cf62d..d71a202d8d2b6edaee8b71a485fa68ff56e983ba 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.e.n[1][2][i] != 7)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c
index 2fc751ce70da35b055c64d9e8bec222a4b4feb8b..f18da3fc1f0c0df27c5bd9dd7995deae19352620 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c
@@ -26,6 +26,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != pib[i - OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
index 5da4343198c10e3d35c9f446bc96f1b97d123f84..cbbfbb24658f8a11d4695fe5e16de4e4cfbdbc7e 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
@@ -28,6 +28,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (pib[i - OFF] != ic[i])
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c
index 1fc14666f286c1f1170d66120d734647db7686cf..2a672122bcc549029c95563745b56d74f41d9a82 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c
@@ -26,6 +26,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != ic[i - OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c
index 1a1a86878405bd3bf240e1417ad68970a585c562..9c659f83928046df2b40c2dcc20cdc12fad6c4fe 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c
@@ -59,6 +59,7 @@ int main (void)
   foo ();
   fir ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (out[i] != fir_out[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
index cc50a5bde01315be13058ac3409db07f4ce6de5f..085cb986b99c00cb1449db61bb68ccec4e7aa0ba 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
@@ -32,6 +32,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
@@ -45,6 +46,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
@@ -58,6 +60,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
@@ -71,6 +74,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.e.k[i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
index 5e4affae7db61a0a07568603f1c80aefaf819adb..2f48955caa19f61c12e4c178f60f564c2e277bee 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
@@ -23,6 +23,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
index 51847599fa62a88ecc090673ab670f7c0a8ac711..cfe7b8536892caa5455e9440505187f21fa09e63 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1.a.n[1][2][i] != 5)
@@ -42,6 +43,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i < N-1; i++)
     {
       if (tmp1.a.n[1][2][i] != 6)
@@ -55,6 +57,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.e.n[1][2][i] != 7)
@@ -68,6 +71,7 @@ int main1 ()
     }
  
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i <N-3; i++)
     {
       if (tmp1.e.n[1][2][i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c
index cfea8723ba2731c334c1fffd749dc157d8f68e36..d9f19d90431ab1e458de738411d7d903445cd04d 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c
@@ -32,6 +32,7 @@ main1 ()
       d[i] = i * i;
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + i + i - a[i]) >= 0.0001f)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c
index 6d67d12f9961f5cbc53d6f7df5240ac2178a08ac..76bb044914f462cf6d76b559b751f1338a3fc0f8 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c
@@ -44,12 +44,14 @@ main1 ()
       b[i] = ((i & 1) ? -4 * i : 4 * i) + 0.25;
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1 + i)
       abort ();
     else
       a[i] = 131.25;
   f2 ();
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
index 495c0319c9dabd65436b5f6180114dfa8967f071..ad22f6e82b3c3312c9f10522377c4749e87ce3aa 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
@@ -65,24 +65,28 @@ main1 ()
       d[i] = i * i;
     }
   f1 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + (i & 3) + i - a[i]) >= 0.0001f)
       abort ();
     else
       a[i] = 131.25;
   f2 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + (i & 1) + i - a[i]) >= 0.0001f)
       abort ();
     else
       a[i] = 131.25;
   f3 ();
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + i - a[i]) >= 0.0001f)
       abort ();
     else
       a[i] = 131.25;
   f4 (10);
+#pragma GCC novector
   for (i = 0; i < 60; i++)
     if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + (i % 3) + i - a[i]) >= 0.0001f)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c
index 274ff0f9942c5aff6c6aaca5243ef21bd8708856..d51e17ff656b7cc7ef3d87d207f78aae8eec9373 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c
@@ -82,36 +82,42 @@ main1 ()
       b[i] = ((i & 1) ? -4 * i : 4 * i) + 0.25;
     }
   f1 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1 + (i & 3))
       abort ();
     else
       a[i] = 131.25;
   f2 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1 + (i & 1))
       abort ();
     else
       a[i] = 131.25;
   f3 ();
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1)
       abort ();
     else
       a[i] = 131.25;
   f4 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i))
       abort ();
     else
       a[i] = 131.25;
   f5 (16);
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i))
       abort ();
     else
       a[i] = 131.25;
   f6 ();
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != ((i & 1) ? -4 * i : 4 * i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c b/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
index 893e521ed8b83768699bc9b70f7d33b91dd89c9b..07992cf72dcfa4da5211a7a160fb146cf0b7ba5c 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
@@ -47,6 +47,7 @@ main (void)
   foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
   {
     if (c[i] != res[i])
diff --git a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
index 71f2db3e0f281c4cdb1bf89315cc959382459e83..fc710637ac8142778b18810cefadf00dda3f39a6 100644
--- a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
+++ b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
@@ -56,6 +56,7 @@ main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i].a != result[2*i] || out[i].b != result[2*i+1])
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c
index 82b37d4ca71344d15e00e0453dae6470c8d5ba9b..aeaf8146b1a817379a09dc3bf09f542524522f99 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c
@@ -32,6 +32,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c
index cafb96f52849c2a9b51591753898207beac9bdd0..635df4573c7cc0d4005421ce12d87b0c6511a228 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c
@@ -31,6 +31,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<200*N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c
index b376fb1d13ec533eebdbcb8092f03b4790de379a..494ff0b6f8f14f3d3b6aba1ada60d6442ce10811 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c
@@ -31,6 +31,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c
index 64c8dfc470ab02f3ea323f13b6477d6370210937..ba766a3f157db3f1a3d174ca6062fe7ddc60812c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c
@@ -38,6 +38,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c
index 277b73b4934a7bd689f8b2856b7813567dd762bc..d2eee349a42cd1061917c828895e45af5f730eb1 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c
@@ -38,6 +38,7 @@ int main (void)
   foo (N-1);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
index 325e201e2dac7aff88f4cb7aff53a7ee25b18631..cf7d605f23ba94b7a0a71526db02b59b517cbacc 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
@@ -42,6 +42,7 @@ int main (void)
   foo (N-1);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
index d9cf28b22d9712f4e7f16ed18b89b0875d94daee..cfb837dced894ad8a885dcb392f489be381a3065 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
@@ -41,6 +41,7 @@ int main (void)
   foo (N-1);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c
index f5aeac981870d0e58679d5574dd12e2c4b40d23a..d650a9d1cdc7af778a2dac8e3e251527b825487d 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c
@@ -34,6 +34,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
index b5f8c3c88e4d0a6562ba867ae83c1ab120077111..e9ec4ca0da316be7d4d02138b0313a9ab087a601 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
@@ -33,6 +33,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c
index 9d642415a133eefb420ced6642ac6d32a0e7e33f..13aac4a939846f05826f2b8628258c0fbd2e413a 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c
@@ -32,6 +32,7 @@ int main (void)
   foo (3);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
index f00132ede83134eb10639b77f5617487356e2ef1..c7c2fa8a5041fbc67747b4d4b98571f71f9599b6 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
@@ -41,6 +41,7 @@ int main (void)
   res = foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum += i;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
index 2dfdd59e14ef9685d22b2b8c34d55052ee747e7e..ba904a6c03e5a94f4a2b225f180bfe6a384f21d1 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
@@ -47,6 +47,7 @@ int main (void)
   res = foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum += (b[i] - c[i]);
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c
index 49dd5207d862fd1de81b59013a07ea74ee9b5beb..464fcb1fc310a7366ef6a55c5ed491a7410720f8 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c
@@ -35,6 +35,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
index 934eadd3e326b950bf33eac03136868089fa1371..5cd4049d08c84ab9f3503a3f1577d170df8ce6c3 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
@@ -36,6 +36,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c
index 42e218924fdbeb7c21830537a55364ad5ca822ac..a9ef1c04c70510797006d8782dcc6abf2908e4f4 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c
@@ -38,6 +38,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
index 75b7f066abae1496caa98080cdf4355ca1383091..72e53c2bfb0338a48def620159e384d423399d0b 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
@@ -41,6 +41,7 @@ int main (void)
   res = foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum += i;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c
index ec04bc28f6279c0cd6a6c174698aedc4312c7ab5..b41b2c322b91ab0a9a06ab93acd335b53f654a6d 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c
@@ -38,6 +38,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c
index ee39891efea2231362dc776efc4193898f06a02c..91e57a5963ac81964fb0c98a28f7586bf98df059 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c
@@ -35,6 +35,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c
index f8ef02d25190a29315e6909b9d89642f699b6c6a..a6c29956f3b84ee0def117bdc886219bf07ec2d0 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c
@@ -39,6 +39,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c
index 2ef43cd146bdbbe6e7a8b8f0a66a11a1b8b7ec08..f01fcfb5c34906dbb96d050068b528192aa0f79a 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c
@@ -37,6 +37,7 @@ int main (void)
   foo ();
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
index 7ac4e1ed949caecd6d2aaa7bf6d33d459ff74f8c..cf529efa31d6a10d3aaad69570f3f3ae102d327c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
@@ -39,6 +39,7 @@ int main (void)
     a[i] = foo (b,i);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = b[i];
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c
index ad2f472f1ed100912386d51ef999353baf50dd93..9c1e251f6a79fd34a820d64393696722c508e671 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c
@@ -38,6 +38,7 @@ int main (void)
     a[i] = foo (b,i);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = b[i];
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
index f56bd2e50af42f20f57791b2e3f0227dac13ee82..543ee98b5a44c91c2c249df0ece304dd3282cc1a 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
@@ -63,6 +63,7 @@ int main (void)
   res = foo (2);
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       if (a[i] != bar ())
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c
index 7c9113bc0f0425139c6723105c78cc8306d82f8c..0ed589b47e6bc722386a9db83e6397377f0e2069 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c
@@ -34,6 +34,7 @@ int main (void)
   foo (a);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c
index cea495c44808373543242d8998cdbfb9691499ca..62fa559e6ce064065b3191f673962a63e874055f 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c
@@ -34,6 +34,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
index 9e1f7890dd1ebc14b4a9a88488625347dcabd38a..96ffb4ce7b4a8a06cb6966acc15924512ad00f31 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
@@ -38,6 +38,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
index ee65ceb6f92a185ca476afcc0b82295ab0034ba5..d76752c0dba3bbedb2913f87ed4b95f7d48ed2cf 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
@@ -37,6 +37,7 @@ int main (void)
   foo (N);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
index fe9e7e7ab4038acfe02d3e6ea9c4fc37ba207043..00d0eca56eeca6aee6f11567629dc955c0924c74 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
@@ -24,6 +24,7 @@ main1 ()
    }
 
   /* check results:  */
+#pragma GCC novector
    for (j = 0; j < N; j++)
    {
     for (i = 0; i < N; i++)
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
index dc5f16fcfd2c269a719f7dcc5d2d0d4f9dbbf556..48b6a9b0681cf1fe410755c3e639b825b27895b0 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
@@ -24,6 +24,7 @@ main1 ()
    }
 
   /* check results:  */
+#pragma GCC novector
  for (i = 0; i < N; i++)
    {
     for (j = 0; j < N; j++) 
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c
index 131d2d9e03f44ed680cb49c71673908511c9236f..57ebd5c92a4297940bbdfc051c8a08d99a3b184e 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c
@@ -22,6 +22,7 @@ int main1 ()
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (arr1[i] != 2+2*i)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
index d2ae7976781e20c6e4257e0ad4141ceb21ed711b..a1311504d2f8e67c275e8738b3c201187cd02bc0 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
@@ -39,6 +39,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
@@ -52,6 +53,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
@@ -65,6 +67,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
@@ -78,6 +81,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.e.k[i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c
index 1edad1ca30eeca0a224a61b5035546615a360fef..604d4b1bc6772f7bf9466b204ebf43e639642a02 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c
index 7663ca7281aacc0ba3e685887e3c20be97322148..3eada6057dd91995709f313d706b6d94b8fb99eb 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c
@@ -32,6 +32,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != s.cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
index 243e01e6dadf48d976fdd72bedd9547746cf73b5..19fbe331b57fde1412bfdaf7024e8c108f913da5 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
@@ -54,6 +54,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j] != ib[i])
@@ -64,6 +65,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[i][1][1][j] != ib[i])
@@ -74,6 +76,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (id[i][1][j+1] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c
index 581554064b572b7eb26d5f9852d4d13622317c7e..d51ef31aeac0d910a69d0959cc0da46d92bd7af9 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c
@@ -44,6 +44,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < M; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j] != ib[2][i][j])
@@ -64,6 +65,7 @@ int main1 ()
   /* check results: */
   for (i = 0; i < M; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[j] != ib[2][i][j])
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
index e339590bacb494569558bfe9536c43f0d6339b8e..23cd3d5c11157f6735ed219c16075007f26034e5 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
@@ -29,6 +29,7 @@ int main1 ()
     {
       for (j = 0; j < N; j++)
         {
+#pragma GCC novector
            if (ia[2][6][j] != 5)
                 abort();
         }
@@ -45,6 +46,7 @@ int main1 ()
     {
       for (j = 2; j < N+2; j++)
         {
+#pragma GCC novector
            if (ia[3][6][j] != 5)
                 abort();
         }
@@ -62,6 +64,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < 16; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[2][1][6][j+1] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
index c403a8302d842a8eda96d2ee0fb25a94e8323254..36b79c2907cc1b41664cdca5074d458e36bdee98 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
@@ -35,6 +35,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1.a.n[1][2][i] != 5)
@@ -48,6 +49,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i < N-1; i++)
     {
       if (tmp1.a.n[1][2][i] != 6)
@@ -61,6 +63,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.e.n[1][2][i] != 7)
@@ -74,6 +77,7 @@ int main1 ()
     }
  
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i <N-3; i++)
     {
       if (tmp1.e.n[1][2][i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
index 34317ccb624a4ca75c612c70a5b5105bb85e272b..a0e53d5fef91868dfdbd542dd0a98dff92bd265b 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
@@ -52,6 +52,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1[2].a.n[1][2][i] != 5)
@@ -65,6 +66,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = NINTS; i < N - 1; i++)
     {
       if (tmp1[2].a.n[1][2][i] != 6)
@@ -81,6 +83,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       for (j = 0; j < N; j++)
@@ -100,6 +103,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N - NINTS; i++)
     {
       for (j = 0; j < N - NINTS; j++)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
index 2199d11e2faee58663484a4d4e6ed06be508188b..f79b74d15700ccd86fc268e039efc8d7b8d245c2 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
@@ -31,7 +31,9 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < M; j++) {
       if (a[j][i] != 4)
         abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c
index d0e4ec2373b66b76235f53522c50ac1067ece4d2..8358b6e54328336f1bd0f6c618c58e96b19401d5 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c
@@ -21,6 +21,7 @@ main1 (void)
     a[i] = (b[i] > 0 ? b[i] : 0);
   }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
   {
     if (a[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c
index d718b5923b11aaee4d259c62cab1a82c714cc934..ae5d23fab86a4dd363e3df7310571ac93fc93f81 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c
@@ -20,6 +20,7 @@ main1 (void)
     a[i] = (b[i] > 0 ? b[i] : 0);
   }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
   {
     if (a[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c
index 7316985829f589dbbbe782b037096b2c5bd2be3c..4aaff3430a4cb110d586da83e2db410ae88bc977 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] >= MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c
index e87bcb8b43d3b82d30f8d3c2340b4968c8dd8da4..c644523a0047a6dfaa0ec8f3d74db79f71b82ec7 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c
@@ -21,6 +21,7 @@ int main ()
     A[i] = ( A[i] > MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c
index 9bd583fde6e71096b9cfd07d2668a9f32b50bf17..5902f61f954c5f65929616b0f924b8941cac847c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] <= MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c
index 9bd583fde6e71096b9cfd07d2668a9f32b50bf17..5902f61f954c5f65929616b0f924b8941cac847c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] <= MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c
index dcb09b7e7c7a3c983763fb3e57ea036e26d2d1ba..7f436a69e99bff6cebbc19a35c2dbbe5dce94c5a 100644
--- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c
+++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] < MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c b/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c
index ebde13167c863d91376d7c17d65191c047a7c9e7..d31157713bf3d0f0fadf305053dfae0612712b8d 100644
--- a/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c
+++ b/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c
@@ -21,6 +21,7 @@ int main ()
   check_vect ();
   main1 (32);
 
+#pragma GCC novector
   for (si = 0; si < 32; ++si)
     if (stack_vars_sorted[si] != si)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c b/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c
index e965910d66d06434a367f08553fde8a733a53e41..8491d5f0070233af5c0baf64f9123d270fe1d51c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c
+++ b/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c
@@ -22,6 +22,7 @@ main1 (unsigned short *in)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -48,6 +49,7 @@ main2 (unsigned short * __restrict__ in, unsigned short * __restrict__ out)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] !=  in[i*4]
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c b/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c
index a92ec9c1656275e1b0e31cfe1dcde3be78dfac7e..45cca1d1991c126fdef29bb129c443aae249a295 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c
@@ -41,6 +41,7 @@ int main(void)
   with_restrict(a + 1);
   without_restrict(b + 1);
 
+#pragma GCC novector
   for (i = 0; i < 1002; ++i) {
     if (a[i] != b[i])
       abort();
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
index ce934279ddfe073a96ef8cd7e0d383ca979bda7a..73b92177dabf5193d9d158a92e0383d389b67c82 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
@@ -30,6 +30,7 @@ int main1 (int x, int y) {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (p->a[i] != a[i] || p->b[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
index d9e0529e73f0a566220020ad671f432f3e72299f..9a3fdab128a3bf2609018f92a38a7a6de8b7270b 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
@@ -35,6 +35,7 @@ int main1 (int x, int y) {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (p->a[i] != 1) 
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c
index 581438823fd2d1fa83ae4cb770995ff30c18abf8..439347c3bb10711911485a9c1f3bc6abf1c7798c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c
@@ -34,6 +34,7 @@ int main1 (int x, int y) {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (p->a[i] != 1)
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c
index 6f4c84b4cd2b928c5df21a44e910620c1937e863..f59eb69d99fbe2794f3f6c6822cc87b209e8295f 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c
@@ -24,6 +24,7 @@ int main1 (char *y)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != cb[i])
@@ -38,6 +39,7 @@ int main1 (char *y)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != s.q[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c
index 18d4d1bbe6d0fdd357a95ab997437ab6b9a46ded..6b4542f5948bc32ca736ad92328a0fd37e44334c 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c
@@ -11,6 +11,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
@@ -66,6 +67,7 @@ main2 (float *pa, float *pb, float *pc)
     }   
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (b[i] * c[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c
index cad507a708f3079f36e2c85c594513514a1e172b..5db05288c81bf5c4c158efbc50f6d4862bf3f335 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c
@@ -11,6 +11,7 @@ void bar (const float *pa, const float *pb, const float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c
index a364c7b0d6f1f19292b937eedf0854163c1f549a..a33375f94fec55183493f96c84099224b7f4af6f 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c
@@ -11,6 +11,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c
index 69e921b95031b9275e6f4edeb120f247e93646a3..5ebb8fea0b7cb101f73fa2b079f4a37092eb6f2d 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c
@@ -11,6 +11,7 @@ void bar (const float *pa, const float *pb, const float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c
index b1c1d86587e5bd9b1dcd364ad495ee7a52ccfb2b..b6d251ec48950dacdecc4d141ebceb4cedaa0755 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c
@@ -11,6 +11,7 @@ void bar (const float *pa, const float *pb, const float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c
index 83dc628f0b0803eab9489101c6f3c26f87cf429c..6291dd9d53c33160a0aacf05aeb6febb79fdadf0 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c
@@ -12,6 +12,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c
index 9524454d367db2a45ab744d55a9d32a32e773140..d0334e3ba90f511fd6c0bc5faa72d78c07510cd9 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c
index 6e9ddcfa5ce61f7a53829e81cab277165ecd1d91..37e474f8a06f1f7df7e9a83290e865d1baa12fce 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c
@@ -23,6 +23,7 @@ main1 (float *pa, float *pb, float *pc)
       pa[i] = q[i] * pc[i];
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != q[i] * pc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c
index da3506a4cecdce11bf929a98c533026d31fc5f96..e808c87158076d3430eac124df9fdd55192821a8 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c
@@ -21,6 +21,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N - 1; i++)
     {
       if (ia[i] != 0)
@@ -34,6 +35,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N - 1; i++)
     {
       if (ib[i] != res[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
index 89958378fca009fba6b59509c2ea7f96fa53805b..25a3409ae5e2ebdb6f7ebabc7974cd49ac7b7d47 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
@@ -21,6 +21,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != 0)
@@ -34,6 +35,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ib[i] != res[i])
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
index e5914d970e3596a082e015725ba99369670db4e7..d1d70dda2eb9b3d7b462ebe0c30536a1f2744af4 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
@@ -130,6 +130,7 @@ main1 (void)
 	case 7: f8 (); break;
 	}
 
+#pragma GCC novector
       for (i = 0; i <= N; i++)
 	{
 	  int ea = i + 3;
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
index 8cc69ab22c5ab7cc193eeba1aa50365db640b254..407b683961ff0f5caaa1f168913fb7011b7fd2a3 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
@@ -37,6 +37,7 @@ int main ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N-20; i++)
     {
       if (A[i] != D[i+20])
@@ -50,6 +51,7 @@ int main ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < 16; i++)
     {
       if (B[i] != C[i] + 5)
@@ -63,6 +65,7 @@ int main ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < 4; i++)
     {
       if (C[i] != E[i])
diff --git a/gcc/testsuite/gcc.dg/vect/pr101445.c b/gcc/testsuite/gcc.dg/vect/pr101445.c
index f8a6e9ce6f7fa514cacd8b58d9263636d1d28eff..143156f2464e84e392c04231e4717ef9ec7d8a6e 100644
--- a/gcc/testsuite/gcc.dg/vect/pr101445.c
+++ b/gcc/testsuite/gcc.dg/vect/pr101445.c
@@ -21,6 +21,7 @@ int main()
 {
   check_vect ();
   foo ();
+#pragma GCC novector
   for (int d = 0; d < 25; d++)
     if (a[d] != 0)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr103581.c b/gcc/testsuite/gcc.dg/vect/pr103581.c
index d072748de31d2c6beb5d6dd86bf762ee1f4d0182..92695c83d99bf048b52c8978634027bcfd71c13d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr103581.c
+++ b/gcc/testsuite/gcc.dg/vect/pr103581.c
@@ -39,6 +39,7 @@ main()
   unsigned int *resusiusi = maskgatherusiusi (16, idx4, data4);
   unsigned long long *resudiudi = maskgatherudiudi (16, idx8, data8);
   unsigned int *resusiudi = maskgatherusiudi (16, idx8, data4);
+#pragma GCC novector
   for (int i = 0; i < 16; ++i)
     {
       unsigned int d = idx4[i];
diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c b/gcc/testsuite/gcc.dg/vect/pr105219.c
index 4bca5bbba30a9740a54e6205bc0d0c8011070977..2289f5e1a633b56218d089d81528599d4f1f282b 100644
--- a/gcc/testsuite/gcc.dg/vect/pr105219.c
+++ b/gcc/testsuite/gcc.dg/vect/pr105219.c
@@ -22,6 +22,7 @@ int main()
       {
         __builtin_memset (data, 0, sizeof (data));
         foo (&data[start], n);
+#pragma GCC novector
         for (int j = 0; j < n; ++j)
           if (data[start + j] != j)
             __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr108608.c b/gcc/testsuite/gcc.dg/vect/pr108608.c
index e968141ba03639ab86ccf77e5e9ad5dd56a66e0d..fff5c1a89365665edc3478263ee909b2b260e178 100644
--- a/gcc/testsuite/gcc.dg/vect/pr108608.c
+++ b/gcc/testsuite/gcc.dg/vect/pr108608.c
@@ -13,6 +13,7 @@ main (void)
 {
   check_vect ();
   float ptr[256];
+#pragma GCC novector
   for (int j = 0; j < 16; ++j)
     {
       for (int i = 0; i < 256; ++i)
diff --git a/gcc/testsuite/gcc.dg/vect/pr18400.c b/gcc/testsuite/gcc.dg/vect/pr18400.c
index 012086138f7199fdf2b4b40666795f7df03a89d2..dd96d87be99287da19df4634578e2e073ab42455 100644
--- a/gcc/testsuite/gcc.dg/vect/pr18400.c
+++ b/gcc/testsuite/gcc.dg/vect/pr18400.c
@@ -19,6 +19,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/pr18536.c b/gcc/testsuite/gcc.dg/vect/pr18536.c
index 6d02675913b68c811f4e3bc1f71df830d7f4e2aa..33ee3a5ddcfa296672924678b40474bea947b9ea 100644
--- a/gcc/testsuite/gcc.dg/vect/pr18536.c
+++ b/gcc/testsuite/gcc.dg/vect/pr18536.c
@@ -22,6 +22,7 @@ int main (void)
   main1 (0, x);
 
   /* check results:  */
+#pragma GCC novector
   while (++i < 4)
     {
       if (x[i-1] != 2)
diff --git a/gcc/testsuite/gcc.dg/vect/pr20122.c b/gcc/testsuite/gcc.dg/vect/pr20122.c
index 4f1b7bd6c1e723405b6625f7c7c890a46d3272bc..3a0387e7728fedc9872cb385dd7817f7f5cf07ac 100644
--- a/gcc/testsuite/gcc.dg/vect/pr20122.c
+++ b/gcc/testsuite/gcc.dg/vect/pr20122.c
@@ -27,6 +27,7 @@ static void VecBug2(short Kernel[8][24])
             Kernshort2[i] = Kernel[k][i];
 
     for (k = 0; k<8; k++)
+#pragma GCC novector
         for (i = 0; i<24; i++)
             if (Kernshort2[i] != Kernel[k][i])
                 abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr25413.c b/gcc/testsuite/gcc.dg/vect/pr25413.c
index e80d6970933e675b6056e5d119c6eb0e817a40f9..266ef3109f20df7615e85079a5d2330f26cf540d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr25413.c
+++ b/gcc/testsuite/gcc.dg/vect/pr25413.c
@@ -26,6 +26,7 @@ int main (void)
   check_vect ();
   
   main1 ();
+#pragma GCC novector
   for (i=0; i<N; i++)
     if (a.d[i] != 1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr30784.c b/gcc/testsuite/gcc.dg/vect/pr30784.c
index 840dbc5f1f139aafe012904a774c1e5b9739b653..ad1fa05d8edae5e28a3308f39ff304de3b1d60c1 100644
--- a/gcc/testsuite/gcc.dg/vect/pr30784.c
+++ b/gcc/testsuite/gcc.dg/vect/pr30784.c
@@ -21,6 +21,7 @@ int main ()
   check_vect ();
   main1 (32);
 
+#pragma GCC novector
   for (si = 0; si < 32; ++si)
     if (stack_vars_sorted[si] != si)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr37539.c b/gcc/testsuite/gcc.dg/vect/pr37539.c
index dfbfc20c5cbca0cfa7158423ee4a42e5976b56fe..c7934eb384739778a841271841fd8b7777ee19be 100644
--- a/gcc/testsuite/gcc.dg/vect/pr37539.c
+++ b/gcc/testsuite/gcc.dg/vect/pr37539.c
@@ -17,6 +17,7 @@ ayuv2yuyv_ref (int *d, int *src, int n)
   }
 
   /* Check results.  */
+#pragma GCC novector
   for(i=0;i<n/2;i++){
    if (dest[i*4 + 0] != (src[i*2 + 0])>>16
        || dest[i*4 + 1] != (src[i*2 + 1])>>8
diff --git a/gcc/testsuite/gcc.dg/vect/pr40074.c b/gcc/testsuite/gcc.dg/vect/pr40074.c
index 143ee05b1fda4b0f858e31cad2ecd4211530e7b6..b75061a8116c34f609eb9ed59256b6eea87976a4 100644
--- a/gcc/testsuite/gcc.dg/vect/pr40074.c
+++ b/gcc/testsuite/gcc.dg/vect/pr40074.c
@@ -30,6 +30,7 @@ main1 ()
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N-1; i++)
     {
       if (res[i] != arr[i].b + arr[i].d + arr[i+1].b)
diff --git a/gcc/testsuite/gcc.dg/vect/pr45752.c b/gcc/testsuite/gcc.dg/vect/pr45752.c
index 4ddac7ad5097c72f08b948f64caa54421d4f55d0..e8b364f29eb0c4b20bb2b2be5d49db3aab5ac39b 100644
--- a/gcc/testsuite/gcc.dg/vect/pr45752.c
+++ b/gcc/testsuite/gcc.dg/vect/pr45752.c
@@ -146,6 +146,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output, input2, output2);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (output[i] != check_results[i]
         || output2[i] != check_results2[i])
diff --git a/gcc/testsuite/gcc.dg/vect/pr45902.c b/gcc/testsuite/gcc.dg/vect/pr45902.c
index ac8e1ca6d38159d3c26497a414b638f49846381e..74510bf94b82850b6492c6d1ed0abacb73f65a16 100644
--- a/gcc/testsuite/gcc.dg/vect/pr45902.c
+++ b/gcc/testsuite/gcc.dg/vect/pr45902.c
@@ -34,6 +34,7 @@ main ()
 
   main1 ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (res[i] != a[i] >> 8)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr46009.c b/gcc/testsuite/gcc.dg/vect/pr46009.c
index 9649e2fb4bbfd74e134a9ef3d068d50b9bcb86c0..fe73dbf5db08732cc74115281dcf6a020f893cb6 100644
--- a/gcc/testsuite/gcc.dg/vect/pr46009.c
+++ b/gcc/testsuite/gcc.dg/vect/pr46009.c
@@ -49,6 +49,7 @@ main (void)
       e[i] = -1;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     {
       int g;
@@ -59,6 +60,7 @@ main (void)
       e[i] = -1;
     }
   bar ();
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     {
       int g;
diff --git a/gcc/testsuite/gcc.dg/vect/pr48172.c b/gcc/testsuite/gcc.dg/vect/pr48172.c
index a7fc05cae9119076efad4ca13a0f6fd0aff004b7..850e9b92bc15ac5f51fee8ac7fd2c9122def66b6 100644
--- a/gcc/testsuite/gcc.dg/vect/pr48172.c
+++ b/gcc/testsuite/gcc.dg/vect/pr48172.c
@@ -25,6 +25,7 @@ int main() {
     array[HALF+i] = array[2*i] + array[2*i + 1];
 
   /* see if we have any failures */
+#pragma GCC novector
   for (i = 0; i < HALF - 1; i++)
     if (array[HALF+i] != array[2*i] + array[2*i + 1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr51074.c b/gcc/testsuite/gcc.dg/vect/pr51074.c
index 4144572126e9de36f5b2e85bb56ff9fdff372bce..d6c8cea1f842e08436a3d04af513307d3e980d27 100644
--- a/gcc/testsuite/gcc.dg/vect/pr51074.c
+++ b/gcc/testsuite/gcc.dg/vect/pr51074.c
@@ -15,6 +15,7 @@ main ()
       s[i].a = i;
     }
   asm volatile ("" : : : "memory");
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (s[i].b != 0 || s[i].a != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr51581-3.c b/gcc/testsuite/gcc.dg/vect/pr51581-3.c
index 76c156adf9d0dc083b7eb5fb2e6f056398e2b845..25acceef0e5ca6f8c180a41131cd190b9c84b533 100644
--- a/gcc/testsuite/gcc.dg/vect/pr51581-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr51581-3.c
@@ -97,17 +97,20 @@ main ()
     }
   f1 ();
   f2 ();
+#pragma GCC novector
   for (i = 0; i < 8; i++)
     if (a[i] != b[i] / 8 || c[i] != d[i] / 3)
       abort ();
   f3 ();
   f4 ();
+#pragma GCC novector
   for (i = 0; i < 8; i+= 2)
     if (a[i] != b[i] / 8 || a[i + 1] != b[i + 1] / 4
 	|| c[i] != d[i] / 3 || c[i + 1] != d[i + 1] / 5)
       abort ();
   f5 ();
   f6 ();
+#pragma GCC novector
   for (i = 0; i < 8; i+= 2)
     if (a[i] != b[i] / 14 || a[i + 1] != b[i + 1] / 15
 	|| c[i] != d[i] / (i == 6 ? 13 : 6) || c[i + 1] != d[i + 1] / 5)
diff --git a/gcc/testsuite/gcc.dg/vect/pr51581-4.c b/gcc/testsuite/gcc.dg/vect/pr51581-4.c
index 632c96e7481339a6dfac92913a519ad5501d34c4..f6234f3e7c09194dba54af08832171798c7d9c09 100644
--- a/gcc/testsuite/gcc.dg/vect/pr51581-4.c
+++ b/gcc/testsuite/gcc.dg/vect/pr51581-4.c
@@ -145,17 +145,20 @@ main ()
     }
   f1 ();
   f2 ();
+#pragma GCC novector
   for (i = 0; i < 16; i++)
     if (a[i] != b[i] / 8 || c[i] != d[i] / 3)
       abort ();
   f3 ();
   f4 ();
+#pragma GCC novector
   for (i = 0; i < 16; i+= 2)
     if (a[i] != b[i] / 8 || a[i + 1] != b[i + 1] / 4
 	|| c[i] != d[i] / 3 || c[i + 1] != d[i + 1] / 5)
       abort ();
   f5 ();
   f6 ();
+#pragma GCC novector
   for (i = 0; i < 16; i+= 2)
     if (a[i] != b[i] / 14 || a[i + 1] != b[i + 1] / 15
 	|| c[i] != d[i] / ((i & 7) == 6 ? 13 : 6) || c[i + 1] != d[i + 1] / 5)
diff --git a/gcc/testsuite/gcc.dg/vect/pr53185-2.c b/gcc/testsuite/gcc.dg/vect/pr53185-2.c
index 6057c69a24a81be20ecc5582685fb4516f47803d..51614e70d8feac0004644b2e6bb7deb52eeeefea 100644
--- a/gcc/testsuite/gcc.dg/vect/pr53185-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr53185-2.c
@@ -20,6 +20,7 @@ int main ()
   for (off = 0; off < 8; ++off)
     {
       fn1 (&a[off], &b[off], 32 - off, 3);
+#pragma GCC novector
       for (i = 0; i < 32 - off; ++i)
 	if (a[off+i] != b[off+i*3])
 	  abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr56826.c b/gcc/testsuite/gcc.dg/vect/pr56826.c
index e8223808184e6b7b37a6d458bdb440566314e959..2f2da458b89ac04634cb809873d7a60e55484499 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56826.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56826.c
@@ -35,6 +35,7 @@ int main()
       __asm__ volatile ("");
     }
   bar (&A[0], &B[0], 100);
+#pragma GCC novector
   for (i=0; i<300; i++)
     if (A[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr56918.c b/gcc/testsuite/gcc.dg/vect/pr56918.c
index 1c88d324b902e9389afe4c5c729f20b2ad790dbf..4941453bbe9940b4e775239c4c2c9606435ea20a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56918.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56918.c
@@ -22,6 +22,7 @@ main ()
   foo ();
   if (data[0] != 3 || data[7] != 1)
     abort ();
+#pragma GCC novector
   for (i = 1; i < 4; ++i)
     if (data[i] != i || data[i + 3] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr56920.c b/gcc/testsuite/gcc.dg/vect/pr56920.c
index 865cfda760d1978eb1f3f063c75e2bac558254bd..ef73471468392b573e999a59e282b4d796556b8d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56920.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56920.c
@@ -12,6 +12,7 @@ main ()
   check_vect ();
   for (i = 0; i < 15; ++i)
     a[i] = (i * 2) % 15;
+#pragma GCC novector
   for (i = 0; i < 15; ++i)
     if (a[i] != (i * 2) % 15)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr56933.c b/gcc/testsuite/gcc.dg/vect/pr56933.c
index 7206682d7935a0436aaf502537bb56642d5e4648..2f2afe6df134163d2e7761be4906d778dbd6b670 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56933.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56933.c
@@ -25,6 +25,7 @@ int main()
   for (i = 0; i < 2*1024; i++)
     d[i] = 1.;
   foo (b, d, f);
+#pragma GCC novector
   for (i = 0; i < 1024; i+= 2)
     {
       if (d[2*i] != 2.)
@@ -32,6 +33,7 @@ int main()
       if (d[2*i+1] != 4.)
 	abort ();
     }
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     {
       if (b[i] != 1.)
diff --git a/gcc/testsuite/gcc.dg/vect/pr57705.c b/gcc/testsuite/gcc.dg/vect/pr57705.c
index e17ae09beb68051637c3ece69ac2f29e1433008d..39c32946d74ef01efce6fbc2f23c72dd0b33091d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr57705.c
+++ b/gcc/testsuite/gcc.dg/vect/pr57705.c
@@ -47,14 +47,17 @@ main ()
   int i;
   check_vect ();
   foo (5, 3);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (a[i] != 5 + 4 * i)
       abort ();
   bar (5, 3);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (a[i] != 9 + 4 * i)
       abort ();
   baz (5, 3);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (a[i] != 5 + 4 * i || b[i] != (unsigned char) i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr57741-2.c b/gcc/testsuite/gcc.dg/vect/pr57741-2.c
index df63a49927d38badb2503787bcd828b796116199..6addd76b422614a2e28272f4d696e3cba4bb0376 100644
--- a/gcc/testsuite/gcc.dg/vect/pr57741-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr57741-2.c
@@ -34,6 +34,7 @@ main ()
   int i;
   check_vect ();
   foo (p, q, 1.5f);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (p[i] != 1.0f + i * 1.5f || q[i] != 2.0f + i * 0.5f)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr57741-3.c b/gcc/testsuite/gcc.dg/vect/pr57741-3.c
index 2e4954ac7f14b21463b0ef0ca97e05c4eb96e8fd..916fa131513b88321d36cdbe46f101361b4f8244 100644
--- a/gcc/testsuite/gcc.dg/vect/pr57741-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr57741-3.c
@@ -33,6 +33,7 @@ main ()
   check_vect ();
   r[0] = 0;
   foo (1.5f);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (p[i] != 1.0f + i * 1.5f || q[i] != 2.0f + i * 0.5f || r[i] != 1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr59591-1.c b/gcc/testsuite/gcc.dg/vect/pr59591-1.c
index 892fce58e36b37e5412cc6c100f82b6077ace77e..e768fb3e1de48cf43b389cf83b4f7f1f030c4f91 100644
--- a/gcc/testsuite/gcc.dg/vect/pr59591-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr59591-1.c
@@ -31,6 +31,7 @@ bar (void)
       t[i] = i * 13;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 256; i++)
     if ((i >> 2) & (1 << (i & 3)))
       {
diff --git a/gcc/testsuite/gcc.dg/vect/pr59591-2.c b/gcc/testsuite/gcc.dg/vect/pr59591-2.c
index bd82d765794a32af6509ffd60d1f552ce10570a3..3bdf4252cffe63830b5b47cd17fa29a3c65afc73 100644
--- a/gcc/testsuite/gcc.dg/vect/pr59591-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr59591-2.c
@@ -32,6 +32,7 @@ bar (void)
       t[i] = i * 13;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 256; i++)
     if ((i >> 2) & (1 << (i & 3)))
       {
diff --git a/gcc/testsuite/gcc.dg/vect/pr59594.c b/gcc/testsuite/gcc.dg/vect/pr59594.c
index 947fa4c0c301d98cbdfeb5da541482858b69180f..e3ece8abf7131aa4ed0a2d5af79d4bdea90bd8c1 100644
--- a/gcc/testsuite/gcc.dg/vect/pr59594.c
+++ b/gcc/testsuite/gcc.dg/vect/pr59594.c
@@ -22,6 +22,7 @@ main ()
     }
   if (b[0] != 1)
     __builtin_abort ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (b[i + 1] != i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr59984.c b/gcc/testsuite/gcc.dg/vect/pr59984.c
index d6977f0020878c043376b7e7bfdc6a0e85ac2663..c00c2267158667784fb084b0ade19e2ab763c6a3 100644
--- a/gcc/testsuite/gcc.dg/vect/pr59984.c
+++ b/gcc/testsuite/gcc.dg/vect/pr59984.c
@@ -37,6 +37,7 @@ test (void)
       foo (a[i], &v1, &v2);
       a[i] = v1 * v2;
     }
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * i * i * i - 1)
       __builtin_abort ();
@@ -49,6 +50,7 @@ test (void)
       bar (a[i], &v1, &v2);
       a[i] = v1 * v2;
     }
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * i * i * i - 1)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr60276.c b/gcc/testsuite/gcc.dg/vect/pr60276.c
index 9fc18ac7428cf71903b6ebb04b90eb21b2e8b3c7..824e2a336b6d9fad2e7a72c445ec2edf80be8138 100644
--- a/gcc/testsuite/gcc.dg/vect/pr60276.c
+++ b/gcc/testsuite/gcc.dg/vect/pr60276.c
@@ -44,6 +44,7 @@ int main(void)
   foo (out + 2, lp + 1, 48);
   foo_novec (out2 + 2, lp + 1, 48);
 
+#pragma GCC novector
   for (s = 0; s < 49; s++)
     if (out[s] != out2[s])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr61194.c b/gcc/testsuite/gcc.dg/vect/pr61194.c
index 8421367577278cdf5762327d83cdc4a0e65c9411..8cd38b3d5da616d65ba131d048280b1d5644339d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr61194.c
+++ b/gcc/testsuite/gcc.dg/vect/pr61194.c
@@ -32,6 +32,7 @@ int main()
 
   barX();
 
+#pragma GCC novector
   for (i = 0; i < 1024; ++i)
     if (z[i] != ((x[i]>0 && w[i]<0) ? 0. : 1.))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr61680.c b/gcc/testsuite/gcc.dg/vect/pr61680.c
index e25bf78090ce49d68cb3694233253b403709331a..bb24014bdf045f22a0c9c5234481f07153c25d41 100644
--- a/gcc/testsuite/gcc.dg/vect/pr61680.c
+++ b/gcc/testsuite/gcc.dg/vect/pr61680.c
@@ -8,6 +8,7 @@ bar (double p[][4])
 {
   int i;
   double d = 172.0;
+#pragma GCC novector
   for (i = 0; i < 4096; i++)
     {
       if (p[i][0] != 6.0 || p[i][1] != 6.0 || p[i][2] != 10.0)
diff --git a/gcc/testsuite/gcc.dg/vect/pr62021.c b/gcc/testsuite/gcc.dg/vect/pr62021.c
index 40c64429d6382821af4a31b3569c696ea0e5fa2a..460fadb3f6cd73c7cac2bbba65cc09d4211396e8 100644
--- a/gcc/testsuite/gcc.dg/vect/pr62021.c
+++ b/gcc/testsuite/gcc.dg/vect/pr62021.c
@@ -24,6 +24,7 @@ main ()
   #pragma omp simd
   for (i = 0; i < 1024; i++)
     b[i] = foo (b[i], i);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (b[i] != &a[1023])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr63341-2.c b/gcc/testsuite/gcc.dg/vect/pr63341-2.c
index 2004a79b80ef4081136ade20df9b6acd5b6428c1..aa338263a7584b06f10e4cb4a6baf19dea20f40a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr63341-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr63341-2.c
@@ -16,6 +16,7 @@ foo ()
   int i;
   for (i = 0; i < 32; i++)
     d[i] = t.s[i].s + 4;
+#pragma GCC novector
   for (i = 0; i < 32; i++)
     if (d[i] != t.s[i].s + 4)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr64252.c b/gcc/testsuite/gcc.dg/vect/pr64252.c
index b82ad017c16fda6e031b503a9b11fe39a3691a6c..89070c27ff0f9763bd8eaff4a81b5b0197ae12dc 100644
--- a/gcc/testsuite/gcc.dg/vect/pr64252.c
+++ b/gcc/testsuite/gcc.dg/vect/pr64252.c
@@ -57,6 +57,7 @@ int main()
   int i;
   check_vect ();
   bar(2, q);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (q[0].a[i].f != 0 || q[0].a[i].c != i || q[0].a[i].p != -1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr64404.c b/gcc/testsuite/gcc.dg/vect/pr64404.c
index 26fceb6cd8936f7300fb0067c0f18c3d35ac4595..6fecf9ecae18e49808a58fe17a6b912786bdbad3 100644
--- a/gcc/testsuite/gcc.dg/vect/pr64404.c
+++ b/gcc/testsuite/gcc.dg/vect/pr64404.c
@@ -42,6 +42,7 @@ main (void)
 
   Compute ();
 
+#pragma GCC novector
   for (d = 0; d < 1024; d++)
     {
       if (Y[d].l != X[d].l + X[d].h
diff --git a/gcc/testsuite/gcc.dg/vect/pr64421.c b/gcc/testsuite/gcc.dg/vect/pr64421.c
index 3b5ab2d980c207c1d5e7fff73cd403ac38790080..47afd22d93e5ed8fbfff034cd2a03d8d70f7e422 100644
--- a/gcc/testsuite/gcc.dg/vect/pr64421.c
+++ b/gcc/testsuite/gcc.dg/vect/pr64421.c
@@ -27,6 +27,7 @@ main ()
     a[i] = foo (a[i], i);
   if (a[0] != 1 || a[1] != 3)
     abort ();
+#pragma GCC novector
   for (i = 2; i < 1024; i++)
     if (a[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr64493.c b/gcc/testsuite/gcc.dg/vect/pr64493.c
index 6fb13eb6d96fe67471fdfafd2eed2a897ae8b670..d3faf84bcc16d31fc11dd2d0cd7242972fdbafdc 100644
--- a/gcc/testsuite/gcc.dg/vect/pr64493.c
+++ b/gcc/testsuite/gcc.dg/vect/pr64493.c
@@ -9,6 +9,7 @@ main ()
 
   for (; a; a--)
     for (d = 1; d <= 0; d++)
+#pragma GCC novector
       for (; d;)
 	if (h)
 	  {
diff --git a/gcc/testsuite/gcc.dg/vect/pr64495.c b/gcc/testsuite/gcc.dg/vect/pr64495.c
index 5cbaeff8389dafd3444f90240a910e7d5e4f2431..c48f9389aa325a8b8ceb5697684f563b8c13a72d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr64495.c
+++ b/gcc/testsuite/gcc.dg/vect/pr64495.c
@@ -11,6 +11,7 @@ main ()
 
   for (; a;)
     for (; g; g++)
+#pragma GCC novector
       for (; f; f++)
 	if (j)
 	  {
diff --git a/gcc/testsuite/gcc.dg/vect/pr66251.c b/gcc/testsuite/gcc.dg/vect/pr66251.c
index 26afbc96a5d57a49fbbac95753f4df006cb36018..355590e69a98687084fee2c5486d14c2a20f3fcb 100644
--- a/gcc/testsuite/gcc.dg/vect/pr66251.c
+++ b/gcc/testsuite/gcc.dg/vect/pr66251.c
@@ -51,6 +51,7 @@ int main ()
 
       test1(da, ia, stride, 256/stride);
 
+#pragma GCC novector
       for (i = 0; i < 256/stride; i++)
 	{
 	  if (da[i*stride] != ia[i*stride])
@@ -66,6 +67,7 @@ int main ()
 
       test2(ia, da, stride, 256/stride);
 
+#pragma GCC novector
       for (i = 0; i < 256/stride; i++)
 	{
 	  if (da[i*stride] != ia[i*stride])
diff --git a/gcc/testsuite/gcc.dg/vect/pr66253.c b/gcc/testsuite/gcc.dg/vect/pr66253.c
index bdf3ff9ca51f7f656fad687fd8c77c6ee053794f..6b99b4f3b872cbeab14e035f2e2d40aab6e438e4 100644
--- a/gcc/testsuite/gcc.dg/vect/pr66253.c
+++ b/gcc/testsuite/gcc.dg/vect/pr66253.c
@@ -39,6 +39,7 @@ int main ()
 
       test1(da, ia, ca, stride, 256/stride);
 
+#pragma GCC novector
       for (i = 0; i < 256/stride; i++)
 	{
 	  if (da[i*stride] != 0.5 * ia[i*stride] * ca[i*stride])
diff --git a/gcc/testsuite/gcc.dg/vect/pr68502-1.c b/gcc/testsuite/gcc.dg/vect/pr68502-1.c
index 4f7d0bfca38693877ff080842d6ef7abf3d3e17b..cc6e6cd9a2be0e921382bda3c653f6a6b730b905 100644
--- a/gcc/testsuite/gcc.dg/vect/pr68502-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr68502-1.c
@@ -41,6 +41,7 @@ int main ()
   for (i = 0; i < numf1s; i++)
     f1_layer[i].I = (double *)-1;
   reset_nodes ();
+#pragma GCC novector
   for (i = 0; i < numf1s; i++)
     if (f1_layer[i].I != (double *)-1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr68502-2.c b/gcc/testsuite/gcc.dg/vect/pr68502-2.c
index a3eddafc7ca76cbe4c21f6ed873249cb2c94b7a6..11f87125b75df9db29669aa55cdc3c202b0fedda 100644
--- a/gcc/testsuite/gcc.dg/vect/pr68502-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr68502-2.c
@@ -41,6 +41,7 @@ int main ()
   for (i = 0; i < numf1s; i++)
     f1_layer[i].I = -1;
   reset_nodes ();
+#pragma GCC novector
   for (i = 0; i < numf1s; i++)
     if (f1_layer[i].I != -1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr69820.c b/gcc/testsuite/gcc.dg/vect/pr69820.c
index be24e4fa9a1343e4308bfd967f1ccfdd3549db5c..72d10b65c16b54764aac0cf271138ffa187f4052 100644
--- a/gcc/testsuite/gcc.dg/vect/pr69820.c
+++ b/gcc/testsuite/gcc.dg/vect/pr69820.c
@@ -28,6 +28,7 @@ main ()
       c[i] = 38364;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 100; ++i)
     if (b[i] != 0xed446af8U)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr70021.c b/gcc/testsuite/gcc.dg/vect/pr70021.c
index 988fc53216d12908bbbc564c9efc4d63a5c057d7..d4d5db12bc0e646413ba393b57edc60ba1189059 100644
--- a/gcc/testsuite/gcc.dg/vect/pr70021.c
+++ b/gcc/testsuite/gcc.dg/vect/pr70021.c
@@ -32,6 +32,7 @@ main ()
       e[i] = 14234165565810642243ULL;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < N; ++i)
     if (e[i] != ((i & 3) ? 14234165565810642243ULL : 1ULL))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr70354-1.c b/gcc/testsuite/gcc.dg/vect/pr70354-1.c
index 9d601dc9d4a92922e4114b8b4d1b7ef2f49c0c44..2687758b022b01af3eb7b444fee25be8bc1f8b3c 100644
--- a/gcc/testsuite/gcc.dg/vect/pr70354-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr70354-1.c
@@ -41,6 +41,7 @@ main ()
       h[i] = 8193845517487445944ULL;
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (d[i] != 8193845517487445944ULL || e[i] != 1
 	|| g[i] != 4402992416302558097ULL)
diff --git a/gcc/testsuite/gcc.dg/vect/pr70354-2.c b/gcc/testsuite/gcc.dg/vect/pr70354-2.c
index 160e1e083e03e0652d06bf29df060192cbe75fd5..cb4cdaae30ba5760fc32e255b651072ca397a499 100644
--- a/gcc/testsuite/gcc.dg/vect/pr70354-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr70354-2.c
@@ -29,6 +29,7 @@ main ()
       b[i] = 0x1200000000ULL + (i % 54);
     }
   foo ();
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     if (a[i] != (0x1234ULL << (i % 54)))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr71259.c b/gcc/testsuite/gcc.dg/vect/pr71259.c
index 587a8e3c8f378f3c57f8a9a2e9fa5aee3a968860..6cb22f622ee2ce2d6de51c440472e36fe7294362 100644
--- a/gcc/testsuite/gcc.dg/vect/pr71259.c
+++ b/gcc/testsuite/gcc.dg/vect/pr71259.c
@@ -20,6 +20,7 @@ main ()
   asm volatile ("" : : : "memory");
   for (i = 0; i < 44; i++) 
     for (j = 0; j < 17; j++)
+#pragma GCC novector
       for (k = 0; k < 2; k++)
 	if (c[i][j][k] != -5105075050047261684)
 	  __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr78005.c b/gcc/testsuite/gcc.dg/vect/pr78005.c
index 7cefe73fe1b3d0050befeb5e25aec169867fd96a..6da7acf50c2a1237b817abf8e6b9191b3c3e1378 100644
--- a/gcc/testsuite/gcc.dg/vect/pr78005.c
+++ b/gcc/testsuite/gcc.dg/vect/pr78005.c
@@ -22,6 +22,7 @@ foo (int n, int d)
 
 #define check_u(x)		\
   foo (x, 2);			\
+  _Pragma("GCC novector")	\
   for (i = 0; i < N; i++)	\
     {				\
       if (u[i] != res##x[i])	\
diff --git a/gcc/testsuite/gcc.dg/vect/pr78558.c b/gcc/testsuite/gcc.dg/vect/pr78558.c
index 2606d4ec10d3fa18a4c0e4b8e9dd02131cb57ba7..2c28426eb85fc6663625c542e84860fa7bcfd3c2 100644
--- a/gcc/testsuite/gcc.dg/vect/pr78558.c
+++ b/gcc/testsuite/gcc.dg/vect/pr78558.c
@@ -37,6 +37,7 @@ main ()
   asm volatile ("" : : "g" (s), "g" (d) : "memory");
   foo ();
   asm volatile ("" : : "g" (s), "g" (d) : "memory");
+#pragma GCC novector
   for (i = 0; i < 50; ++i)
     if (d[i].q != i || d[i].r != 50 * i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr80815-2.c b/gcc/testsuite/gcc.dg/vect/pr80815-2.c
index 83557daa6963632ccf2cf0a641a4106b4dc833f5..3ffff0be3be96df4c3e6a3d5caa68b7d4b6bad9a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr80815-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr80815-2.c
@@ -38,6 +38,7 @@ int main (void)
 
   foo (a, b);
 
+#pragma GCC novector
   for (i = 973; i < 1020; i++)
     if (arr[i] != res[i - 973])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr80815-3.c b/gcc/testsuite/gcc.dg/vect/pr80815-3.c
index 50392ab1a417de2af81af6473bf0a85bd9eb7279..5e2be5262ebb639d4bd771e326f9a07ed2ee0680 100644
--- a/gcc/testsuite/gcc.dg/vect/pr80815-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr80815-3.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo (a, b, 50);
 
+#pragma GCC novector
   for (i = 975; i < 1025; i++)
     if (arr[i] != res[i - 975])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr80928.c b/gcc/testsuite/gcc.dg/vect/pr80928.c
index e6c1f1ab5a7f4ca7eac98cf91fccffbff2dcfc7a..34566c4535247d2fa39c5d856d1e0c32687e9a2a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr80928.c
+++ b/gcc/testsuite/gcc.dg/vect/pr80928.c
@@ -25,6 +25,7 @@ int main ()
   foo ();
 
   /* check results */
+#pragma GCC novector
   for (int i = 0; i < 1020; ++i)
     if (a[i] != ((i + 4) / 5) * 5)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr81410.c b/gcc/testsuite/gcc.dg/vect/pr81410.c
index 9c91c08d33c729d8ff26cae72f4651081850b550..6b7586992fe46918aab537a06f166ce2e25f90d8 100644
--- a/gcc/testsuite/gcc.dg/vect/pr81410.c
+++ b/gcc/testsuite/gcc.dg/vect/pr81410.c
@@ -26,6 +26,7 @@ int main()
       __asm__ volatile ("" : : : "memory");
     }
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 8; ++i)
     if (y[2*i] != 3*i || y[2*i+1] != 3*i + 1)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr81633.c b/gcc/testsuite/gcc.dg/vect/pr81633.c
index 9689ab3959cd9df8234b89ec307b7cd5d6f9d795..2ad144a60444eb82b8e8575efd8fcec94fcd6f01 100644
--- a/gcc/testsuite/gcc.dg/vect/pr81633.c
+++ b/gcc/testsuite/gcc.dg/vect/pr81633.c
@@ -24,6 +24,7 @@ int main(void)
   double A[4][4] = {{0.0}};
   kernel(A);
   for ( int i = 0; i < 4; i++ )
+#pragma GCC novector
     for ( int j = 0; j < 4; j++ )
       if (A[i][j] != expected[i][j])
 	__builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr81740-1.c b/gcc/testsuite/gcc.dg/vect/pr81740-1.c
index f6fd43c7c87e0aad951ba092796f0aae39b80d54..b01e1994834934bbd50f3fc1cbcf494ecc62c315 100644
--- a/gcc/testsuite/gcc.dg/vect/pr81740-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr81740-1.c
@@ -14,6 +14,7 @@ main ()
     for (c = 0; c <= 6; c++)
       a[c + 1][b + 2] = a[c][b + 1];
   for (i = 0; i < 8; i++)
+#pragma GCC novector
     for (d = 0; d < 10; d++)
       if (a[i][d] != (i == 3 && d == 6) * 4)
 	__builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr81740-2.c b/gcc/testsuite/gcc.dg/vect/pr81740-2.c
index 1e0d6645a03f77c9c042313fd5377b71ba75c4d6..7b2bfe139f20fb66c90cfd643b65df3edb9b536e 100644
--- a/gcc/testsuite/gcc.dg/vect/pr81740-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr81740-2.c
@@ -15,6 +15,7 @@ main ()
     for (c = 6; c >= 0; c--)
       a[c + 1][b + 2] = a[c][b + 1];
   for (i = 0; i < 8; i++)
+#pragma GCC novector
     for (d = 0; d < 10; d++)
       if (a[i][d] != (i == 3 && d == 6) * 4)
 	__builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr85586.c b/gcc/testsuite/gcc.dg/vect/pr85586.c
index 3d075bfcec83bab119f77bad7b642eb3d634fb4c..a4a170a1fcd130d84da3be9f897889ff4cfc717c 100644
--- a/gcc/testsuite/gcc.dg/vect/pr85586.c
+++ b/gcc/testsuite/gcc.dg/vect/pr85586.c
@@ -24,6 +24,7 @@ main (void)
     }
 
   foo (out, in, 1);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (out[i] != in[i])
       __builtin_abort ();
@@ -33,6 +34,7 @@ main (void)
   foo (out + N - 1, in, -1);
   if (out[0] != in[N - 1])
     __builtin_abort ();
+#pragma GCC novector
   for (int i = 1; i <= N; ++i)
     if (out[i] != 2)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr87288-1.c b/gcc/testsuite/gcc.dg/vect/pr87288-1.c
index 0d0a70dff6f21b2f07fecd937d4fe26c0df61513..ec968dfcd0153cdb001e8e282146dbdb67d23c65 100644
--- a/gcc/testsuite/gcc.dg/vect/pr87288-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr87288-1.c
@@ -16,6 +16,7 @@ run (int *restrict a, int *restrict b, int count)
 void __attribute__ ((noipa))
 check (int *restrict a, int count)
 {
+#pragma GCC novector
   for (int i = 0; i < count * N; ++i)
     if (a[i * 2] != i * 41 + count || a[i * 2 + 1] != count)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr87288-2.c b/gcc/testsuite/gcc.dg/vect/pr87288-2.c
index e9ff9a0be7c08a9755972717a63025f2825e95cf..03c7f88a6a48507bbbfbf2e177425d28605a3aa6 100644
--- a/gcc/testsuite/gcc.dg/vect/pr87288-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr87288-2.c
@@ -22,6 +22,7 @@ RUN_COUNT (4)
 void __attribute__ ((noipa))
 check (int *restrict a, int count)
 {
+#pragma GCC novector
   for (int i = 0; i < count * N; ++i)
     if (a[i * 2] != i * 41 + count || a[i * 2 + 1] != count)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr87288-3.c b/gcc/testsuite/gcc.dg/vect/pr87288-3.c
index 23f574ccb53268b59b933ec59a5eadaa890007ff..0475990992e58451de8649b735fa16f0e32ed657 100644
--- a/gcc/testsuite/gcc.dg/vect/pr87288-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr87288-3.c
@@ -22,6 +22,7 @@ RUN_COUNT (4)
 void __attribute__ ((noipa))
 check (int *restrict a, int count)
 {
+#pragma GCC novector
   for (int i = 0; i < count * N + 1; ++i)
     if (a[i * 2] != i * 41 + count || a[i * 2 + 1] != count)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr88903-1.c b/gcc/testsuite/gcc.dg/vect/pr88903-1.c
index 77dbfd47c91be8cce0edde8b09b7b90d40268306..0f78ccc995d5dcd35d5d7ba0f35afdc8bb5a1b2b 100644
--- a/gcc/testsuite/gcc.dg/vect/pr88903-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr88903-1.c
@@ -19,6 +19,7 @@ main()
   for (int i = 0; i < 1024; ++i)
     x[i] = i;
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != i << ((i/2+1) & 31))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr88903-2.c b/gcc/testsuite/gcc.dg/vect/pr88903-2.c
index cd88a99c6045c6a3eb848f053386d22b9cbe46ce..8a1cf9c523632f392d95aa2d6ec8332fa50fec5b 100644
--- a/gcc/testsuite/gcc.dg/vect/pr88903-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr88903-2.c
@@ -21,6 +21,7 @@ int main()
   for (int i = 0; i < 1024; ++i)
     x[i] = i, y[i] = i % 8;
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != i << ((i & ~1) % 8))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr90018.c b/gcc/testsuite/gcc.dg/vect/pr90018.c
index 52640f5aa6f02d6deed3b2790482a2d2d01ddd5b..08ca326f7ebfab1a42813bc121f1e5a46394e983 100644
--- a/gcc/testsuite/gcc.dg/vect/pr90018.c
+++ b/gcc/testsuite/gcc.dg/vect/pr90018.c
@@ -41,6 +41,7 @@ int main(int argc, char **argv)
       a42[i*4+n*4+1] = tem4 + a42[i*4+n*4+1];
       __asm__ volatile ("": : : "memory");
     }
+#pragma GCC novector
   for (int i = 0; i < 4 * n * 3; ++i)
     if (a4[i] != a42[i])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr92420.c b/gcc/testsuite/gcc.dg/vect/pr92420.c
index e43539fbbd7202b3ae2e9f71bfd82a3fcdf8bde3..e56eb0e12fbec55b16785e244f3a24b889af784d 100644
--- a/gcc/testsuite/gcc.dg/vect/pr92420.c
+++ b/gcc/testsuite/gcc.dg/vect/pr92420.c
@@ -41,6 +41,7 @@ main ()
     }
   foo (a, b + N, d, N);
   bar (a, c, e, N);
+#pragma GCC novector
   for (i = 0; i < N; ++i)
     if (d[i].r != e[i].r || d[i].i != e[i].i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr94994.c b/gcc/testsuite/gcc.dg/vect/pr94994.c
index e98aeb090d8cbcfc9628052b553b7a7d226069d1..2f598eacd541eafaef02f9aee34fc769dac2a4c6 100644
--- a/gcc/testsuite/gcc.dg/vect/pr94994.c
+++ b/gcc/testsuite/gcc.dg/vect/pr94994.c
@@ -41,6 +41,7 @@ main (void)
       for (unsigned int j = 0; j < INPUT_SIZE + MAX_STEP; ++j)
 	x[j] = j + 10;
       copy (x + i, x, INPUT_SIZE);
+#pragma GCC novector
       for (int j = 0; j < INPUT_SIZE + i; ++j)
 	{
 	  int expected;
diff --git a/gcc/testsuite/gcc.dg/vect/pr96783-1.c b/gcc/testsuite/gcc.dg/vect/pr96783-1.c
index 55d1364f056febd86c49272ede488bd37867dbe8..2de222d2ae6491054b6c7a6cf5891580abf5c6f7 100644
--- a/gcc/testsuite/gcc.dg/vect/pr96783-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr96783-1.c
@@ -31,6 +31,7 @@ int main ()
     a[i] = i;
   foo (a + 3 * 5, 6-1, 5);
   const long b[3 * 8] = { 0, 1, 2, 21, 22, 23, 18, 19, 20, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 };
+#pragma GCC novector
   for (int i = 0; i < 3 * 8; ++i)
     if (a[i] != b[i])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr96783-2.c b/gcc/testsuite/gcc.dg/vect/pr96783-2.c
index 33c37109e3a8de646edd8339b0c98300bed25b51..bcdcfac072cf564d965edd4be7fbd9b23302e759 100644
--- a/gcc/testsuite/gcc.dg/vect/pr96783-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr96783-2.c
@@ -20,6 +20,7 @@ int main()
   for (int i = 0; i < 1024; ++i)
     b[i] = i;
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 256; ++i)
     if (a[3*i] != 1023 - 3*i - 2
 	|| a[3*i+1] != 1023 - 3*i - 1
diff --git a/gcc/testsuite/gcc.dg/vect/pr97081-2.c b/gcc/testsuite/gcc.dg/vect/pr97081-2.c
index 98ad3c3fe17e4556985cb6a0392de72a19911a97..436e897cd2e6a8bb41228cec14480bac88e98952 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97081-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97081-2.c
@@ -24,6 +24,7 @@ main ()
       c[i] = i;
     }
   foo (3);
+#pragma GCC novector
   for (int i = 0; i < 1024; i++)
     if (s[i] != (unsigned short) ((i << 3) | (i >> (__SIZEOF_SHORT__ * __CHAR_BIT__ - 3)))
         || c[i] != (unsigned char) ((((unsigned char) i) << 3) | (((unsigned char) i) >> (__CHAR_BIT__ - 3))))
diff --git a/gcc/testsuite/gcc.dg/vect/pr97558-2.c b/gcc/testsuite/gcc.dg/vect/pr97558-2.c
index 8f0808686fbad0b5b5ec11471fd38f53ebd81bde..5dff065f2e220b1ff31027c271c07c9670b98f9c 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97558-2.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97558-2.c
@@ -41,6 +41,7 @@ int main (void)
   foo (N-1);
 
     /* check results:  */
+#pragma GCC novector
   for (i=0; i<N/2; i++)
     {
       sum = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/pr97678.c b/gcc/testsuite/gcc.dg/vect/pr97678.c
index 7fb6c93515e41257f173f664d9304755a8dc0de2..1fa56326422e832e82bb6f1739f14ea1a1cb4955 100644
--- a/gcc/testsuite/gcc.dg/vect/pr97678.c
+++ b/gcc/testsuite/gcc.dg/vect/pr97678.c
@@ -19,6 +19,7 @@ main ()
       b[i * 2 + 1] = i * 8;
     }
 
+#pragma GCC novector
   for (i = 0; i < 158; ++i)
     if (b[i*2] != (unsigned short)(i*7)
         || b[i*2+1] != (unsigned short)(i*8))
diff --git a/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c b/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c
index 4373dce917f9d7916e128a639e81179fe1250ada..1154b40d4855b5a42187134e9d5f08a98a160744 100644
--- a/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c
+++ b/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c
@@ -22,6 +22,7 @@ int main (void)
   int i;
   check_vect ();
   foo ();
+#pragma GCC novector
   for (i = 0; i < 100; i++)
     if (f[i]!=1) 
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
index e3466d0da1de6207b8583f42aad412b2c2000dcc..dbf65605e91c4219b6f5c6de220384ed09e999a7 100644
--- a/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
+++ b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
@@ -50,6 +50,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1[2].a.n[1][2][i] != 5)
@@ -63,6 +64,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = NINTS - 1; i < N - 1; i++)
     {
       if (tmp1[2].a.n[1][2][i] != 6)
@@ -81,6 +83,7 @@ int main1 ()
   /* check results:  */
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
 	{
           if (tmp1[2].e.n[1][i][j] != 8)
@@ -100,6 +103,7 @@ int main1 ()
   /* check results:  */
   for (i = 0; i < N - NINTS; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N - NINTS; j++)
 	{
           if (tmp2[2].e.n[1][i][j] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-1.c b/gcc/testsuite/gcc.dg/vect/slp-1.c
index 26b71d654252bcd2e4591f11a78a4c0a3dad5d85..82e4f6469fb9484f84c5c832d0461576b63ba8fe 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-1.c
@@ -20,6 +20,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != 8 
@@ -42,6 +43,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] != 8
@@ -66,6 +68,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*5] != 8
@@ -91,6 +94,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*9] != 8
diff --git a/gcc/testsuite/gcc.dg/vect/slp-10.c b/gcc/testsuite/gcc.dg/vect/slp-10.c
index da44f26601a9ba8ea52417ec5a160dc4bedfc315..2759b66f7772cb1af508622a3099bdfb524cba56 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-10.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-10.c
@@ -46,6 +46,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
@@ -68,6 +69,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  (in[i*4] + 2) * 3
@@ -84,6 +86,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out2[i*2] !=  (float) (in[i*2] * 2 + 5)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-11a.c b/gcc/testsuite/gcc.dg/vect/slp-11a.c
index e6632fa77be8092524a202d6a322354b45e1794d..fcb7cf6c7a2c5d42ec7ce8bc081db7394ba2bd96 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11a.c
@@ -44,6 +44,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-11b.c b/gcc/testsuite/gcc.dg/vect/slp-11b.c
index d0b972f720be1c965207ded917f979957c76ee67..df64c8db350dbb12295c61e84d32d5a5c20a1ebe 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11b.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11b.c
@@ -22,6 +22,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  (in[i*4] + 2) * 3
diff --git a/gcc/testsuite/gcc.dg/vect/slp-11c.c b/gcc/testsuite/gcc.dg/vect/slp-11c.c
index bdcf434ce31ebc1df5f7cfecb5051ebc71af3aed..0f680cd4e60c41624992e4fb68d2c3664ff1722e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11c.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11c.c
@@ -21,6 +21,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out[i*2] !=  ((float) in[i*2] * 2 + 6)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c b/gcc/testsuite/gcc.dg/vect/slp-12a.c
index 08a8f55bab0b3d09e7eae14354c515203146b3d8..f0dda55acaea38e463044c7495af1f57ac121ce0 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c
@@ -47,6 +47,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-12b.c b/gcc/testsuite/gcc.dg/vect/slp-12b.c
index 48e78651a6dca24de91a1f36d0cd757e18f7c1b8..e2ea24d6c535c60ba903ce2411290e603414009a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12b.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12b.c
@@ -23,6 +23,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out2[i*2] !=  (float) (in[i*2] * 2 + 11)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-12c.c b/gcc/testsuite/gcc.dg/vect/slp-12c.c
index 6650b8bd94ece71dd9ccb9adcc3d17be2f2bc07a..9c48dff3bf486a8cd1843876975dfba40a055a23 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12c.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12c.c
@@ -24,6 +24,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  (in[i*4] + 2) * 3
diff --git a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
index a16656ace00a6a31d0c056056ec2e3e1f050c09f..ca70856c1dd54f106c9f1c3cde6b0ff5f7994e74 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
@@ -34,6 +34,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8] + i
@@ -65,6 +66,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
         if (out2[i*12] != in2[i*12] + i
@@ -100,6 +102,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
         if (out2[i*12] != in2[i*12] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-13.c b/gcc/testsuite/gcc.dg/vect/slp-13.c
index 8769d62cfd4d975a063ad953344855091a1cd129..b7f947e6dbe1fb7d9a8aa8b5f6ac1edfc89d33a2 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-13.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-13.c
@@ -28,6 +28,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8] + i
@@ -59,6 +60,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
         if (out2[i*12] != in2[i*12] + i
@@ -94,6 +96,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
         if (out2[i*12] != in2[i*12] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-14.c b/gcc/testsuite/gcc.dg/vect/slp-14.c
index 6af70815dd43c13fc9abfcebd70c562268dea86f..ccf23c1e44b78ac62dc78eef0ff6c6bc26e99fc1 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-14.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-14.c
@@ -64,6 +64,7 @@ main1 (int n)
 }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-15.c b/gcc/testsuite/gcc.dg/vect/slp-15.c
index dbced88c98d1fc8d289e6ac32a84dc9f4072e49f..13a0f3e3014d84a16a68a807e6a2730cbe8e6840 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-15.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-15.c
@@ -64,6 +64,7 @@ main1 (int n)
 }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-16.c b/gcc/testsuite/gcc.dg/vect/slp-16.c
index a7da9932c54c28669875d46e3e3945962d5e2dee..d053a64276db5c306749969cca7f336ba6a19b0b 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-16.c
@@ -38,6 +38,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*2] !=  (in[i*2] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-17.c b/gcc/testsuite/gcc.dg/vect/slp-17.c
index 6fa11e4c53ad73735af9ee74f56ddff0b777b99b..c759a5f0145ac239eb2a12efa89c4865fdbf703e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-17.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-17.c
@@ -27,6 +27,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*2] != in[i*2] + 5
diff --git a/gcc/testsuite/gcc.dg/vect/slp-18.c b/gcc/testsuite/gcc.dg/vect/slp-18.c
index ed426a344985d1e205f7a94f72f86954a77b3d92..f31088cb76b4cdd80460c0d6a24568430e595ea0 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-18.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-18.c
@@ -57,6 +57,7 @@ main1 ()
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-19a.c b/gcc/testsuite/gcc.dg/vect/slp-19a.c
index 0f92de92cd396227cc668396cd567ca965e9784b..ca7a0a8e456b1b787ad82e910ea5e3c5e5048c80 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-19a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-19a.c
@@ -28,6 +28,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-19b.c b/gcc/testsuite/gcc.dg/vect/slp-19b.c
index 237b36dd227186c8f0cb78b703351fdae6fef27c..4d53ac698dbd164d20271c4fe9ccc2c20f3c4eaa 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-19b.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-19b.c
@@ -29,6 +29,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-19c.c b/gcc/testsuite/gcc.dg/vect/slp-19c.c
index 32566cb5e1320de2ce9c83867c05902a24036de4..188ab37a0b61ba33ff4c19115e5c54e0f7bac500 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-19c.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-19c.c
@@ -47,6 +47,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*12] !=  in[i*12]
@@ -79,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*6] !=  in[i*6]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-2.c b/gcc/testsuite/gcc.dg/vect/slp-2.c
index 8d374d724539a47930fc951888471a7b367cd845..d0de3577eb6a1b8219e8a79a1a684f6b1b7baf52 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-2.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-2.c
@@ -25,6 +25,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != a8 
@@ -55,6 +56,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*16] != a8
@@ -85,6 +87,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*3] != a8
@@ -110,6 +113,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*11] != a8
diff --git a/gcc/testsuite/gcc.dg/vect/slp-20.c b/gcc/testsuite/gcc.dg/vect/slp-20.c
index dc5eab669ea9eaf7db83606b4c426921a6a5da15..ea19095f9fa06db508cfedda68ca2c65769b35b0 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-20.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-20.c
@@ -34,6 +34,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != b0 
@@ -77,6 +78,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != b0 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c b/gcc/testsuite/gcc.dg/vect/slp-21.c
index 4b83adb9807fc29fb9f2d618d15e8eb15290dd67..712a73b69d730fd27cb75d3ebb3624809317f841 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-21.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-21.c
@@ -45,6 +45,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       a0 = in[i*4];
@@ -101,6 +102,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       a0 = in[i*4];
@@ -158,6 +160,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       a0 = in[i*4];
diff --git a/gcc/testsuite/gcc.dg/vect/slp-22.c b/gcc/testsuite/gcc.dg/vect/slp-22.c
index e2a0002ffaf363fc12b76deaaee3067c9a0a186b..2c083dc4ea3b1d7d3c6b56508cc7465b76060aa1 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-22.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-22.c
@@ -39,6 +39,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != b0 
@@ -92,6 +93,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] != b0 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-23.c b/gcc/testsuite/gcc.dg/vect/slp-23.c
index d7c67fe2c6e9c6ecf94a2ddc8c1d7a4c234933c8..d32ee5ba73becb9e0b53bfc2af27a64571c56899 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-23.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-23.c
@@ -39,6 +39,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].c + arr[i].c
@@ -67,6 +68,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].c != arr[i].c + arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
index abd3a878f1ac36a7c8cde58743496f79b71f4476..5eaea9600acb2b8ffe674730bcf9514b51ae105f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
@@ -42,6 +42,7 @@ main1 (unsigned char x, unsigned char max_result, unsigned char min_result, s *a
     pIn++;
   }
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (ua1[2*i] != ub[2*i]
         || ua1[2*i+1] != ub[2*i+1]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-24.c b/gcc/testsuite/gcc.dg/vect/slp-24.c
index a45ce7de71fa6a8595b611dd47507df4e91e3b36..59178f2c0f28bdbf657ad68658d373e75d076f79 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-24.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-24.c
@@ -41,6 +41,7 @@ main1 (unsigned char x, unsigned char max_result, unsigned char min_result, s *a
     pIn++;
   }
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (ua1[2*i] != ub[2*i]
         || ua1[2*i+1] != ub[2*i+1]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-25.c b/gcc/testsuite/gcc.dg/vect/slp-25.c
index 1c33927c4342e01f80765d0ea723e01cec5fe2e6..9e3b5bbc9469fd0dc8631332643c1eb496652218 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-25.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-25.c
@@ -24,6 +24,7 @@ int main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N/2; i++)
     {
       if (ia[2*i] != 25
@@ -38,6 +39,7 @@ int main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= n/2; i++)
     {
       if (sa[2*i] != 25
diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c b/gcc/testsuite/gcc.dg/vect/slp-26.c
index f8b49ff603c16127694e599137b1f48ea665c4db..d398a5acb0cdb337b442f071c96f3ce62fe84cff 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-26.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
@@ -24,6 +24,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*4] !=  in[i*4]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-28.c b/gcc/testsuite/gcc.dg/vect/slp-28.c
index 0bb5f0eb0e40307558dc3ab826d583ea004891cd..67b7be29b22bb646b4bea2e0448e919319b11c98 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-28.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-28.c
@@ -34,6 +34,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (in[i] != i+5)
@@ -51,6 +52,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (in2[i] != (i % 4) + (i / 4) * 5)
@@ -69,6 +71,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (in3[i] != (i % 12) + (i / 12) * 5)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c
index 4cf0e7a0ece17204221c483bcac8fe9bdab3c85c..615a79f4a30f8002a989047c99eea13dd9f9e1a6 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c
@@ -32,6 +32,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
@@ -54,6 +55,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -84,6 +86,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*16] !=  in[i*16]
@@ -120,6 +123,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       if (out[i*9] !=  in[i*9]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-3.c b/gcc/testsuite/gcc.dg/vect/slp-3.c
index 760b3fa35a2a2018a103b344c329464ca8cb52fe..183c7e65c57ae7dfe3994757385d9968b1de45e5 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-3.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
@@ -48,6 +49,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -78,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*16] !=  in[i*16]
@@ -114,6 +117,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       if (out[i*9] !=  in[i*9]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-33.c b/gcc/testsuite/gcc.dg/vect/slp-33.c
index 2404a5f19b407ef47d4ed6e597da9381629530ff..c382093c2329b09d3ef9e78abadd1f7ffe22dfda 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-33.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-33.c
@@ -43,6 +43,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*7] !=  (in[i*7] + 5) * 3 - 2
@@ -64,6 +65,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*3] !=  (in[i*3] + 2) * 3
@@ -81,6 +83,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out2[i*3] !=  (float) (in[i*3] * 2 + 5)
diff --git a/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
index 9e9c8207f7bbb0235e5864b529869b6db3768087..0baaff7dc6e6b8eeb958655f964f234512cc4500 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
@@ -36,6 +36,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*3] != in[i*3] + 5
diff --git a/gcc/testsuite/gcc.dg/vect/slp-34.c b/gcc/testsuite/gcc.dg/vect/slp-34.c
index 1fd09069247f546a9614c47fca529da4bc465497..41832d7f5191bfe7f82159cde69c1787cfdc6d8c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-34.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-34.c
@@ -30,6 +30,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*3] != in[i*3] + 5
diff --git a/gcc/testsuite/gcc.dg/vect/slp-35.c b/gcc/testsuite/gcc.dg/vect/slp-35.c
index 76dd7456d89859108440eb0be2374215a16cfa57..5e9f6739e1f25d109319da1db349a4063f5aaa1b 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-35.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-35.c
@@ -32,6 +32,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].c + arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/slp-37.c b/gcc/testsuite/gcc.dg/vect/slp-37.c
index a765cd70a09c2eb69df6d85b2056f0d90fc4120f..caee2bb508f1824fa549568dd09911c8624222f4 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-37.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-37.c
@@ -28,6 +28,7 @@ foo1 (s1 *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
        if (arr[i].a != 6 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c
index 98ac3f1f2839c717d66c04ba4e0179d4497be33e..fcda45ff368511b350b25857f21b2eaeb721561a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c
@@ -34,6 +34,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
@@ -59,6 +60,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -92,6 +94,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*16] !=  in[i*16]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-4.c b/gcc/testsuite/gcc.dg/vect/slp-4.c
index e4f65bc37f8c5e45c1673d2218bf75a2a98b3daf..29e741df02ba0ef6874cde2a4410b79d1d7608ee 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-4.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-4.c
@@ -28,6 +28,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
@@ -53,6 +54,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -86,6 +88,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*16] !=  in[i*16]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-41.c b/gcc/testsuite/gcc.dg/vect/slp-41.c
index 2ad9fd2077231a0124c7fe2aaf37570a3a10f849..b96de4fbcb7f9a3c60b884a47bbfc52ebbe1dd44 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-41.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-41.c
@@ -48,6 +48,7 @@ int main()
        __asm__ volatile ("");
     }
   testi (ia, sa, 8, 32);
+#pragma GCC novector
   for (i = 0; i < 128; ++i)
     if (sa[i] != ia[(i / 4) * 8 + i % 4])
       abort ();
@@ -58,6 +59,7 @@ int main()
        __asm__ volatile ("");
     }
   testi2 (ia, sa, 8, 32);
+#pragma GCC novector
   for (i = 0; i < 128; ++i)
     if (ia[i] != sa[(i / 4) * 8 + i % 4])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-43.c b/gcc/testsuite/gcc.dg/vect/slp-43.c
index 3cee613bdbed4b7ca7a796d45776b833cff2d1a2..3d8ffb113276c3b244436b98048fe78112340e0c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-43.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-43.c
@@ -23,11 +23,13 @@ foo_ ## T ## _ ## N (T * __restrict__ in_, T * __restrict__ out_, int s) \
 }
 
 #define TEST(T,N) \
+ _Pragma("GCC novector") \
  do { \
   memset (out, 0, 4096); \
   foo_ ## T ## _ ## N ((T *)in, (T *)out, 1); \
   if (memcmp (in, out, sizeof (T) * MAX_VEC_ELEMENTS * N) != 0) \
     __builtin_abort (); \
+  _Pragma("GCC novector") \
   for (int i = sizeof (T) * MAX_VEC_ELEMENTS * N; i < 4096; ++i) \
     if (out[i] != 0) \
       __builtin_abort (); \
diff --git a/gcc/testsuite/gcc.dg/vect/slp-45.c b/gcc/testsuite/gcc.dg/vect/slp-45.c
index fadc4e5924308d46aaac81a0d5b42564285d58ff..f34033004520f106240fd4a7f6a6538cb22622ff 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-45.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-45.c
@@ -23,11 +23,13 @@ foo_ ## T ## _ ## N (T * __restrict__ in_, T * __restrict__ out_, int s) \
 }
 
 #define TEST(T,N) \
+ _Pragma("GCC novector") \
  do { \
   memset (out, 0, 4096); \
   foo_ ## T ## _ ## N ((T *)in, (T *)out, 1); \
   if (memcmp (in, out, sizeof (T) * MAX_VEC_ELEMENTS * N) != 0) \
     __builtin_abort (); \
+  _Pragma("GCC novector") \
   for (int i = sizeof (T) * MAX_VEC_ELEMENTS * N; i < 4096; ++i) \
     if (out[i] != 0) \
       __builtin_abort (); \
diff --git a/gcc/testsuite/gcc.dg/vect/slp-46.c b/gcc/testsuite/gcc.dg/vect/slp-46.c
index 18476a43d3f61c07aede8d90ca69817b0e0b5342..2d5534430b39f10c15ab4d0bdab47bf68af86376 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-46.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-46.c
@@ -54,6 +54,7 @@ main ()
     }
 
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[i/2])
       abort ();
@@ -65,6 +66,7 @@ main ()
     }
 
   bar ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[2*(i/2)])
       abort ();
@@ -76,6 +78,7 @@ main ()
     }
 
   baz ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[511 - i/2])
       abort ();
@@ -87,6 +90,7 @@ main ()
     }
 
   boo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[2*(511 - i/2)])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-47.c b/gcc/testsuite/gcc.dg/vect/slp-47.c
index 7b2ddf664dfefa97ac80f9f9eb7993e18980c411..7772bb71c8d013b8699bee644a3bb471ff41678f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-47.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-47.c
@@ -35,6 +35,7 @@ main ()
     }
 
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[1023 - i])
       abort ();
@@ -46,6 +47,7 @@ main ()
     }
 
   bar ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[1023 - i^1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-48.c b/gcc/testsuite/gcc.dg/vect/slp-48.c
index 0b327aede8e6bb53d01315553ed9f2c3c3dc3290..38f533233d657189851a8942e8fa8133a9d2eb91 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-48.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-48.c
@@ -35,6 +35,7 @@ main ()
     }
 
   foo ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[1023 - i^1])
       abort ();
@@ -46,6 +47,7 @@ main ()
     }
 
   bar ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     if (x[i] != y[1023 - i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-49.c b/gcc/testsuite/gcc.dg/vect/slp-49.c
index 4141a09ed97a9ceadf89d394d18c0b0226eb55d7..b2433c920793c34fb316cba925d7659db356af28 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-49.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-49.c
@@ -24,6 +24,7 @@ main()
 
   foo (17);
 
+#pragma GCC novector
   for (int i = 0; i < 512; ++i)
     {
       if (a[2*i] != 5 + i
diff --git a/gcc/testsuite/gcc.dg/vect/slp-5.c b/gcc/testsuite/gcc.dg/vect/slp-5.c
index 989e05ac8be6bdd1fb36c4bdc079866ce101e017..6d51f6a73234ac41eb2cc4d2fcedc8928d9932b2 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-5.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-5.c
@@ -30,6 +30,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8]
@@ -55,6 +56,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4]
@@ -86,6 +88,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*16] !=  in[i*16]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-6.c b/gcc/testsuite/gcc.dg/vect/slp-6.c
index ec85eb77236e4b8bf5e0c6a8d07abf44a28e2a5c..ea9f7889734dca9bfa3b28747c382e94bb2c1c84 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-6.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-6.c
@@ -28,6 +28,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8] + 5
@@ -50,6 +51,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4] + 2
@@ -80,6 +82,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out2[i*16] !=  in2[i*16] * 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-7.c b/gcc/testsuite/gcc.dg/vect/slp-7.c
index e836a1ae9b5b60685e8ec2d15ca5005ff35a895e..2845a99dedf5c99032b099a136acd96f37fc5295 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-7.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-7.c
@@ -30,6 +30,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  in[i*8] + 5
@@ -55,6 +56,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*2; i++)
     {
       if (out[i*4] !=  in[i*4] + 1
@@ -86,6 +88,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out2[i*16] !=  in2[i*16] * 2
diff --git a/gcc/testsuite/gcc.dg/vect/slp-8.c b/gcc/testsuite/gcc.dg/vect/slp-8.c
index e9ea0ef0d6b32d23977d728c943bac05dc982b2d..8647249f546267185bb5c232f088a4c0984f2039 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-8.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       if (fa[4*i] != (float) ib[4*i]      
diff --git a/gcc/testsuite/gcc.dg/vect/slp-9.c b/gcc/testsuite/gcc.dg/vect/slp-9.c
index d5212dca3ddcbffabdc9fbed8f2380ffceee626d..4fb6953cced876c2a1e5761b0f94968c5774da9e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-9.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-9.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
index 482fc080a0fc132409509b084fcd67ef95f2aa17..450c7141c96b07b9f798c62950d3de30eeab9a28 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
@@ -79,11 +79,13 @@ main ()
       e[i] = 2 * i;
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 17 : 0))
       abort ();
 
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       switch (i % 9)
@@ -115,6 +117,7 @@ main ()
   f3 ();
 
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? e[i] : d[i]))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
index 57cc67ee121108bcc5ccaaee0dca5085264c8818..cb7eb94b3a3ba207d513e3e701cd1c9908000a01 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
@@ -82,11 +82,13 @@ main ()
     }
 
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 17 : 0))
       abort ();
 
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       switch (i % 9)
@@ -118,6 +120,7 @@ main ()
   f3 ();
 
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (f[i] != ((i % 3) == 0 ? e[i] : d[i]))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-2.c b/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
index 7350695ece0f53e36de861c4e7724ebf36ff6b76..1dcee46cd9540690521df07c9cacb608e37b62b7 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
@@ -82,11 +82,13 @@ main ()
     }
 
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 17 : 0))
       abort ();
 
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       switch (i % 9)
@@ -118,6 +120,7 @@ main ()
   f3 ();
 
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (f[i] != ((i % 3) == 0 ? e[i] : d[i]))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-3.c b/gcc/testsuite/gcc.dg/vect/slp-cond-3.c
index d19ec13a21ac8660cc326dfaa4a36becab219d82..64904b001e6a39623eff9a1ddc530afbc5e64687 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-3.c
@@ -72,6 +72,7 @@ int main ()
     }
 
   bar (a, b, c, d, e, 2);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (e[i] != ((i % 3) == 0 ? 10 : 2 * i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-4.c b/gcc/testsuite/gcc.dg/vect/slp-cond-4.c
index f82b8416d8467a8127fbb498040c5559e33d6608..0e1bd3b40994016bb6232bd6a1e129602c03167b 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-4.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-4.c
@@ -75,6 +75,7 @@ int main ()
     }
 
   bar (a, b, c, d, e, 2);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (e[i] != ((i % 3) == 0 ? 5 : i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-5.c b/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
index 5ade7d1fbad9eee7861d1b0d12ac98e42d453422..f0a703f0030b4c01d4119c812086de2a8e78ff4f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
@@ -70,6 +70,7 @@ int main ()
     }
 
   bar (a, b, c, d, e, 2);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (e[i] != ((i % 3) == 0 || i <= 5 ? 10 : 2 * i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c
index 1850f063eb4fc74c26a9b1a1016f9d70a0c28441..605f6ab8ba638175d557145c82f2b78c30eb5835 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sout[i*4] != 8 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c
index 62580c070c8e19468812a9c81edc1c5847327ebb..06d9029e9202b15dc8de6d054779f9d53fbea60d 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out[i].a !=  (unsigned char) in[i*2] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c
index a3d0670cea98379af381fd7282f28e9724096a93..2792b932734a7a8ad4958454de56956081753d7c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c
@@ -34,6 +34,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i].a !=  (int) in[i*3] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
index 86a3fafa16f41dc2c63f4704b85268330ad5568d..5c75dc12b695785405b7d56891e7e71ac24e2539 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
@@ -28,6 +28,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i].a !=  (int) in[i*3] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c
index d4c929de2ecbc73c75c08ae498b8b400f67bf636..13119822200fef23a96e920bde8ca968f0a09f84 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c
@@ -32,6 +32,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sout[i*4] != 8 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c
index 28a645c79472578d3775e9e2eb28cb7ee69efad0..c15baa00dd00fb8fa0ae79470d846b31ee4dd578 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c
@@ -41,6 +41,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*16] != a8
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c
index 39bd7c41f8cca2a517486bc9a9898031911115c6..c79906a8d7b30834dfcda5c70d6bf472849a39cb 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c
@@ -45,6 +45,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (out[i*8] !=  in[i*8] + 5
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c
index faf17d6f0cde5eacb7756996a224e4004b305f7f..b221f705070af661716d1d6fbf70f16ef3652ca9 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (int) in[i*8] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c
index fb4f720aa4935da6862951a3c618799bb37f535f..3237773e1b13223164473ad88b3c806c8df243b2 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (short) in[i*8] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c
index f006d081346aa4f067d1e02018f2c46d4fcf1680..e62d16b6de34ce1919545a5815600263931e11ac 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (unsigned char) in[i*8] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c
index 286e2fc42af815dcc724f1a66d8d01a96c915beb..08ab2dc3d10f6ab208841e53609dc7c672a69c5e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c
@@ -26,6 +26,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i*8] !=  (int) in[i*8] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c
index d88ebe4d778c4487c00ef055059d2b825542679a..0b67ecc8e0730813966cfd6922e8d3f9db740408 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c
@@ -20,6 +20,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out[i*2] !=  (int) in[i*2] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c
index 872b20cac93c119854b8250eb85dc43767743da4..49261483166cbd6dcf99800a5c7062f7f091c103 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c
@@ -20,6 +20,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N*4; i++)
     {
       if (out[i*2] !=  (unsigned char) in[i*2] + 1
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
index ca7803ec1a9a49b4800cf396bcdc05f263f344ee..dbb107f95fec3338b135ff965e8be2b514cc1fe6 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
@@ -69,6 +69,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (output[i] != check_results[i])
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-10.c b/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
index 678152ba4168d32f84a1d1b01ba6c43b210ec8b9..2cce30c2444323ba6166ceee6a768fbd9d881a47 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
@@ -35,6 +35,7 @@ int main ()
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < 32; ++i)
     if (b[i*8+0] != i*8+0
 	|| b[i*8+1] != i*8+0
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-11.c b/gcc/testsuite/gcc.dg/vect/slp-perm-11.c
index 0318d468ef102cb263d090a33429849221dc3c0d..0d25d9d93bbf14b64fb6f2c116fe70bf17b5f432 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-11.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-11.c
@@ -26,6 +26,7 @@ int main ()
       __asm__ volatile ("");
     }
   foo (4);
+#pragma GCC novector
   for (i = 0; i < 64; ++i)
     if (a[i] != (4*(i/2) + (i & 1) ^ 1))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-12.c b/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
index 113223ab0f96507b74cfff8fc6b112070cabb5ee..642b1e8b399e7ffc77e54e02067eec053ea54c7e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
@@ -42,6 +42,7 @@ int main()
 
   test (a, b);
 
+#pragma GCC novector
   for (i = 0; i < 64; ++i)
     if (a[i] != 253)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
index 82776f3f06af8a7b82e0d190a922b213d17aee88..41fd159adce8395dd805f089e94aacfe7eeba09f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
@@ -43,6 +43,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (output[i] != check_results[i])
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
index 1807275d1bfcc895ed68bd5e536b5837adf336e6..9ea35ba5afca2db0033150e35fca6b961b389c03 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
@@ -56,6 +56,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (output[i] != check_results[i])
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c b/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
index 8457e4f45d62d6d704145b1c4f62af14c1877762..107968f1f7ce65c53bf0280e700f659f625d8c1e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
@@ -103,6 +103,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (output[i] != check_results[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-5.c b/gcc/testsuite/gcc.dg/vect/slp-perm-5.c
index b86a3dc8756e0d30551a40ed1febb142813190a4..7128cf471555d5f589b11e1e58a65b0211e7d6fd 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-5.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-5.c
@@ -96,6 +96,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output, input2, output2);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
      if (output[i] != check_results[i] || output2[i] != check_results2[i])
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-6.c b/gcc/testsuite/gcc.dg/vect/slp-perm-6.c
index bec1544650ac9e897ab1c06f120fb6416091dec6..5cc6261d69a15d2a3f6b691c13544c27dc8f9941 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-6.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-6.c
@@ -95,6 +95,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output, input2, output2);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
      if (output[i] != check_results[i] || output2[i] != check_results2[i])
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
index 346411fd5042add21fdc6413922506bcb92f4594..df13c37bc75d43173d4e1b9d0daf533ba5829c7f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
@@ -88,6 +88,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output, input2, output2);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
      if (output[i] != check_results[i] || output2[i] != check_results2[i])
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
index 44df21aae2a2f860d49c36568122733e693d4310..029be5485b62ffef915f3b6b28306501852733d7 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
@@ -52,6 +52,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N - (N % 3); i++)
      if (output[i] != check_results[i])
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
index 154c00af598d05bac9ebdad3bfb4eeb28594a1fc..c92fc2f38619a5c086f7029db444a6cb208749f0 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
@@ -50,6 +50,7 @@ int main (int argc, const char* argv[])
 
   foo (input, output);
 
+#pragma GCC novector
   for (i = 0; i < N - (N % 3); i++)
      if (output[i] != check_results[i])
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
index e3bfee33348c5164f657a1494f480db26a7aeffa..72811eb852e5ed51ed5f5d042fac4e9b487911c2 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
@@ -40,6 +40,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (out[i] != in[i] * COEF || out2[i] != in[i] + COEF2)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c
index abb10fde45bc807269cd5bb58f463a77f75118d8..f8ec1fa730d21cde5f2bbb0791b04ddf0e0b358c 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c
@@ -29,6 +29,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c
index 0756119afb455a0b834fd835553318eb29887f4d..76507c4f46157a8ded48e7c600ee53424e01382f 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c
@@ -29,6 +29,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-100.c b/gcc/testsuite/gcc.dg/vect/vect-100.c
index 9a4d4de06718228fcc0bd011d2e23d4c564c29ff..0d8703281f28c995a7c08c4366a4fccf22cf16e2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-100.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-100.c
@@ -30,6 +30,7 @@ int main1 () {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (p->a[i] != a[i] || p->b[i] != b[i])
@@ -55,6 +56,7 @@ int main2 () {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (p->a[i] != c[i] || p->b[i] != d[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-103.c b/gcc/testsuite/gcc.dg/vect/vect-103.c
index d03562f7cddd0890e3e159fbdc7c5d629b54d58c..59d8edc38cacda52e53a5d059171b6eefee9f920 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-103.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-103.c
@@ -43,6 +43,7 @@ int main1 (int x, int y) {
   /* check results: */
   if (p->a[0] != a[N - 1])
     abort ();
+#pragma GCC novector
   for (i = 1; i < N; i++)
     if (p->a[i] != b[i - 1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-104.c b/gcc/testsuite/gcc.dg/vect/vect-104.c
index a77c98735ebad6876c97ee22467f5287b4575a01..e0e5b5a53bdae1e148c61db716f0290bf3e829f1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-104.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-104.c
@@ -43,6 +43,7 @@ int main1 (int x) {
   }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
    {
     for (j = 0; j < N; j++)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c
index 433565bfd4d3cea87abe23de29edbe8823054515..ec7e676439677ae587a67eae15aab34fd5ac5b03 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c
@@ -75,6 +75,7 @@ int main1 (int x) {
   /* check results: */
   for (i = 0; i < N; i++)
    {
+#pragma GCC novector
     for (j = 0; j < N; j++)
      {
        if (p->a[i][j] != c[i][j])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-105.c b/gcc/testsuite/gcc.dg/vect/vect-105.c
index 17b6e89d8f69053b5825c859f3ab5c68c49b3a5d..f0823fbe397358cb34bf4654fccce21a053ba2a7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-105.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-105.c
@@ -45,6 +45,7 @@ int main1 (int x) {
   /* check results: */
   for (i = 0; i < N; i++)
    {
+#pragma GCC novector
     for (j = 0; j < N; j++)
      {
        if (p->a[i][j] != c[i][j])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-106.c b/gcc/testsuite/gcc.dg/vect/vect-106.c
index 0171cfcdfa6e60e6cb8158d098d435c0e472abf8..4b3451cc783e9f83f7a6cb8c54cf50f4c43dddc0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-106.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-106.c
@@ -28,6 +28,7 @@ int main1 () {
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (*q != a[i] || *p != b[i])
@@ -50,6 +51,7 @@ int main1 () {
   q = q1;
   p = p1;
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (*q != b[i] || *p != a[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-107.c b/gcc/testsuite/gcc.dg/vect/vect-107.c
index aaab9c00345bf7f0b25fbcda25a141988bda9eac..60c83a99a19f4797bc7a5a175f33aecbc598f8e2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-107.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-107.c
@@ -24,6 +24,7 @@ main1 (void)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (a[i] != b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-108.c b/gcc/testsuite/gcc.dg/vect/vect-108.c
index 4af6326e9c35963ec7109d66dd0d321cf1055597..2cbb6701d5c6df749482d5e4351b9cb4a808b94f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-108.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-108.c
@@ -21,6 +21,7 @@ main1 (void)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] * ic[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-109.c b/gcc/testsuite/gcc.dg/vect/vect-109.c
index fe7ea6c420fb1512286b0b468cbe9ffed5daae71..31b9aa2be690fb4f2d9cf8062acbf1b42971098d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-109.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-109.c
@@ -34,6 +34,7 @@ int main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i+2] != sb[i] + sc[i] || ia[i+1] != ib[i] + ic[i])
@@ -56,6 +57,7 @@ int main2 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i] != sb[i] + sc[i] || ia[i+1] != ib[i] + ic[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-11.c b/gcc/testsuite/gcc.dg/vect/vect-11.c
index 044fc5edc2dddb0bddaca545b4e97de1499be8bd..1171757e323bc9a64c5e6762e98c101120fc1449 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-11.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] * ic[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-110.c b/gcc/testsuite/gcc.dg/vect/vect-110.c
index 47c6456107ddd4f326e8c9e783b01c59e23087e6..69ee547cfd17965f334d0d1af6bc28f99ae3a671 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-110.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-110.c
@@ -20,6 +20,7 @@ main1 (void)
   }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N - 1; i++){
     if (a[i] != b[i] + c[i])
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-113.c b/gcc/testsuite/gcc.dg/vect/vect-113.c
index a9d45ce9fcc21195030dfcdf773ffc3a41e48a37..8e9cc545ce6b3204b5c9f4a220e12d0068aa4f3e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-113.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-113.c
@@ -17,6 +17,7 @@ main1 (void)
     a[i] = i;
   }
 
+#pragma GCC novector
   for ( i = 0; i < N; i++) 
   {
     if (a[i] != i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-114.c b/gcc/testsuite/gcc.dg/vect/vect-114.c
index 557b44110a095ae725b58cf1ca2494a103b96dd7..1617d3009eb3fdf0bb16980feb0f54d2862b8f3c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-114.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-114.c
@@ -19,6 +19,7 @@ main1 (void)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != b[N-1-i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-115.c b/gcc/testsuite/gcc.dg/vect/vect-115.c
index 0502d15ed3ebd37d8dda044dbe13d68525f3e30a..82b8e2eea1f3374bdbe5460ca58641f217d1ab33 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-115.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-115.c
@@ -41,6 +41,7 @@ int main1 ()
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.strc_t.strc_s.b[i] != a[i])
@@ -54,6 +55,7 @@ int main1 ()
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.ptr_t->strc_s.c[i] != a[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-116.c b/gcc/testsuite/gcc.dg/vect/vect-116.c
index d4aa069772ed76f895f99c91609852bdcc43d324..ac603db44ee2601665c1de4bb60aee95f545c8ef 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-116.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-116.c
@@ -18,6 +18,7 @@ void foo()
   for (i = 0; i < 256; ++i)
     C[i] = A[i] * B[i];
 
+#pragma GCC novector
   for (i = 0; i < 256; ++i)
     if (C[i] != (unsigned char)(i * i))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-117.c b/gcc/testsuite/gcc.dg/vect/vect-117.c
index 22f8e01187272e2cfe445c66ca590f77923d4e95..f2c1c5857059a9bcaafad4ceadff02e192209840 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-117.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-117.c
@@ -47,6 +47,7 @@ int main (void)
 
   for (i = 0; i < N; i++)
    {
+#pragma GCC novector
     for (j = 0; j < N; j++)
      {
        if (a[i][j] != c[i][j])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-11a.c b/gcc/testsuite/gcc.dg/vect/vect-11a.c
index 4f1e15e74293187d495c8c11cda333a1af1139a6..9d93a2e8951f61b34079f6d867abfaf0fccbb8fc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-11a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-11a.c
@@ -21,6 +21,7 @@ void u ()
   
   for (i=0; i<8; i++)
     C[i] = A[i] * B[i];
+#pragma GCC novector
   for (i=0; i<8; i++)
     if (C[i] != Answer[i])
       abort ();
@@ -41,6 +42,7 @@ void s()
   
   for (i=0; i<8; i++)
     F[i] = D[i] * E[i];
+#pragma GCC novector
   for (i=0; i<8; i++)
     if (F[i] != Dnswer[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-12.c b/gcc/testsuite/gcc.dg/vect/vect-12.c
index b095170f008c719326a6cfd5820a7926ae8c722e..096ff10f53c9a4d7e0d3a8bbe4d8ef513a82c46c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-12.c
@@ -24,6 +24,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] + ic[i] || sa[i] != sb[i] + sc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-122.c b/gcc/testsuite/gcc.dg/vect/vect-122.c
index 04dae679647ff9831224b6dc200a25b2b1bb28d7..6e7a4c1578f4c4cddf43a81e3e4bc6ab87efa3ca 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-122.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-122.c
@@ -50,6 +50,7 @@ main ()
   f2 ();
   f3 ();
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i || b[i] != i || l[i] != i * (i + 7LL) || m[i] != i * 7LL)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-124.c b/gcc/testsuite/gcc.dg/vect/vect-124.c
index c720648aaddbe72d0073fcf7548408ce6bda3cdd..6b6730a22bdb62e0f8770b4a288aa1adeff756c2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-124.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-124.c
@@ -21,6 +21,7 @@ main ()
   
   check_vect ();
   foo (6);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 3 + 6)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-13.c b/gcc/testsuite/gcc.dg/vect/vect-13.c
index 5d902924ec20e2ea0ee29418a1b52d4e2ede728e..f1e99a3ec02487cd331e171c6e42496924e931a2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-13.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-13.c
@@ -22,6 +22,7 @@ int main1()
     }
 
   /* Check results  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != results[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-14.c b/gcc/testsuite/gcc.dg/vect/vect-14.c
index 1640220a134ed8962e31b9d201c0e4a8630d631f..5898d4cd8924a5a6036f38efa79bc4146a78320d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-14.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-14.c
@@ -17,6 +17,7 @@ int main1 ()
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c
index 5313eae598b4787e5294eefe87bf59f5a3581657..bc2689fce50cebf55720bfc9f60bd7c0dd9659dc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c
@@ -25,6 +25,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != b[N-1-i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-15.c b/gcc/testsuite/gcc.dg/vect/vect-15.c
index 178bc4404c420c3a7d74ca381f3503aaefc195db..4a73d0681f0db2b12e68ce805f987aabf8f1cf6f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-15.c
@@ -19,6 +19,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != b[N-1-i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-17.c b/gcc/testsuite/gcc.dg/vect/vect-17.c
index 471a82336cf466856186eb9ad3f7a95e4087cedc..797444a4c4a312d41d9b507c5d2d024e5b5b87bb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-17.c
@@ -81,6 +81,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != (ib[i] & ic[i]))
@@ -95,6 +96,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != (cb[i] & cc[i]))
@@ -109,6 +111,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != (sb[i] & sc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-18.c b/gcc/testsuite/gcc.dg/vect/vect-18.c
index 28b2941e581fa6abecbdafaa812cf4ff07ea9e5f..8c0fab43e28da6193f1e948e0c59985b2bff1119 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-18.c
@@ -80,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != (ib[i] | ic[i]))
@@ -94,6 +95,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != (cb[i] | cc[i]))
@@ -108,6 +110,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != (sb[i] | sc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-19.c b/gcc/testsuite/gcc.dg/vect/vect-19.c
index 27c6dc835a60c42e8360521d343b13f461a0b009..fe2a88c7fd855a516c34ff3fa3b5da5364fb0a81 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-19.c
@@ -80,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != (ib[i] ^ ic[i]))
@@ -94,6 +95,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != (cb[i] ^ cc[i]))
@@ -108,6 +110,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != (sb[i] ^ sc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c
index 162cb54b58d17efc205778adc14e846be39afab1..70595db744e349bdc6d786c7e64b762406689c64 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c
@@ -26,6 +26,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-2.c b/gcc/testsuite/gcc.dg/vect/vect-2.c
index d975668cbd023b0324c7526e162bc1aeb21dfcd7..80415a5b54b75f9e9b03f0123a53fd70ee07e7cd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-2.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-20.c b/gcc/testsuite/gcc.dg/vect/vect-20.c
index 8d759f3c6a66e6a6e318510ba59196ab91b757ac..0491bb2fc73bcef98cb26e82fb74778c8fea2dc0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-20.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-20.c
@@ -52,6 +52,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != ~ib[i])
@@ -66,6 +67,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != ~cb[i])
@@ -80,6 +82,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != ~sb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-21.c b/gcc/testsuite/gcc.dg/vect/vect-21.c
index ab77df6ef88890907f57a89870e645bb51d51c5a..f98ae8b22ee3e8bbb2c8e4abbc6022c11150fdb1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-21.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-21.c
@@ -80,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != !ib[i])
@@ -94,6 +95,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != !cb[i])
@@ -108,6 +110,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != !sb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-22.c b/gcc/testsuite/gcc.dg/vect/vect-22.c
index 78dc1ce91def46c31e913806aada5907d02fd4e0..3ab5070d94e85e8d332f55fe8511bbb82df781a6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-22.c
@@ -63,6 +63,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != -ib[i])
@@ -77,6 +78,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != -cb[i])
@@ -91,6 +93,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != -sb[i])
@@ -105,6 +108,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (fa[i] != -fb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-23.c b/gcc/testsuite/gcc.dg/vect/vect-23.c
index 69e0848c8eca10661d85a2f0b17b9a3d99319135..1a1c0b415a9247a3ed2555ca094d0a59e698384b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-23.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-23.c
@@ -80,6 +80,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != ib[i] && ic[i])
@@ -94,6 +95,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != cb[i] && cc[i])
@@ -108,6 +110,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != sb[i] && sc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-24.c b/gcc/testsuite/gcc.dg/vect/vect-24.c
index fa4c0620d29cd44b82fc75f0dc3bab8a862058d9..2da477077111e04d86801c85282822319cd8cfb8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-24.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-24.c
@@ -81,6 +81,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ia[i] != (ib[i] || ic[i]))
@@ -95,6 +96,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (ca[i] != (cb[i] || cc[i]))
@@ -109,6 +111,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (sa[i] != (sb[i] || sc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-25.c b/gcc/testsuite/gcc.dg/vect/vect-25.c
index 904eea8a17b7572ffa335dcf60d27df648f01f18..d665c3e53cde7e5be416a88ace81f68343c1f115 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-25.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-25.c
@@ -19,6 +19,7 @@ int main1 (int n, int *p)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != n)
@@ -32,6 +33,7 @@ int main1 (int n, int *p)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ib[i] != k)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-26.c b/gcc/testsuite/gcc.dg/vect/vect-26.c
index 8a141f38400308c35a99aa77b0d181a4dce0643c..2ea9aa93dc46dbf11c91d468cdb91a1c0936b323 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-26.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-26.c
@@ -21,6 +21,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-27.c b/gcc/testsuite/gcc.dg/vect/vect-27.c
index ac86b21aceb7b238665e86bbbd8a46e2aaa4d162..d459a84cf85d285e56e4abb5b56b2c6157db4b6a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-27.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-27.c
@@ -29,6 +29,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i-1] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-28.c b/gcc/testsuite/gcc.dg/vect/vect-28.c
index e213df1a46548d7d2962335c5600c252d9d5d5f3..531a7babb214ed2e6694f845c4b1d6f66f1c5d31 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-28.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-28.c
@@ -21,6 +21,7 @@ int main1 (int off)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i+off] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-29.c b/gcc/testsuite/gcc.dg/vect/vect-29.c
index bbd446dfe63f1477f91e7d548513d99be4c11d7d..42fb0467f1e31b0e89ef9323b60e3360c970f222 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-29.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-29.c
@@ -30,6 +30,7 @@ int main1 (int off)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-3.c b/gcc/testsuite/gcc.dg/vect/vect-3.c
index 6fc6557cf9f13e9dcfb9e4198b4846bca44542ba..2c9b5066dd47f8b654e005fb6fac8a5a28f48111 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-3.c
@@ -29,6 +29,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       float fres = b[i] + c[i] + d[i];
diff --git a/gcc/testsuite/gcc.dg/vect/vect-30.c b/gcc/testsuite/gcc.dg/vect/vect-30.c
index 71f7a2d169f44990a59f57dcecd83e0a2824f81d..3585ac8cfefa1bd2c89611857c11de23d846f3f6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-30.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-30.c
@@ -21,6 +21,7 @@ int main1 (int n)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (a[i] != b[i])
@@ -43,6 +44,7 @@ int main2 (unsigned int n)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < nn; i++)
     {
       if (c[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c
index 5621eb4d4ba17aaa6321807ee2d3610e38f8cceb..24bd0c7737df02a6b5dd5de9e745be070b0d8468 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c
@@ -31,6 +31,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
@@ -44,6 +45,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
@@ -57,6 +59,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
@@ -70,6 +73,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.e.k[i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-31.c b/gcc/testsuite/gcc.dg/vect/vect-31.c
index 3f7d00c1748058ef662710eda30d89f0a0560f2f..8e1274bae53d95cbe0a4e959fe6a6002dede7590 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-31.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-31.c
@@ -31,6 +31,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.b[i] != 5)
@@ -44,6 +45,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.c[i] != 6)
@@ -57,6 +59,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.d.k[i] != 7)
@@ -70,6 +73,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N/2; i++)
     {
       if (tmp.e.k[i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c
index 3e1403bbe96948188e7544d05f183a271828640f..5a4053ee8212ecb0f3824f2d0b2e6e03cb8e09ed 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c
@@ -19,6 +19,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-32.c b/gcc/testsuite/gcc.dg/vect/vect-32.c
index 2684cf2e0d390406e4c6c2ac30ac178ecfe70d5c..b04cbeb7c8297d589608b1e7468d536a5f265337 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-32.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-32.c
@@ -23,6 +23,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
index c1aa399a240e8c7f50ae10610e2c40d41ea8d555..c3bfaaeb055183ee7a059a050d2fc8fe139bbbae 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
@@ -23,6 +23,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-33.c b/gcc/testsuite/gcc.dg/vect/vect-33.c
index e215052ff777a911358e1291630df9cabd27e343..8ffd888d482bc91e10b225317b399c0926ba437a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-33.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-33.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c
index 0aa6d507a82f086056113157bc4b7ce0d5a87691..c3d44b4d15fef5b719cf618293bbc2a541582f4a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c
@@ -26,6 +26,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-34.c b/gcc/testsuite/gcc.dg/vect/vect-34.c
index 9cc590253c78317843930fff480b64aaa68de2e2..e3beba56623e9312c1bcfcc81b96d19adb36d83f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-34.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-34.c
@@ -21,6 +21,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
index 28a99c910fd507414a4a732a6bcc93c4ce142ba6..a88d111b21a0ce2670311103678fb91bf1aff80f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
@@ -26,6 +26,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.a[i] != i + 1)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-35.c b/gcc/testsuite/gcc.dg/vect/vect-35.c
index a7ec0f16d4cf0225c2f62c2f0aabf142704b2af8..4267c0bebaef82f5a58601daefd7330fff21c5b1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-35.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-35.c
@@ -26,6 +26,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.a[i] != i + 1)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c
index d40fcb6d9925de2730acfd37dba2724904159ebb..9aa3bd7c2f40991ef8a3682058d8aea1bab9ba05 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c
@@ -27,6 +27,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != s.cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-36.c b/gcc/testsuite/gcc.dg/vect/vect-36.c
index 64bc7fe18095178bc4bc0db5ef93e4c6706fa7d2..59bef84ad2e134c2a47746fb0daf96f0aaa92a34 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-36.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-36.c
@@ -27,6 +27,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.ca[i] != s.cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-38.c b/gcc/testsuite/gcc.dg/vect/vect-38.c
index 01d984c61b8245997b4db358dd579fc2042df9ff..81d9f38515afebd9e7e8c85a08660e4ff09aa571 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-38.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-38.c
@@ -19,6 +19,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != cb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-4.c b/gcc/testsuite/gcc.dg/vect/vect-4.c
index b0cc45be7de6c24af16f0abedf34bc98370ae3e7..393c88df502ecd9261ac45a8366de969bfee84ae 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-4.c
@@ -21,6 +21,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != b[i] * c[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-40.c b/gcc/testsuite/gcc.dg/vect/vect-40.c
index c74703268f913194119e89982092ec4ce7fa0fde..d524b4ebd433b434f55ca1681ef8ade732dfa1bc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-40.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-40.c
@@ -12,6 +12,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-42.c b/gcc/testsuite/gcc.dg/vect/vect-42.c
index 086cbf20c0a2cf7c38ede4e9db30042ac3237972..c1d16f659f130aeabbce4fcc1c1ab9d2cb46e12d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-42.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-42.c
@@ -14,6 +14,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-44.c b/gcc/testsuite/gcc.dg/vect/vect-44.c
index f7f1fd28665f23560cd7a2f397a0c773290c923f..b6895bd1d8287a246c2581ba24132f344dabb27e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-44.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-44.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-46.c b/gcc/testsuite/gcc.dg/vect/vect-46.c
index 185ac1424f94956fbcd5b26d0f4e6d36fd5f708b..7ca8b56ea9ffc50ae1cc99dc74662aea60d63023 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-46.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-46.c
@@ -12,6 +12,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-48.c b/gcc/testsuite/gcc.dg/vect/vect-48.c
index b29fe47635a349c0a845c43655c1a44d569d765e..10d8e09cac1daafeb0d5aa6e12eb7f3ecf6d33fc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-48.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-48.c
@@ -30,6 +30,7 @@ main1 (float *pb, float *pc)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-5.c b/gcc/testsuite/gcc.dg/vect/vect-5.c
index 17f3b2fac9a72f11b512659046dd8710d2e2f9a2..a999989215aa7693a1520c261d690c66f6f9ba13 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-5.c
@@ -25,6 +25,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != c[i])
@@ -38,6 +39,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != d[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-50.c b/gcc/testsuite/gcc.dg/vect/vect-50.c
index f43676896af4b9de482521b4aa915a47596ff4a9..76304cd10ce00881de8a2a6dc37fddf100e534c5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-50.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-50.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-52.c b/gcc/testsuite/gcc.dg/vect/vect-52.c
index c20a4be2edee6c958ae150b7de81121d01b2ab8a..2ad7149fc612e5df4adc390dffc6a0e72717308f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-52.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-52.c
@@ -30,6 +30,7 @@ main1 (int n, float *pb, float *pc)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-54.c b/gcc/testsuite/gcc.dg/vect/vect-54.c
index 2b236e48e196106b7892d3f28b4bd901a700ff9c..7ae59c3e4d391200bcb46a1b3229c30ed26b6083 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-54.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-54.c
@@ -14,6 +14,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i+1] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-56.c b/gcc/testsuite/gcc.dg/vect/vect-56.c
index c914126ece5f5929d316c5c107e7633efa4da55c..a8703d1e00969afdbb58782068e51e571b612b1d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-56.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-56.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
@@ -50,6 +51,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-58.c b/gcc/testsuite/gcc.dg/vect/vect-58.c
index da4f9740e3358f67e9a05f82c87cf78bf3620e56..43a596f6e9522531c2c4d2138f80eae73da43038 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-58.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-58.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i+1] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c
index c5de86b167a07ddf9043ae1ba77466ffd16765e6..a38373888907a7ed8f5ac610e030cd919315727d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c
@@ -39,6 +39,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       if (a[i] != results1[i] || e[i] != results2[i])
@@ -52,6 +53,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <=N-4; i++)
     {
       if (a[i+3] != b[i-1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-6.c b/gcc/testsuite/gcc.dg/vect/vect-6.c
index c3e6336bb43c6ab30eb2c55049e0f1a9bd5788b6..eb006ad0735c70bd6a416d7575501a49febafd91 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-6.c
@@ -24,6 +24,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     {
       if (a[i] != results1[i] || e[i] != results2[i])
@@ -37,6 +38,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <=N-4; i++)
     {
       if (a[i+3] != b[i-1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-60.c b/gcc/testsuite/gcc.dg/vect/vect-60.c
index 121c503c63afaf7cc5faa96bb537f4a184c82b00..2de6f0031aa6faf854a61bf60acf4e5a05a7d3d0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-60.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-60.c
@@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
@@ -50,6 +51,7 @@ main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (pa[i] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-62.c b/gcc/testsuite/gcc.dg/vect/vect-62.c
index abd3d700668b019a075c52edfaff16061200305b..ea6ae91f56b9aea165a51c5fe6489729d5ba4e62 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-62.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-62.c
@@ -25,6 +25,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j+8] != ib[i])
@@ -46,6 +47,7 @@ int main1 ()
   /* check results: */
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][8] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-63.c b/gcc/testsuite/gcc.dg/vect/vect-63.c
index 8d002a5e3c349bd4cbf9e37e8194e9a7450d0bde..20600728145325962598d6fbc17640296c5ca199 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-63.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-63.c
@@ -25,6 +25,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i + j][1][j] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-64.c b/gcc/testsuite/gcc.dg/vect/vect-64.c
index 240b68f6d0d2d4bbef72b60aac2b26ba366514df..96773f6cab610ee565f33038515345ea799ba2c9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-64.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-64.c
@@ -45,6 +45,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j] != ib[i])
@@ -55,6 +56,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[i][1][1][j] != ib[i])
@@ -65,6 +67,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (id[i][1][j+1] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-65.c b/gcc/testsuite/gcc.dg/vect/vect-65.c
index 9ac8ea4f013a5bea6dbfe8673056d35fc1c3fabb..af714d03ebb7f30ab56a93799c4c0d521b9cea93 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-65.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-65.c
@@ -42,6 +42,7 @@ int main1 ()
   /* check results: */  
   for (i = 0; i < M; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j] != ib[2][i][j])
@@ -62,6 +63,7 @@ int main1 ()
   /* check results: */
   for (i = 0; i < M; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[j] != ib[2][i][j])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-66.c b/gcc/testsuite/gcc.dg/vect/vect-66.c
index ccb66bc80017d3aa64698cba43f932a296a82e7d..cf16dd15ac2d1664d2edf9a676955c4479715fd2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-66.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-66.c
@@ -23,6 +23,7 @@ void main1 ()
   /* check results: */  
   for (i = 0; i < 16; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[2][6][j] != 5)
@@ -47,6 +48,7 @@ void main2 ()
   /* check results: */  
   for (i = 0; i < 16; i++)
     {
+#pragma GCC novector
       for (j = 2; j < N+2; j++)
         {
            if (ia[3][6][j] != 5)
@@ -73,6 +75,7 @@ void main3 ()
   /* check results: */  
   for (i = 0; i < 16; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ic[2][1][6][j+1] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-67.c b/gcc/testsuite/gcc.dg/vect/vect-67.c
index 12183a233c273d8ae3932fa312e1734b48f8c7b0..f3322a32c1e34949a107772dc6a3f4a7064e7ce5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-67.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-67.c
@@ -31,6 +31,7 @@ int main1 (int a, int b)
   /* check results: */  
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
         {
            if (ia[i][1][j + NINTS] != (a == b))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-68.c b/gcc/testsuite/gcc.dg/vect/vect-68.c
index 3012d88494d0494ec137ca89fef4e98e13ae108e..8cc2d84140967d2c54d3db2b408edf92c53340d6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-68.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-68.c
@@ -30,6 +30,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (tmp1.a.n[1][2][i] != 5)
@@ -43,6 +44,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i < N-1; i++)
     {
       if (tmp1.a.n[1][2][i] != 6)
@@ -56,6 +58,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (tmp1.e.n[1][2][i] != 7)
@@ -69,6 +72,7 @@ int main1 ()
     }
  
   /* check results:  */
+#pragma GCC novector
   for (i = 3; i <N-3; i++)
     {
       if (tmp1.e.n[1][2][i] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-7.c b/gcc/testsuite/gcc.dg/vect/vect-7.c
index c4556e321c6b0d6bf1a2cd36136d71a43718af32..fb2737e92f5dc037c3253803134687081064ae0e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-7.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sb[i] != 5)
@@ -32,6 +33,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sa[i] != 105)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-70.c b/gcc/testsuite/gcc.dg/vect/vect-70.c
index 793dbfb748160ba709dd835dc253cb436f7aada1..cd432a6545a97d83ebac2323fe2b1a960df09c6e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-70.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-70.c
@@ -52,6 +52,7 @@ int main1 ()
 
   /* check results:  */
   for (i = 0; i < OUTERN; i++)
+#pragma GCC novector
     for (j = NINTS - 1; j < N - NINTS + 1; j++)
     {
       if (tmp1.e[i].n[1][2][j] != 8)
@@ -67,6 +68,7 @@ int main1 ()
   
   /* check results:  */
   for (i = 0; i < OUTERN; i++)
+#pragma GCC novector
     for (j = NINTS - 1; j < N - NINTS + 1; j++)
     {
       if (tmp1.e[j].n[1][2][j] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-71.c b/gcc/testsuite/gcc.dg/vect/vect-71.c
index 581473fa4a1dcf1a7ee570336693ada765d429f3..46226c5f056bdceb902e73326a00959544892600 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-71.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-71.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 2; i < N+1; i++)
     {
       if (ia[ib[i]] != 0)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-72.c b/gcc/testsuite/gcc.dg/vect/vect-72.c
index 9e8e91b7ae6a0bc61410ffcd3f0e5fdf4c3488f1..2ab51fdf307c0872248f2bb107c77d19e53894f4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-72.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-72.c
@@ -33,6 +33,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i-1] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c
index 1c9d1fdaf9a2bb4eee4e9e766e531b72a3ecef2c..d81498ac0ce5926fb384c00aa5f66cc2a976cfdb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c
@@ -28,6 +28,7 @@ int main1 ()
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (ia[i] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-73.c b/gcc/testsuite/gcc.dg/vect/vect-73.c
index fdb49b86362774b0fdf3e10e918b7d73f3383dd7..48e1e64558e53fe109b96bd56eb8af92268cd7ec 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-73.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-73.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results: */  
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (ia[i] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c
index ba1ae63bd57cd3347820d888045005a7d4d83f1a..27d708745d31bdb09f4f0d01d551088e02ba24b9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c
@@ -36,6 +36,7 @@ main1 (float *__restrict__  pa, float * __restrict__ pb, float * __restrict__ pc
       pa[i] = q[i] * pc[i];
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != q[i] * pc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-74.c b/gcc/testsuite/gcc.dg/vect/vect-74.c
index a44f643ee96729fc0952a64e32a52275321557eb..c23c38a85063024b46c95c2e1c5158c81b6dcd65 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-74.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-74.c
@@ -24,6 +24,7 @@ main1 (float *__restrict__  pa, float * __restrict__ pb, float * __restrict__ pc
       pa[i] = q[i] * pc[i];
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != q[i] * pc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c
index a3fb5053037fcca89d7518c47eb2debfc136ba7f..10a3850d0da6d55a124fd6a7f4a2b7fd0efb3fae 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c
@@ -32,6 +32,7 @@ int main1 (int *ib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-75.c b/gcc/testsuite/gcc.dg/vect/vect-75.c
index 88da97f0bb7cecee4ee93a9d3fa7f55f0ae9641c..ecf5174921cc779f92e12fc64c3014d1a4997783 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-75.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-75.c
@@ -32,6 +32,7 @@ int main1 (int *ib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c
index 5825cfc446468b16eff60fa2115bb1de4872654f..4f317f273c8737ab07e51699ed19e66d9eb8a51b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c
@@ -32,6 +32,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != pib[i - OFF])
@@ -45,6 +46,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != pib[i - OFF])
@@ -58,6 +60,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != ic[i - OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-76.c b/gcc/testsuite/gcc.dg/vect/vect-76.c
index 3f4feeff8ac7882627c88490298c2f39b5172b7e..23210d4b775bfd4d436b2cdf2af2825cbf1924f0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-76.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-76.c
@@ -26,6 +26,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != pib[i - OFF])
@@ -39,6 +40,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != pib[i - OFF])
@@ -52,6 +54,7 @@ int main1 (int *pib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = OFF; i < N; i++)
     {
      if (ia[i] != ic[i - OFF])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c b/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c
index fb3e49927826f77149d4813185a6a2cac00232d4..5fb833441d46ce2b6b0df2def5b3093290a2f7a4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c
@@ -32,6 +32,7 @@ int main1 (int *ib, int off)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-77-global.c b/gcc/testsuite/gcc.dg/vect/vect-77-global.c
index 1580d6e075b018696c56de4d680a0999a837bbca..b9622420c64b732047712ff343a3c0027e7bcf3a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-77-global.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-77-global.c
@@ -28,6 +28,7 @@ int main1 (int *ib, int off)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-77.c b/gcc/testsuite/gcc.dg/vect/vect-77.c
index d402e147043c0245f6523f6713dafc83e5357121..033d4ba79869c54f12fc3eea24a11ada871373ab 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-77.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-77.c
@@ -25,6 +25,7 @@ int main1 (int *ib, int off)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c b/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c
index 57e8da0a9090cae7d501ecb83220afff0bf553b2..f7563c4608546696e5c1174402b42bfc2fd3fa83 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c
@@ -33,6 +33,7 @@ int main1 (int *ib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-78-global.c b/gcc/testsuite/gcc.dg/vect/vect-78-global.c
index ea039b389b22fe16af9353bd5efa59a375a6a71c..11b7e0e9b63cd95bfff9f64f0cfca8b5e4137fe2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-78-global.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-78-global.c
@@ -29,6 +29,7 @@ int main1 (int *ib)
 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-78.c b/gcc/testsuite/gcc.dg/vect/vect-78.c
index faa7f2f4f768b0d7a191b8b67f5000f53c485142..b2bf78108dc9b2f8d43235b64a307addeb71e82a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-78.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-78.c
@@ -25,6 +25,7 @@ int main1 (int *ib)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
      if (ia[i] != ib[i+off])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-8.c b/gcc/testsuite/gcc.dg/vect/vect-8.c
index 44c5f53ebaf260c2087b298abf0428c8d21e8cfa..85bc347ff2f2803d8b830bc1a231e8dadfa525be 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-8.c
@@ -19,6 +19,7 @@ int main1 (int n)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (a[i] != b[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c
index 0baf4d2859b679f7b20d6b5fc939b71ec2533fb4..a43ec9ca9a635d055a6ef70dcdd919102ae3690d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c
@@ -35,6 +35,7 @@ main1 (float * __restrict__ pa, float * __restrict__ pb, float *__restrict__ pc)
       pa[i] = q[i] * pc[i];
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != q[i] * pc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-80.c b/gcc/testsuite/gcc.dg/vect/vect-80.c
index 45aac84a578fa55624f1f305e9316bbc98e877bb..44299d3c7fed9ac9c213699f6982ba3858bbe0bb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-80.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-80.c
@@ -24,6 +24,7 @@ main1 (float * __restrict__ pa, float * __restrict__ pb, float *__restrict__ pc)
       pa[i] = q[i] * pc[i];
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != q[i] * pc[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-82.c b/gcc/testsuite/gcc.dg/vect/vect-82.c
index fcafb36c06388302775a68f0f056b925725e8aa8..2c1b567d10f2e7e519986c5b1d2e2c6b11353bc2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-82.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-82.c
@@ -17,6 +17,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != 0)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-82_64.c b/gcc/testsuite/gcc.dg/vect/vect-82_64.c
index 358a85a838f7519a0c1e0b2bae037d6e8aafeea9..d0962e06c62a8888cb5cabb1c1e08438e3a16c8e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-82_64.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-82_64.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != 0)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-83.c b/gcc/testsuite/gcc.dg/vect/vect-83.c
index a300a0a08c462c043b2841961c58b8c8f2849cc5..4fd14cac2abd9581cd47d67e8194795b74c68402 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-83.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-83.c
@@ -17,6 +17,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != 2)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-83_64.c b/gcc/testsuite/gcc.dg/vect/vect-83_64.c
index a5e897e093d955e0d1aff88021f99caf3a70d928..e3691011c7771328b9f83ea70aec20f373b10da4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-83_64.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-83_64.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != 2)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c
index ade04016cc3136470db804ea7a1bac3010d6da91..9d527b06c7476c4de7d1f5a8863088c189ce6142 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c
@@ -22,10 +22,12 @@ int main1 (int *a)
     }
 
 
+#pragma GCC novector
   for (j = 0; j < N; j++)
     if (a[j] != i + N - 1)
       abort ();
 
+#pragma GCC novector
   for (j = 0; j < N; j++)
     if (b[j] != j + N)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-85.c b/gcc/testsuite/gcc.dg/vect/vect-85.c
index a73bae1ad41a23ab583d7fd1f5cf8234d516d515..367cea72b142d3346acfb62cb16be58104de4f1c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-85.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-85.c
@@ -22,10 +22,12 @@ int main1 (int *a)
     }
 
 
+#pragma GCC novector
   for (j = 0; j < N; j++)
     if (a[j] != i + N - 1)
       abort();	
 
+#pragma GCC novector
   for (j = 0; j < N; j++)
     if (b[j] != j + N)
       abort();	
diff --git a/gcc/testsuite/gcc.dg/vect/vect-86.c b/gcc/testsuite/gcc.dg/vect/vect-86.c
index ff1d41df23f1e1eaab7f066726d5217b48fadb57..fea07f11d74c132fec987db7ac181927abc03564 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-86.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-86.c
@@ -24,11 +24,12 @@ int main1 (int n)
       b[i] = k;
     }
 
-
+#pragma GCC novector
   for (j = 0; j < n; j++)
     if (a[j] != i + n - 1)
       abort();	
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (b[i] != i + n)
       abort();	
diff --git a/gcc/testsuite/gcc.dg/vect/vect-87.c b/gcc/testsuite/gcc.dg/vect/vect-87.c
index 17b1dcdee99c819c8a65eadbf9159d9f78242f62..0eadc85eecdf4f8b5ab8e7a94782157534acf0a6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-87.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-87.c
@@ -23,10 +23,12 @@ int main1 (int n, int *a)
     }
 
 
+#pragma GCC novector
   for (j = 0; j < n; j++)
     if (a[j] != i + n - 1)
       abort();	
 
+#pragma GCC novector
   for (j = 0; j < n; j++)
     if (b[j] != j + n)
       abort();	
diff --git a/gcc/testsuite/gcc.dg/vect/vect-88.c b/gcc/testsuite/gcc.dg/vect/vect-88.c
index b99cb4d89a4b8e94000dc6334514af042e1d2031..64341e66b1227ada7de8f26da353e6c6c440c9a9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-88.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-88.c
@@ -23,10 +23,12 @@ int main1 (int n, int *a)
     }
 
 
+#pragma GCC novector
   for (j = 0; j < n; j++)
     if (a[j] != i + n - 1)
       abort();	
 
+#pragma GCC novector
   for (j = 0; j < n; j++)
     if (b[j] != j + n)
       abort();	
diff --git a/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c
index 59e1aae0017d92c5b98858777e7e55bceb73a90a..64578b353fec58c4af632346a546ab655b615125 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c
@@ -28,6 +28,7 @@ int main1 ()
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (p->y[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-89.c b/gcc/testsuite/gcc.dg/vect/vect-89.c
index 356ab96d330046c553364a585e770653609e5cfe..6e7c875c01e2313ba362506542f6018534bfb443 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-89.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-89.c
@@ -32,6 +32,7 @@ int main1 ()
     }
 
   /* check results: */  
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (p->y[i] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-9.c b/gcc/testsuite/gcc.dg/vect/vect-9.c
index 87600fb5df0d104daf4438e6a7a020e08c277502..dcecef729a60bf22741407e3470e238840ef6def 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-9.c
@@ -20,6 +20,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != (int) sb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-92.c b/gcc/testsuite/gcc.dg/vect/vect-92.c
index 9ceb0fbadcd61ec9a5c3682cf3582abf464ce106..86864126951ccd8392cc7f7e87642be23084d5ea 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-92.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-92.c
@@ -36,6 +36,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < 10; i++)
     {
       if (pa[i+1] != (pb[i+1] * pc[i+1]))
@@ -56,6 +57,7 @@ main2 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < 12; i++)
     {
       if (pa[i+1] != (pb[i+1] * pc[i+1]))
@@ -76,6 +78,7 @@ main3 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (pa[i+1] != (pb[i+1] * pc[i+1]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-93.c b/gcc/testsuite/gcc.dg/vect/vect-93.c
index c3e12783b2c47a4e296fd47cc9dc8e73b7ccebb0..b4ccbeedd08fe1285dc362b28cb6d975c6313137 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-93.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-93.c
@@ -23,6 +23,7 @@ main1 (float *pa)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N1; i++)
     {
       if (pa[i] != 2.0)
@@ -36,6 +37,7 @@ main1 (float *pa)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N2; i++)
     {
       if (pa[i] != 3.0)
@@ -60,6 +62,7 @@ int main (void)
   for (i = 1; i <= 256; i++) a[i] = b[i-1];
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= 256; i++)
     {
       if (a[i] != i-1)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-95.c b/gcc/testsuite/gcc.dg/vect/vect-95.c
index 1e8bc1e7240ded152ea81f60addab9f7179d3bfc..cfca253e810ff1caf2ef2eef0d7bafc39896ea3e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-95.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-95.c
@@ -11,6 +11,7 @@ void bar (float *pd, float *pa, float *pb, float *pc)
   int i;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (pa[i] != (pb[i] * pc[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-96.c b/gcc/testsuite/gcc.dg/vect/vect-96.c
index c0d6c37b21db23b175de895a582f48b302255e9f..e36196b50d7527f88a88b4f12bebbe780fe23f08 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-96.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-96.c
@@ -28,7 +28,8 @@ int main1 (int off)
   for (i = 0; i < N; i++)
       pp->ia[i] = ib[i];
 
-  /* check results: */  
+  /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (pp->ia[i] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c
index 977a9d57ed4795718722c83344c2efd761e6783e..e015c1684ad856a4732084fbe49783aaeac31e58 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c
@@ -32,6 +32,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != cb[i])
@@ -48,6 +49,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != s.q[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-97.c b/gcc/testsuite/gcc.dg/vect/vect-97.c
index 734ba3b6ca36cf56d810a1ce4329f9cb1862dede..e5af7462ef89e7f47b2ca822f563401b7bd95e2c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-97.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-97.c
@@ -27,6 +27,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != cb[i])
@@ -43,6 +44,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (s.p[i] != s.q[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c
index 61b749d4669386a890f5c2f5ba83d6e00d269b4f..2d4435d22e476de5b40c6245f26209bff824139c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c
@@ -22,6 +22,7 @@ int main1 (int ia[][N])
     }
 
   /* check results: */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
        if (ic[0][i] != DOT16 (ia[i], ib))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-98.c b/gcc/testsuite/gcc.dg/vect/vect-98.c
index 2055cce70b20b96dd69d06775e3d6deb9f27e3b2..72a1f37290358b6a89db6c89aada2c1650d2e7a5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-98.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-98.c
@@ -19,7 +19,8 @@ int main1 (int ia[][N])
 	ic[0][i] = DOT4 (ia[i], ib);
     }
 
-  /* check results: */  
+  /* check results: */
+#pragma GCC novector
   for (i = 0; i < M; i++)
     {
        if (ic[0][i] != DOT4 (ia[i], ib))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-99.c b/gcc/testsuite/gcc.dg/vect/vect-99.c
index ae23b3afbd1d42221f6fe876f23ee7b9beaebca3..0ef9051d907209e025a8fee057d04266ee2fcb03 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-99.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-99.c
@@ -21,6 +21,7 @@ int main (void)
 
   foo(100);
 
+#pragma GCC novector
   for (i = 0; i < 100; ++i) {
     if (ca[i] != 2)
       abort();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
index b6cc309dbe87b088c9969e07dea03c7f6b5993dd..8fd3bf407e9db3d188b897112ab1e41b381ae3c5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
@@ -45,6 +45,7 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)					\
+  _Pragma("GCC novector")				\
   for (int j = -M; j <= M; ++j)				\
     {							\
       TYPE a[N * M], b[N * M];				\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
index 09a4ebfa69e867869adca3bb5daece02fcee93da..5ecdc3250708e99c30e790da84b002b99a8d7e9b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
@@ -51,6 +51,7 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)					\
+  _Pragma("GCC novector")				\
   for (int j = -M; j <= M; ++j)				\
     {							\
       TYPE a1[N * M], a2[N * M], b1[N], b2[N];		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
index 63a897f4bad4894a6ec4b2ff8749eed3f9e33782..23690c45b65a1b95bf88d50f80d021d5c481d5f1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
@@ -52,6 +52,7 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)					\
+  _Pragma("GCC novector")				\
   for (int j = 0; j <= M; ++j)				\
     {							\
       TYPE a1[N * M], a2[N * M], b1[N], b2[N];		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
index 29bc571642db8858d3e4ca1027131a1a6559c4c1..b36ad116762e2e3c90ccd79fc4f8564cc57fc3f1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
@@ -39,6 +39,7 @@ typedef unsigned long long ull;
       for (int i = 0; i < N + M; ++i)				\
 	a[i] = TEST_VALUE (i);					\
       test_##TYPE (a + j, a);					\
+      _Pragma("GCC novector")					\
       for (int i = 0; i < N; i += 2)				\
 	{							\
 	  TYPE base1 = j == 0 ? TEST_VALUE (i) : a[i];		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
index ad74496a6913dcf57ee4573ef1589263a32b074c..f7545e79d935f1d05641415246aabc2dbe9b7d27 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
@@ -33,6 +33,7 @@ typedef unsigned long long ull;
     {								\
       TYPE a[N + DIST * 2] = {};				\
       test_##TYPE (a + DIST, a + i);				\
+      _Pragma("GCC novector")					\
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	{							\
 	  TYPE expected = 0;					\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
index 8a9a6fffde1d39f138c5f54221854e73cef89079..d90adc70e28420e5e8fd0e36c15316da12224b38 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
@@ -33,12 +33,14 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)						\
+  _Pragma("GCC novector")					\
   for (int i = 0; i < DIST * 2; ++i)				\
     {								\
       TYPE a[N + DIST * 2];					\
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	a[j] = TEST_VALUE (j);					\
       TYPE res = test_##TYPE (a + DIST, a + i);			\
+      _Pragma("GCC novector")					\
       for (int j = 0; j < N; ++j)				\
 	if (a[j + DIST] != (TYPE) j)				\
 	  __builtin_abort ();					\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
index b9f5d2bbc9f6437e3e8058264cc0c9aaa522b3e2..3b576a4dc432725c67b4e7f31d2bc5937bc34b7a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
@@ -34,6 +34,7 @@ typedef unsigned long long ull;
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	a_##TYPE[j] = TEST_VALUE (j);				\
       test_##TYPE (i + N - 1, DIST + N - 1);			\
+      _Pragma("GCC novector")					\
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	{							\
 	  TYPE expected;					\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
index 7c0ff36a8c43f11197de413cb682bcd0a3afcae8..36771b04ed5cc0d6c14c0fe1a0e9fd49db4265c4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
@@ -34,6 +34,7 @@ typedef unsigned long long ull;
     {								\
       __builtin_memset (a_##TYPE, 0, sizeof (a_##TYPE));	\
       test_##TYPE (DIST, i);					\
+      _Pragma("GCC novector")					\
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	{							\
 	  TYPE expected = 0;					\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c
index 8a699ebfda8bfffdafc5e5f09d137bb0c7e78beb..9658f8ce38e8efb8d19806a4078e1dc4fe57d2ef 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c
@@ -34,11 +34,13 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)						\
+  _Pragma("GCC novector")					\
   for (int i = 0; i < DIST * 2; ++i)				\
     {								\
       for (int j = 0; j < N + DIST * 2; ++j)			\
 	a_##TYPE[j] = TEST_VALUE (j);				\
       TYPE res = test_##TYPE (DIST, i);				\
+      _Pragma("GCC novector")					\
       for (int j = 0; j < N; ++j)				\
 	if (a_##TYPE[j + DIST] != (TYPE) j)			\
 	  __builtin_abort ();					\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
index 7e5df1389991da8115df2c6784b52ff3e15f8124..3bc78bed676d8267f7512b71849a7d33cb4ab05b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
@@ -29,6 +29,7 @@ typedef unsigned long long ull;
   }
 
 #define DO_TEST(TYPE)						\
+  _Pragma("GCC novector")					\
   for (int i = 0; i < DIST * 2; ++i)				\
     {								\
       for (int j = 0; j < N + DIST * 2; ++j)			\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
index a7fc1fcebbb2679fbe6a98c6fa340edcde492ba9..c11c1d13e0ba253b00afb02306aeec786cee1161 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
@@ -37,6 +37,7 @@ typedef unsigned long long ull;
       for (int i = 0; i < N + M; ++i)			\
 	a[i] = TEST_VALUE (i);				\
       test_##TYPE (a + j, a);				\
+      _Pragma("GCC novector")				\
       for (int i = 0; i < N; i += 2)			\
 	if (a[i + j] != (TYPE) (a[i] + 2)		\
 	    || a[i + j + 1] != (TYPE) (a[i + 1] + 3))	\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-align-1.c b/gcc/testsuite/gcc.dg/vect/vect-align-1.c
index d56898c4d23406b4c8cc53fa1409974b6ab05485..9630fc0738cdf4aa5db67effdd5eb47de4459f6f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-align-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-align-1.c
@@ -28,6 +28,7 @@ main1 (struct foo * __restrict__ p)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (p->y[i] != x[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-align-2.c b/gcc/testsuite/gcc.dg/vect/vect-align-2.c
index 39708648703357e9360e0b63ca7070c4c21def03..98759c155d683475545dc20cae23d54c19bd8aed 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-align-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-align-2.c
@@ -26,6 +26,7 @@ void fbar(struct foo *fp)
         f2.y[i][j] = z[i];
 
    for (i=0; i<N; i++)
+#pragma GCC novector
       for (j=0; j<N; j++)
 	if (f2.y[i][j] != z[i])
 	  abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c
index 6eb9533a8bb17acf7f9e29bfaa7f7a7aca2dc221..3f3137bd12e1462e44889c7e096096beca4d5b40 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c
@@ -18,6 +18,7 @@ __attribute__ ((noinline))
 void icheck_results (int *a, int *results)
 {
   int i;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != results[i])
@@ -29,6 +30,7 @@ __attribute__ ((noinline))
 void fcheck_results (float *a, float *results)
 {
   int i;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != results[i])
@@ -108,6 +110,7 @@ main1 ()
       ca[i] = cb[i];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != cb[i])
@@ -163,6 +166,7 @@ main1 ()
       a[i+3] = b[i-1];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <=N-4; i++)
     {
       if (a[i+3] != b[i-1])
@@ -180,6 +184,7 @@ main1 ()
       j++;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != c[i])
@@ -193,6 +198,7 @@ main1 ()
       a[N-i] = d[N-i];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != d[i])
@@ -206,6 +212,7 @@ main1 ()
       a[i] = 5.0;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != 5.0)
@@ -217,6 +224,7 @@ main1 ()
       sa[i] = 5;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sa[i] != 5)
@@ -228,6 +236,7 @@ main1 ()
       ia[i] = ib[i] + 5;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] + 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-all.c b/gcc/testsuite/gcc.dg/vect/vect-all.c
index cc41e2dd3d313a0557dea16204564a5a0c694950..6fd579fa6ad24623f387d9ebf5c863ca6e91dfe6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-all.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-all.c
@@ -18,6 +18,7 @@ __attribute__ ((noinline))
 void icheck_results (int *a, int *results)
 {
   int i;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != results[i])
@@ -29,6 +30,7 @@ __attribute__ ((noinline))
 void fcheck_results (float *a, float *results)
 {
   int i;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != results[i])
@@ -91,6 +93,7 @@ main1 ()
       ca[i] = cb[i];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ca[i] != cb[i])
@@ -134,6 +137,7 @@ main1 ()
       a[i+3] = b[i-1];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <=N-4; i++)
     {
       if (a[i+3] != b[i-1])
@@ -151,6 +155,7 @@ main1 ()
       j++;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != c[i])
@@ -164,6 +169,7 @@ main1 ()
       a[N-i] = d[N-i];
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i <N; i++)
     {
       if (a[i] != d[i])
@@ -177,6 +183,7 @@ main1 ()
       a[i] = 5.0;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != 5.0)
@@ -188,6 +195,7 @@ main1 ()
       sa[i] = 5;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sa[i] != 5)
@@ -199,6 +207,7 @@ main1 ()
       ia[i] = ib[i] + 5;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] + 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-1.c b/gcc/testsuite/gcc.dg/vect/vect-avg-1.c
index a7bc7cc90963c8aa8e14d0960d57dc724486247f..4a752cd7d573cd53ea1a59dba0180d017a7f73a5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-avg-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-avg-1.c
@@ -35,6 +35,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != ((BASE1 + BASE2 + i * 9 + BIAS) >> 1))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-11.c b/gcc/testsuite/gcc.dg/vect/vect-avg-11.c
index 85292f1b82416b70698619e284ae76f3a3d9410d..0046f8ceb4e7b2688059073645175b8845246346 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-avg-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-avg-11.c
@@ -43,6 +43,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (((((BASE1 + i * 5) ^ 0x55)
 		   + (BASE2 + i * 4)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-15.c b/gcc/testsuite/gcc.dg/vect/vect-avg-15.c
index 48d7ed773000486c42277535cebe34f101e035ef..57b6670cb98cdf92e60dd6c7154b4a8012b05a1e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-avg-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-avg-15.c
@@ -37,6 +37,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, N / 20, 20);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       int d = (BASE1 + BASE2 + i * 5) >> 1;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-16.c b/gcc/testsuite/gcc.dg/vect/vect-avg-16.c
index f3e3839a879b6646aba6237e55e2dcd943eac168..319edba1fa3c04b6b74b343cf5397277a36dd6d1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-avg-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-avg-16.c
@@ -37,6 +37,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, N / 20);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       int d = (BASE1 + BASE2 + i * 5) >> 1;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-5.c b/gcc/testsuite/gcc.dg/vect/vect-avg-5.c
index 6c43575f448325e84975999c2e8aa91afb525f87..6bdaeff0d5ab4c55bb5cba1df51a85c4525be6fb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-avg-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-avg-5.c
@@ -39,6 +39,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != ((BASE1 + BASE2 + i * 9 + BIAS) >> 1))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
index 19683d277b1ade1034496136f1d03bb2b446900f..22e6235301417d72e1f85ecbdd96d8e498500991 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
@@ -19,6 +19,7 @@ f(struct s *ptr, unsigned n) {
 
 void __attribute__ ((noipa))
 check_f(struct s *ptr) {
+#pragma GCC novector
     for (unsigned i = 0; i < N; ++i)
       if (ptr[i].i != V)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
index 1a101357ccc9e1b8bb157793eb3f709e99330bf6..0c8291c9363d0de4c09f81525015b7b88004bc94 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
@@ -23,6 +23,7 @@ f(struct s *ptr, unsigned n) {
 
 void __attribute__ ((noipa))
 check_f(struct s *ptr) {
+#pragma GCC novector
     for (unsigned i = 0; i < N; ++i)
       if (ptr[i].a != V)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
index 5dc679627d52e2ad229d0920e5ad8087a71281fe..46fcb02b2f1b6bb2689a6b709901584605cc9a45 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
@@ -24,6 +24,7 @@ f(struct s *ptr, unsigned n) {
 
 void __attribute__ ((noipa))
 check_f(struct s *ptr) {
+#pragma GCC novector
     for (unsigned i = 0; i < N; ++i)
       if (ptr[i].a != V)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
index fae6ea3557dcaba7b330ebdaa471281d33d2ba15..5a7227a93e4665cd10ee564c8b15165dc6cef303 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
@@ -22,6 +22,7 @@ f(struct s *ptr, unsigned n) {
 
 void __attribute__ ((noipa))
 check_f(struct s *ptr) {
+#pragma GCC novector
     for (unsigned i = 0; i < N; ++i)
       if (ptr[i].a != V)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
index 99360c2967b076212c67eb4f34b8fd91711d8821..e0b36e411a4a72335d4043f0f360c2e88b667397 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
@@ -22,6 +22,7 @@ f(struct s *ptr, unsigned n) {
 
 void __attribute__ ((noipa))
 check_f(struct s *ptr) {
+#pragma GCC novector
     for (unsigned i = 0; i < N; ++i)
       if (ptr[i].a != V)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c b/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c
index c97da5289141d35a9f7ca220ae62aa82338fa7f5..a1be71167025c960fc2304878c1ed15d90484dfb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c
@@ -183,6 +183,7 @@ check (int *p, cmp_fn fn)
 {
   int i;
 
+#pragma GCC novector
   for (i = 0; i < 32; i++)
     {
       int t1 = ((i % 4) > 1) == 0;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bswap16.c b/gcc/testsuite/gcc.dg/vect/vect-bswap16.c
index d29b352b832a67e89e7cb3856634390244369daa..7d2cb297738378863ddf78b916036b0998d28e6f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bswap16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bswap16.c
@@ -30,6 +30,7 @@ main (void)
 
   vfoo16 (arr);
 
+#pragma GCC novector
   for (i = 0; i < N; ++i)
     {
       if (arr[i] != expect[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bswap32.c b/gcc/testsuite/gcc.dg/vect/vect-bswap32.c
index 88d88b5f034153cb736391e4fc46a9b786ec28c5..1139754bbf1b8f7ef7a5a86f5621c9fe319dec08 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bswap32.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bswap32.c
@@ -30,6 +30,7 @@ main (void)
 
   vfoo32 (arr);
 
+#pragma GCC novector
   for (i = 0; i < N; ++i)
     {
       if (arr[i] != expect[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bswap64.c b/gcc/testsuite/gcc.dg/vect/vect-bswap64.c
index fd15d713c5d63db335e61c892c670b06ee9da25f..38d598eba33019bfb7c50dc2f0d5b7fec3a4736c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bswap64.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bswap64.c
@@ -30,6 +30,7 @@ main (void)
 
   vfoo64 (arr);
 
+#pragma GCC novector
   for (i = 0; i < N; ++i)
     {
       if (arr[i] != expect[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-1.c b/gcc/testsuite/gcc.dg/vect/vect-complex-1.c
index 2a87e2feadeba7f1eaef3cce72e27a7d0ffafb5f..b3a02fe9c6d840e79764cb6469a86cfce315a337 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-complex-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-complex-1.c
@@ -43,6 +43,7 @@ main (void)
   foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (c[i] != res[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-2.c b/gcc/testsuite/gcc.dg/vect/vect-complex-2.c
index 19b24e1eb87feacc8f7b90fb067124007e22c90f..7bbfdd95b5c46f83f24263e33bf5e3d2ecee0a4d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-complex-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-complex-2.c
@@ -43,6 +43,7 @@ main (void)
   foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (c[i] != res[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-4.c b/gcc/testsuite/gcc.dg/vect/vect-complex-4.c
index 49cfdbe1738794c3bf873c330fff4d7f4626e10b..d5e50cc15df66501fe1aa1618f04ff293908469a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-complex-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-complex-4.c
@@ -92,6 +92,7 @@ main (void)
   foo ();
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (c[i].f1 != res[i].f1)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-1.c b/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
index 261d828dbb2855fe680b396d3fcbf094e814b6fd..e438cbb67e196a5b3e5e2e2769efc791b0c2d6b7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
@@ -43,6 +43,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (j = 0; j < M; j++)
     if (x_out[j] != check_result[j])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-10.c b/gcc/testsuite/gcc.dg/vect/vect-cond-10.c
index b2f97d735ef7d94a80a67265b4535a1e228e20ca..dbbe4877db41c43d5be5e3f35cb275b96322c9bc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-10.c
@@ -120,41 +120,49 @@ main ()
 	}
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f3 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f5 (k);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f6 (k);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f7 (k);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, 0, sizeof (k));
   f8 (k);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-11.c b/gcc/testsuite/gcc.dg/vect/vect-cond-11.c
index f28af658f331849a0c5103ba96dd2e3b60de428d..38f1f8f50901c3039d0e7cb17d1bd47b18b89c71 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-11.c
@@ -79,13 +79,16 @@ baz (unsigned int *a, unsigned int *b,
     }
   if (fn (a, b) != -512U - (N - 32) * 16U + 32 * 127U)
     __builtin_abort ();
+#pragma GCC novector
   for (i = -64; i < 0; i++)
     if (a[i] != 19 || b[i] != 17)
       __builtin_abort ();
+#pragma GCC novector
   for (; i < N; i++)
     if (a[i] != (i - 512U < 32U ? i - 512U + 127 : i - 512U - 16)
 	|| b[i] != (i - 512U < 32U ? i * 2U : i + 1U))
       __builtin_abort ();
+#pragma GCC novector
   for (; i < N + 64; i++)
     if (a[i] != 27 || b[i] != 19)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-3.c b/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
index 8a66b4b52ed8a98dd52ef945afb3822de8fe37e9..1521fedd1b5b9d6f3021a1e5653f9ed8df0610b2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
@@ -50,6 +50,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (j = 0; j < M; j++)
     if (x_out_a[j] != check_result_a[j]
         || x_out_b[j] != check_result_b[j])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
index 2a6577c6db33a49c7fac809f67b7e957c0b707c2..4057d14c702c22ef41f504a8d3714a871866f04f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
@@ -47,6 +47,7 @@ int main (void)
 
   foo (125);
 
+#pragma GCC novector
   for (j = 0; j < M; j++)
     if (x_out_a[j] != 125
         || x_out_b[j] != 5)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-5.c b/gcc/testsuite/gcc.dg/vect/vect-cond-5.c
index 41e57f9235b90347e7842d88c9710ee682ea4bd4..f10feab71df6daa76966f8d6bc3a4deba8a7b56a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-5.c
@@ -46,6 +46,7 @@ int main ()
 
   foo(5);
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-6.c b/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
index 65fdc4a9ef195f7210b08289242e74cda1db4831..a46479a07eb105f5b2635f3d5848e882efd8aabf 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
@@ -47,6 +47,7 @@ int main ()
 
   foo(125);
 
+#pragma GCC novector
   for (k = 0; k < K; k++) 
     if (out[k] != 33)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-7.c b/gcc/testsuite/gcc.dg/vect/vect-cond-7.c
index bd2947516584bf0039d91589422acefd0d27cc35..ea11693ff21798e9e792cfc43aca3c59853e84a0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-7.c
@@ -53,6 +53,7 @@ main ()
 #undef F
 #define F(var) f##var ();
   TESTS
+#pragma GCC novector
   for (i = 0; i < 64; i++)
     {
       asm volatile ("" : : : "memory");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-8.c b/gcc/testsuite/gcc.dg/vect/vect-cond-8.c
index d888442aa456e7520cf57e4a07c0938849758068..88289018b9be7d20edd9c7d898bb51d947ed7806 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-8.c
@@ -79,18 +79,22 @@ main ()
       e[i] = 2 * i;
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 17 : 0))
       abort ();
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 0 : 24))
       abort ();
   f3 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 ? 51 : 12))
       abort ();
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (f[i] != ((i % 3) == 0 ? d[i] : e[i]))
       abort ();
@@ -112,6 +116,7 @@ main ()
       b[i] = i / 2;
     }
   f5 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (c[i] != ((i % 3) == 0 ? a[i] : b[i]))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-9.c b/gcc/testsuite/gcc.dg/vect/vect-cond-9.c
index 63eee1b47296d8c422b4ff899e5840ca4d4f59f5..87febca10e7049cb0f4547a13d27f533011d44bc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-9.c
@@ -145,51 +145,61 @@ main ()
 	}
     }
   f1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f3 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f4 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, -6, sizeof (k));
   f5 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, -6, sizeof (k));
   f6 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f7 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f8 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (j[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (j, -6, sizeof (j));
   f9 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
   __builtin_memset (k, -6, sizeof (k));
   f10 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c
index d52e81e9109cc4d81de84adf370b2322799c8c27..5138712731f245eb1f17ef2e9e02e333c8e214de 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c
@@ -23,6 +23,7 @@
 #define TEST(OP)					\
   {							\
     f_##OP (a, b, 10);					\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	int bval = (i % 17) * 10;			\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c
index f02b0dc5d3a11e3cfa8a23536f570ecb04a039fd..11a680061c21fb7da69739892b79ff37d1599027 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c
@@ -24,6 +24,7 @@
 #define TEST(INV)					\
   {							\
     f_##INV (a, b, c, d);				\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	double mb = (INV & 1 ? -b[i] : b[i]);		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
index 55a174a7ec1fa42c40d4359e882ca475a4feaca3..1af0fe642a0f6a186a225e7619bff130bd09246f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
@@ -20,6 +20,7 @@
 #define TEST(OP)					\
   {							\
     f_##OP (a, b, 10);					\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	int bval = (i % 17) * 10;			\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
index d2eadc4e9454eba204b94532ee3b002692969ddb..ec3d9db42021c0f1273bf5fa37bd24fa77c1f183 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
@@ -21,6 +21,7 @@
 #define TEST(OP)					\
   {							\
     f_##OP (a, b, 10);					\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	int bval = (i % 17) * 10;			\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
index cc70b8a54c44fbc1d20aa9c2599b9a37d9fc135b..2aeebd44f835ee99f110629ded9572b338d6fb50 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
@@ -23,6 +23,7 @@
 #define TEST(OP)						\
   {								\
     f_##OP (a, b, 10);						\
+    _Pragma("GCC novector")					\
     for (int i = 0; i < N; ++i)					\
       {								\
 	int bval = (i % 17) * 10;				\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c
index 739b98f59aece34b73ed4762c2eeda2512834539..9d20f977884213a6b4580b90e1a187161cf5c945 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c
@@ -22,6 +22,7 @@
 #define TEST(INV)					\
   {							\
     f_##INV (a, b, c, d);				\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	double mb = (INV & 1 ? -b[i] : b[i]);		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c b/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
index e6ad865303c42c9d5958cb6e7eac6a766752902b..faeccca865f63bc55ee1a8b412a5e738115811e9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
@@ -73,6 +73,7 @@ main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (out[i].a != result[2*i] || out[i].b != result[2*i+1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c b/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c
index 95efe7ad62eac1f66b85ffdc359fd60bd7465cfd..f3b7db076e6b223fcf8b341f41be636e10cc952a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c
@@ -55,6 +55,7 @@ main (void)
 
   foo (a, b);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != result[2*i] || b[i] != result[2*i+1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
index c81f8946922250234bf759e0a0a04ea8c1f73e3c..f02f98faf2fad408f7d7e65a09c678f242aa32eb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
@@ -16,6 +16,7 @@ int
 main (void)
 {
   V v = foo ((V) { 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff }, 0xffff);
+#pragma GCC novector
   for (unsigned i = 0; i < sizeof (v) / sizeof (v[0]); i++)
     if (v[i] != 0x00010001)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
index b4eb1a4dacba481e6306b49914d2a29b933de625..80293e50bbc6bbae90cac0fcf436c790b3215c0e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
@@ -44,6 +44,7 @@ int main ()
   fun1 (a, N / 2, N);
   fun2 (b, N / 2, N);
 
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       if (DEBUG)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
index 29a16739aa4b706616367bfd1832f28ebd07993e..bfdc730fe5f7b38117854cffbf2e450dad7c3b5a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
@@ -30,6 +30,7 @@ int main ()
   fun1 (a, N / 2, N);
   fun2 (b, N / 2, N);
 
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       if (DEBUG)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c
index 6abf76392c8df94765c63c248fbd7045dc24aab1..6456b3aad8666888fe15061b2be98047c28ffed2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c
@@ -43,6 +43,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c
index 4bfd1630c4e9927d89bf23ddc90716e0cc249813..d5613e55eb20731070eabeee8fe49c9e61d8be50 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c
@@ -43,6 +43,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c
index 3bdf9efe9472342359b64d51ef308a4d4f8f9a79..239ddb0b444163803c310e4e9910cfe4e4c44be7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c
@@ -48,12 +48,14 @@ int main ()
 
   foo(0, 0);
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out_max[k] != check_max[k] || out_min[k] != 0)
       abort ();
 
   foo(100, 45);
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out_min[k] != check_min[k] || out_max[k] != 100)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c
index e5937705400c7c015513abc513a8629c6d66d140..5344c80741091e4e69b41ce056b9541b75215df2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c
@@ -43,6 +43,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
index 079704cee81cc17b882b476c42cbeee0280369cf..7465eae1c4762d39c14048077cd4786ffb8e4848 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
@@ -43,6 +43,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c
index 1d9dcdab5e9c09514a8427cd65c419e74962c9de..a032e33993970e65e9e8a90cca4d23a9ff97f1e8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c
@@ -49,6 +49,7 @@ int main ()
 
   foo ();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c
index 85aec1bf609582988f06826afb6b7ce77d6d83de..d1d1faf7c3add6ce2c3378d4d094bf0fc2aba046 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c
@@ -38,6 +38,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c
index c3145a2ad029f92e96995f59e9be9823e016ec11..1ef7a2d19c8b6ee96280aee0e9d69b441b597a89 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c
@@ -52,6 +52,7 @@ int main ()
 
   foo();
 
+#pragma GCC novector
   for (k = 0; k < K; k++)
     if (out[k] != check_result[k])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c b/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c
index 76b436948c185ca73e21203ef68b0a9d4da03408..603f48167d10fe41143f329cd50ca7f6c8e9a154 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c
@@ -21,6 +21,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (da[i] != (double) fb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c b/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c
index 8b82c82f1cdd1078898847c31c6c06371f4232f6..9f404f0e36e10ebf61b44e95d6771d26a25faea8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c
@@ -20,6 +20,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (fa[i] != (float) db[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c
index fc5081b7e8e143893009b60147d667855efa12ad..f80da6a7ca7f0de224d88860a48f24b4fd8c2ad8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c
@@ -20,6 +20,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != (int) fb[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
index 64fab3876310d60ca016b78938e449201c80997d..dc038857a42813e665591c10eb3ab7f744d691ad 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
@@ -19,6 +19,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != (int) db[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-fma-1.c b/gcc/testsuite/gcc.dg/vect/vect-fma-1.c
index 6b6b4f726e9476ac6a90984e15fdd0839dff8885..27d206d9fa0601812b09a3ead2ee9730623e97e4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-fma-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-fma-1.c
@@ -22,6 +22,7 @@
 #define TEST(INV)					\
   {							\
     f_##INV (a, b, c, d);				\
+    _Pragma("GCC novector")				\
     for (int i = 0; i < N; ++i)				\
       {							\
 	double mb = (INV & 1 ? -b[i] : b[i]);		\
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
index 4cee73fc7752681c2f677d3e6fddf7daf6e183eb..e3bbf5c0bf8db8cb258d8d05591c246d80c5e755 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
@@ -50,6 +50,7 @@ main (void)
   check_vect ();
 
   f (y, x, indices);
+#pragma GCC novector
   for (int i = 0; i < 32; ++i)
     if (y[i] != expected[i])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
index 738bd3f3106948754e38ffa93fec5097560511d3..adfef3bf407fb46ef7a2ad01c495e44456b37b7b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
@@ -54,6 +54,7 @@ main (void)
   check_vect ();
 
   f (y, x, indices);
+#pragma GCC novector
   for (int i = 0; i < 32; ++i)
     if (y[i] != expected[i])
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
index 7e323693087598942f57aa8b7cf3686dde4a52c9..04d5fd07723e851442e1dc496fdf004d9196caa2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
@@ -26,6 +26,7 @@ int main ()
   check_vect ();
   foo ();
   /* check results:  */
+#pragma GCC novector
   for (int i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c
index 56a8e01993d1a0701998e377fb7fac4fa2119aed..0f752b716ca811de093373cce75d948923386653 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] != MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c
index 962be1c0230cca6bef2c097b35833ddd6c270875..8b028d7f75f1de1c8d10376e4f0ce14b60dffc70 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] == MAX ? 0 : MAX);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c
index 6406875951bd52c3a5c3691eb2bc062e5525a4a1..10145d049083b541c95b813f2fd12d3d62041f53 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] >= MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c
index d55440c9fa421719cb03a30baac5d58ca1ac2fb6..4964343c0ac80abf707fe11cacf473232689123e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] > MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c
index 5cef85175131bd6b2e08d7801966f5526ededf8e..63f53a4c4eef6e1397d67c7ce5570dfec3160e83 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] <= MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c
index 3118e2d5a5536e175838284d367a8f2eedf8eb86..38b014336482dc22ecedaed81b79f8e7d5913d1e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] < MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c
index 272fde09429b6a46ee4a081b49736613136cc328..56e0f71bc799d16725e589a53c99abebe5dca40a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] != MAX ? MAX : 0); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c
index c0c7a3cdb2baafa5702a7fcf80b7198175ecc4f2..879d88a5ce9239bf872cc0ee1b4eb921b95235d0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c
@@ -22,6 +22,7 @@ int main ()
     A[i] = ( A[i] == MAX ? 0 : MAX); 
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c
index e6446a765c0298857f71b80ffcaefdf77e4f5ce3..bbeccae0f228ad3fc7478c879ae4a741ae6fe7a3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c
@@ -27,6 +27,7 @@ int main ()
   check_vect ();
   foo ();
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
index bef3efa5658ae6d91010d286967e319906f9aeb5..f75c0f5a1a6645fdee6a8a04ffc55bd67cb7ac43 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
@@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (fa[i] != (float) ib[i]) 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
index 666ee34a4a753ff1d0e33012d95a77496f1986fa..32df21fb52a0b9f16aff7340eee21e76e832cceb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
@@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (float_arr[i] != (float) int_arr[i]) 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
index 78fc3da481c6693611b45d3939fe03d23e84f8f7..db33a84b54d70c9355079adf2ee163c904c68e57 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
@@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (da[i] != (double) ib[i]) 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
index af8fe89a7b02b555acc64b578a07c735f5ef45eb..6fc23bb4621eea594a0e70347a8007a85fb53db8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
@@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (fa[i] != (float) sb[i]) 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
index 49c83182026b91c7b52667fec7a44554e3aff638..b570db5dc96db9c6e95b0e4dbebe1dae19c5ba7c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
@@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (fa[i] != (float) usb[i]) 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-1.c b/gcc/testsuite/gcc.dg/vect/vect-iv-1.c
index 90163c440d34bcd70a7024b83f70abb7b83f8077..e6dcf29ebe0d2b2dc6695e754c4a1043f743dd58 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-1.c
@@ -22,6 +22,7 @@ __attribute__ ((noinline)) int main1 (int X)
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (arr[i] != result[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-10.c b/gcc/testsuite/gcc.dg/vect/vect-iv-10.c
index 195474b56441bee9b20f373a6aa991610a551e10..83bc7805c3de27ef3dd697d593ee86c1662e742c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-10.c
@@ -17,6 +17,7 @@ int main1 ()
   }
 
   /* check results:  */
+#pragma GCC novector
   for (j=0,i=N;  j<N,i>0;  i--,j++) {
       if (ia[j] != i)
         abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-2.c b/gcc/testsuite/gcc.dg/vect/vect-iv-2.c
index 73e30ee9bac6857b545242136d9c1408f6bfe60e..d85bb3436b2e0abcc4d0d0a7b480f4f267b4898c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-2.c
@@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 ()
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (arr1[i] != 2+2*i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-3.c b/gcc/testsuite/gcc.dg/vect/vect-iv-3.c
index f8ca94dd17db81d8be824dfb2f023517f05d7c04..c0738ebc469f1780eb8ce90e89caa222df0e1fba 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-3.c
@@ -24,6 +24,7 @@ __attribute__ ((noinline)) int main1 ()
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (arr1[i] != 2+2*i || arr2[i] != 5 + 2*i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-4.c b/gcc/testsuite/gcc.dg/vect/vect-iv-4.c
index dfe5bc14458c856122f48bd6bc6a50092d7729e1..2dd8ae30513260c858504f8dc0e8c7b6fd3ea59b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-4.c
@@ -24,6 +24,7 @@ __attribute__ ((noinline)) int main1 ()
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (arr1[i] != 2+2*i || arr2[i] != 5 + 2*i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-5.c b/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
index 2015385fbf5fac1349124dd35d57b26c49af6346..c3c4735f03432f9be07ed2fb14c94234ee8f4e52 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
@@ -20,6 +20,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (arr[i] != 1.0 + 2.0*i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-6.c b/gcc/testsuite/gcc.dg/vect/vect-iv-6.c
index ccd7458a98f1d3833b19c838a27e9f582631e89c..4c9d9f19b45825a210ea3fa26160a306facdfea5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-6.c
@@ -28,6 +28,7 @@ __attribute__ ((noinline)) int main1 (int X)
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (arr1[i+1] != X+6*i+2
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-7.c b/gcc/testsuite/gcc.dg/vect/vect-iv-7.c
index 24b59fe55c498bf21d107bef72bdc93690229c20..f6d93360d8dda6f9380425b5518ea5904f938322 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-7.c
@@ -22,6 +22,7 @@ __attribute__ ((noinline, noclone)) int main1 (int X)
    } while (i < N);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (arr[i] != result[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c
index 45d82c65e2f85b7b470a22748dacc78a63c3bd3e..26e8c499ce50cc91116c558a2425a47ebe21cdf7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c
@@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8.c
index dd37d250e91c3839c21fb3c22dc895be367cdcec..b4bb29d88003d2bbc0e90377351cb46d1ff72b55 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8.c
@@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c
index 63b6b6e893f7a55a56aef89331610fd76d2c1c42..dceae27bbbee36a13af8055785dd4258b03e3dba 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c
@@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != (signed char)myX || b[i] != myX || c[i] != (int)myX++)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c
index 1f8fedf2716745d469771cfce2629dd05478bce8..dfe3a27f024031427344f337d490d4c75d8a04be 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c
@@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
   }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (a[i] != (signed char)myX || b[i] != myX || c[i] != (int)myX++)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-1.c b/gcc/testsuite/gcc.dg/vect/vect-live-1.c
index f628c5d3998930ea3e0cee271c20ff3eb17edf62..e4a6433a89961b008a2b766f6669e16f378ca01e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-1.c
@@ -38,6 +38,7 @@ main (void)
   if (ret != MAX + START)
     abort ();
 
+#pragma GCC novector
   for (i=0; i<MAX; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-2.c b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
index 19d8c22859e0804ccab9d25ba69f22e50d635ebb..dae36e9ed67c8f6f5adf735345b817d59a3741f4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
@@ -48,6 +48,7 @@ main (void)
   if (ret != MAX - 1)
     abort ();
 
+#pragma GCC novector
   for (i=0; i<MAX; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-3.c b/gcc/testsuite/gcc.dg/vect/vect-live-3.c
index 8f5ccb27365dea5e8cd8561d3c8a406e47469ebe..1f6b3ea0faf047715484ee64c1a49ef74dc1850e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-3.c
@@ -45,6 +45,7 @@ main (void)
   if (ret != (MAX - 1) * 3)
     abort ();
 
+#pragma GCC novector
   for (i=0; i<MAX; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-4.c b/gcc/testsuite/gcc.dg/vect/vect-live-4.c
index 553ffcd49f744cabd6bdd42e6aca8c12d15ceb01..170927802d2d8f1c42890f3c82f9dabd18eb2f38 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-4.c
@@ -42,6 +42,7 @@ main (void)
   if (ret != MAX + 4)
     abort ();
 
+#pragma GCC novector
   for (i=0; i<MAX; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-5.c b/gcc/testsuite/gcc.dg/vect/vect-live-5.c
index 7cde1db534bb1201e106ba34c9e8716c1f0445a1..9897552c25ce64130645887439c9d1f0763ed399 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-5.c
@@ -39,6 +39,7 @@ main (void)
   if (ret != 99)
     abort ();
 
+#pragma GCC novector
   for (i=0; i<MAX; i++)
     {
       __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c b/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
index 965437c8f03eaa707add3577c6c19e9ec4c50302..6270c11e025ed6e181c7a607da7b1b4fbe82b325 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
@@ -51,6 +51,7 @@ main (void)
       a[i] = i;
     }
 
+#pragma GCC novector
   for (i=0; i<4; i++)
     {
       __asm__ volatile ("");
@@ -60,6 +61,7 @@ main (void)
       if (ret != (MAX * 4) - 4 + i)
 	abort ();
 
+#pragma GCC novector
       for (i=0; i<MAX*4; i++)
 	{
 	  __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c b/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c
index 0d2f17f9003178d65c3dc5358e13c45f8ac980e3..c9987018e88b04f5f0ff195baaf528ad86722714 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c
@@ -45,6 +45,7 @@ main (void)
       a[i] = i;
     }
 
+#pragma GCC novector
   for (i=0; i<2; i++)
     {
       __asm__ volatile ("");
@@ -54,6 +55,7 @@ main (void)
       if (ret != (MAX * 2) - 2 + i)
 	abort ();
 
+#pragma GCC novector
       for (i=0; i<MAX*2; i++)
 	{
 	  __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
index a3f60f6ce6d24fa35e94d95f2dea4bfd14bfdc74..e37822406751b99b3e5e7b33722dcb1912483345 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
@@ -52,6 +52,7 @@ main (void)
       a[i] = i;
     }
 
+#pragma GCC novector
   for (i=0; i<4; i++)
     {
       __asm__ volatile ("");
@@ -61,6 +62,7 @@ main (void)
       if (ret != (MAX * 4) - 4 + i)
 	abort ();
 
+#pragma GCC novector
       for (i=0; i<MAX*4; i++)
 	{
 	  __asm__ volatile ("");
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c
index 992cbda2e91628cd145d28c8fdabdb7a4d63ee68..91d4d40a86013dca896913d082773e20113a17e2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c
@@ -36,6 +36,7 @@ main ()
       asm ("");
     }
   foo (a, b);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (a[i] != ((i & 1)
 		 ? 7 * i + 2.0 * (7 * i * 7.0 + 3.0)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
index 7d9dc5addf54264bf2fd0c733ccfb83bb1c8f20d..76f72597589c6032d298adbc8e687ea4808e9cd4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
@@ -36,6 +36,7 @@ main ()
       asm ("");
     }
   foo (a, b, c);
+#pragma GCC novector
   for (i = 0; i < 1024; i++)
     if (a[i] != ((i & 1) ? -i : i)
 	|| b[i] != ((i & 1) ? a[i] + 2.0f : 7 * i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c b/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c
index 8e46ff6b01fe765f597add737e0b64ec5b505dd1..4df0581efe08333df976dfc9c52eaab310d5a1cc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c
@@ -37,6 +37,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, N);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != HRS(BASE1 * BASE2 + i * i * (CONST1 * CONST2)
 		    + i * (BASE1 * CONST2 + BASE2 * CONST1)))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
index b63e9a8a6d9d0c396c3843069d100fbb9d5fa913..1e90d19a684eb0eebf223f85c4ea2b2fd93aa0c5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
@@ -27,6 +27,7 @@ main (void)
     }
 
   foo (data);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (data[i] / 123 != i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
index a8253837c3863f5bc5bfea1d188a5588aea501c6..f19829b55a96227f0157527b015291da6abd54bf 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
@@ -26,6 +26,7 @@ main (void)
     }
 
   foo (data);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (data[i] / -19594LL != i)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
index 378a5fe642ac415cd20f45e88f06e8d7b9040c98..06dbb427ea11e14879d1856c379934ebdbe50e04 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
@@ -39,6 +39,7 @@ __attribute__ ((noinline)) int main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i + NSHORTS - 1] != sb[i] || ia[i + NINTS - 1] != ib[i + 1])
@@ -69,6 +70,7 @@ __attribute__ ((noinline)) int main2 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i + NINTS - 1] != sb[i + 1] || ia[i + NINTS - 1] != ib[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c
index 891ba6d8e7169c67e840733402e953eea919274e..c47cf8c11d9ade3c4053f3fcf18bf719fe58c971 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c
@@ -48,6 +48,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresult[i] != (unsigned short)uY[i])
       abort ();
@@ -55,6 +56,7 @@ int main (void)
   
   foo2 (N);
   
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != (short)Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c
index c58391f495eb8d19aec9054f4d324a1bdf4461a4..29d178cf88d8df72b546772047b1e99a1a74043b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c
@@ -30,6 +30,7 @@ int main (void)
 
   foo (N,z+2);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (z[i+2] != x[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
index 4782d3f7d1066e1dcf5c3c1004d055eb56bd3aec..dd5fffaed8e714114dcf964ffc6b5419fba1aa9f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
@@ -31,6 +31,7 @@ int main (void)
 
   foo (N,z+2);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (z[i+2] != x[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c
index 2b185bb1f86ede937842596cec86f285a7c40d27..5bf796388f9c41083a69f3d6be3f5a334e9410a1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c
@@ -44,6 +44,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresult[i] != (unsigned int)uX[i])
       abort ();
@@ -51,6 +52,7 @@ int main (void)
   
   foo2 (N);
   
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != (int)X[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c
index ff5f8e989b2ea57fb265e8fca3a39366afb06aaa..6f9b81d1c01ab831a79608074f060b3b231f177d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresultX[i] != uX[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
index cf45703e01867a7954325f6f8642594e31da9744..a61f1a9a2215e238f6c67e229f642db6ec07a00c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
@@ -26,6 +26,7 @@ int main (void)
 
   foo (N,z+2);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (z[i+2] != x[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c
index 79ad80e3013e189c0efb9425de2b507cf486f39a..d2eff3a20986593a5185e981ae642fcad9a57a29 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c
@@ -30,6 +30,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresultX[i] != uX[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c
index 7f93938349f91c0490dad8ea2de3aec780c30b2b..069ef44154effb38f74792e1a00dc3ee236ee6db 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c
@@ -26,6 +26,7 @@ __attribute__ ((noinline)) int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
index 1f82121df06181ad27478378a2323dbf478eacbe..04b144c869fc2a8f8be91a8252387e09d7fca2f2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
@@ -39,6 +39,7 @@ int main1 (int n, int * __restrict__ pib,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (ia[i] != pib[i] 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
index b0f74083f2ba992620ebdf3a3874f6c5fa29f84d..18ab9538675b3fd227ae57fafc1bfd1e840b8607 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
@@ -41,6 +41,7 @@ int main1 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
@@ -75,6 +76,7 @@ int main2 (int n)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c
index ad11d78a548735a67f76b3aa7f98731d88868b56..7c54479db1f684b9661d59816a3cd9b0e5f35619 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c
@@ -30,6 +30,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (ia[i] != ib[i] + ic[i] 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
index 864b17ac640577753d8164f1ae3ea84181a553c1..73d3b30384ebc4f15b853a140512d004262db3ef 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
@@ -46,6 +46,7 @@ int main1 (int n,
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     {
       if (ia[i] != pib[i] + pic[i] 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c
index 315c9aba731ac28189cd5f463262fc973d52abe2..001671ebdc699ca950f6fd157bd93dea0871c5ab 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresultX[i] != uX[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c
index 8c5c611947f720c9ef744c33bdd09a78967d4a4c..3e599b3462d13a8afcad22144100f8efa58ac921 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c
@@ -44,6 +44,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (uresult[i] != (unsigned short)uX[i])
       abort ();
@@ -51,6 +52,7 @@ int main (void)
   
   foo2 (N);
   
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != (short)X[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
index 75b210c1a21c56c114f25b354fb368bdbe9462d5..357d006177f60a5376597929846efbfaa787f90b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
@@ -20,6 +20,7 @@ int main (int argc, const char **argv)
   int i;
   check_vect ();
   foo (31);
+#pragma GCC novector
   for (i = 0; i < 31; i++)
     if (ii[i] != i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
index 229ce987db5f5a5b48177d0c9d74e416e417d3f6..dc4c7a64aee4f800997d62550f891b3b35f7b633 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
@@ -23,6 +23,7 @@ int main (int argc, const char **argv)
   int i;
   check_vect ();
   foo (32);
+#pragma GCC novector
   for (i = 0; i < 32; i++)
     if (ii[i] != i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
index 16665265c4062c0a3acb31e01a1473dea3125685..268e65458bf839e2403a7ae3e4c679e7df6dcac7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
@@ -22,6 +22,7 @@ int main (int argc, const char **argv)
   int i;
   check_vect ();
   foo (33);
+#pragma GCC novector
   for (i = 0; i < 33; i++)
     if (ii[i] != i)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c b/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c
index fca8ee0963860fa0a938db41c865e8225bf554c3..aa6e403b51ce8e9a29ddd39da5d252c9238ca7eb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c
@@ -28,10 +28,12 @@ int main (void)
    
   test1 (x + 16);
   
+#pragma GCC novector
   for (i = 0; i < 128; i++)
    if (x[i + 16] != 1234)
      abort ();
   
+#pragma GCC novector
   for (i = 0; i < 16; i++)
     if (x[i] != 5678
        || x[i + 144] != 5678)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c b/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c
index c924b12b02fd438d039d0de6b6639813047839e7..95b16196007488f52b2ec9a2dfb5a4f24ab49bba 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c
@@ -28,10 +28,12 @@ int main (void)
    
   test1 (x + 16, 1234);
   
+#pragma GCC novector
   for (i = 0; i < 128; i++)
    if (x[i + 16] != 1234)
      abort ();
   
+#pragma GCC novector
   for (i = 0; i < 16; i++)
     if (x[i] != 5678
        || x[i + 144] != 5678)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c
index f52f30aa24e83768f9beb03fb2ac7b17f37e0b77..129dab2ba1cfe8175644e0a2330349974efca679 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c
@@ -28,6 +28,7 @@ foo ()
       out[i] = res;
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)  
     if (out[i] != check_res[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c
index 5aa977df633c9a5d24e248b0c02ec21751f78241..26ad6fa65c6d1489aa1b1ce9ae09ea6f81ad44d2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c
@@ -27,6 +27,7 @@ foo ()
       out[i] = res;
     }
 
+#pragma GCC novector
   for (i = 0; i < N; i++)  
     if (out[i] != check_res[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c
index f2ab30c63b2e28fbd453af68628d3491d6b4d034..4e3b8343ff7b4b1f43397fe2e71a8de1e89e9a74 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c
@@ -27,6 +27,7 @@ main1 ()
     }
 
   /* Check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != DIFF)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c
index 02f01cfb5791319d766f61465c2d1b64718674de..32c40fb76e325571347993571547fa12dd6255aa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c
@@ -28,6 +28,7 @@ int main (void)
   foo ();
 
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j++) {
       if (image[j][i] != j+i)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2.c
index 55023d594dd2e0cb18c3c9dc838ac831ede938da..a0a419c1547fc451b948628dafeb48ef2f836daa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2.c
@@ -28,6 +28,7 @@ int main (void)
   foo ();
 
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j++) {
       if (image[j][i] != j+i)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c
index 6b9fefedf3a5c9ee43c9201039987468710df62d..5ca835a2dda468bab1cbba969278a74beff0de32 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c
@@ -28,6 +28,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j++) {
       if (image[k][j][i] != j+i+k)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c
index 3a4dafee0720bd1a5e532eb2c0062c5eb78556b6..f9924fcb2b40531e8e7a4536d787b5d1b6e2b4ee 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c
@@ -28,6 +28,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j++) {
       if (image[k][j][i] != j+i+k)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c
index bb4e74b7b333ce036159db4cbf5aaa7107dc35d9..218df61cf4b18709cb891969ae53977081a86f1d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c
@@ -27,6 +27,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j++) {
       if (image[k+i][j][i] != j+i+k)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c
index 6adde9769215e8c98132ec91ab015e56b710c47a..36c9681201532960b3eecda2b252ebe83036a95a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c
@@ -28,6 +28,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j+=2) {
       if (image[k][j][i] != j+i+k)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c
index bf6abfef01fa96904adbf350935de3609550f2af..678d7e46a5513e0bdeaf0ec24f2469d58df2cbc5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c
@@ -28,6 +28,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < N; j+=2) {
       if (image[k][j][i] != j+i+k)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c
index b75281bc3187f84824e1360ba92a18f627686aa5..81a4fc407086372c901b1ff34c75cada3e8efb8a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c
@@ -27,6 +27,7 @@ int main (void)
 
  for (k=0; k<N; k++) {
   for (i = 0; i < N; i++) {
+#pragma GCC novector
     for (j = 0; j < i+1; j++) {
       if (image[k][j][i] != j+i+k)
        abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c
index fdc8a0544dd941f28a97a22e706bd3f5c3c9d2a3..231989917d7c4d5ff02b4f13a36d32c543114c37 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < N; j++) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3.c
index 921db48a0f76763bc724d41f90c74472da8e25fb..c51787fe5753f4317b8c1e82c413b009e865ad11 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < N; j++) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c
index fd841b182e3c81eed43a249fe401c6213814ea36..7ae931e39be5a4e6da45242b415459e073f1384a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < N; j++) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
index d26440d1a64e887aa2cd6ccf1330cb34d244ef12..bfadac0c5e70b61b23b15afa9271ac9070c267c1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < N; j++) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c
index b915c4e370c55293ec00665ddd344b9ddafec3b4..1e2bbf1e7bac29563a530c4bbcd637d8541ddfca 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N/2; i++) {
     diff = 0;
     for (j = 0; j < N; j++) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c
index 091c6826f66acb07dbc412ae687d72c84800146d..952bba4d911956c49a515276c536a87a68433d40 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c
@@ -36,6 +36,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < N; j+=4) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4.c b/gcc/testsuite/gcc.dg/vect/vect-outer-4.c
index 9614b777aded3c9d2f5229d27ce8e5cfbce0c7d2..8a803cd330f25324669467a595534100878f3ddc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4.c
@@ -38,6 +38,7 @@ int main (void)
 
   foo ();
   
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < M; j+=4) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c
index b656064697c93177cb9cd9aae8f9f278b9af40b0..587eabaf004705fb6d89882a43a628921361c30e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < M; j+=4) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c b/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c
index 443a00d49e19dae2a0dd32d6e9e28d2bf5972201..0c9115f60a681f48125dfb2a6428202cc1ec7557 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo ();
   
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     diff = 0;
     for (j = 0; j < M; j+=4) {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-5.c b/gcc/testsuite/gcc.dg/vect/vect-outer-5.c
index 10b558fd20905d2c8b9915d44a41e89b406028d9..67be075278847ea09e309c5d2ae2b4cf8c51b736 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-5.c
@@ -38,6 +38,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N-20; i++)
     {
       s = 0;
@@ -57,6 +58,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < 4; i++)
     {
       if (B[i] != E[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-6.c b/gcc/testsuite/gcc.dg/vect/vect-outer-6.c
index 201ca8424828d6dabe1c6d90dff8396438a71ff4..13a5496f70c069f790d24d036642e0715a133b3b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-6.c
@@ -48,6 +48,7 @@ int main ()
   main1();
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       s = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c
index 6299e9fed4233b3ec2c0b9892afdca42edf0bee0..8114934ed03332aaa682c6d4b5a7f62dfc33a51e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c
@@ -62,6 +62,7 @@ int main (void)
   foo ();
   fir ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (out[i] != fir_out[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c
index d575229f2fb3bb6ece1fbc013019ebb0fbaa505e..9c4be4b9f658f7abd1e65b7b5a9124a5670f7ab9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c
@@ -66,6 +66,7 @@ int main (void)
   foo ();
   fir ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (out[i] != fir_out[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c
index 9414e82f3edb1ea00587b916bfaf66847ac07574..4f1ccfccfa229105eb4e8a5c96a5ebfb13384c5d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c
@@ -66,6 +66,7 @@ int main (void)
   foo ();
   fir ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (out[i] != fir_out[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c
index 0d181dfec24a212d430a1cac493ee914ebe25325..1c68c6738580d8670b7b108c52987d576efee4ac 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c
@@ -62,6 +62,7 @@ int main (void)
   foo ();
   fir ();
   
+#pragma GCC novector
   for (i = 0; i < N; i++) {
     if (out[i] != fir_out[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c
index 217b32cd2bde12247f94f36787ccdf67bb014ba2..795bff5f3d5f1629b75cdc7fefdc48ff4c05ad8a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c
@@ -66,6 +66,7 @@ int main()
       t2[i] = z1[i]; z1[i] = 1.0f;
     }
   foo2 ();  /* scalar variant.  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     if (x1[i] != t1[i] || z1[i] != t2[i])
       abort ();	
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
index 3ae1020936f960a5e46d6c74bee80d3b52df6db5..ead8d6f8e79187f0054d874b1d6e5fe3c273b5ca 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
@@ -67,6 +67,7 @@ int main ()
       t2[i] = z1[i]; z1[i] = 1.0f;
     }
   foo2 (n);  /* scalar variant.  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     if (x1[i] != t1[i] || z1[i] != t2[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c
index 59e54db075254498b34f673198f8f4f373b728a5..a102ddd7d8d4d9182436646e1ca4d0bd1dd86479 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c
@@ -70,6 +70,7 @@ int main()
       t2[i] = z1[i]; z1[i] = 1.0f;
     }
   foo2 ();  /* scalar variant.  */
+#pragma GCC novector
   for (i=0; i<N; i++)
     if (x1[i] != t1[i] || z1[i] != t2[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c
index ec1e1036f57022716361977fb419b0806e55123d..0e5388b46ce80b610d75e18c725b8f05881c244b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c
@@ -28,6 +28,7 @@ int main ()
 
   foo ();
 
+#pragma GCC novector
   for (int i = 0; i < 20; i++)
     {
       double suma = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c
index 53865d4737b1333c3eb49723d35d2f0e385049a3..3dce51426b5b83d85bc93aaaa67bca3e4c29bc44 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c
@@ -35,6 +35,7 @@ int main ()
 
   foo ();
 
+#pragma GCC novector
   for (int i = 0; i < 20; i++)
     {
       double suma = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
index 9a5141ee6ecce133ce85edcf75603e0b3ce41f04..a7ce95bcdcefc1b71d84426290a72e8891d8775b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
@@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
index f2d284ca9bee4af23c25726a54866bfaf054c46c..21fbcf4ed70716b47da6cbd268f041965584d08b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
@@ -30,6 +30,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
index 222d854b2d6e3b606e83131862c2d23a56f11829..1e48dab5ccb4b13c82800d890cdd5a5a5d6dd295 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
@@ -43,6 +43,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, d);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       int res = BASE_B + BASE_C + i * 9;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
index b25db881afbc668bb163915a893bfb8b83243f32..08a65ea551812ba48298884ec32c6c7c5e46bdd2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
@@ -36,6 +36,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
index d31050ee926ac7e12c8bce99bf3edc26a1b11fbe..bd7acbb613f47fd61f85b4af777387ae88d4580a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
@@ -38,6 +38,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
index 333d74ac3e7bdf99cc22b8fc7e919e39af7d2ca4..53fcfd0c06c14e5d9ddc06cdb3c36e2add364d3b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
@@ -33,6 +33,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4)))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
index ecb74d7793eeaa80b0d48479b2be6c68e64c61b0..aa58cd1c95789ad4f17317c5fa501385a185edc9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
@@ -35,6 +35,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4)))
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
index 11546fe1a502a02c750ea955f483bc3a8b3a0ac7..c93cd4d09af5fc602b5019352073404bb1f5d127 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
@@ -40,6 +40,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, d, e);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != i * 2 + 3
 	|| b[i] != i + 100
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
index 82aec9f26517b2e00568f3240ff88d954af29bea..4bbb30ac8aca529d062e0daacfe539177ab92224 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
@@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (int *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
index 0bcbd4f1b5315ec84e4aa3bd92e058b6ca9ea0ec..ad423f133c0bc25dfad42e30c34eceb5a8b852ab 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
@@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (int *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
index 47f970d231ee61c74c7c4d5b3f9e9bab0673cfe2..81292d42f0d695f98b62607053daf8a5c94d98d3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
@@ -40,6 +40,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, d, e);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != i * 2 + 3
 	|| b[i] != i + 100
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
index 6e13f260f1124ef221bae41b31f8f52ae35162d3..361f77081a6d0a1d30051107f37aa4a4b764af4f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
@@ -38,6 +38,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, d, e);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != i * 2 + 3
 	|| b[i] != i + 100
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c
index 187bdf159feaa770b8497c020bd3bc82becdea15..830f221019871a3df26925026b7b8c506da097db 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c
@@ -37,6 +37,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, d, 0x73);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (b[i] != ((i * 2 + 3) ^ 0x73)
 	|| a[i] != ((i * 11) | b[i]))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
index 6f89aacbebf5094c7b1081b12c7fcce1b97d536b..55de14161d85db871ae253b86086a1341eba275c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
@@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
index a1e1182c6067db47445ad07b77e5c6e067858488..3d833561972da4a128c1bc01eff277564f084f14 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
@@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
index 03a6e6795ec68e5f9a35da93ca7a8d50a3012a21..6b3a2b88abfb6a5cd4587d766b889825c2d53d60 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
@@ -28,6 +28,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
index 0ef377f1f58a6f6466380a59c381333dbc4805df..60c9c2cc1ec272b46b7bb9a5cf856a57591425b0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
@@ -32,6 +32,7 @@ foo (unsigned char *src, unsigned char *dst)
 
   s = src;
   d = (unsigned short *)dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       const int b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
index 269df5387d20c859806da03aed91d77955fa651a..c2ab11a9d325c1e636003e61bdae1bab63e4cf85 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
@@ -37,6 +37,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (BASE_B + BASE_C + i * 9) >> 1)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
index 314a6828c53c161d2e63b88bdecf0cee9070a794..1d55e13fb1fbc4273d3a64da20dc1e80fb760296 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
@@ -39,6 +39,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c, D);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2)
       __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
index 5baba09a575f0f316aac1a967e145dbbbdade5b4..36bfc68e05357359b8d9bdfe818910a3d0ddcb5a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
@@ -40,6 +40,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     {
       int res = BASE_B + BASE_C + i * 9;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c b/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c
index 7980d4dd6438d9a063051c78608f73f1cea1c740..717850a166b2b811797cf9cdd0753afea676bf74 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c
@@ -21,6 +21,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i] != ib[i+2] + ib[i+6])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c b/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
index f6fc134c8705567a628dcd62c053ad6f2ca2904d..5e5a358d34bece8bbe5092bf2d617c0995388634 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
@@ -22,6 +22,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i] != ib[i+2] + ib[i+6])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c b/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c
index 33088fb090271c3b97fae2300e5d7fc86242e246..1b85f14351242304af71564660de7db757294400 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c
@@ -18,6 +18,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 1; i <= N; i++)
     {
       if (ia[i] != ib[i] + ib[i+5])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
index 64de22a1db4d7a8b354ad3755685171308a79a00..698ca5bf0672d3bfce0121bd2eae27abb2f75ca2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
@@ -28,6 +28,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c);
+#pragma GCC novector
   for (int i = 1; i < 64; ++i)
     if (b[i] != a[i] - a[i-1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
index 086b48d9087c02ccbc0aaf36f575a3174f2916af..777051ee4a16a47f20339f97e13ad396837dea9a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
@@ -29,6 +29,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c);
+#pragma GCC novector
   for (int i = 1; i < 64; ++i)
     if (b[i] != a[i] - a[i-1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
index 3389736ead98df2207a89de3ecb34a4a95faa6f5..aeb7da3877d7e0df77d6fee1a379f352ae2a5750 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
@@ -29,6 +29,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c);
+#pragma GCC novector
   for (int i = 1; i < 64; ++i)
     if (b[i] != a[i] - a[i-1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
index c0b73cd8f3322ae01b7a1889657bc92d38fa4af6..f4ab59671b7934e3e6f5d893159a3618f4aa3898 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
@@ -31,6 +31,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c);
+#pragma GCC novector
   for (int i = 2; i < 64; i+=2)
     if (b[i] != a[i] - a[i-2]
 	|| b[i+1] != a[i+1] - a[i-1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
index 7327883cc31ae4a37e5e4597b44b35e6376b4ed2..2fed60df68cdfbdc3ebf420db51d132ed335dc14 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
@@ -32,6 +32,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c);
+#pragma GCC novector
   for (int i = 2; i < 64; i+=2)
     if (b[i] != a[i] - a[i-2]
 	|| b[i+1] != a[i+1] - a[i-1])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
index f678b326f1043d2bce51b1d652de5ee2b55d6d0f..c170f4c345cdee1d5078452f9e301e6ef6dff398 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
@@ -28,6 +28,7 @@ main ()
     }
   int c = 7;
   foo (a, b, &c, 63);
+#pragma GCC novector
   for (int i = 1; i < 63; ++i)
     if (b[i] != a[i] - a[i-1])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
index 484efb1e8c826a8dafb43ed18e25794951418a9c..49ecbe216f2740329d5cd2169527a9aeb7ab844c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
@@ -70,6 +70,7 @@ main (void)
       fns[i].div (b, a, N);
       fns[i].mod (c, a, N);
 
+#pragma GCC novector
       for (int j = 0; j < N; j++)
 	if (a[j] != (b[j] * p + c[j]))
           __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c b/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c
index dfd8ebace5610b22cc0da33647953ae33e084a42..0c4025abceb0e36092f5f7be1f813e4a6ebeda15 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c
@@ -88,6 +88,7 @@ main ()
   f4 (4095);
   if (a[0] != (-2048 << 8))
     abort ();
+#pragma GCC novector
   for (i = 1; i < 4096; i++)
     if (a[i] != ((1 + ((i - 2048) % 16)) << 8))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-shift-1.c
index 0c3086b1d683441e9b7d0096d4edce37e86d3cc1..d5fc4748758cea2762efc1977126d48df265f1c3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-shift-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-1.c
@@ -21,6 +21,7 @@ int main ()
     A[i] = A[i] >> 3;
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (A[i] != B[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-3.c b/gcc/testsuite/gcc.dg/vect/vect-shift-3.c
index a1b4b0752291e64d51206fca644e241c8e0063a9..0a9d562feb56ec69e944d0a3581853249d9642ae 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-shift-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-3.c
@@ -26,6 +26,7 @@ int main()
 
   array_shift ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (dst[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-4.c b/gcc/testsuite/gcc.dg/vect/vect-shift-4.c
index 09f6e5a9584099b34e539b72dbe95e33da83cd20..d53faa52ee88b00d09eeefa504c9938084fa6230 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-shift-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-4.c
@@ -26,6 +26,7 @@ int main()
 
   array_shift ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (dst[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-1.c
index 7c3feeeffae363b8ad42989a3569ca394519a414..09722ae090d0edb875cb91f5b20da71074aee7d3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-1.c
@@ -44,19 +44,23 @@ main ()
 {
   check_vect ();
   foo ();
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != 1)
       abort ();
   x = 1;
   foo ();
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != 2)
       abort ();
   baz ();
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != 3)
       abort ();
   qux ();
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != 4)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-10.c b/gcc/testsuite/gcc.dg/vect/vect-simd-10.c
index e49566a3847a97dee412148bed63a4b69af8dd1b..af0999a726288890a525fe18966331e0cb5c0cad 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-10.c
@@ -76,6 +76,7 @@ main ()
   if (r * 16384.0f != 0.125f)
     abort ();
   float m = -175.25f;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s *= a[i];
@@ -91,6 +92,7 @@ main ()
   if (bar () != 592.0f)
     abort ();
   s = FLT_MIN_VALUE;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (s < a[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-11.c b/gcc/testsuite/gcc.dg/vect/vect-simd-11.c
index e7d8aa0eb03879fcf0a77a512afc3281fbeabe76..2620dfebbc0dde80d219660dcead43ae01c7c41f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-11.c
@@ -109,6 +109,7 @@ main ()
       || r2 != (unsigned short) r
       || r3 != (unsigned char) r)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -129,6 +130,7 @@ main ()
       || s3 != (unsigned char) (1024 * 1023))
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
@@ -152,6 +154,7 @@ main ()
       || r3 != (unsigned char) r)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -174,6 +177,7 @@ main ()
       || s3 != (unsigned char) (1024 * 1023))
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-12.c b/gcc/testsuite/gcc.dg/vect/vect-simd-12.c
index cdfec81a6e6d761b6959fd434fc3367ad01d7026..45b55384006b1674c36a89f4539d2ffee2e4236e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-12.c
@@ -77,6 +77,7 @@ main ()
   foo (a, b);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -88,6 +89,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -101,6 +103,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -112,6 +115,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-13.c b/gcc/testsuite/gcc.dg/vect/vect-simd-13.c
index aee5244d85e18e707163a34cb93a9cd5b1317fc3..3ef4aa9a991c0b6259f3b3057616c1aa298663d9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-13.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-13.c
@@ -79,6 +79,7 @@ main ()
   foo (a, b);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -90,6 +91,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -103,6 +105,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -114,6 +117,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-14.c b/gcc/testsuite/gcc.dg/vect/vect-simd-14.c
index 9e73792ed7c36030b2f6885e1257a66991cdc4d1..c8a38f85ad4f29c9bbc664a368e23254effdd976 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-14.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-14.c
@@ -76,6 +76,7 @@ main ()
   if (r * 16384.0f != 0.125f)
     abort ();
   float m = -175.25f;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
@@ -89,6 +90,7 @@ main ()
   if (bar () != 592.0f)
     abort ();
   s = FLT_MIN_VALUE;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-15.c b/gcc/testsuite/gcc.dg/vect/vect-simd-15.c
index 91e34cd6428c4b841ab55226e49a5fc10444df57..6982a59da78276bad2779827ee0b8c1e1691e2e3 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-15.c
@@ -109,6 +109,7 @@ main ()
       || r2 != (unsigned short) r
       || r3 != (unsigned char) r)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s
@@ -129,6 +130,7 @@ main ()
       || s3 != (unsigned char) (1024 * 1023))
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s
@@ -152,6 +154,7 @@ main ()
       || r3 != (unsigned char) r)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s
@@ -174,6 +177,7 @@ main ()
       || s3 != (unsigned char) (1024 * 1023))
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       if (b[i] != s
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-16.c b/gcc/testsuite/gcc.dg/vect/vect-simd-16.c
index ee4459a9341815c7ac4a5f6be4b9ca7679f13022..1ac13a5c5b4f568afa448af8d294d114533c061b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-16.c
@@ -41,12 +41,14 @@ main ()
   check_vect ();
   if (foo (a) != 64)
     abort ();
+#pragma GCC novector
   for (i = 0; i < 64; ++i)
     if (a[i] != i)
       abort ();
     else
       a[i] = -8;
   bar (a);
+#pragma GCC novector
   for (i = 0; i < 64; ++i)
     if (a[i] != i + 1)
       abort ();
@@ -54,6 +56,7 @@ main ()
       a[i] = -8;
   if (baz (a) != 64)
     abort ();
+#pragma GCC novector
   for (i = 0; i < 64; ++i)
     if (a[i] != i + 2)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-17.c b/gcc/testsuite/gcc.dg/vect/vect-simd-17.c
index 951ba3afd9e332d7cd22addd273adf733e0fb71a..79b3602a6c08969a84856bf98ba59c18b45d5b11 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-17.c
@@ -52,12 +52,14 @@ doit (void)
   if (i != 11 || j != 101 || x != 10340 || niters != 550 || err)
     abort ();
   for (i = 1; i <= 10; i++)
+#pragma GCC novector
     for (j = 1; j <= 10 * i; j++)
       if (k[i][j] == 3)
 	k[i][j] = 0;
       else
 	abort ();
   for (i = 0; i < 11; i++)
+#pragma GCC novector
     for (j = 0; j < 101; j++)
       if (k[i][j] != 0)
 	abort ();
@@ -101,12 +103,14 @@ doit (void)
   if (i != 10 || j != 90 || x != 9305 || niters != 450 || err)
     abort ();
   for (i = 0; i < 10; i++)
+#pragma GCC novector
     for (j = 0; j < 10 * i; j++)
       if (k[i][j] == 3)
 	k[i][j] = 0;
       else
 	abort ();
   for (i = 0; i < 11; i++)
+#pragma GCC novector
     for (j = 0; j < 101; j++)
       if (k[i][j] != 0)
 	abort ();
@@ -156,6 +160,7 @@ doit (void)
       else
 	abort ();
   for (i = 0; i < 11; i++)
+#pragma GCC novector
     for (j = 0; j < 101; j++)
       if (k[i][j] != 0)
 	abort ();
@@ -199,12 +204,14 @@ doit (void)
   if (i != 11 || j != 10 || x != 9225 || niters != 25 || err)
     abort ();
   for (i = 1; i < 10; i += 2)
+#pragma GCC novector
     for (j = 1; j < i + 1; j++)
       if (k[i][j] == 3)
 	k[i][j] = 0;
       else
 	abort ();
   for (i = 0; i < 11; i++)
+#pragma GCC novector
     for (j = 0; j < 101; j++)
       if (k[i][j] != 0)
 	abort ();
@@ -244,11 +251,13 @@ doit (void)
       }
   if (i != 16 || j != 4 || x != 5109 || niters != 3 || err)
     abort ();
+#pragma GCC novector
   for (j = -11; j >= -41; j -= 15)
     if (k[0][-j] == 3)
       k[0][-j] = 0;
     else
       abort ();
+#pragma GCC novector
   for (j = -11; j >= -41; j--)
     if (k[0][-j] != 0)
       abort ();
@@ -288,6 +297,7 @@ doit (void)
       }
   if (/*i != 11 || j != 2 || */x != -12295 || niters != 28 || err)
     abort ();
+#pragma GCC novector
   for (j = -34; j <= -7; j++)
     if (k[0][-j] == 3)
       k[0][-j] = 0;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-18.c b/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
index cca350f5c21125fa4380611a1ba42be317fd9d85..e454abe88009a7572cfad1397bbd5770c7086a6b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
@@ -25,12 +25,14 @@ main ()
   int i, r;
   check_vect ();
   r = foo (78, p);
+#pragma GCC novector
   for (i = 0; i < 10000 / 78; i++)
     if (p[i] != 78 * i)
       abort ();
   if (r != (10000 / 78) * (10000 / 78 + 1) / 2 * 78 * 3)
     abort ();
   r = foo (87, p);
+#pragma GCC novector
   for (i = 0; i < 10000 / 87; i++)
     if (p[i] != 87 * i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-19.c b/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
index 67e25c0e07eeff8e3453a8a3b5e4df54b16f3f30..4d25b43f5dca9df6562a146e12e1c3542d094602 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
@@ -25,12 +25,14 @@ main ()
   int i, r;
   check_vect ();
   r = foo (78, 0, 10000, p);
+#pragma GCC novector
   for (i = 0; i < 10000 / 78; i++)
     if (p[i] != 78 * i)
       abort ();
   if (r != (10000 / 78) * (10000 / 78 + 1) / 2 * 78 * 3)
     abort ();
   r = foo (87, 0, 10000, p);
+#pragma GCC novector
   for (i = 0; i < 10000 / 87; i++)
     if (p[i] != 87 * i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-20.c b/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
index 57217c8a6ba4c15095f777cfa64aee9ffbe3e459..9ba7c3ce956a613e175ee6bd1f04b0531e6a79bd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
@@ -27,6 +27,7 @@ main ()
   check_vect ();
   r = foo (78, 0, 10000, p);
   for (j = 0; j < 7; j++)
+#pragma GCC novector
     for (i = 0; i < 10000 / 78; i++)
       if (p[j * (10000 / 78 + 1) + i] != 78 * i)
 	abort ();
@@ -34,6 +35,7 @@ main ()
     abort ();
   r = foo (87, 0, 10000, p);
   for (j = 0; j < 7; j++)
+#pragma GCC novector
     for (i = 0; i < 10000 / 87; i++)
       if (p[j * (10000 / 87 + 1) + i] != 87 * i)
 	abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-8.c b/gcc/testsuite/gcc.dg/vect/vect-simd-8.c
index 5d10ad90501835bf6cac2c2d81ee98bc6ce6db5b..a3c2decee2e36949950ca87a0a9942bc303ee633 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-8.c
@@ -77,6 +77,7 @@ main ()
   foo (a, b);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -88,6 +89,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
@@ -101,6 +103,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -112,6 +115,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-9.c b/gcc/testsuite/gcc.dg/vect/vect-simd-9.c
index 52eb24f680f1362ee93b7a22de5fd46d37119216..b652759e5ad5ec723a644cf9c6cb31677d120e2d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-9.c
@@ -79,6 +79,7 @@ main ()
   foo (a, b);
   if (r != 1024 * 1023 / 2)
     abort ();
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -90,6 +91,7 @@ main ()
   if (bar () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
@@ -103,6 +105,7 @@ main ()
   if (r != 1024 * 1023 / 2)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += i;
@@ -114,6 +117,7 @@ main ()
   if (qux () != 1024 * 1023)
     abort ();
   s = 0;
+#pragma GCC novector
   for (int i = 0; i < 1024; ++i)
     {
       s += 2 * i;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
index cd65fc343f1893accb6f25a6222a22f64a8b4b2e..c44bfe511a5743198a647247c691075951f2258d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
@@ -46,10 +46,12 @@ main ()
   int i;
   check_vect ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (array[i] != (i < 30 ? 5 : i * 4 + 123))
       abort ();
   baz ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (array[i] != (i < 30 ? 5 : i * 8 + 123))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c
index 03acd375e089c3a430adbed8d71197f39d7c512b..ed63ff59cc05e5f0a240376c4ca0985213a7eb48 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c
@@ -65,6 +65,7 @@ main ()
   check_vect ();
   fn3 ();
   fn1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
@@ -72,6 +73,7 @@ main ()
       abort ();
   fn3 ();
   fn2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c
index 29acde22f1783e8b11376d1ae2e702e09182350c..4974e5cc0ccdc5e01bf7a61a022bae9c2a6a048b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c
@@ -44,19 +44,23 @@ main ()
   if (sizeof (int) * __CHAR_BIT__ < 32)
     return 0;
   bar (a + 7);
+#pragma GCC novector
   for (i = 0; i < N / 2; i++)
     if (a[i + 7] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
       abort ();
   bar (a);
+#pragma GCC novector
   for (i = 0; i < N / 2; i++)
     if (a[i] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
       abort ();
 #if 0
   baz (a + 7);
+#pragma GCC novector
   for (i = 0; i < N / 2; i++)
     if (a[i + 7] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
       abort ();
   baz (a);
+#pragma GCC novector
   for (i = 0; i < N / 2; i++)
     if (a[i] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c
index 675ac7026b67edda2e573367643eb68063559bc2..866f1000f34098fb578001395f4a35e29cc8c0af 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c
@@ -32,6 +32,7 @@ main ()
   int i;
   check_vect ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (array[i] != ((i >> 1) + (-3 * i)))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
index ffcbf9380d609d7a3ed7420a38df5c11f632b46a..feab989cfd595f9fdb839aa8bd3e8486751abf2f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
@@ -44,6 +44,7 @@ main ()
   check_vect ();
   baz ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (array[i] != 5 * (i & 7) * i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
index 18d68779cc5dd8faec77a71a8f1cfa9785ff36ed..fef48c5066918a42fa80f1e14f9800e28ddb2c96 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
@@ -37,6 +37,7 @@ main ()
   int i;
   check_vect ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (d[i] != (i < 30 ? 5 : i * 4 + 123) || e[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
index e9af0b83162e5bbd40e6a54df7d656ad956a8fd8..42414671c254ffcd93169849d7a982861aa5ac0b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
@@ -40,6 +40,7 @@ main ()
   int i;
   check_vect ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (d[i] != (i < 30 ? 5.0f : i * 4 + 123.0f) || e[i] || f[i] != 1)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
index 46da496524d99ff70e3673682040c0d5067afe03..620cec36e4c023e1f52160327a3d5ba21540ad3b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
@@ -35,6 +35,7 @@ main ()
   int i;
   check_vect ();
   bar ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (d[i] != i * 4 + 123 || e[i] != i)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c
index 6143a91eaf078d5b73e608bcfa080b70a5896f3d..440091d70e83be80574a6fcf9e034c53aed15786 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c
@@ -57,6 +57,7 @@ main ()
   check_vect ();
   baz ();
   bar (0);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != 2 * i || b[i] != 6 - 7 * i
 	|| c[i] != 6 - 5 * i + ((i & 31) << 4))
@@ -64,6 +65,7 @@ main ()
     else
       a[i] = c[i];
   bar (17);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != 6 - 5 * i + ((i & 31) << 4)
 	|| b[i] != 6 - 7 * i
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c
index a0316e9e5813ac4c9076aaf5f762b9cc5dc98b1e..62246e28837272ef1e18860912643422f6dce018 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c
@@ -57,6 +57,7 @@ main ()
   check_vect ();
   baz ();
   bar (0);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != 2 * i || b[i] != 6 - 7 * i
 	|| c[i] != 6 - 5 * i + ((i & 31) << 4))
@@ -64,6 +65,7 @@ main ()
     else
       a[i] = c[i];
   bar (17);
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != 6 - 5 * i + ((i & 31) << 4)
 	|| b[i] != 6 - 7 * i
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
index f414285a170c7e3469fdad07256ef09e1b46e17b..11ea2132689137cfb7175b176e39539b9197a330 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
@@ -76,6 +76,7 @@ main ()
   check_vect ();
   fn3 ();
   fn1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
@@ -83,6 +84,7 @@ main ()
       abort ();
   fn3 ();
   fn2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c
index a968b9ce91a17c454f66aa76ec8b094e011e1c74..0112e553f8f130b06ee23a8c269a78d7764dcfff 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c
@@ -76,6 +76,7 @@ main ()
   check_vect ();
   fn3 ();
   fn1 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
@@ -83,6 +84,7 @@ main ()
       abort ();
   fn3 ();
   fn2 ();
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
 	|| b[i] != 17 + (i % 37)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c
index da47a824cb6046dcd9808bd7bd80161dbc0531b5..1531553651ceb6185ce16ab49f447496ad923408 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c
@@ -46,6 +46,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].b != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c
index d53b7669a6b50d6bc27e646d08af98ca6fd093e3..b8d094723f9035083a244cfcee98d3de46512206 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c
@@ -33,6 +33,7 @@ main1 ()
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
index 37ff3abe97d60d9b968addaee7812cb0b05b6f44..0f1344c42017fc2a5bfda3a9c17d46fbdd523127 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
@@ -44,6 +44,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c
index 9237a9074deeb72c4d724771d5397d36593ced7c..b0d36486714159c88419ce9e793c27a398ddcbcb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c
@@ -39,6 +39,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].b != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c
index 62a8a51e2034b1065a4438a712a80e0a7c149985..1c9906fa65237a7b9e0bbd2162e9c56b6e86074f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c
@@ -39,6 +39,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i] != arr[i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
index f64a1347350a465b9e7a0c123fe2b5bcbc2bf860..dc9ad168c7161c15f6de4a57d53e301e6754e525 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
@@ -33,6 +33,7 @@ main1 ()
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].a
@@ -49,6 +50,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c
index 2add5b489915cffda25f3c59b41bd1c44edf16ce..d35e427996f472ce9fffdf9570fb6685c3115037 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c
@@ -62,6 +62,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != check_res[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
index 2b7a1a4bb77f4dce44958c50864a0a6ecac90c53..a9524a9d8e5cb152ec879db68f316d5568161ec1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
@@ -51,6 +51,7 @@ main1 ()
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c
index e487de8b4e7d8e092054a73b337a345ba00e4e02..95ff41930d3f1ab95f0a20947e0527f39c78e715 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c
@@ -71,6 +71,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != check_res[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
index 0f3347e8bb2200f48927b21938e7ebd348a73ada..b2dd1aee116d212bda7df0b0b1ca5470bd35ab83 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
@@ -56,6 +56,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-float.c b/gcc/testsuite/gcc.dg/vect/vect-strided-float.c
index 6d6bfae7bc5ce4cbcaeaadc07856773e6d77bdb4..716cce3eecbec0390f85f393e9cc714bd1a1faae 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-float.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-float.c
@@ -22,6 +22,7 @@ main1 (void)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (a[i*2] != b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c b/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c
index 82727e595c166a52c8a1060339259ec7c39b594f..59008499192388c618f3eb38d91d9dcb5e47e3d9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c
@@ -35,6 +35,7 @@ main1 (s *arr, ii *iarr)
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].b != arr[i].b - arr[i].a 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c
index 0fac615011601d45c64e83be1a6ec1e1af407192..350223fa23ace9253e8e56bbbbd065e575639b19 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c
@@ -35,6 +35,7 @@ main1 (s *arr, ii *iarr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].b != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c b/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c
index 8c560480bc4eac50c381ed51cfbc6ccc696d0424..e988c5c846911a875a188cbb6ec8a4e4b80b787a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c
@@ -35,6 +35,7 @@ main1 (s * __restrict__  pIn, s* __restrict__ pOut)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (q->a != p->a + 5
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
index dcae9c26d8621dd393f00d81257262a27913d7a8..37b8eb80ce0ce0dfe1ce5f9e5c13618bffbe41ff 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
@@ -24,6 +24,7 @@ main (int argc, char **argv)
     }
   loop ();
   __asm__ volatile ("" : : : "memory");
+#pragma GCC novector
   for (int i = 0; i < N; i++)
     {
       if (out[i] != i*2 + 7)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c
index 6be939eea167992aade397ada0ee50d4daa43066..a55cd32e5896be4c1592e4e815baccede0f30e82 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c
@@ -38,6 +38,7 @@ main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != a[i] + 3
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c
index 9d1ad579e6a607f34ec953395f741f180474a77a..170f23472b967cedec88c1fa82dfb898014a6d09 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c
@@ -34,6 +34,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].c != a[i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c
index a081d4e396e36a4633eb224d927543c7379d3108..11c2f2c4df60d8238830c188c3400a324444ab4d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c
@@ -22,6 +22,7 @@ main1 (void)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N/2; i++)
     {
       if (a[i*2] != b[i] + c[i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store.c
index e8303b63bd4812e0643dc96888eeee2ea8ca082a..dfdafe8e8b46ea33e3c9ed759687788784a22607 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-store.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store.c
@@ -19,12 +19,14 @@ int main()
   float src[] = {1, 2, 3, 4, 5, 6, 7, 8};
   float dest[64];
   check_vect ();
+#pragma GCC novector
   for (stride = 0; stride < 8; stride++)
     {
       sumit (dest, src, src, stride, 8);
       if (!stride && dest[0] != 16)
 	abort();
       else if (stride)
+#pragma GCC novector
 	for (i = 0; i < 8; i++)
 	  if (2*src[i] != dest[i*stride])
 	    abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c
index 7d264f39c60d668927232a75fe3843dbee087aa5..004db4e1f84735d8857c5591453158c96f213246 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c
@@ -25,6 +25,7 @@ main1 (s *arr)
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c
index 63a4da797cbeb70bde0b1329fe39f510c24a990c..5d94e8f49bc41431df9de2b809c65e48cc269fa0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c
@@ -18,6 +18,7 @@ check1 (s *res)
 {
   int i;
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (res[i].a != C (i)
 	|| res[i].b != A (i)
@@ -30,6 +31,7 @@ check2 (unsigned short *res)
 {
   int i;
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (res[i] != (unsigned short) (A (i) + B (i) + C (i)))
       abort ();
@@ -40,6 +42,7 @@ check3 (s *res)
 {
   int i;
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (res[i].a != i
 	|| res[i].b != i
@@ -52,6 +55,7 @@ check4 (unsigned short *res)
 {
   int i;
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (res[i] != (unsigned short) (A (i) + B (i)))
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c
index ee8ea0d666db4b7671cd3f788fc7f6056189f3da..547ad9b9ee3d35802d3f8d7b9c43d578fb14f828 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c
@@ -34,6 +34,7 @@ main1 (s *arr)
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c
index fe41dbd9cf452b9452084e988d48ede232f548bf..8f58e24c4a8b8be2da0a6c136924a370b9952691 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c
@@ -29,6 +29,7 @@ main1 (s *arr)
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c
index a88c0f08456cf278c4fa5a5b9b0a06900cb7c9be..edb13d1b26f5963113917e8882f199c7dd4d8de7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c
@@ -37,6 +37,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c
index cddd0c88f42a99f526362ca117e9386c013c768d..0c2bd9d8cbde5e789474595db519d603b374e74c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c
@@ -29,6 +29,7 @@ main1 (unsigned short *arr, ii *iarr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i] != arr[i]
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
index ab841205e4f5b3c0aea29f60045934e84644a6a7..fd7920031dcf6df98114cfde9a56037d655bb74d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
@@ -25,6 +25,7 @@ main1 (s *arr)
     }
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b
@@ -41,6 +42,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c
index 0afd50db0b8de7758faf7f2bff14247a27a7ee38..ae2345a9787804af0edc45d93f18e75d159326b0 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c
@@ -24,6 +24,7 @@ main1 (s *arr)
       ptr++;
     }
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b - arr[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c
index ef532251465e5b1eb16e820fc30844a7995b82a9..c7a1da534baea886fe14add1220c105153d6bb80 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c
@@ -39,6 +39,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != check_res[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
index 04f18fbb591d9dc50d56b20bce99cb79903e5e27..2a068d821aebee8ab646ff1b4c33209dc5b2fcbf 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
@@ -37,6 +37,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c
index b5eb87f4b96e1a577930654f4b1709024256e90e..ac7bf000196b3671044de57d88dd3a32080b68a8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c
@@ -41,6 +41,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != check_res[i].a
@@ -64,6 +65,7 @@ main1 (s *arr)
     }
 
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c
index 69b970ef33b9dd8834b10baf7085b88a0c441a46..0a6050ae462332b8d74043fce094776892a80386 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c
@@ -53,6 +53,7 @@ main1 (s *arr, int n)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < n; i++)
     { 
       if (res[i].c != arr[i].b + arr[i].c
@@ -67,6 +68,7 @@ main1 (s *arr, int n)
    }
 
   /* Check also that we don't do more iterations than needed.  */
+#pragma GCC novector
   for (i = n; i < N; i++)
     {
       if (res[i].c == arr[i].b + arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
index f1d05a5aaf9f6885b921c5ae3370d9c17795ff82..9ead5a776d0b1a69bec804615ffe7639f61f993f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
@@ -39,6 +39,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b + arr[i].c
@@ -62,6 +63,7 @@ main1 (s *arr)
     }
   
   /* Check results.  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != arr[i].b 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c
index b703e636b49f8c7995c4c463b38b585f79acbdf2..176c6a784bc73e0300e3114a74aba05dc8185cac 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c
@@ -44,6 +44,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (res[i].a != check_res[i].a
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c
index 764f10d0adaca01e664bb45dd4da59a0c3f8a2af..cef88f6bf8246a98933ff84103c090664398cedd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c
@@ -42,6 +42,7 @@ main1 (s *arr)
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c
index 35bab79ce826ac663eabb1a1036ed7afd6d33e8b..c29c3ff6cdc304e5447f0e12aac00cd0fcd7b61e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c
@@ -44,6 +44,7 @@ main1 (s *arr)
     } 
    
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     { 
       if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c
index ea35835465c8ed18be1a0c9c4f226f078a51acaa..2d5c10a878c7145972aeaa678e0e11c1cf1b79dd 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c
@@ -27,6 +27,7 @@ main (void)
   foo (X, Y);
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (Y[i] != result[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c
index df6b999c5a4d88c8b106829f6f9df8edbe00f86f..4848215a7a8f5fea569c0bfaf5909ac68a81bbf2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c
@@ -32,6 +32,7 @@ main (void)
   foo (X, Y, Z);
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (Y[i] != resultY[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c
index 36861a059e03b1103adc2dca32409878ca95611e..2a94c73907e813019fcfbc912a1599f7423e2a47 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c
@@ -40,6 +40,7 @@ main (void)
   foo (X, Y);
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (Y[i].a != result[i].a)  
diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c
index bfbb48b21ee632243f2f5ba63d7eeec0f687daef..b0e9d6f90391cfc05911f7cc709df199d7fbbdf1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c
@@ -26,6 +26,7 @@ main (void)
   foo (X, &X[2]);
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N+2; i++)
     {
       if (X[i] != result[i])
diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c
index d775f320e0c1e2c6de2e77a1d8df621971fc3d2d..27d762490908829d54cdbb81247926c2f677fe36 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c
@@ -40,6 +40,7 @@ main (void)
   foo (X, Y);
   
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (Y[i].a != result[i].a)
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c
index 0d6e64081a17fed8d9b9239f9ba02ffa1b7a758d..f3abc9407f52784e391c495152e617b1f0753e92 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c
@@ -38,6 +38,7 @@ main (void)
       asm volatile ("" ::: "memory");
     }
   f (a, b, c);
+#pragma GCC novector
   for (int i = 0; i < N; ++i)
     if (a[i] != (SIGNEDNESS_1 short) ((BASE + i * 5)
 				      * (BASE + OFFSET + i * 4)))
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
index 4c95dd2017922904122aee2925491e9b9b48fe8e..dfbb2171c004565045d91605354b5d6e7219ab19 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
@@ -17,6 +17,7 @@ foo (int *__restrict a,
   for (i = 0; i < n; i++)
     a[i] = b[i] * 2333;
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (a[i] != b[i] * 2333)
       abort ();
@@ -32,6 +33,7 @@ bar (int *__restrict a,
   for (i = 0; i < n; i++)
     a[i] = b[i] * (short) 2333;
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (a[i] != b[i] * (short) 2333)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
index 4075f815cea0ffbad1e05e0ac8b9b232bf3efe61..c2ad58f69e7fe5b62a9fbc55dd5dab43ba785104 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
@@ -17,6 +17,7 @@ foo (unsigned int *__restrict a,
   for (i = 0; i < n; i++)
     a[i] = b[i] * 2333;
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (a[i] != b[i] * 2333)
       abort ();
@@ -32,6 +33,7 @@ bar (unsigned int *__restrict a,
   for (i = 0; i < n; i++)
     a[i] = (unsigned short) 2333 * b[i];
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (a[i] != b[i] * (unsigned short) 2333)
       abort ();
@@ -47,6 +49,7 @@ baz (unsigned int *__restrict a,
   for (i = 0; i < n; i++)
     a[i] = b[i] * 233333333;
 
+#pragma GCC novector
   for (i = 0; i < n; i++)
     if (a[i] != b[i] * 233333333)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
index c4ac88e186dbc1a8f36f4d7567a9983446557eea..bfdcbaa09fbd42a16197023b09087cee6642105a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
@@ -43,12 +43,14 @@ int main (void)
 
   foo ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (out[i] != in[i] * COEF)
       abort ();
 
   bar ();
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (out[i] != in[i] * COEF)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
index ebbf4f5e841b75cb1f5171ddedec85cd327f385e..e46b0cc3135fd982b07e0824955654f0ebc59506 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
@@ -38,6 +38,7 @@ int main (void)
 
   foo (COEF2);
 
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (out[i] != in[i] * COEF || out2[i] != in[i] + COEF2)
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c
index 91a8a290263c9630610a48bce3829de753a4b320..6b094868064e9b86c40018363564f356220125a5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c
@@ -33,6 +33,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c
index 7e1f7457f1096d4661dcc724a59a0511555ec0e3..444d41169b5c198c6fa146c3bb71336b0f6b0432 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c
@@ -33,6 +33,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
index 2e28baae0b804cf76ad74926c35126df98857482..14411ef43eda2ff348de9c9c1540e1359f20f55b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
index d277f0b2b9492db77237a489cc8bea4749d8d719..f40def5dddf58f6a6661d9c286b774f954126840 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
@@ -35,6 +35,7 @@ int main (void)
 
   foo (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
index f50358802587d32c1d6e73c0f6e06bd8ff837fc2..63866390835c55e53b6f90f305a71bbdbff85afa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
@@ -34,6 +34,7 @@ int main (void)
 
   foo (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c
index 03d1379410eb927a3ef705afc6523230eb9fb58b..78ad74b5d499c23256e4ca38a82fefde8720e4e9 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c
@@ -34,6 +34,7 @@ int main (void)
 
   foo1 (N);
 
+#pragma GCC novector
   for (i=0; i<N; i++) {
     if (result[i] != X[i] * Y[i])
       abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c
index 5f6c047849b8625f908bc7432b803dff5e671cd3..26d5310807781eb5a7935c51e813bc88892f747c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c
@@ -32,6 +32,7 @@ foo (short *src, int *dst)
 
   s = src;
   d = dst;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       b = *s++;
@@ -60,6 +61,7 @@ foo (short *src, int *dst)
 
   s = src;
   d = dst;
+#pragma GCC novector
   for (i = 0; i < N/4; i++)
     {
       b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c
index 46512f2c69ba50521d6c7519a1c3d073e90b7436..7450d2aef75d755db558e471b807bfefb777f472 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c
@@ -23,6 +23,7 @@ foo (char *src, int *dst)
 
   s = src;
   d = dst;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c
index 212b5dbea18a91bd59d2caf9dc4f4cc3fe531762..ae086b88e7e83f2864d6e74fa94301f7f8ab62f6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c
@@ -23,6 +23,7 @@ foo (unsigned short *src, unsigned int *dst)
 
   s = src;
   d = dst;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c
index 844e5df3269d0a774d2ab8a88de11f17271d6f60..a8e536adee0f04611115e97725608d0e82e9893c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c
@@ -27,6 +27,7 @@ foo (unsigned char *src, unsigned int *dst1, unsigned int *dst2)
   s = src;
   d1 = dst1;
   d2 = dst2;
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       b = *s++;
diff --git a/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c b/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c
index c4d2de1a64e2ebc151c4ade2327c8fceb7ba04e4..414bd9d3e1279db574d860b7a721e4310d4972da 100644
--- a/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c
+++ b/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c
@@ -19,6 +19,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sb[i] != 5)
@@ -31,6 +32,7 @@ int main1 ()
     }
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     {
       if (sa[i] != 105)




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (2 preceding siblings ...)
  2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
@ 2023-06-28 13:43 ` Tamar Christina
  2023-07-04 11:52   ` Richard Biener
  2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
                   ` (38 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:43 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 2237 bytes --]

Hi All,

There's an existing bug in loop frequency scaling where the if statement checks
to see if there's a single exit, and records an dump file note but then
continues.

It then tries to access the null pointer, which of course fails.

For multiple loop exists it's not really clear how to scale the exit
probablities as it's really unknown which exit is most probably.

For that reason I ignore the exit edges during scaling but still adjust the
loop body.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* cfgloopmanip.cc (scale_loop_frequencies): Fix typo.
	(scale_loop_profile): Don't access null pointer.

--- inline copy of patch -- 
diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 6e09dcbb0b1864bc64ffd570a4b923f50c3819b5..b10ef3d2be82902ccd74e52a4318217b2db13bcb 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -501,7 +501,7 @@ scale_loop_frequencies (class loop *loop, profile_probability p)
 /* Scale profile in LOOP by P.
    If ITERATION_BOUND is non-zero, scale even further if loop is predicted
    to iterate too many times.
-   Before caling this function, preheader block profile should be already
+   Before calling this function, preheader block profile should be already
    scaled to final count.  This is necessary because loop iterations are
    determined by comparing header edge count to latch ege count and thus
    they need to be scaled synchronously.  */
@@ -597,14 +597,14 @@ scale_loop_profile (class loop *loop, profile_probability p,
       /* If latch exists, change its count, since we changed
 	 probability of exit.  Theoretically we should update everything from
 	 source of exit edge to latch, but for vectorizer this is enough.  */
-      if (loop->latch && loop->latch != e->src)
+      if (e && loop->latch && loop->latch != e->src)
 	loop->latch->count += count_delta;
 
       /* Scale the probabilities.  */
       scale_loop_frequencies (loop, p);
 
       /* Change latch's count back.  */
-      if (loop->latch && loop->latch != e->src)
+      if (e && loop->latch && loop->latch != e->src)
 	loop->latch->count -= count_delta;
 
       if (dump_file && (dump_flags & TDF_DETAILS))




-- 

[-- Attachment #2: rb17498.patch --]
[-- Type: text/plain, Size: 1523 bytes --]

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 6e09dcbb0b1864bc64ffd570a4b923f50c3819b5..b10ef3d2be82902ccd74e52a4318217b2db13bcb 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -501,7 +501,7 @@ scale_loop_frequencies (class loop *loop, profile_probability p)
 /* Scale profile in LOOP by P.
    If ITERATION_BOUND is non-zero, scale even further if loop is predicted
    to iterate too many times.
-   Before caling this function, preheader block profile should be already
+   Before calling this function, preheader block profile should be already
    scaled to final count.  This is necessary because loop iterations are
    determined by comparing header edge count to latch ege count and thus
    they need to be scaled synchronously.  */
@@ -597,14 +597,14 @@ scale_loop_profile (class loop *loop, profile_probability p,
       /* If latch exists, change its count, since we changed
 	 probability of exit.  Theoretically we should update everything from
 	 source of exit edge to latch, but for vectorizer this is enough.  */
-      if (loop->latch && loop->latch != e->src)
+      if (e && loop->latch && loop->latch != e->src)
 	loop->latch->count += count_delta;
 
       /* Scale the probabilities.  */
       scale_loop_frequencies (loop, p);
 
       /* Change latch's count back.  */
-      if (loop->latch && loop->latch != e->src)
+      if (e && loop->latch && loop->latch != e->src)
 	loop->latch->count -= count_delta;
 
       if (dump_file && (dump_flags & TDF_DETAILS))




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (3 preceding siblings ...)
  2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
@ 2023-06-28 13:43 ` Tamar Christina
  2023-07-04 12:05   ` Richard Biener
  2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
                   ` (37 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:43 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 5048 bytes --]

Hi All,

The bitfield vectorization support does not currently recognize bitfields inside
gconds. This means they can't be used as conditions for early break
vectorization which is a functionality we require.

This adds support for them by explicitly matching and handling gcond as a
source.

Testcases are added in the testsuite update patch as the only way to get there
is with the early break vectorization.   See tests:

  - vect-early-break_20.c
  - vect-early-break_21.c

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE
	from original statement.
	(vect_recog_bitfield_ref_pattern): Support bitfields in gcond.

Co-Authored-By:  Andre Vieira <andre.simoesdiasvieira@arm.com>

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 60bc9be6819af9bd28a81430869417965ba9d82d..c221b1d64449ce3b6c8864bbec4b17ddf938c2d6 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
   STMT_VINFO_DEF_TYPE (pattern_stmt_info)
     = STMT_VINFO_DEF_TYPE (orig_stmt_info);
+  STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info);
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
@@ -2488,27 +2489,37 @@ static gimple *
 vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 				 tree *type_out)
 {
-  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  gassign *conv_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
 
-  if (!first_stmt)
-    return NULL;
-
-  gassign *bf_stmt;
-  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
-      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+  gimple *bf_stmt = NULL;
+  tree cond_cst = NULL_TREE;
+  if (cond_stmt)
     {
-      gimple *second_stmt
-	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
-      bf_stmt = dyn_cast <gassign *> (second_stmt);
-      if (!bf_stmt
-	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+      tree op = gimple_cond_lhs (cond_stmt);
+      if (TREE_CODE (op) != SSA_NAME)
+	return NULL;
+      bf_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (op));
+      cond_cst = gimple_cond_rhs (cond_stmt);
+      if (TREE_CODE (cond_cst) != INTEGER_CST)
 	return NULL;
     }
-  else
+  else if (conv_stmt
+	   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (conv_stmt))
+	   && TREE_CODE (gimple_assign_rhs1 (conv_stmt)) == SSA_NAME)
+    {
+      gimple *second_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (conv_stmt));
+      bf_stmt = dyn_cast <gassign *> (second_stmt);
+    }
+
+  if (!bf_stmt
+      || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
     return NULL;
 
   tree bf_ref = gimple_assign_rhs1 (bf_stmt);
   tree container = TREE_OPERAND (bf_ref, 0);
+  tree ret_type = cond_cst ? TREE_TYPE (container)
+			   : TREE_TYPE (gimple_assign_lhs (conv_stmt));
 
   if (!bit_field_offset (bf_ref).is_constant ()
       || !bit_field_size (bf_ref).is_constant ()
@@ -2522,8 +2533,6 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 
   gimple *use_stmt, *pattern_stmt;
   use_operand_p use_p;
-  tree ret = gimple_assign_lhs (first_stmt);
-  tree ret_type = TREE_TYPE (ret);
   bool shift_first = true;
   tree container_type = TREE_TYPE (container);
   tree vectype = get_vectype_for_scalar_type (vinfo, container_type);
@@ -2560,7 +2569,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
   /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
      PLUS_EXPR then do the shift last as some targets can combine the shift and
      add into a single instruction.  */
-  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+  if (conv_stmt
+      && single_imm_use (gimple_assign_lhs (conv_stmt), &use_p, &use_stmt))
     {
       if (gimple_code (use_stmt) == GIMPLE_ASSIGN
 	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
@@ -2620,7 +2630,21 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 			       NOP_EXPR, result);
     }
 
-  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  if (cond_cst)
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_cond (gimple_cond_code (cond_stmt),
+			     gimple_get_lhs (pattern_stmt),
+			     fold_convert (ret_type, cond_cst),
+			     gimple_cond_true_label (cond_stmt),
+			     gimple_cond_false_label (cond_stmt));
+      *type_out = STMT_VINFO_VECTYPE (stmt_info);
+    }
+  else
+    *type_out
+      = get_vectype_for_scalar_type (vinfo,
+				     TREE_TYPE (gimple_get_lhs (pattern_stmt)));
   vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
 
   return pattern_stmt;




-- 

[-- Attachment #2: rb17499.patch --]
[-- Type: text/plain, Size: 4197 bytes --]

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 60bc9be6819af9bd28a81430869417965ba9d82d..c221b1d64449ce3b6c8864bbec4b17ddf938c2d6 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
   STMT_VINFO_DEF_TYPE (pattern_stmt_info)
     = STMT_VINFO_DEF_TYPE (orig_stmt_info);
+  STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info);
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
@@ -2488,27 +2489,37 @@ static gimple *
 vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 				 tree *type_out)
 {
-  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  gassign *conv_stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
 
-  if (!first_stmt)
-    return NULL;
-
-  gassign *bf_stmt;
-  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
-      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+  gimple *bf_stmt = NULL;
+  tree cond_cst = NULL_TREE;
+  if (cond_stmt)
     {
-      gimple *second_stmt
-	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
-      bf_stmt = dyn_cast <gassign *> (second_stmt);
-      if (!bf_stmt
-	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+      tree op = gimple_cond_lhs (cond_stmt);
+      if (TREE_CODE (op) != SSA_NAME)
+	return NULL;
+      bf_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (op));
+      cond_cst = gimple_cond_rhs (cond_stmt);
+      if (TREE_CODE (cond_cst) != INTEGER_CST)
 	return NULL;
     }
-  else
+  else if (conv_stmt
+	   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (conv_stmt))
+	   && TREE_CODE (gimple_assign_rhs1 (conv_stmt)) == SSA_NAME)
+    {
+      gimple *second_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (conv_stmt));
+      bf_stmt = dyn_cast <gassign *> (second_stmt);
+    }
+
+  if (!bf_stmt
+      || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
     return NULL;
 
   tree bf_ref = gimple_assign_rhs1 (bf_stmt);
   tree container = TREE_OPERAND (bf_ref, 0);
+  tree ret_type = cond_cst ? TREE_TYPE (container)
+			   : TREE_TYPE (gimple_assign_lhs (conv_stmt));
 
   if (!bit_field_offset (bf_ref).is_constant ()
       || !bit_field_size (bf_ref).is_constant ()
@@ -2522,8 +2533,6 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 
   gimple *use_stmt, *pattern_stmt;
   use_operand_p use_p;
-  tree ret = gimple_assign_lhs (first_stmt);
-  tree ret_type = TREE_TYPE (ret);
   bool shift_first = true;
   tree container_type = TREE_TYPE (container);
   tree vectype = get_vectype_for_scalar_type (vinfo, container_type);
@@ -2560,7 +2569,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
   /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
      PLUS_EXPR then do the shift last as some targets can combine the shift and
      add into a single instruction.  */
-  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+  if (conv_stmt
+      && single_imm_use (gimple_assign_lhs (conv_stmt), &use_p, &use_stmt))
     {
       if (gimple_code (use_stmt) == GIMPLE_ASSIGN
 	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
@@ -2620,7 +2630,21 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 			       NOP_EXPR, result);
     }
 
-  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  if (cond_cst)
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      pattern_stmt
+	= gimple_build_cond (gimple_cond_code (cond_stmt),
+			     gimple_get_lhs (pattern_stmt),
+			     fold_convert (ret_type, cond_cst),
+			     gimple_cond_true_label (cond_stmt),
+			     gimple_cond_false_label (cond_stmt));
+      *type_out = STMT_VINFO_VECTYPE (stmt_info);
+    }
+  else
+    *type_out
+      = get_vectype_for_scalar_type (vinfo,
+				     TREE_TYPE (gimple_get_lhs (pattern_stmt)));
   vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
 
   return pattern_stmt;




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant.
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (4 preceding siblings ...)
  2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
@ 2023-06-28 13:44 ` Tamar Christina
  2023-07-04 12:10   ` Richard Biener
  2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
                   ` (36 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 1102 bytes --]

Hi All,

expand_vector_piecewise does not support VLA expansion as it has a hard assert
on the type not being VLA.

Instead of just failing to expand and so the call marked unsupported we ICE.
This adjust it so we don't and can gracefully handle the expansion in support
checks.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-generic.cc (expand_vector_comparison): Skip piecewise if not
	constant.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6ed0b2f4c3c222d58a8d 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
 	    }
 	  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
 	}
-      else
+      else if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
 	t = expand_vector_piecewise (gsi, do_compare, type,
 				     TREE_TYPE (TREE_TYPE (op0)), op0, op1,
 				     code, false);




-- 

[-- Attachment #2: rb17500.patch --]
[-- Type: text/plain, Size: 590 bytes --]

diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6ed0b2f4c3c222d58a8d 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
 	    }
 	  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
 	}
-      else
+      else if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
 	t = expand_vector_piecewise (gsi, do_compare, type,
 				     TREE_TYPE (TREE_TYPE (op0)), op0, op1,
 				     code, false);




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (5 preceding siblings ...)
  2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
@ 2023-06-28 13:44 ` Tamar Christina
  2023-07-13 11:32   ` Richard Biener
  2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
                   ` (35 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 23889 bytes --]

Hi All,

This patch splits off the vectorizer's understanding of the main loop exit off
from the normal loop infrastructure.

Essentially we're relaxing the use of single_exit() in the vectorizer as we will
no longer have a single single and need a well defined split between the main
and secondary exits of loops for vectorization.

These new values were added to the loop class even though they're only used by
the vectorizer for a couple of reasons:
  - We need access to them in places where we have no loop_vinfo.
  - We only have a single loop_vinfo for each loop under consideration, however
    that same loop can have different copies, e.g. peeled/versioned copies or
    the scalar variant of the loop.  For each of these we still need to be able
    to have a coherent exit definition.

For these reason the placement in the loop class was the only way to keep the
book keeping together with the loops and avoid possibly expensive lookups.

For this version of the patch the `main` exit of a loop is defined as the exit
that is closest to the loop latch. This is stored in vec_loop_iv.  The remaining
exits which are relevant for the vectorizer are stored inside
vec_loop_alt_exits.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* cfgloop.cc (alloc_loop): Initialize vec_loop_iv.
	* cfgloop.h (class loop): Add vec_loop_iv and vec_loop_alt_exits.
	* doc/loop.texi: Document get_edge_condition.
	* tree-loop-distribution.cc (loop_distribution::distribute_loop):
	Initialize vec_loop_iv since loop distributions calls loop peeling which
	only understands vec_loop_iv now.
	* tree-scalar-evolution.cc (get_edge_condition): New.
	(get_loop_exit_condition): Refactor into get_edge_condition.
	* tree-scalar-evolution.h (get_edge_condition): New.
	* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Update use
	of single_exit.
	* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
	vect_set_loop_condition_normal, vect_set_loop_condition,
	slpeel_tree_duplicate_loop_to_edge_cfg, slpeel_can_duplicate_loop_p,
	find_loop_location, vect_update_ivs_after_vectorizer,
	vect_gen_vector_loop_niters_mult_vf, find_guard_arg, vect_do_peeling):
	Replace usages of single_exit.
	(vec_init_exit_info): New.
	* tree-vect-loop.cc (vect_analyze_loop_form,
	vect_create_epilog_for_reduction, vectorizable_live_operation,
	scale_profile_for_vect_loop, vect_transform_loop): New.
	* tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_ALT_EXITS,
	vec_init_exit_info): New.

--- inline copy of patch -- 
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index e7ac2b5f3db55de3dbbab7bd2bfe08388f4ec533..cab82d7960e5be517bba2621f7f4888e7bf3c295 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -272,6 +272,14 @@ public:
      the basic-block from being collected but its index can still be
      reused.  */
   basic_block former_header;
+
+  /* The controlling loop IV for the current loop when vectorizing.  This IV
+     controls the natural exits of the loop.  */
+  edge  GTY ((skip (""))) vec_loop_iv;
+
+  /* If the loop has multiple exits this structure contains the alternate
+     exits of the loop which are relevant for vectorization.  */
+  vec<edge> GTY ((skip (""))) vec_loop_alt_exits;
 };
 
 /* Set if the loop is known to be infinite.  */
diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
index ccda7415d7037e26048425b5d85f3633a39fd325..98123f7dce98227c8dffe4833e159fbb05596831 100644
--- a/gcc/cfgloop.cc
+++ b/gcc/cfgloop.cc
@@ -355,6 +355,7 @@ alloc_loop (void)
   loop->nb_iterations_upper_bound = 0;
   loop->nb_iterations_likely_upper_bound = 0;
   loop->nb_iterations_estimate = 0;
+  loop->vec_loop_iv = NULL;
   return loop;
 }
 
diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
index b357e9de7bcb1898ab9dda25738b9f003ca6f9f5..4ba6bb2585c81f7af34943b0493b94d5c3a8bf60 100644
--- a/gcc/doc/loop.texi
+++ b/gcc/doc/loop.texi
@@ -212,6 +212,7 @@ relation, and breath-first search order, respectively.
 @code{NULL} if the loop has more than one exit.  You can only use this
 function if @code{LOOPS_HAVE_RECORDED_EXITS} is used.
 @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
+@item @code{get_edge_condition}: Get the condition belonging to an exit edge.
 @item @code{just_once_each_iteration_p}: Returns true if the basic block
 is executed exactly once during each iteration of a loop (that is, it
 does not belong to a sub-loop, and it dominates the latch of the loop).
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index cf7c197aaf7919a0ecd56a10db0a42f93707ca58..97879498db46dd3c34181ae9aa6e5476004dd5b5 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -3042,6 +3042,24 @@ loop_distribution::distribute_loop (class loop *loop,
       return 0;
     }
 
+  /* Loop distribution only does prologue peeling but we still need to
+     initialize loop exit information.  However we only support single exits at
+     the moment.  As such, should exit information not have been provided and we
+     have more than one exit, bail out.  */
+  if (!(loop->vec_loop_iv = single_exit (loop)))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Loop %d not distributed: too many exits.\n",
+		 loop->num);
+
+      free_rdg (rdg);
+      loop_nest.release ();
+      free_data_refs (datarefs_vec);
+      delete ddrs_table;
+      return 0;
+    }
+
   data_reference_p dref;
   for (i = 0; datarefs_vec.iterate (i, &dref); ++i)
     dref->aux = (void *) (uintptr_t) i;
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index c58a8a16e81573aada38e912b7c58b3e1b23b66d..2e83836911ec8e968e90cf9b489dc7fe121ff80e 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 
 extern tree number_of_latch_executions (class loop *);
 extern gcond *get_loop_exit_condition (const class loop *);
+extern gcond *get_edge_condition (edge);
 
 extern void scev_initialize (void);
 extern bool scev_initialized_p (void);
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index ba47a684f4b373fb4f2dc16ddb8edb0ef39da6ed..af8be618b0748258132ccbef2d387bfddbe3c16b 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -1293,8 +1293,15 @@ scev_dfs::follow_ssa_edge_expr (gimple *at_stmt, tree expr,
 gcond *
 get_loop_exit_condition (const class loop *loop)
 {
+  return get_edge_condition (single_exit (loop));
+}
+
+/* If the statement just before the EXIT_EDGE contains a condition then
+   return the condition, otherwise NULL. */
+
+gcond *
+get_edge_condition (edge exit_edge){
   gcond *res = NULL;
-  edge exit_edge = single_exit (loop);
 
   if (dump_file && (dump_flags & TDF_SCEV))
     fprintf (dump_file, "(get_loop_exit_condition \n  ");
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index ebe93832b1e89120eab2fdac0fc30fe35c0356a2..fcc950f528b2d1e044be12424c2df11f692ee8ba 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -2070,7 +2070,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
-      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
+      || !slpeel_can_duplicate_loop_p (loop_vinfo,
+				       LOOP_VINFO_IV_EXIT (loop_vinfo))
       || loop->inner)
     do_peeling = false;
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 20f570e4a0d64610d7b63fe492eba5254ab5dc2c..299dfb75e3372b6a91637101b4bab0e82eb560ad 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -904,7 +904,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   add_header_seq (loop, header_seq);
 
   /* Get a boolean result that tells us whether to iterate.  */
-  edge exit_edge = single_exit (loop);
+  edge exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
   gcond *cond_stmt;
   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
       && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
@@ -935,7 +935,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   if (final_iv)
     {
       gassign *assign = gimple_build_assign (final_iv, orig_niters);
-      gsi_insert_on_edge_immediate (single_exit (loop), assign);
+      gsi_insert_on_edge_immediate (exit_edge, assign);
     }
 
   return cond_stmt;
@@ -1183,7 +1183,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
+vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
+				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
 {
@@ -1191,13 +1192,13 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
   gcond *cond_stmt;
   gcond *orig_cond;
   edge pe = loop_preheader_edge (loop);
-  edge exit_edge = single_exit (loop);
+  edge exit_edge = loop->vec_loop_iv;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   enum tree_code code;
   tree niters_type = TREE_TYPE (niters);
 
-  orig_cond = get_loop_exit_condition (loop);
+  orig_cond = get_edge_condition (exit_edge);
   gcc_assert (orig_cond);
   loop_cond_gsi = gsi_for_stmt (orig_cond);
 
@@ -1305,7 +1306,7 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
   if (final_iv)
     {
       gassign *assign;
-      edge exit = single_exit (loop);
+      edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo);
       gcc_assert (single_pred_p (exit->dest));
       tree phi_dest
 	= integer_zerop (init) ? final_iv : copy_ssa_name (indx_after_incr);
@@ -1353,7 +1354,7 @@ vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo,
 			 bool niters_maybe_zero)
 {
   gcond *cond_stmt;
-  gcond *orig_cond = get_loop_exit_condition (loop);
+  gcond *orig_cond = get_edge_condition (loop->vec_loop_iv);
   gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond);
 
   if (loop_vinfo && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
@@ -1370,7 +1371,8 @@ vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo,
 							     loop_cond_gsi);
     }
   else
-    cond_stmt = vect_set_loop_condition_normal (loop, niters, step, final_iv,
+    cond_stmt = vect_set_loop_condition_normal (loop_vinfo, loop, niters,
+						step, final_iv,
 						niters_maybe_zero,
 						loop_cond_gsi);
 
@@ -1439,6 +1441,69 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
 		     get_current_def (PHI_ARG_DEF_FROM_EDGE (from_phi, from)));
 }
 
+/* When copies of the same loop are created the copies won't have any SCEV
+   information and so we can't determine what their exits are.  However since
+   they are copies of an original loop the exits should be the same.
+
+   I don't really like this, and think we need a different way, but I don't
+   know what.  So sending this up so Richi can comment.  */
+
+void
+vec_init_exit_info (class loop *loop)
+{
+  if (loop->vec_loop_iv)
+    return;
+
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  if (exits.is_empty ())
+    return;
+
+  if ((loop->vec_loop_iv = single_exit (loop)))
+    return;
+
+  loop->vec_loop_alt_exits.create (exits.length () - 1);
+
+  /* The main IV is to be determined by the block that's the first reachable
+     block from the latch.  We cannot rely on the order the loop analysis
+     returns and we don't have any SCEV analysis on the loop.  */
+  auto_vec <edge> workset;
+  workset.safe_push (loop_latch_edge (loop));
+  hash_set <edge> visited;
+
+  while (!workset.is_empty ())
+    {
+      edge e = workset.pop ();
+      if (visited.contains (e))
+	continue;
+
+      bool found_p = false;
+      for (edge ex : e->src->succs)
+	{
+	  if (exits.contains (ex))
+	    {
+	      found_p = true;
+	      e = ex;
+	      break;
+	    }
+	}
+
+      if (found_p)
+	{
+	  loop->vec_loop_iv = e;
+	  for (edge ex : exits)
+	    if (e != ex)
+	      loop->vec_loop_alt_exits.safe_push (ex);
+	  return;
+	}
+      else
+	{
+	  for (edge ex : e->src->preds)
+	    workset.safe_insert (0, ex);
+	}
+      visited.add (e);
+    }
+  gcc_unreachable ();
+}
 
 /* Given LOOP this function generates a new copy of it and puts it
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
@@ -1458,13 +1523,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
   edge exit, new_exit;
   bool duplicate_outer_loop = false;
 
-  exit = single_exit (loop);
+  exit = loop->vec_loop_iv;
   at_exit = (e == exit);
   if (!at_exit && e != loop_preheader_edge (loop))
     return NULL;
 
   if (scalar_loop == NULL)
     scalar_loop = loop;
+  else
+    vec_init_exit_info (scalar_loop);
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   pbbs = bbs + 1;
@@ -1490,13 +1557,17 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
   bbs[0] = preheader;
   new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
 
-  exit = single_exit (scalar_loop);
+  exit = scalar_loop->vec_loop_iv;
   copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
 	    &exit, 1, &new_exit, NULL,
 	    at_exit ? loop->latch : e->src, true);
-  exit = single_exit (loop);
+  exit = loop->vec_loop_iv;
   basic_block new_preheader = new_bbs[0];
 
+  /* Record the new loop exit information.  new_loop doesn't have SCEV data and
+     so we must initialize the exit information.  */
+  vec_init_exit_info (new_loop);
+
   /* Before installing PHI arguments make sure that the edges
      into them match that of the scalar loop we analyzed.  This
      makes sure the SLP tree matches up between the main vectorized
@@ -1537,7 +1608,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
 	 but LOOP will not.  slpeel_update_phi_nodes_for_guard{1,2} expects
 	 the LOOP SSA_NAMEs (on the exit edge and edge from latch to
 	 header) to have current_def set, so copy them over.  */
-      slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop),
+      slpeel_duplicate_current_defs_from_edges (scalar_loop->vec_loop_iv,
 						exit);
       slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch,
 							   0),
@@ -1696,11 +1767,12 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond,
  */
 
 bool
-slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
+slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo, const_edge e)
 {
-  edge exit_e = single_exit (loop);
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
   edge entry_e = loop_preheader_edge (loop);
-  gcond *orig_cond = get_loop_exit_condition (loop);
+  gcond *orig_cond = get_edge_condition (exit_e);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
   unsigned int num_bb = loop->inner? 5 : 2;
 
@@ -1709,7 +1781,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
   if (!loop_outer (loop)
       || loop->num_nodes != num_bb
       || !empty_block_p (loop->latch)
-      || !single_exit (loop)
+      || !LOOP_VINFO_IV_EXIT (loop_vinfo)
       /* Verify that new loop exit condition can be trivially modified.  */
       || (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi))
       || (e != exit_e && e != entry_e))
@@ -1722,7 +1794,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
   return ret;
 }
 
-/* Function vect_get_loop_location.
+/* Function find_loop_location.
 
    Extract the location of the loop in the source code.
    If the loop is not well formed for vectorization, an estimated
@@ -1739,11 +1811,19 @@ find_loop_location (class loop *loop)
   if (!loop)
     return dump_user_location_t ();
 
-  stmt = get_loop_exit_condition (loop);
+  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
+    {
+      /* We only care about the loop location, so use any exit with location
+	 information.  */
+      for (edge e : get_loop_exit_edges (loop))
+	{
+	  stmt = get_edge_condition (e);
 
-  if (stmt
-      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
-    return stmt;
+	  if (stmt
+	      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
+	    return stmt;
+	}
+    }
 
   /* If we got here the loop is probably not "well formed",
      try to estimate the loop location */
@@ -1962,7 +2042,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
-  basic_block exit_bb = single_exit (loop)->dest;
+
+  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   /* Make sure there exists a single-predecessor exit bb:  */
   gcc_assert (single_pred_p (exit_bb));
@@ -2529,10 +2610,9 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
 {
   /* We should be using a step_vector of VF if VF is variable.  */
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ();
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree type = TREE_TYPE (niters_vector);
   tree log_vf = build_int_cst (type, exact_log2 (vf));
-  basic_block exit_bb = single_exit (loop)->dest;
+  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   gcc_assert (niters_vector_mult_vf_ptr != NULL);
   tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
@@ -2559,7 +2639,7 @@ find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
 		gphi *lcssa_phi)
 {
   gphi_iterator gsi;
-  edge e = single_exit (loop);
+  edge e = loop->vec_loop_iv;
 
   gcc_assert (single_pred_p (e->dest));
   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
@@ -3328,8 +3408,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 
   if (epilog_peeling)
     {
-      e = single_exit (loop);
-      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
+      e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
 
       /* Peel epilog and put it on exit edge of loop.  If we are vectorizing
 	 said epilog then we should use a copy of the main loop as a starting
@@ -3419,8 +3499,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	{
 	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
 				    niters, niters_vector_mult_vf);
-	  guard_bb = single_exit (loop)->dest;
-	  guard_to = split_edge (single_exit (epilog));
+	  guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+	  guard_to = split_edge (epilog->vec_loop_iv);
 	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
 					   skip_vector ? anchor : guard_bb,
 					   prob_epilog.invert (),
@@ -3428,7 +3508,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	  if (vect_epilogues)
 	    epilogue_vinfo->skip_this_loop_edge = guard_e;
 	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
-					      single_exit (epilog));
+					      epilog->vec_loop_iv);
 	  /* Only need to handle basic block before epilog loop if it's not
 	     the guard_bb, which is the case when skip_vector is true.  */
 	  if (guard_bb != bb_before_epilog)
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 0a03f56aae7b51fb4c5ce0e49d96888bae634ef7..0bca5932d237cf1cfbbb48271db3f4430672b5dc 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1641,6 +1641,13 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 {
   DUMP_VECT_SCOPE ("vect_analyze_loop_form");
 
+  vec_init_exit_info (loop);
+  if (!loop->vec_loop_iv)
+    return opt_result::failure_at (vect_location,
+				   "not vectorized:"
+				   " could not determine main exit from"
+				   " loop with multiple exits.\n");
+
   /* Different restrictions apply when we are considering an inner-most loop,
      vs. an outer (nested) loop.
      (FORNOW. May want to relax some of these restrictions in the future).  */
@@ -3025,9 +3032,8 @@ start_over:
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
       if (!vect_can_advance_ivs_p (loop_vinfo)
-	  || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
-					   single_exit (LOOP_VINFO_LOOP
-							 (loop_vinfo))))
+	  || !slpeel_can_duplicate_loop_p (loop_vinfo,
+					   LOOP_VINFO_IV_EXIT (loop_vinfo)))
         {
 	  ok = opt_result::failure_at (vect_location,
 				       "not vectorized: can't create required "
@@ -5964,7 +5970,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = single_exit (loop)->dest;
+  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
   for (unsigned i = 0; i < vec_num; i++)
@@ -5980,7 +5986,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
 	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
+	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -10301,12 +10307,12 @@ vectorizable_live_operation (vec_info *vinfo,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = single_exit (loop)->dest;
+      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
       gcc_assert (single_pred_p (exit_bb));
 
       tree vec_lhs_phi = copy_ssa_name (vec_lhs);
       gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
+      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
 
       gimple_seq stmts = NULL;
       tree new_tree;
@@ -10829,7 +10835,8 @@ scale_profile_for_vect_loop (class loop *loop, unsigned vf)
       scale_loop_frequencies (loop, p);
     }
 
-  edge exit_e = single_exit (loop);
+  edge exit_e = loop->vec_loop_iv;
+
   exit_e->probability = profile_probability::always () / (new_est_niter + 1);
 
   edge exit_l = single_pred_edge (loop->latch);
@@ -11177,7 +11184,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 
   /* Make sure there exists a single-predecessor exit bb.  Do this before 
      versioning.   */
-  edge e = single_exit (loop);
+  edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
   if (! single_pred_p (e->dest))
     {
       split_loop_exit_edge (e, true);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index a36974c2c0d2103b0a2d0397d06ab84dace08129..bd5eceb5da7a45ef036cd14609ebe091799320bf 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -917,6 +917,8 @@ public:
 
 /* Access Functions.  */
 #define LOOP_VINFO_LOOP(L)                 (L)->loop
+#define LOOP_VINFO_IV_EXIT(L)              (L)->loop->vec_loop_iv
+#define LOOP_VINFO_ALT_EXITS(L)            (L)->loop->vec_loop_alt_exits
 #define LOOP_VINFO_BBS(L)                  (L)->bbs
 #define LOOP_VINFO_NITERSM1(L)             (L)->num_itersm1
 #define LOOP_VINFO_NITERS(L)               (L)->num_iters
@@ -2162,6 +2164,7 @@ extern void vect_prepare_for_masked_peels (loop_vec_info);
 extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
+extern void vec_init_exit_info (class loop *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,




-- 

[-- Attachment #2: rb17501.patch --]
[-- Type: text/plain, Size: 21302 bytes --]

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index e7ac2b5f3db55de3dbbab7bd2bfe08388f4ec533..cab82d7960e5be517bba2621f7f4888e7bf3c295 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -272,6 +272,14 @@ public:
      the basic-block from being collected but its index can still be
      reused.  */
   basic_block former_header;
+
+  /* The controlling loop IV for the current loop when vectorizing.  This IV
+     controls the natural exits of the loop.  */
+  edge  GTY ((skip (""))) vec_loop_iv;
+
+  /* If the loop has multiple exits this structure contains the alternate
+     exits of the loop which are relevant for vectorization.  */
+  vec<edge> GTY ((skip (""))) vec_loop_alt_exits;
 };
 
 /* Set if the loop is known to be infinite.  */
diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
index ccda7415d7037e26048425b5d85f3633a39fd325..98123f7dce98227c8dffe4833e159fbb05596831 100644
--- a/gcc/cfgloop.cc
+++ b/gcc/cfgloop.cc
@@ -355,6 +355,7 @@ alloc_loop (void)
   loop->nb_iterations_upper_bound = 0;
   loop->nb_iterations_likely_upper_bound = 0;
   loop->nb_iterations_estimate = 0;
+  loop->vec_loop_iv = NULL;
   return loop;
 }
 
diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
index b357e9de7bcb1898ab9dda25738b9f003ca6f9f5..4ba6bb2585c81f7af34943b0493b94d5c3a8bf60 100644
--- a/gcc/doc/loop.texi
+++ b/gcc/doc/loop.texi
@@ -212,6 +212,7 @@ relation, and breath-first search order, respectively.
 @code{NULL} if the loop has more than one exit.  You can only use this
 function if @code{LOOPS_HAVE_RECORDED_EXITS} is used.
 @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
+@item @code{get_edge_condition}: Get the condition belonging to an exit edge.
 @item @code{just_once_each_iteration_p}: Returns true if the basic block
 is executed exactly once during each iteration of a loop (that is, it
 does not belong to a sub-loop, and it dominates the latch of the loop).
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index cf7c197aaf7919a0ecd56a10db0a42f93707ca58..97879498db46dd3c34181ae9aa6e5476004dd5b5 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -3042,6 +3042,24 @@ loop_distribution::distribute_loop (class loop *loop,
       return 0;
     }
 
+  /* Loop distribution only does prologue peeling but we still need to
+     initialize loop exit information.  However we only support single exits at
+     the moment.  As such, should exit information not have been provided and we
+     have more than one exit, bail out.  */
+  if (!(loop->vec_loop_iv = single_exit (loop)))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Loop %d not distributed: too many exits.\n",
+		 loop->num);
+
+      free_rdg (rdg);
+      loop_nest.release ();
+      free_data_refs (datarefs_vec);
+      delete ddrs_table;
+      return 0;
+    }
+
   data_reference_p dref;
   for (i = 0; datarefs_vec.iterate (i, &dref); ++i)
     dref->aux = (void *) (uintptr_t) i;
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index c58a8a16e81573aada38e912b7c58b3e1b23b66d..2e83836911ec8e968e90cf9b489dc7fe121ff80e 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 
 extern tree number_of_latch_executions (class loop *);
 extern gcond *get_loop_exit_condition (const class loop *);
+extern gcond *get_edge_condition (edge);
 
 extern void scev_initialize (void);
 extern bool scev_initialized_p (void);
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index ba47a684f4b373fb4f2dc16ddb8edb0ef39da6ed..af8be618b0748258132ccbef2d387bfddbe3c16b 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -1293,8 +1293,15 @@ scev_dfs::follow_ssa_edge_expr (gimple *at_stmt, tree expr,
 gcond *
 get_loop_exit_condition (const class loop *loop)
 {
+  return get_edge_condition (single_exit (loop));
+}
+
+/* If the statement just before the EXIT_EDGE contains a condition then
+   return the condition, otherwise NULL. */
+
+gcond *
+get_edge_condition (edge exit_edge){
   gcond *res = NULL;
-  edge exit_edge = single_exit (loop);
 
   if (dump_file && (dump_flags & TDF_SCEV))
     fprintf (dump_file, "(get_loop_exit_condition \n  ");
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index ebe93832b1e89120eab2fdac0fc30fe35c0356a2..fcc950f528b2d1e044be12424c2df11f692ee8ba 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -2070,7 +2070,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
-      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
+      || !slpeel_can_duplicate_loop_p (loop_vinfo,
+				       LOOP_VINFO_IV_EXIT (loop_vinfo))
       || loop->inner)
     do_peeling = false;
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 20f570e4a0d64610d7b63fe492eba5254ab5dc2c..299dfb75e3372b6a91637101b4bab0e82eb560ad 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -904,7 +904,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   add_header_seq (loop, header_seq);
 
   /* Get a boolean result that tells us whether to iterate.  */
-  edge exit_edge = single_exit (loop);
+  edge exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
   gcond *cond_stmt;
   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
       && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
@@ -935,7 +935,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   if (final_iv)
     {
       gassign *assign = gimple_build_assign (final_iv, orig_niters);
-      gsi_insert_on_edge_immediate (single_exit (loop), assign);
+      gsi_insert_on_edge_immediate (exit_edge, assign);
     }
 
   return cond_stmt;
@@ -1183,7 +1183,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
+vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
+				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
 {
@@ -1191,13 +1192,13 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
   gcond *cond_stmt;
   gcond *orig_cond;
   edge pe = loop_preheader_edge (loop);
-  edge exit_edge = single_exit (loop);
+  edge exit_edge = loop->vec_loop_iv;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   enum tree_code code;
   tree niters_type = TREE_TYPE (niters);
 
-  orig_cond = get_loop_exit_condition (loop);
+  orig_cond = get_edge_condition (exit_edge);
   gcc_assert (orig_cond);
   loop_cond_gsi = gsi_for_stmt (orig_cond);
 
@@ -1305,7 +1306,7 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
   if (final_iv)
     {
       gassign *assign;
-      edge exit = single_exit (loop);
+      edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo);
       gcc_assert (single_pred_p (exit->dest));
       tree phi_dest
 	= integer_zerop (init) ? final_iv : copy_ssa_name (indx_after_incr);
@@ -1353,7 +1354,7 @@ vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo,
 			 bool niters_maybe_zero)
 {
   gcond *cond_stmt;
-  gcond *orig_cond = get_loop_exit_condition (loop);
+  gcond *orig_cond = get_edge_condition (loop->vec_loop_iv);
   gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond);
 
   if (loop_vinfo && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
@@ -1370,7 +1371,8 @@ vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo,
 							     loop_cond_gsi);
     }
   else
-    cond_stmt = vect_set_loop_condition_normal (loop, niters, step, final_iv,
+    cond_stmt = vect_set_loop_condition_normal (loop_vinfo, loop, niters,
+						step, final_iv,
 						niters_maybe_zero,
 						loop_cond_gsi);
 
@@ -1439,6 +1441,69 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
 		     get_current_def (PHI_ARG_DEF_FROM_EDGE (from_phi, from)));
 }
 
+/* When copies of the same loop are created the copies won't have any SCEV
+   information and so we can't determine what their exits are.  However since
+   they are copies of an original loop the exits should be the same.
+
+   I don't really like this, and think we need a different way, but I don't
+   know what.  So sending this up so Richi can comment.  */
+
+void
+vec_init_exit_info (class loop *loop)
+{
+  if (loop->vec_loop_iv)
+    return;
+
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  if (exits.is_empty ())
+    return;
+
+  if ((loop->vec_loop_iv = single_exit (loop)))
+    return;
+
+  loop->vec_loop_alt_exits.create (exits.length () - 1);
+
+  /* The main IV is to be determined by the block that's the first reachable
+     block from the latch.  We cannot rely on the order the loop analysis
+     returns and we don't have any SCEV analysis on the loop.  */
+  auto_vec <edge> workset;
+  workset.safe_push (loop_latch_edge (loop));
+  hash_set <edge> visited;
+
+  while (!workset.is_empty ())
+    {
+      edge e = workset.pop ();
+      if (visited.contains (e))
+	continue;
+
+      bool found_p = false;
+      for (edge ex : e->src->succs)
+	{
+	  if (exits.contains (ex))
+	    {
+	      found_p = true;
+	      e = ex;
+	      break;
+	    }
+	}
+
+      if (found_p)
+	{
+	  loop->vec_loop_iv = e;
+	  for (edge ex : exits)
+	    if (e != ex)
+	      loop->vec_loop_alt_exits.safe_push (ex);
+	  return;
+	}
+      else
+	{
+	  for (edge ex : e->src->preds)
+	    workset.safe_insert (0, ex);
+	}
+      visited.add (e);
+    }
+  gcc_unreachable ();
+}
 
 /* Given LOOP this function generates a new copy of it and puts it
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
@@ -1458,13 +1523,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
   edge exit, new_exit;
   bool duplicate_outer_loop = false;
 
-  exit = single_exit (loop);
+  exit = loop->vec_loop_iv;
   at_exit = (e == exit);
   if (!at_exit && e != loop_preheader_edge (loop))
     return NULL;
 
   if (scalar_loop == NULL)
     scalar_loop = loop;
+  else
+    vec_init_exit_info (scalar_loop);
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   pbbs = bbs + 1;
@@ -1490,13 +1557,17 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
   bbs[0] = preheader;
   new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
 
-  exit = single_exit (scalar_loop);
+  exit = scalar_loop->vec_loop_iv;
   copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
 	    &exit, 1, &new_exit, NULL,
 	    at_exit ? loop->latch : e->src, true);
-  exit = single_exit (loop);
+  exit = loop->vec_loop_iv;
   basic_block new_preheader = new_bbs[0];
 
+  /* Record the new loop exit information.  new_loop doesn't have SCEV data and
+     so we must initialize the exit information.  */
+  vec_init_exit_info (new_loop);
+
   /* Before installing PHI arguments make sure that the edges
      into them match that of the scalar loop we analyzed.  This
      makes sure the SLP tree matches up between the main vectorized
@@ -1537,7 +1608,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
 	 but LOOP will not.  slpeel_update_phi_nodes_for_guard{1,2} expects
 	 the LOOP SSA_NAMEs (on the exit edge and edge from latch to
 	 header) to have current_def set, so copy them over.  */
-      slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop),
+      slpeel_duplicate_current_defs_from_edges (scalar_loop->vec_loop_iv,
 						exit);
       slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch,
 							   0),
@@ -1696,11 +1767,12 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond,
  */
 
 bool
-slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
+slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo, const_edge e)
 {
-  edge exit_e = single_exit (loop);
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
   edge entry_e = loop_preheader_edge (loop);
-  gcond *orig_cond = get_loop_exit_condition (loop);
+  gcond *orig_cond = get_edge_condition (exit_e);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
   unsigned int num_bb = loop->inner? 5 : 2;
 
@@ -1709,7 +1781,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
   if (!loop_outer (loop)
       || loop->num_nodes != num_bb
       || !empty_block_p (loop->latch)
-      || !single_exit (loop)
+      || !LOOP_VINFO_IV_EXIT (loop_vinfo)
       /* Verify that new loop exit condition can be trivially modified.  */
       || (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi))
       || (e != exit_e && e != entry_e))
@@ -1722,7 +1794,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
   return ret;
 }
 
-/* Function vect_get_loop_location.
+/* Function find_loop_location.
 
    Extract the location of the loop in the source code.
    If the loop is not well formed for vectorization, an estimated
@@ -1739,11 +1811,19 @@ find_loop_location (class loop *loop)
   if (!loop)
     return dump_user_location_t ();
 
-  stmt = get_loop_exit_condition (loop);
+  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
+    {
+      /* We only care about the loop location, so use any exit with location
+	 information.  */
+      for (edge e : get_loop_exit_edges (loop))
+	{
+	  stmt = get_edge_condition (e);
 
-  if (stmt
-      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
-    return stmt;
+	  if (stmt
+	      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
+	    return stmt;
+	}
+    }
 
   /* If we got here the loop is probably not "well formed",
      try to estimate the loop location */
@@ -1962,7 +2042,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
-  basic_block exit_bb = single_exit (loop)->dest;
+
+  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   /* Make sure there exists a single-predecessor exit bb:  */
   gcc_assert (single_pred_p (exit_bb));
@@ -2529,10 +2610,9 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
 {
   /* We should be using a step_vector of VF if VF is variable.  */
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ();
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree type = TREE_TYPE (niters_vector);
   tree log_vf = build_int_cst (type, exact_log2 (vf));
-  basic_block exit_bb = single_exit (loop)->dest;
+  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   gcc_assert (niters_vector_mult_vf_ptr != NULL);
   tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
@@ -2559,7 +2639,7 @@ find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
 		gphi *lcssa_phi)
 {
   gphi_iterator gsi;
-  edge e = single_exit (loop);
+  edge e = loop->vec_loop_iv;
 
   gcc_assert (single_pred_p (e->dest));
   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
@@ -3328,8 +3408,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 
   if (epilog_peeling)
     {
-      e = single_exit (loop);
-      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
+      e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
 
       /* Peel epilog and put it on exit edge of loop.  If we are vectorizing
 	 said epilog then we should use a copy of the main loop as a starting
@@ -3419,8 +3499,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	{
 	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
 				    niters, niters_vector_mult_vf);
-	  guard_bb = single_exit (loop)->dest;
-	  guard_to = split_edge (single_exit (epilog));
+	  guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+	  guard_to = split_edge (epilog->vec_loop_iv);
 	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
 					   skip_vector ? anchor : guard_bb,
 					   prob_epilog.invert (),
@@ -3428,7 +3508,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	  if (vect_epilogues)
 	    epilogue_vinfo->skip_this_loop_edge = guard_e;
 	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
-					      single_exit (epilog));
+					      epilog->vec_loop_iv);
 	  /* Only need to handle basic block before epilog loop if it's not
 	     the guard_bb, which is the case when skip_vector is true.  */
 	  if (guard_bb != bb_before_epilog)
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 0a03f56aae7b51fb4c5ce0e49d96888bae634ef7..0bca5932d237cf1cfbbb48271db3f4430672b5dc 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1641,6 +1641,13 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 {
   DUMP_VECT_SCOPE ("vect_analyze_loop_form");
 
+  vec_init_exit_info (loop);
+  if (!loop->vec_loop_iv)
+    return opt_result::failure_at (vect_location,
+				   "not vectorized:"
+				   " could not determine main exit from"
+				   " loop with multiple exits.\n");
+
   /* Different restrictions apply when we are considering an inner-most loop,
      vs. an outer (nested) loop.
      (FORNOW. May want to relax some of these restrictions in the future).  */
@@ -3025,9 +3032,8 @@ start_over:
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
       if (!vect_can_advance_ivs_p (loop_vinfo)
-	  || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
-					   single_exit (LOOP_VINFO_LOOP
-							 (loop_vinfo))))
+	  || !slpeel_can_duplicate_loop_p (loop_vinfo,
+					   LOOP_VINFO_IV_EXIT (loop_vinfo)))
         {
 	  ok = opt_result::failure_at (vect_location,
 				       "not vectorized: can't create required "
@@ -5964,7 +5970,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = single_exit (loop)->dest;
+  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
   for (unsigned i = 0; i < vec_num; i++)
@@ -5980,7 +5986,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
 	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
+	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -10301,12 +10307,12 @@ vectorizable_live_operation (vec_info *vinfo,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = single_exit (loop)->dest;
+      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
       gcc_assert (single_pred_p (exit_bb));
 
       tree vec_lhs_phi = copy_ssa_name (vec_lhs);
       gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
+      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
 
       gimple_seq stmts = NULL;
       tree new_tree;
@@ -10829,7 +10835,8 @@ scale_profile_for_vect_loop (class loop *loop, unsigned vf)
       scale_loop_frequencies (loop, p);
     }
 
-  edge exit_e = single_exit (loop);
+  edge exit_e = loop->vec_loop_iv;
+
   exit_e->probability = profile_probability::always () / (new_est_niter + 1);
 
   edge exit_l = single_pred_edge (loop->latch);
@@ -11177,7 +11184,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 
   /* Make sure there exists a single-predecessor exit bb.  Do this before 
      versioning.   */
-  edge e = single_exit (loop);
+  edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
   if (! single_pred_p (e->dest))
     {
       split_loop_exit_edge (e, true);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index a36974c2c0d2103b0a2d0397d06ab84dace08129..bd5eceb5da7a45ef036cd14609ebe091799320bf 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -917,6 +917,8 @@ public:
 
 /* Access Functions.  */
 #define LOOP_VINFO_LOOP(L)                 (L)->loop
+#define LOOP_VINFO_IV_EXIT(L)              (L)->loop->vec_loop_iv
+#define LOOP_VINFO_ALT_EXITS(L)            (L)->loop->vec_loop_alt_exits
 #define LOOP_VINFO_BBS(L)                  (L)->bbs
 #define LOOP_VINFO_NITERSM1(L)             (L)->num_itersm1
 #define LOOP_VINFO_NITERS(L)               (L)->num_iters
@@ -2162,6 +2164,7 @@ extern void vect_prepare_for_masked_peels (loop_vec_info);
 extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
+extern void vec_init_exit_info (class loop *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits.
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (6 preceding siblings ...)
  2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
@ 2023-06-28 13:45 ` Tamar Christina
  2023-07-13 11:49   ` Richard Biener
  2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
                   ` (34 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 10563 bytes --]

Hi All,

For early break vectorization we have to update niters analysis to record and
analyze all exits of the loop, and so all conds.

The niters of the loop is still determined by the main/natural exit of the loop
as this is the O(n) bounds.  For now we don't do much with the secondary conds,
but their assumptions can be used to generate versioning checks later.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop.cc (vect_get_loop_niters): Analyze all exits and return
	all gconds.
	(vect_analyze_loop_form): Update code checking for conds.
	(vect_create_loop_vinfo): Handle having multiple conds.
	(vect_analyze_loop): Release extra loop conds structures.
	* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
	LOOP_VINFO_LOOP_IV_COND): New.
	(struct vect_loop_form_info): Add conds, loop_iv_cond.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 55e69a7ca0b24e0872477141db6f74dbf90b7981..9065811b3b9c2a550baf44768603172b9e26b94b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -849,80 +849,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
    in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
    niter information holds in ASSUMPTIONS.
 
-   Return the loop exit condition.  */
+   Return the loop exit conditions.  */
 
 
-static gcond *
+static vec<gcond *>
 vect_get_loop_niters (class loop *loop, tree *assumptions,
 		      tree *number_of_iterations, tree *number_of_iterationsm1)
 {
-  edge exit = single_exit (loop);
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  vec<gcond *> conds;
+  conds.create (exits.length ());
   class tree_niter_desc niter_desc;
   tree niter_assumptions, niter, may_be_zero;
-  gcond *cond = get_loop_exit_condition (loop);
 
   *assumptions = boolean_true_node;
   *number_of_iterationsm1 = chrec_dont_know;
   *number_of_iterations = chrec_dont_know;
+
   DUMP_VECT_SCOPE ("get_loop_niters");
 
-  if (!exit)
-    return cond;
+  if (exits.is_empty ())
+    return conds;
 
-  may_be_zero = NULL_TREE;
-  if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
-      || chrec_contains_undetermined (niter_desc.niter))
-    return cond;
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
+		     exits.length ());
 
-  niter_assumptions = niter_desc.assumptions;
-  may_be_zero = niter_desc.may_be_zero;
-  niter = niter_desc.niter;
+  edge exit;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (exits, i, exit)
+    {
+      gcond *cond = get_edge_condition (exit);
+      if (cond)
+	conds.safe_push (cond);
 
-  if (may_be_zero && integer_zerop (may_be_zero))
-    may_be_zero = NULL_TREE;
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i);
 
-  if (may_be_zero)
-    {
-      if (COMPARISON_CLASS_P (may_be_zero))
+      may_be_zero = NULL_TREE;
+      if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
+          || chrec_contains_undetermined (niter_desc.niter))
+	continue;
+
+      niter_assumptions = niter_desc.assumptions;
+      may_be_zero = niter_desc.may_be_zero;
+      niter = niter_desc.niter;
+
+      if (may_be_zero && integer_zerop (may_be_zero))
+	may_be_zero = NULL_TREE;
+
+      if (may_be_zero)
 	{
-	  /* Try to combine may_be_zero with assumptions, this can simplify
-	     computation of niter expression.  */
-	  if (niter_assumptions && !integer_nonzerop (niter_assumptions))
-	    niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
-					     niter_assumptions,
-					     fold_build1 (TRUTH_NOT_EXPR,
-							  boolean_type_node,
-							  may_be_zero));
+	  if (COMPARISON_CLASS_P (may_be_zero))
+	    {
+	      /* Try to combine may_be_zero with assumptions, this can simplify
+		 computation of niter expression.  */
+	      if (niter_assumptions && !integer_nonzerop (niter_assumptions))
+		niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+						 niter_assumptions,
+						 fold_build1 (TRUTH_NOT_EXPR,
+							      boolean_type_node,
+							      may_be_zero));
+	      else
+		niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
+				     build_int_cst (TREE_TYPE (niter), 0),
+				     rewrite_to_non_trapping_overflow (niter));
+
+	      may_be_zero = NULL_TREE;
+	    }
+	  else if (integer_nonzerop (may_be_zero) && exit == loop->vec_loop_iv)
+	    {
+	      *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
+	      *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
+	      continue;
+	    }
 	  else
-	    niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
-				 build_int_cst (TREE_TYPE (niter), 0),
-				 rewrite_to_non_trapping_overflow (niter));
+	    continue;
+       }
 
-	  may_be_zero = NULL_TREE;
-	}
-      else if (integer_nonzerop (may_be_zero))
+      /* Loop assumptions are based off the normal exit.  */
+      if (exit == loop->vec_loop_iv)
 	{
-	  *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
-	  *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
-	  return cond;
+	  *assumptions = niter_assumptions;
+	  *number_of_iterationsm1 = niter;
+
+	  /* We want the number of loop header executions which is the number
+	     of latch executions plus one.
+	     ???  For UINT_MAX latch executions this number overflows to zero
+	     for loops like do { n++; } while (n != 0);  */
+	  if (niter && !chrec_contains_undetermined (niter))
+	    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
+				 unshare_expr (niter),
+				 build_int_cst (TREE_TYPE (niter), 1));
+	  *number_of_iterations = niter;
 	}
-      else
-	return cond;
     }
 
-  *assumptions = niter_assumptions;
-  *number_of_iterationsm1 = niter;
-
-  /* We want the number of loop header executions which is the number
-     of latch executions plus one.
-     ???  For UINT_MAX latch executions this number overflows to zero
-     for loops like do { n++; } while (n != 0);  */
-  if (niter && !chrec_contains_undetermined (niter))
-    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter),
-			  build_int_cst (TREE_TYPE (niter), 1));
-  *number_of_iterations = niter;
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "All loop exits successfully analyzed.\n");
 
-  return cond;
+  return conds;
 }
 
 /* Function bb_in_loop_p
@@ -1768,15 +1794,26 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				   "not vectorized:"
 				   " abnormal loop exit edge.\n");
 
-  info->loop_cond
+  info->conds
     = vect_get_loop_niters (loop, &info->assumptions,
 			    &info->number_of_iterations,
 			    &info->number_of_iterationsm1);
-  if (!info->loop_cond)
+
+  if (info->conds.is_empty ())
     return opt_result::failure_at
       (vect_location,
        "not vectorized: complicated exit condition.\n");
 
+  /* Determine what the primary and alternate exit conds are.  */
+  info->alt_loop_conds.create (info->conds.length () - 1);
+  for (gcond *cond : info->conds)
+    {
+      if (loop->vec_loop_iv->src != gimple_bb (cond))
+	info->alt_loop_conds.quick_push (cond);
+      else
+	info->loop_cond = cond;
+    }
+
   if (integer_zerop (info->assumptions)
       || !info->number_of_iterations
       || chrec_contains_undetermined (info->number_of_iterations))
@@ -1821,8 +1858,14 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
   if (!integer_onep (info->assumptions) && !main_loop_info)
     LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
 
-  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
-  STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
+  for (gcond *cond : info->alt_loop_conds)
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
+      STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
+    }
+  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
+  LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
+
   if (info->inner_loop_cond)
     {
       stmt_vec_info inner_loop_cond_info
@@ -3520,6 +3563,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 		     "***** Choosing vector mode %s\n",
 		     GET_MODE_NAME (first_loop_vinfo->vector_mode));
 
+  loop_form_info.conds.release ();
+  loop_form_info.alt_loop_conds.release ();
+
   /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
      enabled, SIMDUID is not set, it is the innermost loop and we have
      either already found the loop's SIMDLEN or there was no SIMDLEN to
@@ -3631,6 +3677,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 			   (first_loop_vinfo->epilogue_vinfos[0]->vector_mode));
     }
 
+  loop_form_info.conds.release ();
+  loop_form_info.alt_loop_conds.release ();
+
   return first_loop_vinfo;
 }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index bd5eceb5da7a45ef036cd14609ebe091799320bf..1cc003c12e2447eca878f56cb019236f56e96f85 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -876,6 +876,12 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* List of loop additional IV conditionals found in the loop.  */
+  auto_vec<gcond *> conds;
+
+  /* Main loop IV cond.  */
+  gcond* loop_iv_cond;
+
   /* True if there are no loop carried data dependencies in the loop.
      If loop->safelen <= 1, then this is always true, either the loop
      didn't have any loop carried data dependencies, or the loop is being
@@ -966,6 +972,8 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
+#define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
 #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
 #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
@@ -2353,7 +2361,9 @@ struct vect_loop_form_info
   tree number_of_iterations;
   tree number_of_iterationsm1;
   tree assumptions;
+  vec<gcond *> conds;
   gcond *loop_cond;
+  vec<gcond *> alt_loop_conds;
   gcond *inner_loop_cond;
 };
 extern opt_result vect_analyze_loop_form (class loop *, vect_loop_form_info *);




-- 

[-- Attachment #2: rb17502.patch --]
[-- Type: text/plain, Size: 9652 bytes --]

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 55e69a7ca0b24e0872477141db6f74dbf90b7981..9065811b3b9c2a550baf44768603172b9e26b94b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -849,80 +849,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
    in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
    niter information holds in ASSUMPTIONS.
 
-   Return the loop exit condition.  */
+   Return the loop exit conditions.  */
 
 
-static gcond *
+static vec<gcond *>
 vect_get_loop_niters (class loop *loop, tree *assumptions,
 		      tree *number_of_iterations, tree *number_of_iterationsm1)
 {
-  edge exit = single_exit (loop);
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  vec<gcond *> conds;
+  conds.create (exits.length ());
   class tree_niter_desc niter_desc;
   tree niter_assumptions, niter, may_be_zero;
-  gcond *cond = get_loop_exit_condition (loop);
 
   *assumptions = boolean_true_node;
   *number_of_iterationsm1 = chrec_dont_know;
   *number_of_iterations = chrec_dont_know;
+
   DUMP_VECT_SCOPE ("get_loop_niters");
 
-  if (!exit)
-    return cond;
+  if (exits.is_empty ())
+    return conds;
 
-  may_be_zero = NULL_TREE;
-  if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
-      || chrec_contains_undetermined (niter_desc.niter))
-    return cond;
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
+		     exits.length ());
 
-  niter_assumptions = niter_desc.assumptions;
-  may_be_zero = niter_desc.may_be_zero;
-  niter = niter_desc.niter;
+  edge exit;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (exits, i, exit)
+    {
+      gcond *cond = get_edge_condition (exit);
+      if (cond)
+	conds.safe_push (cond);
 
-  if (may_be_zero && integer_zerop (may_be_zero))
-    may_be_zero = NULL_TREE;
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i);
 
-  if (may_be_zero)
-    {
-      if (COMPARISON_CLASS_P (may_be_zero))
+      may_be_zero = NULL_TREE;
+      if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
+          || chrec_contains_undetermined (niter_desc.niter))
+	continue;
+
+      niter_assumptions = niter_desc.assumptions;
+      may_be_zero = niter_desc.may_be_zero;
+      niter = niter_desc.niter;
+
+      if (may_be_zero && integer_zerop (may_be_zero))
+	may_be_zero = NULL_TREE;
+
+      if (may_be_zero)
 	{
-	  /* Try to combine may_be_zero with assumptions, this can simplify
-	     computation of niter expression.  */
-	  if (niter_assumptions && !integer_nonzerop (niter_assumptions))
-	    niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
-					     niter_assumptions,
-					     fold_build1 (TRUTH_NOT_EXPR,
-							  boolean_type_node,
-							  may_be_zero));
+	  if (COMPARISON_CLASS_P (may_be_zero))
+	    {
+	      /* Try to combine may_be_zero with assumptions, this can simplify
+		 computation of niter expression.  */
+	      if (niter_assumptions && !integer_nonzerop (niter_assumptions))
+		niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+						 niter_assumptions,
+						 fold_build1 (TRUTH_NOT_EXPR,
+							      boolean_type_node,
+							      may_be_zero));
+	      else
+		niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
+				     build_int_cst (TREE_TYPE (niter), 0),
+				     rewrite_to_non_trapping_overflow (niter));
+
+	      may_be_zero = NULL_TREE;
+	    }
+	  else if (integer_nonzerop (may_be_zero) && exit == loop->vec_loop_iv)
+	    {
+	      *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
+	      *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
+	      continue;
+	    }
 	  else
-	    niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
-				 build_int_cst (TREE_TYPE (niter), 0),
-				 rewrite_to_non_trapping_overflow (niter));
+	    continue;
+       }
 
-	  may_be_zero = NULL_TREE;
-	}
-      else if (integer_nonzerop (may_be_zero))
+      /* Loop assumptions are based off the normal exit.  */
+      if (exit == loop->vec_loop_iv)
 	{
-	  *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
-	  *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
-	  return cond;
+	  *assumptions = niter_assumptions;
+	  *number_of_iterationsm1 = niter;
+
+	  /* We want the number of loop header executions which is the number
+	     of latch executions plus one.
+	     ???  For UINT_MAX latch executions this number overflows to zero
+	     for loops like do { n++; } while (n != 0);  */
+	  if (niter && !chrec_contains_undetermined (niter))
+	    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
+				 unshare_expr (niter),
+				 build_int_cst (TREE_TYPE (niter), 1));
+	  *number_of_iterations = niter;
 	}
-      else
-	return cond;
     }
 
-  *assumptions = niter_assumptions;
-  *number_of_iterationsm1 = niter;
-
-  /* We want the number of loop header executions which is the number
-     of latch executions plus one.
-     ???  For UINT_MAX latch executions this number overflows to zero
-     for loops like do { n++; } while (n != 0);  */
-  if (niter && !chrec_contains_undetermined (niter))
-    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter),
-			  build_int_cst (TREE_TYPE (niter), 1));
-  *number_of_iterations = niter;
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "All loop exits successfully analyzed.\n");
 
-  return cond;
+  return conds;
 }
 
 /* Function bb_in_loop_p
@@ -1768,15 +1794,26 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				   "not vectorized:"
 				   " abnormal loop exit edge.\n");
 
-  info->loop_cond
+  info->conds
     = vect_get_loop_niters (loop, &info->assumptions,
 			    &info->number_of_iterations,
 			    &info->number_of_iterationsm1);
-  if (!info->loop_cond)
+
+  if (info->conds.is_empty ())
     return opt_result::failure_at
       (vect_location,
        "not vectorized: complicated exit condition.\n");
 
+  /* Determine what the primary and alternate exit conds are.  */
+  info->alt_loop_conds.create (info->conds.length () - 1);
+  for (gcond *cond : info->conds)
+    {
+      if (loop->vec_loop_iv->src != gimple_bb (cond))
+	info->alt_loop_conds.quick_push (cond);
+      else
+	info->loop_cond = cond;
+    }
+
   if (integer_zerop (info->assumptions)
       || !info->number_of_iterations
       || chrec_contains_undetermined (info->number_of_iterations))
@@ -1821,8 +1858,14 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
   if (!integer_onep (info->assumptions) && !main_loop_info)
     LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
 
-  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
-  STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
+  for (gcond *cond : info->alt_loop_conds)
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
+      STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
+    }
+  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
+  LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
+
   if (info->inner_loop_cond)
     {
       stmt_vec_info inner_loop_cond_info
@@ -3520,6 +3563,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 		     "***** Choosing vector mode %s\n",
 		     GET_MODE_NAME (first_loop_vinfo->vector_mode));
 
+  loop_form_info.conds.release ();
+  loop_form_info.alt_loop_conds.release ();
+
   /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
      enabled, SIMDUID is not set, it is the innermost loop and we have
      either already found the loop's SIMDLEN or there was no SIMDLEN to
@@ -3631,6 +3677,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 			   (first_loop_vinfo->epilogue_vinfos[0]->vector_mode));
     }
 
+  loop_form_info.conds.release ();
+  loop_form_info.alt_loop_conds.release ();
+
   return first_loop_vinfo;
 }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index bd5eceb5da7a45ef036cd14609ebe091799320bf..1cc003c12e2447eca878f56cb019236f56e96f85 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -876,6 +876,12 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* List of loop additional IV conditionals found in the loop.  */
+  auto_vec<gcond *> conds;
+
+  /* Main loop IV cond.  */
+  gcond* loop_iv_cond;
+
   /* True if there are no loop carried data dependencies in the loop.
      If loop->safelen <= 1, then this is always true, either the loop
      didn't have any loop carried data dependencies, or the loop is being
@@ -966,6 +972,8 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
+#define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
 #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
 #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
@@ -2353,7 +2361,9 @@ struct vect_loop_form_info
   tree number_of_iterations;
   tree number_of_iterationsm1;
   tree assumptions;
+  vec<gcond *> conds;
   gcond *loop_cond;
+  vec<gcond *> alt_loop_conds;
   gcond *inner_loop_cond;
 };
 extern opt_result vect_analyze_loop_form (class loop *, vect_loop_form_info *);




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable.
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (7 preceding siblings ...)
  2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
@ 2023-06-28 13:45 ` Tamar Christina
  2023-06-28 13:55   ` [PATCH 9/19] " Tamar Christina
  2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
                   ` (33 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:45 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 4605 bytes --]

Hi All,

Vectorization of a gcond starts off essentially the same as vectorizing a
comparison witht he only difference being how the operands are extracted.

This refactors vectorable_comparison such that we now have a generic function
that can be used from vectorizable_early_break.  The refactoring splits the
gassign checks and actual validation/codegen off to a helper function.

No change in functionality expected.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting body
	to ...
	(vectorizable_comparison_1): ...This.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca81acd197693fc3457c31 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11332,21 +11332,22 @@ vectorizable_condition (vec_info *vinfo,
 
 /* vectorizable_comparison.
 
-   Check if STMT_INFO is comparison expression that can be vectorized.
+/* Helper of vectorizable_comparison.
+
+   Check if STMT_INFO is comparison expression CODE that can be vectorized.
    If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
    comparison, put it in VEC_STMT, and insert it at GSI.
 
    Return true if STMT_INFO is vectorizable in this way.  */
 
 static bool
-vectorizable_comparison (vec_info *vinfo,
-			 stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
-			 gimple **vec_stmt,
-			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
+			   stmt_vec_info stmt_info, tree_code code,
+			   gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			   slp_tree slp_node, stmt_vector_for_cost *cost_vec)
 {
   tree lhs, rhs1, rhs2;
   tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
   tree new_temp;
   loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
@@ -11354,7 +11355,7 @@ vectorizable_comparison (vec_info *vinfo,
   int ndts = 2;
   poly_uint64 nunits;
   int ncopies;
-  enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
+  enum tree_code bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
   int i;
   bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo);
   vec<tree> vec_oprnds0 = vNULL;
@@ -11377,14 +11378,6 @@ vectorizable_comparison (vec_info *vinfo,
     ncopies = vect_get_num_copies (loop_vinfo, vectype);
 
   gcc_assert (ncopies >= 1);
-  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
-    return false;
-
-  gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt);
-  if (!stmt)
-    return false;
-
-  code = gimple_assign_rhs_code (stmt);
 
   if (TREE_CODE_CLASS (code) != tcc_comparison)
     return false;
@@ -11499,7 +11492,6 @@ vectorizable_comparison (vec_info *vinfo,
 	  return false;
 	}
 
-      STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
       vect_model_simple_cost (vinfo, stmt_info,
 			      ncopies * (1 + (bitop2 != NOP_EXPR)),
 			      dts, ndts, slp_node, cost_vec);
@@ -11565,6 +11557,44 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* vectorizable_comparison.
+
+   Check if STMT_INFO is comparison expression that can be vectorized.
+   If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
+   comparison, put it in VEC_STMT, and insert it at GSI.
+
+   Return true if STMT_INFO is vectorizable in this way.  */
+
+static bool
+vectorizable_comparison (vec_info *vinfo,
+			 stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
+			 gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo);
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
+    return false;
+
+  gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!stmt)
+    return false;
+
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    return false;
+
+  if (!vec_stmt)
+    STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.




-- 

[-- Attachment #2: rb17503.patch --]
[-- Type: text/plain, Size: 3915 bytes --]

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca81acd197693fc3457c31 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11332,21 +11332,22 @@ vectorizable_condition (vec_info *vinfo,
 
 /* vectorizable_comparison.
 
-   Check if STMT_INFO is comparison expression that can be vectorized.
+/* Helper of vectorizable_comparison.
+
+   Check if STMT_INFO is comparison expression CODE that can be vectorized.
    If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
    comparison, put it in VEC_STMT, and insert it at GSI.
 
    Return true if STMT_INFO is vectorizable in this way.  */
 
 static bool
-vectorizable_comparison (vec_info *vinfo,
-			 stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
-			 gimple **vec_stmt,
-			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
+			   stmt_vec_info stmt_info, tree_code code,
+			   gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			   slp_tree slp_node, stmt_vector_for_cost *cost_vec)
 {
   tree lhs, rhs1, rhs2;
   tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
   tree new_temp;
   loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
@@ -11354,7 +11355,7 @@ vectorizable_comparison (vec_info *vinfo,
   int ndts = 2;
   poly_uint64 nunits;
   int ncopies;
-  enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
+  enum tree_code bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
   int i;
   bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo);
   vec<tree> vec_oprnds0 = vNULL;
@@ -11377,14 +11378,6 @@ vectorizable_comparison (vec_info *vinfo,
     ncopies = vect_get_num_copies (loop_vinfo, vectype);
 
   gcc_assert (ncopies >= 1);
-  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
-    return false;
-
-  gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt);
-  if (!stmt)
-    return false;
-
-  code = gimple_assign_rhs_code (stmt);
 
   if (TREE_CODE_CLASS (code) != tcc_comparison)
     return false;
@@ -11499,7 +11492,6 @@ vectorizable_comparison (vec_info *vinfo,
 	  return false;
 	}
 
-      STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
       vect_model_simple_cost (vinfo, stmt_info,
 			      ncopies * (1 + (bitop2 != NOP_EXPR)),
 			      dts, ndts, slp_node, cost_vec);
@@ -11565,6 +11557,44 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* vectorizable_comparison.
+
+   Check if STMT_INFO is comparison expression that can be vectorized.
+   If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
+   comparison, put it in VEC_STMT, and insert it at GSI.
+
+   Return true if STMT_INFO is vectorizable in this way.  */
+
+static bool
+vectorizable_comparison (vec_info *vinfo,
+			 stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
+			 gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo);
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
+    return false;
+
+  gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt);
+  if (!stmt)
+    return false;
+
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    return false;
+
+  if (!vec_stmt)
+    STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 10/19]middle-end: implement vectorizable_early_break.
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (8 preceding siblings ...)
  2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
@ 2023-06-28 13:46 ` Tamar Christina
  2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
                   ` (32 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 11010 bytes --]

Hi All,

This implements vectorable_early_exit which is used as the codegen part of
vectorizing a gcond.

For the most part it shares the majority of the code with
vectorizable_comparison with addition that it needs to be able to reduce
multiple resulting statements into a single one for use in the gcond, and also
needs to be able to perform masking on the comparisons.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f3e33cd4ed125b9564ca81acd197693fc3457c31..87c4353fa5180fcb7f60b192897456cf24f3fdbe 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11330,8 +11330,186 @@ vectorizable_condition (vec_info *vinfo,
   return true;
 }
 
-/* vectorizable_comparison.
+static bool
+vectorizable_comparison_1 (vec_info *, tree, stmt_vec_info, tree_code,
+			   gimple_stmt_iterator *, gimple **, slp_tree,
+			   stmt_vector_for_cost *);
+
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  gimple_match_op op;
+  if (!gimple_extract_op (stmt_info->stmt, &op))
+    gcc_unreachable ();
+  gcc_assert (op.code.is_tree_code ());
+  auto code = tree_code (op.code);
+
+  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype_out);
+
+  stmt_vec_info operand0_info
+    = loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (op.ops[0]));
+  if (!operand0_info)
+    return false;
+  /* If we're in a pattern get the type of the original statement.  */
+  if (STMT_VINFO_IN_PATTERN_P (operand0_info))
+    operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
+  tree vectype_op = STMT_VINFO_VECTYPE (operand0_info);
+
+  tree truth_type = truth_type_for (vectype_op);
+  machine_mode mode = TYPE_MODE (truth_type);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector "
+			       "comparisons for type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
 
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  vec<gimple *> stmts;
+
+  if (slp_node)
+    stmts = SLP_TREE_VEC_STMTS (slp_node);
+  else
+    stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<gimple *> workset (stmts.length ());
+      workset.splice (stmts);
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (truth_type, NULL, "vexit_reduc");
+	  gimple *arg0 = workset.pop ();
+	  gimple *arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR,
+					  gimple_assign_lhs (arg0),
+					  gimple_assign_lhs (arg1));
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  if (slp_node)
+	    SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
+	  else
+	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+	  workset.quick_insert (0, new_stmt);
+	}
+    }
+  else
+    new_stmt = stmts[0];
+
+  gcc_assert (new_stmt);
+
+  tree cond = gimple_assign_lhs (new_stmt);
+  if (masked_loop_p)
+    {
+      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0);
+      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			       &cond_gsi);
+    }
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  if (is_pattern_stmt_p (stmt_info))
+    stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info));
+
+  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
+			build_zero_cst (truth_type));
+  t = canonicalize_cond_expr_cond (t);
+  gimple_cond_set_condition_from_tree ((gcond*)stmt, t);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_STMTS (slp_node).quick_push (stmt);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
 /* Helper of vectorizable_comparison.
 
    Check if STMT_INFO is comparison expression CODE that can be vectorized.
@@ -11501,8 +11679,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (stmt);
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -11516,7 +11695,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -11816,7 +11998,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -11839,7 +12023,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -11997,6 +12184,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -12395,6 +12588,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+	case vect_early_exit_def:
+	  dump_printf (MSG_NOTE, "early exit\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
@@ -12511,6 +12707,14 @@ vect_is_simple_use (vec_info *vinfo, stmt_vec_info stmt, slp_tree slp_node,
 	  else
 	    *op = gimple_op (ass, operand + 1);
 	}
+      else if (gcond *cond = dyn_cast <gcond *> (stmt->stmt))
+	{
+	  gimple_match_op m_op;
+	  if (!gimple_extract_op (cond, &m_op))
+	    return false;
+	  gcc_assert (m_op.code.is_tree_code ());
+	  *op = m_op.ops[operand];
+	}
       else if (gcall *call = dyn_cast <gcall *> (stmt->stmt))
 	*op = gimple_call_arg (call, operand);
       else
@@ -13121,6 +13325,8 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
   *nunits_vectype_out = NULL_TREE;
 
   if (gimple_get_lhs (stmt) == NULL_TREE
+      /* Allow vector conditionals through here.  */
+      && !is_ctrl_stmt (stmt)
       /* MASK_STORE has no lhs, but is ok.  */
       && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
     {
@@ -13137,7 +13343,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	}
 
       return opt_result::failure_at (stmt,
-				     "not vectorized: irregular stmt.%G", stmt);
+				     "not vectorized: irregular stmt: %G", stmt);
     }
 
   tree vectype;
@@ -13166,6 +13372,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if (is_ctrl_stmt (stmt))
+	{
+	  gcond *cond = dyn_cast <gcond *> (stmt);
+	  if (!cond)
+	    return opt_result::failure_at (stmt, "not vectorized: unsupported"
+					   " control flow statement.\n");
+	  scalar_type = TREE_TYPE (gimple_cond_rhs (stmt));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 




-- 

[-- Attachment #2: rb17504.patch --]
[-- Type: text/plain, Size: 10257 bytes --]

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f3e33cd4ed125b9564ca81acd197693fc3457c31..87c4353fa5180fcb7f60b192897456cf24f3fdbe 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11330,8 +11330,186 @@ vectorizable_condition (vec_info *vinfo,
   return true;
 }
 
-/* vectorizable_comparison.
+static bool
+vectorizable_comparison_1 (vec_info *, tree, stmt_vec_info, tree_code,
+			   gimple_stmt_iterator *, gimple **, slp_tree,
+			   stmt_vector_for_cost *);
+
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  gimple_match_op op;
+  if (!gimple_extract_op (stmt_info->stmt, &op))
+    gcc_unreachable ();
+  gcc_assert (op.code.is_tree_code ());
+  auto code = tree_code (op.code);
+
+  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype_out);
+
+  stmt_vec_info operand0_info
+    = loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (op.ops[0]));
+  if (!operand0_info)
+    return false;
+  /* If we're in a pattern get the type of the original statement.  */
+  if (STMT_VINFO_IN_PATTERN_P (operand0_info))
+    operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
+  tree vectype_op = STMT_VINFO_VECTYPE (operand0_info);
+
+  tree truth_type = truth_type_for (vectype_op);
+  machine_mode mode = TYPE_MODE (truth_type);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector "
+			       "comparisons for type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
 
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  vec<gimple *> stmts;
+
+  if (slp_node)
+    stmts = SLP_TREE_VEC_STMTS (slp_node);
+  else
+    stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<gimple *> workset (stmts.length ());
+      workset.splice (stmts);
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (truth_type, NULL, "vexit_reduc");
+	  gimple *arg0 = workset.pop ();
+	  gimple *arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR,
+					  gimple_assign_lhs (arg0),
+					  gimple_assign_lhs (arg1));
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  if (slp_node)
+	    SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
+	  else
+	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+	  workset.quick_insert (0, new_stmt);
+	}
+    }
+  else
+    new_stmt = stmts[0];
+
+  gcc_assert (new_stmt);
+
+  tree cond = gimple_assign_lhs (new_stmt);
+  if (masked_loop_p)
+    {
+      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0);
+      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			       &cond_gsi);
+    }
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  if (is_pattern_stmt_p (stmt_info))
+    stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info));
+
+  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
+			build_zero_cst (truth_type));
+  t = canonicalize_cond_expr_cond (t);
+  gimple_cond_set_condition_from_tree ((gcond*)stmt, t);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_STMTS (slp_node).quick_push (stmt);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
 /* Helper of vectorizable_comparison.
 
    Check if STMT_INFO is comparison expression CODE that can be vectorized.
@@ -11501,8 +11679,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (stmt);
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -11516,7 +11695,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -11816,7 +11998,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -11839,7 +12023,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -11997,6 +12184,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -12395,6 +12588,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+	case vect_early_exit_def:
+	  dump_printf (MSG_NOTE, "early exit\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
@@ -12511,6 +12707,14 @@ vect_is_simple_use (vec_info *vinfo, stmt_vec_info stmt, slp_tree slp_node,
 	  else
 	    *op = gimple_op (ass, operand + 1);
 	}
+      else if (gcond *cond = dyn_cast <gcond *> (stmt->stmt))
+	{
+	  gimple_match_op m_op;
+	  if (!gimple_extract_op (cond, &m_op))
+	    return false;
+	  gcc_assert (m_op.code.is_tree_code ());
+	  *op = m_op.ops[operand];
+	}
       else if (gcall *call = dyn_cast <gcall *> (stmt->stmt))
 	*op = gimple_call_arg (call, operand);
       else
@@ -13121,6 +13325,8 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
   *nunits_vectype_out = NULL_TREE;
 
   if (gimple_get_lhs (stmt) == NULL_TREE
+      /* Allow vector conditionals through here.  */
+      && !is_ctrl_stmt (stmt)
       /* MASK_STORE has no lhs, but is ok.  */
       && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
     {
@@ -13137,7 +13343,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	}
 
       return opt_result::failure_at (stmt,
-				     "not vectorized: irregular stmt.%G", stmt);
+				     "not vectorized: irregular stmt: %G", stmt);
     }
 
   tree vectype;
@@ -13166,6 +13372,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if (is_ctrl_stmt (stmt))
+	{
+	  gcond *cond = dyn_cast <gcond *> (stmt);
+	  if (!cond)
+	    return opt_result::failure_at (stmt, "not vectorized: unsupported"
+					   " control flow statement.\n");
+	  scalar_type = TREE_TYPE (gimple_cond_rhs (stmt));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 11/19]middle-end: implement code motion for early break.
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (9 preceding siblings ...)
  2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
@ 2023-06-28 13:46 ` Tamar Christina
  2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
                   ` (31 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 17501 bytes --]

Hi All,

When performing early break vectorization we need to be sure that the vector
operations are safe to perform.  A simple example is e.g.

 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i]*2 != x)
     break;
   vect_a[i] = x;
 }

where the store to vect_b is not allowed to be executed unconditionally since
if we exit through the early break it wouldn't have been done for the full VF
iteration.

Effective the code motion determines:
  - is it safe/possible to vectorize the function
  - what updates to the VUSES should be performed if we do
  - Which statements need to be moved
  - Which statements can't be moved:
    * values that are live must be reachable through all exits
    * values that aren't single use and shared by the use/def chain of the cond
  - The final insertion point of the instructions.  In the cases we have
    multiple early exist statements this should be the one closest to the loop
    latch itself.

After motion the loop above is:

 for (int i = 0; i < N; i++)
 {
   ... y = x + i;
   if (vect_a[i]*2 != x)
     break;
   vect_b[i] = y;
   vect_a[i] = x;

 }

The operation is split into two, during data ref analysis we determine
validity of the operation and generate a worklist of actions to perform if we
vectorize.

After peeling and just before statetement tranformation we replay this worklist
which moves the statements and updates book keeping only in the main loop that's
to be vectorized.  This includes updating of USES in exit blocks.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
	(vect_analyze_data_ref_dependences): Use it.
	* tree-vect-loop.cc (move_early_exit_stmts): New.
	(vect_transform_loop): Use it.
	* tree-vectorizer.h (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS,
	LOOP_VINFO_EARLY_BRK_DEST_BB, LOOP_VINFO_EARLY_BRK_VUSES): New.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index fcc950f528b2d1e044be12424c2df11f692ee8ba..240bd7a86233f6b907816f812681e4cd778ecaae 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -568,6 +568,278 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.
+
+   Arguments:
+     - LOOP_VINFO: loop information for the current loop.
+     - CHAIN: Currently detected sequence of instructions that need to be moved
+	      if we are to vectorize this early break.
+     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from
+	      one or more cond conditions.  If this set overlaps with CHAIN then FIXED
+	      takes precedence.  This deals with non-single use cases.
+     - LOADS: List of all loads found during traversal.
+     - BASES: List of all load data references found during traversal.
+     - GSTMT: Current position to inspect for validity.  The sequence
+	      will be moved upwards from this point.
+     - REACHING_VUSE: The dominating VUSE found so far.
+     - CURRENT_VDEF: The last VDEF we've seen.  These are updated in
+		      pre-order and updated in post-order after moving the
+		      instruction.  */
+
+static bool
+validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree> *chain,
+			   hash_set<tree> *fixed, vec<tree> *loads,
+			   vec<data_reference *> *bases, tree *reaching_vuse,
+			   tree *current_vdef, gimple_stmt_iterator *gstmt,
+			   hash_map<tree, tree> *renames)
+{
+  if (gsi_end_p (*gstmt))
+    return true;
+
+  gimple *stmt = gsi_stmt (*gstmt);
+  if (gimple_has_ops (stmt))
+    {
+      tree dest = NULL_TREE;
+      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
+	 use the LHS, if not, assume that the first argument of a call is the
+	 value being defined.  e.g. MASKED_LOAD etc.  */
+      if (gimple_has_lhs (stmt))
+	{
+	  if (is_gimple_assign (stmt))
+	    dest = gimple_assign_lhs (stmt);
+	  else if (const gcall *call = dyn_cast <const gcall *> (stmt))
+	    dest = gimple_call_lhs (call);
+	}
+      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
+	dest = gimple_arg (call, 0);
+      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
+	{
+	  /* Operands of conds are ones we can't move.  */
+	  fixed->add (gimple_cond_lhs (cond));
+	  fixed->add (gimple_cond_rhs (cond));
+	}
+
+      bool move = false;
+
+      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_vinfo)
+	{
+	   if (dump_enabled_p ())
+	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			      "early breaks only supported. Unknown"
+			      " statement: %G", stmt);
+	   return false;
+	}
+
+      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+      if (dr_ref)
+	{
+	   /* We currenly only support statically allocated objects due to
+	      not having first-faulting loads support or peeling for alignment
+	      support.  Compute the isize of the referenced object (it could be
+	      dynamically allocated).  */
+	   tree obj = DR_BASE_ADDRESS (dr_ref);
+	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   tree refop = TREE_OPERAND (obj, 0);
+	   tree refbase = get_base_address (refop);
+	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   if (DR_IS_READ (dr_ref))
+	     {
+		loads->safe_push (dest);
+		bases->safe_push (dr_ref);
+	     }
+	   else if (DR_IS_WRITE (dr_ref))
+	     {
+		for (auto dr : bases)
+		  if (same_data_refs_base_objects (dr, dr_ref))
+		    {
+		      if (dump_enabled_p ())
+			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+					   vect_location,
+					   "early breaks only supported,"
+					   " overlapping loads and stores found"
+					   " before the break statement.\n");
+		      return false;
+		    }
+		/* Any writes starts a new chain. */
+		move = true;
+	     }
+	}
+
+      /* If a statement if live and escapes the loop through usage in the loop
+	 epilogue then we can't move it since we need to maintain its
+	 reachability through all exits.  */
+      bool skip = false;
+      if (STMT_VINFO_LIVE_P (stmt_vinfo)
+	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
+	{
+	  imm_use_iterator imm_iter;
+	  use_operand_p use_p;
+	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
+	    {
+	      basic_block bb = gimple_bb (USE_STMT (use_p));
+	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+	      if (skip)
+		break;
+	    }
+	}
+
+      /* If we found the defining statement of a something that's part of the
+	 chain then expand the chain with the new SSA_VARs being used.  */
+      if (!skip && (chain->contains (dest) || move))
+	{
+	  move = true;
+	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
+	    {
+	      tree var = gimple_arg (stmt, x);
+	      if (TREE_CODE (var) == SSA_NAME)
+		{
+		  if (fixed->contains (dest))
+		    {
+		      move = false;
+		      fixed->add (var);
+		    }
+		  else
+		    chain->add (var);
+		}
+	      else
+		{
+		  use_operand_p use_p;
+		  ssa_op_iter iter;
+		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+		    {
+		      tree op = USE_FROM_PTR (use_p);
+		      gcc_assert (TREE_CODE (op) == SSA_NAME);
+		      if (fixed->contains (dest))
+			{
+			  move = false;
+			  fixed->add (op);
+			}
+		      else
+			chain->add (op);
+		    }
+		}
+	    }
+
+	  if (dump_enabled_p ())
+	    {
+	      if (move)
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"found chain %G", stmt);
+	      else
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"ignored chain %G, not single use", stmt);
+	    }
+	}
+
+      if (move)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "==> recording stmt %G", stmt);
+
+	  for (tree ref : loads)
+	    if (stmt_may_clobber_ref_p (stmt, ref, true))
+	      {
+	        if (dump_enabled_p ())
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   "early breaks not supported as memory used"
+				   " may alias.\n");
+	        return false;
+	      }
+
+	  /* This statement is to be moved.  */
+	  LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt);
+
+	  /* If we've moved a VDEF, extract the defining MEM and update
+	     usages of it.   */
+	  tree vdef;
+	  if ((vdef = gimple_vdef (stmt)))
+	    {
+	      *current_vdef = vdef;
+	      *reaching_vuse = gimple_vuse (stmt);
+	    }
+	}
+    }
+
+  gsi_prev (gstmt);
+
+  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
+				  reaching_vuse, current_vdef, gstmt, renames))
+    return false;
+
+  if (gimple_vuse (stmt)
+      && reaching_vuse && *reaching_vuse
+      && gimple_vuse (stmt) == *current_vdef)
+    {
+      tree new_vuse = *reaching_vuse;
+      tree *renamed = renames->get (new_vuse);
+      if (renamed)
+        new_vuse = *renamed;
+      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push ({stmt, new_vuse});
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "current_use: %T, new_use: %T, mem_ref: %G",
+			   *current_vdef, new_vuse, stmt);
+
+      if (!renamed)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "stored: %T -> %T\n", *current_vdef, new_vuse);
+
+	  renames->put (*current_vdef, new_vuse);
+	}
+    }
+
+  return true;
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -612,6 +884,84 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      hash_set<tree> chain, fixed;
+      auto_vec<tree> loads;
+      auto_vec<data_reference *> bases;
+      hash_map<tree, tree> renames;
+      basic_block dest_bb = NULL;
+      tree vdef = NULL;
+      tree vuse = NULL;
+
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "loop contains multiple exits, analyzing"
+			   " statement dependencies.\n");
+
+      for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+	{
+	  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+	  if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	    continue;
+
+	  gimple *stmt = STMT_VINFO_STMT (loop_cond_info);
+	  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+
+	  /* Initiaze the vuse chain with the one at the early break.  */
+	  if (!vuse)
+	    vuse = gimple_vuse (c);
+
+	  if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
+					  &bases, &vuse, &vdef, &gsi, &renames))
+	    return opt_result::failure_at (stmt,
+					   "can't safely apply code motion to "
+					   "dependencies of %G to vectorize "
+					   "the early exit.\n", stmt);
+
+	  /* Save destination as we go, BB are visited in order and the last one
+	     is where statements should be moved to.  */
+	  if (!dest_bb)
+	    dest_bb = gimple_bb (c);
+	  else
+	    {
+	      basic_block curr_bb = gimple_bb (c);
+	      if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+		dest_bb = curr_bb;
+	    }
+	}
+
+      dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
+      gcc_assert (dest_bb);
+      LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+      /* Do some renaming to update the uses chain.  */
+      for (unsigned i = 0; i < LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).length (); i++)
+	{
+	  auto g = LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)[i];
+	  tree *tmp = renames.get (g.second);
+	  if (tmp)
+	    LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)[i]
+	      = std::make_pair (g.first, *tmp);
+	}
+
+      /* TODO: Remove? It's useful debug statement but may be too much.  */
+      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "overrode use: %T, mem_ref: %G",
+			     g.second, g.first);
+	}
+
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "recorded statements to be moved to BB %d\n",
+			   LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+    }
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 9065811b3b9c2a550baf44768603172b9e26b94b..b4a98de80aa39057fc9b17977dd0e347b4f0fb5d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11192,6 +11192,45 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+      update_stmt (stmt);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse %G", p.first);
+      unlink_stmt_vdef (p.first);
+      gimple_set_vuse (p.first, p.second);
+      update_stmt (p.first);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11330,6 +11369,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 1cc003c12e2447eca878f56cb019236f56e96f85..ec65b65b5910e9cbad0a8c7e83c950b6168b98bf 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -919,6 +919,19 @@ public:
      analysis.  */
   vec<_loop_vec_info *> epilogue_vinfos;
 
+  /* Used to store the list of statements needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  */
+  auto_vec<gimple *> early_break_conflict;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<std::pair<gimple*, tree>> early_break_vuses;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -972,6 +985,9 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies




-- 

[-- Attachment #2: rb17505.patch --]
[-- Type: text/plain, Size: 15529 bytes --]

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index fcc950f528b2d1e044be12424c2df11f692ee8ba..240bd7a86233f6b907816f812681e4cd778ecaae 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -568,6 +568,278 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.
+
+   Arguments:
+     - LOOP_VINFO: loop information for the current loop.
+     - CHAIN: Currently detected sequence of instructions that need to be moved
+	      if we are to vectorize this early break.
+     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from
+	      one or more cond conditions.  If this set overlaps with CHAIN then FIXED
+	      takes precedence.  This deals with non-single use cases.
+     - LOADS: List of all loads found during traversal.
+     - BASES: List of all load data references found during traversal.
+     - GSTMT: Current position to inspect for validity.  The sequence
+	      will be moved upwards from this point.
+     - REACHING_VUSE: The dominating VUSE found so far.
+     - CURRENT_VDEF: The last VDEF we've seen.  These are updated in
+		      pre-order and updated in post-order after moving the
+		      instruction.  */
+
+static bool
+validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree> *chain,
+			   hash_set<tree> *fixed, vec<tree> *loads,
+			   vec<data_reference *> *bases, tree *reaching_vuse,
+			   tree *current_vdef, gimple_stmt_iterator *gstmt,
+			   hash_map<tree, tree> *renames)
+{
+  if (gsi_end_p (*gstmt))
+    return true;
+
+  gimple *stmt = gsi_stmt (*gstmt);
+  if (gimple_has_ops (stmt))
+    {
+      tree dest = NULL_TREE;
+      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
+	 use the LHS, if not, assume that the first argument of a call is the
+	 value being defined.  e.g. MASKED_LOAD etc.  */
+      if (gimple_has_lhs (stmt))
+	{
+	  if (is_gimple_assign (stmt))
+	    dest = gimple_assign_lhs (stmt);
+	  else if (const gcall *call = dyn_cast <const gcall *> (stmt))
+	    dest = gimple_call_lhs (call);
+	}
+      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
+	dest = gimple_arg (call, 0);
+      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
+	{
+	  /* Operands of conds are ones we can't move.  */
+	  fixed->add (gimple_cond_lhs (cond));
+	  fixed->add (gimple_cond_rhs (cond));
+	}
+
+      bool move = false;
+
+      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_vinfo)
+	{
+	   if (dump_enabled_p ())
+	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			      "early breaks only supported. Unknown"
+			      " statement: %G", stmt);
+	   return false;
+	}
+
+      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+      if (dr_ref)
+	{
+	   /* We currenly only support statically allocated objects due to
+	      not having first-faulting loads support or peeling for alignment
+	      support.  Compute the isize of the referenced object (it could be
+	      dynamically allocated).  */
+	   tree obj = DR_BASE_ADDRESS (dr_ref);
+	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   tree refop = TREE_OPERAND (obj, 0);
+	   tree refbase = get_base_address (refop);
+	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   if (DR_IS_READ (dr_ref))
+	     {
+		loads->safe_push (dest);
+		bases->safe_push (dr_ref);
+	     }
+	   else if (DR_IS_WRITE (dr_ref))
+	     {
+		for (auto dr : bases)
+		  if (same_data_refs_base_objects (dr, dr_ref))
+		    {
+		      if (dump_enabled_p ())
+			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+					   vect_location,
+					   "early breaks only supported,"
+					   " overlapping loads and stores found"
+					   " before the break statement.\n");
+		      return false;
+		    }
+		/* Any writes starts a new chain. */
+		move = true;
+	     }
+	}
+
+      /* If a statement if live and escapes the loop through usage in the loop
+	 epilogue then we can't move it since we need to maintain its
+	 reachability through all exits.  */
+      bool skip = false;
+      if (STMT_VINFO_LIVE_P (stmt_vinfo)
+	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
+	{
+	  imm_use_iterator imm_iter;
+	  use_operand_p use_p;
+	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
+	    {
+	      basic_block bb = gimple_bb (USE_STMT (use_p));
+	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+	      if (skip)
+		break;
+	    }
+	}
+
+      /* If we found the defining statement of a something that's part of the
+	 chain then expand the chain with the new SSA_VARs being used.  */
+      if (!skip && (chain->contains (dest) || move))
+	{
+	  move = true;
+	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
+	    {
+	      tree var = gimple_arg (stmt, x);
+	      if (TREE_CODE (var) == SSA_NAME)
+		{
+		  if (fixed->contains (dest))
+		    {
+		      move = false;
+		      fixed->add (var);
+		    }
+		  else
+		    chain->add (var);
+		}
+	      else
+		{
+		  use_operand_p use_p;
+		  ssa_op_iter iter;
+		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+		    {
+		      tree op = USE_FROM_PTR (use_p);
+		      gcc_assert (TREE_CODE (op) == SSA_NAME);
+		      if (fixed->contains (dest))
+			{
+			  move = false;
+			  fixed->add (op);
+			}
+		      else
+			chain->add (op);
+		    }
+		}
+	    }
+
+	  if (dump_enabled_p ())
+	    {
+	      if (move)
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"found chain %G", stmt);
+	      else
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"ignored chain %G, not single use", stmt);
+	    }
+	}
+
+      if (move)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "==> recording stmt %G", stmt);
+
+	  for (tree ref : loads)
+	    if (stmt_may_clobber_ref_p (stmt, ref, true))
+	      {
+	        if (dump_enabled_p ())
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   "early breaks not supported as memory used"
+				   " may alias.\n");
+	        return false;
+	      }
+
+	  /* This statement is to be moved.  */
+	  LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt);
+
+	  /* If we've moved a VDEF, extract the defining MEM and update
+	     usages of it.   */
+	  tree vdef;
+	  if ((vdef = gimple_vdef (stmt)))
+	    {
+	      *current_vdef = vdef;
+	      *reaching_vuse = gimple_vuse (stmt);
+	    }
+	}
+    }
+
+  gsi_prev (gstmt);
+
+  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
+				  reaching_vuse, current_vdef, gstmt, renames))
+    return false;
+
+  if (gimple_vuse (stmt)
+      && reaching_vuse && *reaching_vuse
+      && gimple_vuse (stmt) == *current_vdef)
+    {
+      tree new_vuse = *reaching_vuse;
+      tree *renamed = renames->get (new_vuse);
+      if (renamed)
+        new_vuse = *renamed;
+      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push ({stmt, new_vuse});
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "current_use: %T, new_use: %T, mem_ref: %G",
+			   *current_vdef, new_vuse, stmt);
+
+      if (!renamed)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "stored: %T -> %T\n", *current_vdef, new_vuse);
+
+	  renames->put (*current_vdef, new_vuse);
+	}
+    }
+
+  return true;
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -612,6 +884,84 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      hash_set<tree> chain, fixed;
+      auto_vec<tree> loads;
+      auto_vec<data_reference *> bases;
+      hash_map<tree, tree> renames;
+      basic_block dest_bb = NULL;
+      tree vdef = NULL;
+      tree vuse = NULL;
+
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "loop contains multiple exits, analyzing"
+			   " statement dependencies.\n");
+
+      for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+	{
+	  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+	  if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	    continue;
+
+	  gimple *stmt = STMT_VINFO_STMT (loop_cond_info);
+	  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+
+	  /* Initiaze the vuse chain with the one at the early break.  */
+	  if (!vuse)
+	    vuse = gimple_vuse (c);
+
+	  if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
+					  &bases, &vuse, &vdef, &gsi, &renames))
+	    return opt_result::failure_at (stmt,
+					   "can't safely apply code motion to "
+					   "dependencies of %G to vectorize "
+					   "the early exit.\n", stmt);
+
+	  /* Save destination as we go, BB are visited in order and the last one
+	     is where statements should be moved to.  */
+	  if (!dest_bb)
+	    dest_bb = gimple_bb (c);
+	  else
+	    {
+	      basic_block curr_bb = gimple_bb (c);
+	      if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+		dest_bb = curr_bb;
+	    }
+	}
+
+      dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
+      gcc_assert (dest_bb);
+      LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+      /* Do some renaming to update the uses chain.  */
+      for (unsigned i = 0; i < LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).length (); i++)
+	{
+	  auto g = LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)[i];
+	  tree *tmp = renames.get (g.second);
+	  if (tmp)
+	    LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)[i]
+	      = std::make_pair (g.first, *tmp);
+	}
+
+      /* TODO: Remove? It's useful debug statement but may be too much.  */
+      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "overrode use: %T, mem_ref: %G",
+			     g.second, g.first);
+	}
+
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "recorded statements to be moved to BB %d\n",
+			   LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+    }
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 9065811b3b9c2a550baf44768603172b9e26b94b..b4a98de80aa39057fc9b17977dd0e347b4f0fb5d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11192,6 +11192,45 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+      update_stmt (stmt);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse %G", p.first);
+      unlink_stmt_vdef (p.first);
+      gimple_set_vuse (p.first, p.second);
+      update_stmt (p.first);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11330,6 +11369,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 1cc003c12e2447eca878f56cb019236f56e96f85..ec65b65b5910e9cbad0a8c7e83c950b6168b98bf 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -919,6 +919,19 @@ public:
      analysis.  */
   vec<_loop_vec_info *> epilogue_vinfos;
 
+  /* Used to store the list of statements needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  */
+  auto_vec<gimple *> early_break_conflict;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<std::pair<gimple*, tree>> early_break_vuses;
+
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -972,6 +985,9 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (10 preceding siblings ...)
  2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
@ 2023-06-28 13:47 ` Tamar Christina
  2023-07-13 17:31   ` Richard Biener
  2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
                   ` (30 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 56269 bytes --]

Hi All,

This patch updates the peeling code to maintain LCSSA during peeling.
The rewrite also naturally takes into account multiple exits and so it didn't
make sense to split them off.

For the purposes of peeling the only change for multiple exits is that the
secondary exits are all wired to the start of the new loop preheader when doing
epilogue peeling.

When doing prologue peeling the CFG is kept in tact.

For both epilogue and prologue peeling we wire through between the two loops any
PHI nodes that escape the first loop into the second loop if flow_loops is
specified.  The reason for this conditionality is because
slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
  - prologue peeling
  - epilogue peeling
  - loop distribution

for the last case the loops should remain independent, and so not be connected.
Because of this propagation of only used phi nodes get_current_def can be used
to easily find the previous definitions.  However live statements that are
not used inside the loop itself are not propagated (since if unused, the moment
we add the guard in between the two loops the value across the bypass edge can
be wrong if the loop has been peeled.)

This is dealt with easily enough in find_guard_arg.

For multiple exits, while we are in LCSSA form, and have a correct DOM tree, the
moment we add the guard block we will change the dominators again.  To deal with
this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the blocks to
update without having to recompute the list of blocks to update again.

When multiple exits and doing epilogue peeling we will also temporarily have an
incorrect VUSES chain for the secondary exits as it anticipates the final result
after the VDEFs have been moved.  This will thus be corrected once the code
motion is applied.

Lastly by doing things this way we can remove the helper functions that
previously did lock step iterations to update things as it went along.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-loop-distribution.cc (copy_loop_before): Pass flow_loops = false.
	* tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
	* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
	assert.
	(vect_set_loop_condition_normal): Skip modifying loop IV for multiple
	exits.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit peeling.
	(slpeel_can_duplicate_loop_p): Likewise.
	(vect_update_ivs_after_vectorizer): Don't enter this...
	(vect_update_ivs_after_early_break): ...but instead enter here.
	(find_guard_arg): Update for new peeling code.
	(slpeel_update_phi_nodes_for_loops): Remove.
	(slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0 checks.
	(slpeel_update_phi_nodes_for_lcssa): Remove.
	(vect_do_peeling): Fix VF for multiple exits and force epilogue.
	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
	non_break_control_flow and early_breaks.
	(vect_need_peeling_or_partial_vectors_p): Force partial vector if
	multiple exits and VLA.
	(vect_analyze_loop_form): Support inner loop multiple exits.
	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
	(vect_create_epilog_for_reduction):  Update live phi nodes.
	(vectorizable_live_operation): Ignore live operations in vector loop
	when multiple exits.
	(vect_transform_loop): Force unrolling for VF loops and multiple exits.
	* tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
	(vect_mark_stmts_to_be_vectorized): Check for non-exit control flow and
	analyze gcond params.
	(vect_analyze_stmt): Support gcond.
	* tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
	in RPO pass.
	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
	(LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW): New.
	(loop_vec_info_for_loop): Change to const and static.
	(is_loop_header_bb_p): Drop assert.
	(slpeel_can_duplicate_loop_p): Update prototype.
	(class loop): Add early_breaks and non_break_control_flow.

--- inline copy of patch -- 
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40d833a968314a4442b9e 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
   edge preheader = loop_preheader_edge (loop);
 
   initialize_original_copy_tables ();
-  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
+  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader, false);
   gcc_assert (res != NULL);
 
   /* When a not last partition is supposed to keep the LC PHIs computed
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 5d398b67e68c7076760854119590f18b19c622b6..79686f6c4945b7139ba377300430c04b7aeefe6c 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -3072,7 +3072,12 @@ loop_only_exit_p (const class loop *loop, basic_block *body, const_edge exit)
   gimple_stmt_iterator bsi;
   unsigned i;
 
-  if (exit != single_exit (loop))
+  /* We need to check for alternative exits since exit can be NULL.  */
+  auto exits = get_loop_exit_edges (loop);
+  if (exits.length () != 1)
+    return false;
+
+  if (exit != exits[0])
     return false;
 
   for (i = 0; i < loop->num_nodes; i++)
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 6b93fb3f9af8f2bbdf5dec28f0009177aa5171ab..550d7f40002cf0b58f8a927cb150edd7c2aa9999 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple *update_phi, edge e, tree new_def)
 {
   tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
 
+  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
+	      || orig_def != new_def);
+
   SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
 
   if (MAY_HAVE_DEBUG_BIND_STMTS)
@@ -1292,7 +1295,8 @@ vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
 
   /* Record the number of latch iterations.  */
-  if (limit == niters)
+  if (limit == niters
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     /* Case A: the loop iterates NITERS times.  Subtract one to get the
        latch count.  */
     loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
@@ -1303,7 +1307,13 @@ vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
     loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type,
 				       limit, step);
 
-  if (final_iv)
+  /* For multiple exits we've already maintained LCSSA form and handled
+     the scalar iteration update in the code that deals with the merge
+     block and its updated guard.  I could move that code here instead
+     of in vect_update_ivs_after_early_break but I have to still deal
+     with the updates to the counter `i`.  So for now I'll keep them
+     together.  */
+  if (final_iv && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       gassign *assign;
       edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo);
@@ -1509,11 +1519,19 @@ vec_init_exit_info (class loop *loop)
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
    basic blocks from SCALAR_LOOP instead of LOOP, but to either the
-   entry or exit of LOOP.  */
+   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to SCALAR_LOOP as a
+   continuation.  This is correct for cases where one loop continues from the
+   other like in the vectorizer, but not true for uses in e.g. loop distribution
+   where the loop is duplicated and then modified.
+
+   If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
+   dominators were updated during the peeling.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
-					class loop *scalar_loop, edge e)
+					class loop *scalar_loop, edge e,
+					bool flow_loops,
+					vec<basic_block> *updated_doms)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1602,6 +1620,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
   for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++)
     rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
 
+  /* Rename the exit uses.  */
+  for (edge exit : get_loop_exit_edges (new_loop))
+    for (auto gsi = gsi_start_phis (exit->dest);
+	 !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit);
+	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit));
+	if (MAY_HAVE_DEBUG_BIND_STMTS)
+	  adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest);
+      }
+
+  /* This condition happens when the loop has been versioned. e.g. due to ifcvt
+     versioning the loop.  */
   if (scalar_loop != loop)
     {
       /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from
@@ -1616,28 +1647,106 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
 						EDGE_SUCC (loop->latch, 0));
     }
 
+  vec<edge> alt_exits = loop->vec_loop_alt_exits;
+  bool multiple_exits_p = !alt_exits.is_empty ();
+  auto_vec<basic_block> doms;
+  class loop *update_loop = NULL;
+
   if (at_exit) /* Add the loop copy at exit.  */
     {
-      if (scalar_loop != loop)
+      if (scalar_loop != loop && new_exit->dest != exit_dest)
 	{
-	  gphi_iterator gsi;
 	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
+	  flush_pending_stmts (new_exit);
+	}
 
-	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
-	       gsi_next (&gsi))
+      auto loop_exits = get_loop_exit_edges (loop);
+      for (edge exit : loop_exits)
+	redirect_edge_and_branch (exit, new_preheader);
+
+
+      /* Copy the current loop LC PHI nodes between the original loop exit
+	 block and the new loop header.  This allows us to later split the
+	 preheader block and still find the right LC nodes.  */
+      edge latch_new = single_succ_edge (new_preheader);
+      edge latch_old = loop_latch_edge (loop);
+      hash_set <tree> lcssa_vars;
+      for (auto gsi_from = gsi_start_phis (latch_old->dest),
+	   gsi_to = gsi_start_phis (latch_new->dest);
+	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	   gsi_next (&gsi_from), gsi_next (&gsi_to))
+	{
+	  gimple *from_phi = gsi_stmt (gsi_from);
+	  gimple *to_phi = gsi_stmt (gsi_to);
+	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, latch_old);
+	  /* In all cases, even in early break situations we're only
+	     interested in the number of fully executed loop iters.  As such
+	     we discard any partially done iteration.  So we simply propagate
+	     the phi nodes from the latch to the merge block.  */
+	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
+
+	  lcssa_vars.add (new_arg);
+
+	  /* Main loop exit should use the final iter value.  */
+	  add_phi_arg (lcssa_phi, new_arg, loop->vec_loop_iv, UNKNOWN_LOCATION);
+
+	  /* All other exits use the previous iters.  */
+	  for (edge e : alt_exits)
+	    add_phi_arg (lcssa_phi, gimple_phi_result (from_phi), e,
+			 UNKNOWN_LOCATION);
+
+	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
+	}
+
+      /* Copy over any live SSA vars that may not have been materialized in the
+	 loops themselves but would be in the exit block.  However when the live
+	 value is not used inside the loop then we don't need to do this,  if we do
+	 then when we split the guard block the branch edge can end up containing the
+	 wrong reference,  particularly if it shares an edge with something that has
+	 bypassed the loop.  This is not something peeling can check so we need to
+	 anticipate the usage of the live variable here.  */
+      auto exit_map = redirect_edge_var_map_vector (exit);
+      if (exit_map)
+        for (auto vm : exit_map)
+	{
+	  if (lcssa_vars.contains (vm.def)
+	      || TREE_CODE (vm.def) != SSA_NAME)
+	    continue;
+
+	  imm_use_iterator imm_iter;
+	  use_operand_p use_p;
+	  bool use_in_loop = false;
+
+	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def)
 	    {
-	      gphi *phi = gsi.phi ();
-	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
-	      location_t orig_locus
-		= gimple_phi_arg_location_from_edge (phi, e);
+	      basic_block bb = gimple_bb (USE_STMT (use_p));
+	      if (flow_bb_inside_loop_p (loop, bb)
+		  && !gimple_vuse (USE_STMT (use_p)))
+		{
+		  use_in_loop = true;
+		  break;
+		}
+	    }
 
-	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
+	  if (!use_in_loop)
+	    {
+	       /* Do a final check to see if it's perhaps defined in the loop.  This
+		  mirrors the relevancy analysis's used_outside_scope.  */
+	      gimple *stmt = SSA_NAME_DEF_STMT (vm.def);
+	      if (!stmt || !flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
+		continue;
 	    }
+
+	  tree new_res = copy_ssa_name (vm.result);
+	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
+	  for (edge exit : loop_exits)
+	     add_phi_arg (lcssa_phi, vm.def, exit, vm.locus);
 	}
-      redirect_edge_and_branch_force (e, new_preheader);
-      flush_pending_stmts (e);
+
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
-      if (was_imm_dom || duplicate_outer_loop)
+
+      if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p)
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
 
       /* And remove the non-necessary forwarder again.  Keep the other
@@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
       delete_basic_block (preheader);
       set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
 			       loop_preheader_edge (scalar_loop)->src);
+
+      /* Finally after wiring the new epilogue we need to update its main exit
+	 to the original function exit we recorded.  Other exits are already
+	 correct.  */
+      if (multiple_exits_p)
+	{
+	  for (edge e : get_loop_exit_edges (loop))
+	    doms.safe_push (e->dest);
+	  update_loop = new_loop;
+	  doms.safe_push (exit_dest);
+
+	  /* Likely a fall-through edge, so update if needed.  */
+	  if (single_succ_p (exit_dest))
+	    doms.safe_push (single_succ (exit_dest));
+	}
     }
   else /* Add the copy at entry.  */
     {
+      /* Copy the current loop LC PHI nodes between the original loop exit
+	 block and the new loop header.  This allows us to later split the
+	 preheader block and still find the right LC nodes.  */
+      edge old_latch_loop = loop_latch_edge (loop);
+      edge old_latch_init = loop_preheader_edge (loop);
+      edge new_latch_loop = loop_latch_edge (new_loop);
+      edge new_latch_init = loop_preheader_edge (new_loop);
+      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),
+	   gsi_to = gsi_start_phis (old_latch_loop->dest);
+	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	   gsi_next (&gsi_from), gsi_next (&gsi_to))
+	{
+	  gimple *from_phi = gsi_stmt (gsi_from);
+	  gimple *to_phi = gsi_stmt (gsi_to);
+	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, new_latch_loop);
+	  adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg);
+	}
+
       if (scalar_loop != loop)
 	{
 	  /* Remove the non-necessary forwarder of scalar_loop again.  */
@@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
       delete_basic_block (new_preheader);
       set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
 			       loop_preheader_edge (new_loop)->src);
+
+      if (multiple_exits_p)
+	update_loop = loop;
     }
 
-  if (scalar_loop != loop)
+  if (multiple_exits_p)
     {
-      /* Update new_loop->header PHIs, so that on the preheader
-	 edge they are the ones from loop rather than scalar_loop.  */
-      gphi_iterator gsi_orig, gsi_new;
-      edge orig_e = loop_preheader_edge (loop);
-      edge new_e = loop_preheader_edge (new_loop);
-
-      for (gsi_orig = gsi_start_phis (loop->header),
-	   gsi_new = gsi_start_phis (new_loop->header);
-	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
-	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
+      for (edge e : get_loop_exit_edges (update_loop))
 	{
-	  gphi *orig_phi = gsi_orig.phi ();
-	  gphi *new_phi = gsi_new.phi ();
-	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
-	  location_t orig_locus
-	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
-
-	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
+	  edge ex;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
+	    {
+	      /* Find the first non-fallthrough block as fall-throughs can't
+		 dominate other blocks.  */
+	      while ((ex->flags & EDGE_FALLTHRU)
+		     && single_succ_p (ex->dest))
+		{
+		  doms.safe_push (ex->dest);
+		  ex = single_succ_edge (ex->dest);
+		}
+	      doms.safe_push (ex->dest);
+	    }
+	  doms.safe_push (e->dest);
 	}
-    }
 
+      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+      if (updated_doms)
+	updated_doms->safe_splice (doms);
+    }
   free (new_bbs);
   free (bbs);
 
@@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo, const_edge e)
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
   unsigned int num_bb = loop->inner? 5 : 2;
 
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
+
   /* All loops have an outer scope; the only case loop->outer is NULL is for
      the function itself.  */
   if (!loop_outer (loop)
@@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
 
+  /* For early exits we'll update the IVs in
+     vect_update_ivs_after_early_break.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return;
+
   basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   /* Make sure there exists a single-predecessor exit bb:  */
@@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       /* Fix phi expressions in the successor bb.  */
       adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
     }
+  return;
+}
+
+/*   Function vect_update_ivs_after_early_break.
+
+     "Advance" the induction variables of LOOP to the value they should take
+     after the execution of LOOP.  This is currently necessary because the
+     vectorizer does not handle induction variables that are used after the
+     loop.  Such a situation occurs when the last iterations of LOOP are
+     peeled, because of the early exit.  With an early exit we always peel the
+     loop.
+
+     Input:
+     - LOOP_VINFO - a loop info structure for the loop that is going to be
+		    vectorized. The last few iterations of LOOP were peeled.
+     - LOOP - a loop that is going to be vectorized. The last few iterations
+	      of LOOP were peeled.
+     - VF - The loop vectorization factor.
+     - NITERS_ORIG - the number of iterations that LOOP executes (before it is
+		     vectorized). i.e, the number of times the ivs should be
+		     bumped.
+     - NITERS_VECTOR - The number of iterations that the vector LOOP executes.
+     - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
+		  coming out from LOOP on which there are uses of the LOOP ivs
+		  (this is the path from LOOP->exit to epilog_loop->preheader).
+
+		  The new definitions of the ivs are placed in LOOP->exit.
+		  The phi args associated with the edge UPDATE_E in the bb
+		  UPDATE_E->dest are updated accordingly.
+
+     Output:
+       - If available, the LCSSA phi node for the loop IV temp.
+
+     Assumption 1: Like the rest of the vectorizer, this function assumes
+     a single loop exit that has a single predecessor.
+
+     Assumption 2: The phi nodes in the LOOP header and in update_bb are
+     organized in the same order.
+
+     Assumption 3: The access function of the ivs is simple enough (see
+     vect_can_advance_ivs_p).  This assumption will be relaxed in the future.
+
+     Assumption 4: Exactly one of the successors of LOOP exit-bb is on a path
+     coming out of LOOP on which the ivs of LOOP are used (this is the path
+     that leads to the epilog loop; other paths skip the epilog loop).  This
+     path starts with the edge UPDATE_E, and its destination (denoted update_bb)
+     needs to have its phis updated.
+ */
+
+static tree
+vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class loop * epilog,
+				   poly_int64 vf, tree niters_orig,
+				   tree niters_vector, edge update_e)
+{
+  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return NULL;
+
+  gphi_iterator gsi, gsi1;
+  tree ni_name, ivtmp = NULL;
+  basic_block update_bb = update_e->dest;
+  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
+  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  basic_block exit_bb = loop_iv->dest;
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
+
+  gcc_assert (cond);
+
+  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
+       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
+       gsi_next (&gsi), gsi_next (&gsi1))
+    {
+      tree init_expr, final_expr, step_expr;
+      tree type;
+      tree var, ni, off;
+      gimple_stmt_iterator last_gsi;
+
+      gphi *phi = gsi1.phi ();
+      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (epilog));
+      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
+      if (!phi1)
+	continue;
+      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "vect_update_ivs_after_early_break: phi: %G",
+			 (gimple *)phi);
+
+      /* Skip reduction and virtual phis.  */
+      if (!iv_phi_p (phi_info))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "reduc or virtual phi. skip.\n");
+	  continue;
+	}
+
+      /* For multiple exits where we handle early exits we need to carry on
+	 with the previous IV as loop iteration was not done because we exited
+	 early.  As such just grab the original IV.  */
+      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge (loop));
+      if (gimple_cond_lhs (cond) != phi_ssa
+	  && gimple_cond_rhs (cond) != phi_ssa)
+	{
+	  type = TREE_TYPE (gimple_phi_result (phi));
+	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
+	  step_expr = unshare_expr (step_expr);
+
+	  /* We previously generated the new merged phi in the same BB as the
+	     guard.  So use that to perform the scaling on rather than the
+	     normal loop phi which don't take the early breaks into account.  */
+	  final_expr = gimple_phi_result (phi1);
+	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_preheader_edge (loop));
+
+	  tree stype = TREE_TYPE (step_expr);
+	  /* For early break the final loop IV is:
+	     init + (final - init) * vf which takes into account peeling
+	     values and non-single steps.  */
+	  off = fold_build2 (MINUS_EXPR, stype,
+			     fold_convert (stype, final_expr),
+			     fold_convert (stype, init_expr));
+	  /* Now adjust for VF to get the final iteration value.  */
+	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf));
+
+	  /* Adjust the value with the offset.  */
+	  if (POINTER_TYPE_P (type))
+	    ni = fold_build_pointer_plus (init_expr, off);
+	  else
+	    ni = fold_convert (type,
+			       fold_build2 (PLUS_EXPR, stype,
+					    fold_convert (stype, init_expr),
+					    off));
+	  var = create_tmp_var (type, "tmp");
+
+	  last_gsi = gsi_last_bb (exit_bb);
+	  gimple_seq new_stmts = NULL;
+	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+	  /* Exit_bb shouldn't be empty.  */
+	  if (!gsi_end_p (last_gsi))
+	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
+	  else
+	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
+
+	  /* Fix phi expressions in the successor bb.  */
+	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
+	}
+      else
+	{
+	  type = TREE_TYPE (gimple_phi_result (phi));
+	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
+	  step_expr = unshare_expr (step_expr);
+
+	  /* We previously generated the new merged phi in the same BB as the
+	     guard.  So use that to perform the scaling on rather than the
+	     normal loop phi which don't take the early breaks into account.  */
+	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge (loop));
+	  tree stype = TREE_TYPE (step_expr);
+
+	  if (vf.is_constant ())
+	    {
+	      ni = fold_build2 (MULT_EXPR, stype,
+				fold_convert (stype,
+					      niters_vector),
+				build_int_cst (stype, vf));
+
+	      ni = fold_build2 (MINUS_EXPR, stype,
+				fold_convert (stype,
+					      niters_orig),
+				fold_convert (stype, ni));
+	    }
+	  else
+	    /* If the loop's VF isn't constant then the loop must have been
+	       masked, so at the end of the loop we know we have finished
+	       the entire loop and found nothing.  */
+	    ni = build_zero_cst (stype);
+
+	  ni = fold_convert (type, ni);
+	  /* We don't support variable n in this version yet.  */
+	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
+
+	  var = create_tmp_var (type, "tmp");
+
+	  last_gsi = gsi_last_bb (exit_bb);
+	  gimple_seq new_stmts = NULL;
+	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+	  /* Exit_bb shouldn't be empty.  */
+	  if (!gsi_end_p (last_gsi))
+	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
+	  else
+	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
+
+	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
+
+	  for (edge exit : alt_exits)
+	    adjust_phi_and_debug_stmts (phi1, exit,
+					build_int_cst (TREE_TYPE (step_expr),
+						       vf));
+	  ivtmp = gimple_phi_result (phi1);
+	}
+    }
+
+  return ivtmp;
 }
 
 /* Return a gimple value containing the misalignment (measured in vector
@@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
 
 /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
    this function searches for the corresponding lcssa phi node in exit
-   bb of LOOP.  If it is found, return the phi result; otherwise return
-   NULL.  */
+   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
+   return the phi result; otherwise return NULL.  */
 
 static tree
 find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
-		gphi *lcssa_phi)
+		gphi *lcssa_phi, int lcssa_edge = 0)
 {
   gphi_iterator gsi;
   edge e = loop->vec_loop_iv;
 
-  gcc_assert (single_pred_p (e->dest));
   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gphi *phi = gsi.phi ();
-      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
-			   PHI_ARG_DEF (lcssa_phi, 0), 0))
-	return PHI_RESULT (phi);
-    }
-  return NULL_TREE;
-}
-
-/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates FIRST/SECOND
-   from SECOND/FIRST and puts it at the original loop's preheader/exit
-   edge, the two loops are arranged as below:
-
-       preheader_a:
-     first_loop:
-       header_a:
-	 i_1 = PHI<i_0, i_2>;
-	 ...
-	 i_2 = i_1 + 1;
-	 if (cond_a)
-	   goto latch_a;
-	 else
-	   goto between_bb;
-       latch_a:
-	 goto header_a;
-
-       between_bb:
-	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
-
-     second_loop:
-       header_b:
-	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
-				 or with i_2 if no LCSSA phi is created
-				 under condition of CREATE_LCSSA_FOR_IV_PHIS.
-	 ...
-	 i_4 = i_3 + 1;
-	 if (cond_b)
-	   goto latch_b;
-	 else
-	   goto exit_bb;
-       latch_b:
-	 goto header_b;
-
-       exit_bb:
-
-   This function creates loop closed SSA for the first loop; update the
-   second loop's PHI nodes by replacing argument on incoming edge with the
-   result of newly created lcssa PHI nodes.  IF CREATE_LCSSA_FOR_IV_PHIS
-   is false, Loop closed ssa phis will only be created for non-iv phis for
-   the first loop.
-
-   This function assumes exit bb of the first loop is preheader bb of the
-   second loop, i.e, between_bb in the example code.  With PHIs updated,
-   the second loop will execute rest iterations of the first.  */
-
-static void
-slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
-				   class loop *first, class loop *second,
-				   bool create_lcssa_for_iv_phis)
-{
-  gphi_iterator gsi_update, gsi_orig;
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-
-  edge first_latch_e = EDGE_SUCC (first->latch, 0);
-  edge second_preheader_e = loop_preheader_edge (second);
-  basic_block between_bb = single_exit (first)->dest;
-
-  gcc_assert (between_bb == second_preheader_e->src);
-  gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
-  /* Either the first loop or the second is the loop to be vectorized.  */
-  gcc_assert (loop == first || loop == second);
-
-  for (gsi_orig = gsi_start_phis (first->header),
-       gsi_update = gsi_start_phis (second->header);
-       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
-       gsi_next (&gsi_orig), gsi_next (&gsi_update))
-    {
-      gphi *orig_phi = gsi_orig.phi ();
-      gphi *update_phi = gsi_update.phi ();
-
-      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
-      /* Generate lcssa PHI node for the first loop.  */
-      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
-      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
-      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
+      /* Nested loops with multiple exits can have different no# phi node
+	 arguments between the main loop and epilog as epilog falls to the
+	 second loop.  */
+      if (gimple_phi_num_args (phi) > e->dest_idx)
 	{
-	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
-	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcssa_phi, arg, single_exit (first), UNKNOWN_LOCATION);
-	  arg = new_res;
-	}
-
-      /* Update PHI node in the second loop by replacing arg on the loop's
-	 incoming edge.  */
-      adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg);
-    }
-
-  /* For epilogue peeling we have to make sure to copy all LC PHIs
-     for correct vectorization of live stmts.  */
-  if (loop == first)
-    {
-      basic_block orig_exit = single_exit (second)->dest;
-      for (gsi_orig = gsi_start_phis (orig_exit);
-	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
-	{
-	  gphi *orig_phi = gsi_orig.phi ();
-	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
-	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p  (orig_arg))
-	    continue;
-
-	  /* Already created in the above loop.   */
-	  if (find_guard_arg (first, second, orig_phi))
+	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
+	  if (TREE_CODE (var) != SSA_NAME)
 	    continue;
 
-	  tree new_res = copy_ssa_name (orig_arg);
-	  gphi *lcphi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcphi, orig_arg, single_exit (first), UNKNOWN_LOCATION);
+	  if (operand_equal_p (get_current_def (var),
+			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
+	    return PHI_RESULT (phi);
 	}
     }
+  return NULL_TREE;
 }
 
 /* Function slpeel_add_loop_guard adds guard skipping from the beginning
@@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
   gcc_assert (single_succ_p (merge_bb));
   edge e = single_succ_edge (merge_bb);
   basic_block exit_bb = e->dest;
-  gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
 
   for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gphi *update_phi = gsi.phi ();
-      tree old_arg = PHI_ARG_DEF (update_phi, 0);
+      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
 
       tree merge_arg = NULL_TREE;
 
@@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
       if (!merge_arg)
 	merge_arg = old_arg;
 
-      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
+      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e->dest_idx);
       /* If the var is live after loop but not a reduction, we simply
 	 use the old arg.  */
       if (!guard_arg)
@@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
     }
 }
 
-/* EPILOG loop is duplicated from the original loop for vectorizing,
-   the arg of its loop closed ssa PHI needs to be updated.  */
-
-static void
-slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
-{
-  gphi_iterator gsi;
-  basic_block exit_bb = single_exit (epilog)->dest;
-
-  gcc_assert (single_pred_p (exit_bb));
-  edge e = EDGE_PRED (exit_bb, 0);
-  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
-    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
-}
-
 /* EPILOGUE_VINFO is an epilogue loop that we now know would need to
    iterate exactly CONST_NITERS times.  Make a final decision about
    whether the epilogue loop should be used, returning true if so.  */
@@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
     bound_epilog += vf - 1;
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
     bound_epilog += 1;
+  /* For early breaks the scalar loop needs to execute at most VF times
+     to find the element that caused the break.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      bound_epilog = vf;
+      /* Force a scalar epilogue as we can't vectorize the index finding.  */
+      vect_epilogues = false;
+    }
   bool epilog_peeling = maybe_ne (bound_epilog, 0U);
   poly_uint64 bound_scalar = bound_epilog;
 
@@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 				  bound_prolog + bound_epilog)
 		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
 			 || vect_epilogues));
+
+  /* We only support early break vectorization on known bounds at this time.
+     This means that if the vector loop can't be entered then we won't generate
+     it at all.  So for now force skip_vector off because the additional control
+     flow messes with the BB exits and we've already analyzed them.  */
+ skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
+
   /* Epilog loop must be executed if the number of iterations for epilog
      loop is known at compile time, otherwise we need to add a check at
      the end of vector loop and skip to the end of epilog loop.  */
   bool skip_epilog = (prolog_peeling < 0
 		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
 		      || !vf.is_constant ());
-  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
-  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+  /* PEELING_FOR_GAPS and peeling for early breaks are special because epilog
+     loop must be executed.  */
+  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     skip_epilog = false;
-
   class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
   auto_vec<profile_count> original_counts;
   basic_block *original_bbs = NULL;
@@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
   if (prolog_peeling)
     {
       e = loop_preheader_edge (loop);
-      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
-
+      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
       /* Peel prolog and put it on preheader edge of loop.  */
-      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
+      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e,
+						       true);
       gcc_assert (prolog);
       prolog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
+
       first_loop = prolog;
       reset_original_copy_tables ();
 
@@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 as the transformations mentioned above make less or no sense when not
 	 vectorizing.  */
       epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
-      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
+      auto_vec<basic_block> doms;
+      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e, true,
+						       &doms);
       gcc_assert (epilog);
 
       epilog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
 
       /* Scalar version loop may be preferred.  In this case, add guard
 	 and skip to epilog.  Note this only happens when the number of
@@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
 					update_e);
 
+      /* For early breaks we must create a guard to check how many iterations
+	 of the scalar loop are yet to be performed.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  tree ivtmp =
+	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters,
+					       *niters_vector, update_e);
+
+	  gcc_assert (ivtmp);
+	  tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
+					 fold_convert (TREE_TYPE (niters),
+						       ivtmp),
+					 build_zero_cst (TREE_TYPE (niters)));
+	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+
+	  /* If we had a fallthrough edge, the guard will the threaded through
+	     and so we may need to find the actual final edge.  */
+	  edge final_edge = epilog->vec_loop_iv;
+	  /* slpeel_update_phi_nodes_for_guard2 expects an empty block in
+	     between the guard and the exit edge.  It only adds new nodes and
+	     doesn't update existing one in the current scheme.  */
+	  basic_block guard_to = split_edge (final_edge);
+	  edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
+						guard_bb, prob_epilog.invert (),
+						irred_flag);
+	  doms.safe_push (guard_bb);
+
+	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+
+	  /* We must update all the edges from the new guard_bb.  */
+	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
+					      final_edge);
+
+	  /* If the loop was versioned we'll have an intermediate BB between
+	     the guard and the exit.  This intermediate block is required
+	     because in the current scheme of things the guard block phi
+	     updating can only maintain LCSSA by creating new blocks.  In this
+	     case we just need to update the uses in this block as well.  */
+	  if (loop != scalar_loop)
+	    {
+	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
+		   !gsi_end_p (gsi); gsi_next (&gsi))
+		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), guard_e));
+	    }
+
+	  flush_pending_stmts (guard_e);
+	}
+
       if (skip_epilog)
 	{
 	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
@@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	    }
 	  scale_loop_profile (epilog, prob_epilog, 0);
 	}
-      else
-	slpeel_update_phi_nodes_for_lcssa (epilog);
 
       unsigned HOST_WIDE_INT bound;
       if (bound_scalar.is_constant (&bound))
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49f21421958e7ee25eada 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
+    non_break_control_flow (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
     th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
 					  (loop_vinfo));
 
+  /* When we have multiple exits and VF is unknown, we must require partial
+     vectors because the loop bounds is not a minimum but a maximum.  That is to
+     say we cannot unpredicate the main loop unless we peel or use partial
+     vectors in the epilogue.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
+    return true;
+
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
     {
@@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
   loop_vinfo->scalar_costs->finish_cost (nullptr);
 }
 
-
 /* Function vect_analyze_loop_form.
 
    Verify that certain CFG restrictions hold, including:
    - the loop has a pre-header
-   - the loop has a single entry and exit
+   - the loop has a single entry
+   - nested loops can have only a single exit.
    - the loop exit condition is simple enough
    - the number of iterations can be analyzed, i.e, a countable loop.  The
      niter could be analyzed under some assumptions.  */
@@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
                            |
                         (exit-bb)  */
 
-      if (loop->num_nodes != 2)
-	return opt_result::failure_at (vect_location,
-				       "not vectorized:"
-				       " control flow in loop.\n");
-
       if (empty_block_p (loop->header))
 	return opt_result::failure_at (vect_location,
 				       "not vectorized: empty loop.\n");
@@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
         dump_printf_loc (MSG_NOTE, vect_location,
 			 "Considering outer-loop vectorization.\n");
       info->inner_loop_cond = inner.loop_cond;
+
+      if (!single_exit (loop))
+	return opt_result::failure_at (vect_location,
+				       "not vectorized: multiple exits.\n");
+
     }
 
-  if (!single_exit (loop))
-    return opt_result::failure_at (vect_location,
-				   "not vectorized: multiple exits.\n");
   if (EDGE_COUNT (loop->header->preds) != 2)
     return opt_result::failure_at (vect_location,
 				   "not vectorized:"
@@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				   "not vectorized: latch block not empty.\n");
 
   /* Make sure the exit is not abnormal.  */
-  edge e = single_exit (loop);
-  if (e->flags & EDGE_ABNORMAL)
-    return opt_result::failure_at (vect_location,
-				   "not vectorized:"
-				   " abnormal loop exit edge.\n");
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  edge nexit = loop->vec_loop_iv;
+  for (edge e : exits)
+    {
+      if (e->flags & EDGE_ABNORMAL)
+	return opt_result::failure_at (vect_location,
+				       "not vectorized:"
+				       " abnormal loop exit edge.\n");
+      /* Early break BB must be after the main exit BB.  In theory we should
+	 be able to vectorize the inverse order, but the current flow in the
+	 the vectorizer always assumes you update successor PHI nodes, not
+	 preds.  */
+      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e->src))
+	return opt_result::failure_at (vect_location,
+				       "not vectorized:"
+				       " abnormal loop exit edge order.\n");
+    }
+
+  /* We currently only support early exit loops with known bounds.   */
+  if (exits.length () > 1)
+    {
+      class tree_niter_desc niter;
+      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter, NULL)
+	  || chrec_contains_undetermined (niter.niter)
+	  || !evolution_function_is_constant_p (niter.niter))
+	return opt_result::failure_at (vect_location,
+				       "not vectorized:"
+				       " early breaks only supported on loops"
+				       " with known iteration bounds.\n");
+    }
 
   info->conds
     = vect_get_loop_niters (loop, &info->assumptions,
@@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
   LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
   LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
 
+  /* Check to see if we're vectorizing multiple exits.  */
+  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
+
   if (info->inner_loop_cond)
     {
       stmt_vec_info inner_loop_cond_info
@@ -3070,7 +3106,8 @@ start_over:
 
   /* If an epilogue loop is required make sure we can create one.  */
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
-      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
+      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
@@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
   basic_block exit_bb;
   tree scalar_dest;
   tree scalar_type;
-  gimple *new_phi = NULL, *phi;
+  gimple *new_phi = NULL, *phi = NULL;
   gimple_stmt_iterator exit_gsi;
   tree new_temp = NULL_TREE, new_name, new_scalar_dest;
   gimple *epilog_stmt = NULL;
@@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
+
+	/* Update the other exits.  */
+	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	  {
+	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
+	    gphi_iterator gsi, gsi1;
+	    for (edge exit : alt_exits)
+	      {
+		/* Find the phi node to propaget into the exit block for each
+		   exit edge.  */
+		for (gsi = gsi_start_phis (exit_bb),
+		     gsi1 = gsi_start_phis (exit->src);
+		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
+		     gsi_next (&gsi), gsi_next (&gsi1))
+		  {
+		    /* There really should be a function to just get the number
+		       of phis inside a bb.  */
+		    if (phi && phi == gsi.phi ())
+		      {
+			gphi *phi1 = gsi1.phi ();
+			SET_PHI_ARG_DEF (phi, exit->dest_idx,
+					 PHI_RESULT (phi1));
+			break;
+		      }
+		  }
+	      }
+	  }
       gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
     }
 
@@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info *vinfo,
 	   new_tree = lane_extract <vec_lhs', ...>;
 	   lhs' = new_tree;  */
 
+      /* When vectorizing an early break, any live statement that is used
+	 outside of the loop are dead.  The loop will never get to them.
+	 We could change the liveness value during analysis instead but since
+	 the below code is invalid anyway just ignore it during codegen.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	return true;
+
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
       basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
       gcc_assert (single_pred_p (exit_bb));
@@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
   /* Make sure there exists a single-predecessor exit bb.  Do this before 
      versioning.   */
   edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
-  if (! single_pred_p (e->dest))
+  if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       split_loop_exit_edge (e, true);
       if (dump_enabled_p ())
@@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
   if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
     {
       e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
-      if (! single_pred_p (e->dest))
+      if (e && ! single_pred_p (e->dest))
 	{
 	  split_loop_exit_edge (e, true);
 	  if (dump_enabled_p ())
@@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 
   /* Loops vectorized with a variable factor won't benefit from
      unrolling/peeling.  */
-  if (!vf.is_constant ())
+  if (!vf.is_constant ()
+      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       loop->unroll = 1;
       if (dump_enabled_p ())
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82afe78ee2a7c627be45b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
-  if (is_ctrl_stmt (stmt_info->stmt)
-      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
-    *relevant = vect_used_in_scope;
+  if (is_ctrl_stmt (stmt_info->stmt))
+    {
+      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge, but
+	 it looks like loop_manip doesn't do that..  So we have to do it
+	 the hard way.  */
+      basic_block bb = gimple_bb (stmt_info->stmt);
+      bool exit_bb = false, early_exit = false;
+      edge_iterator ei;
+      edge e;
+      FOR_EACH_EDGE (e, ei, bb->succs)
+        if (!flow_bb_inside_loop_p (loop, e->dest))
+	  {
+	    exit_bb = true;
+	    early_exit = loop->vec_loop_iv->src != bb;
+	    break;
+	  }
+
+      /* We should have processed any exit edge, so an edge not an early
+	 break must be a loop IV edge.  We need to distinguish between the
+	 two as we don't want to generate code for the main loop IV.  */
+      if (exit_bb)
+	{
+	  if (early_exit)
+	    *relevant = vect_used_in_scope;
+	}
+      else if (bb->loop_father == loop)
+	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;
+    }
 
   /* changing memory.  */
   if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
@@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	*relevant = vect_used_in_scope;
       }
 
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  auto_bitmap exit_bbs;
+  for (edge exit : exits)
+    bitmap_set_bit (exit_bbs, exit->dest->index);
+
   /* uses outside the loop.  */
   FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter, SSA_OP_DEF)
     {
@@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	      /* We expect all such uses to be in the loop exit phis
 		 (because of loop closed form)   */
 	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
-	      gcc_assert (bb == single_exit (loop)->dest);
+	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
 
               *live_p = true;
 	    }
@@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
 	}
     }
 
+  /* Ideally this should be in vect_analyze_loop_form but we haven't seen all
+     the conds yet at that point and there's no quick way to retrieve them.  */
+  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
+    return opt_result::failure_at (vect_location,
+				   "not vectorized:"
+				   " unsupported control flow in loop.\n");
+
   /* 2. Process_worklist */
   while (worklist.length () > 0)
     {
@@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
 			return res;
 		    }
                  }
+	    }
+	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
+	    {
+	      enum tree_code rhs_code = gimple_cond_code (cond);
+	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
+	      opt_result res
+		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
+			       loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
+	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
+				loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
             }
 	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
 	    {
@@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
 			     node_instance, cost_vec);
       if (!res)
 	return res;
-   }
+    }
+
+  if (is_ctrl_stmt (stmt_info->stmt))
+    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
 
   switch (STMT_VINFO_DEF_TYPE (stmt_info))
     {
       case vect_internal_def:
+      case vect_early_exit_def:
         break;
 
       case vect_reduction_def:
@@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
     {
       gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
       gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
+		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
 		  || (call && gimple_call_lhs (call) == NULL_TREE));
       *need_to_vectorize = true;
     }
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b340baff61d18da8e242dd 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -63,6 +63,7 @@ enum vect_def_type {
   vect_internal_def,
   vect_induction_def,
   vect_reduction_def,
+  vect_early_exit_def,
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
@@ -876,6 +877,13 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
+  /* When the loop has a non-early break control flow inside.  */
+  bool non_break_control_flow;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -985,9 +993,11 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
 #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
 #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
 #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
+#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)->non_break_control_flow
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
@@ -1038,8 +1048,8 @@ public:
    stack.  */
 typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
 
-inline loop_vec_info
-loop_vec_info_for_loop (class loop *loop)
+static inline loop_vec_info
+loop_vec_info_for_loop (const class loop *loop)
 {
   return (loop_vec_info) loop->aux;
 }
@@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
 {
   if (bb == (bb->loop_father)->header)
     return true;
-  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
+
   return false;
 }
 
@@ -2176,9 +2186,10 @@ class auto_purge_vect_location
    in tree-vect-loop-manip.cc.  */
 extern void vect_set_loop_condition (class loop *, loop_vec_info,
 				     tree, tree, tree, bool);
-extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge);
+extern bool slpeel_can_duplicate_loop_p (const loop_vec_info, const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
-						     class loop *, edge);
+						    class loop *, edge, bool,
+						    vec<basic_block> * = NULL);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,
diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c67f29f5dc9a8d72235f4 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
 	 predicates that need to be shared for optimal predicate usage.
 	 However reassoc will re-order them and prevent CSE from working
 	 as it should.  CSE only the loop body, not the entry.  */
-      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
+      auto_vec<edge> exits = get_loop_exit_edges (loop);
+      for (edge exit : exits)
+	bitmap_set_bit (exit_bbs, exit->dest->index);
 
       edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
       do_rpo_vn (fun, entry, exit_bbs);




-- 

[-- Attachment #2: rb17506.patch --]
[-- Type: text/plain, Size: 52165 bytes --]

diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40d833a968314a4442b9e 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
   edge preheader = loop_preheader_edge (loop);
 
   initialize_original_copy_tables ();
-  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
+  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader, false);
   gcc_assert (res != NULL);
 
   /* When a not last partition is supposed to keep the LC PHIs computed
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 5d398b67e68c7076760854119590f18b19c622b6..79686f6c4945b7139ba377300430c04b7aeefe6c 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -3072,7 +3072,12 @@ loop_only_exit_p (const class loop *loop, basic_block *body, const_edge exit)
   gimple_stmt_iterator bsi;
   unsigned i;
 
-  if (exit != single_exit (loop))
+  /* We need to check for alternative exits since exit can be NULL.  */
+  auto exits = get_loop_exit_edges (loop);
+  if (exits.length () != 1)
+    return false;
+
+  if (exit != exits[0])
     return false;
 
   for (i = 0; i < loop->num_nodes; i++)
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 6b93fb3f9af8f2bbdf5dec28f0009177aa5171ab..550d7f40002cf0b58f8a927cb150edd7c2aa9999 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple *update_phi, edge e, tree new_def)
 {
   tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
 
+  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
+	      || orig_def != new_def);
+
   SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
 
   if (MAY_HAVE_DEBUG_BIND_STMTS)
@@ -1292,7 +1295,8 @@ vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
 
   /* Record the number of latch iterations.  */
-  if (limit == niters)
+  if (limit == niters
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     /* Case A: the loop iterates NITERS times.  Subtract one to get the
        latch count.  */
     loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
@@ -1303,7 +1307,13 @@ vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
     loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type,
 				       limit, step);
 
-  if (final_iv)
+  /* For multiple exits we've already maintained LCSSA form and handled
+     the scalar iteration update in the code that deals with the merge
+     block and its updated guard.  I could move that code here instead
+     of in vect_update_ivs_after_early_break but I have to still deal
+     with the updates to the counter `i`.  So for now I'll keep them
+     together.  */
+  if (final_iv && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       gassign *assign;
       edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo);
@@ -1509,11 +1519,19 @@ vec_init_exit_info (class loop *loop)
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
    basic blocks from SCALAR_LOOP instead of LOOP, but to either the
-   entry or exit of LOOP.  */
+   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to SCALAR_LOOP as a
+   continuation.  This is correct for cases where one loop continues from the
+   other like in the vectorizer, but not true for uses in e.g. loop distribution
+   where the loop is duplicated and then modified.
+
+   If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
+   dominators were updated during the peeling.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
-					class loop *scalar_loop, edge e)
+					class loop *scalar_loop, edge e,
+					bool flow_loops,
+					vec<basic_block> *updated_doms)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1602,6 +1620,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
   for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++)
     rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
 
+  /* Rename the exit uses.  */
+  for (edge exit : get_loop_exit_edges (new_loop))
+    for (auto gsi = gsi_start_phis (exit->dest);
+	 !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit);
+	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit));
+	if (MAY_HAVE_DEBUG_BIND_STMTS)
+	  adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest);
+      }
+
+  /* This condition happens when the loop has been versioned. e.g. due to ifcvt
+     versioning the loop.  */
   if (scalar_loop != loop)
     {
       /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from
@@ -1616,28 +1647,106 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
 						EDGE_SUCC (loop->latch, 0));
     }
 
+  vec<edge> alt_exits = loop->vec_loop_alt_exits;
+  bool multiple_exits_p = !alt_exits.is_empty ();
+  auto_vec<basic_block> doms;
+  class loop *update_loop = NULL;
+
   if (at_exit) /* Add the loop copy at exit.  */
     {
-      if (scalar_loop != loop)
+      if (scalar_loop != loop && new_exit->dest != exit_dest)
 	{
-	  gphi_iterator gsi;
 	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
+	  flush_pending_stmts (new_exit);
+	}
 
-	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
-	       gsi_next (&gsi))
+      auto loop_exits = get_loop_exit_edges (loop);
+      for (edge exit : loop_exits)
+	redirect_edge_and_branch (exit, new_preheader);
+
+
+      /* Copy the current loop LC PHI nodes between the original loop exit
+	 block and the new loop header.  This allows us to later split the
+	 preheader block and still find the right LC nodes.  */
+      edge latch_new = single_succ_edge (new_preheader);
+      edge latch_old = loop_latch_edge (loop);
+      hash_set <tree> lcssa_vars;
+      for (auto gsi_from = gsi_start_phis (latch_old->dest),
+	   gsi_to = gsi_start_phis (latch_new->dest);
+	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	   gsi_next (&gsi_from), gsi_next (&gsi_to))
+	{
+	  gimple *from_phi = gsi_stmt (gsi_from);
+	  gimple *to_phi = gsi_stmt (gsi_to);
+	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, latch_old);
+	  /* In all cases, even in early break situations we're only
+	     interested in the number of fully executed loop iters.  As such
+	     we discard any partially done iteration.  So we simply propagate
+	     the phi nodes from the latch to the merge block.  */
+	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
+
+	  lcssa_vars.add (new_arg);
+
+	  /* Main loop exit should use the final iter value.  */
+	  add_phi_arg (lcssa_phi, new_arg, loop->vec_loop_iv, UNKNOWN_LOCATION);
+
+	  /* All other exits use the previous iters.  */
+	  for (edge e : alt_exits)
+	    add_phi_arg (lcssa_phi, gimple_phi_result (from_phi), e,
+			 UNKNOWN_LOCATION);
+
+	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
+	}
+
+      /* Copy over any live SSA vars that may not have been materialized in the
+	 loops themselves but would be in the exit block.  However when the live
+	 value is not used inside the loop then we don't need to do this,  if we do
+	 then when we split the guard block the branch edge can end up containing the
+	 wrong reference,  particularly if it shares an edge with something that has
+	 bypassed the loop.  This is not something peeling can check so we need to
+	 anticipate the usage of the live variable here.  */
+      auto exit_map = redirect_edge_var_map_vector (exit);
+      if (exit_map)
+        for (auto vm : exit_map)
+	{
+	  if (lcssa_vars.contains (vm.def)
+	      || TREE_CODE (vm.def) != SSA_NAME)
+	    continue;
+
+	  imm_use_iterator imm_iter;
+	  use_operand_p use_p;
+	  bool use_in_loop = false;
+
+	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def)
 	    {
-	      gphi *phi = gsi.phi ();
-	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
-	      location_t orig_locus
-		= gimple_phi_arg_location_from_edge (phi, e);
+	      basic_block bb = gimple_bb (USE_STMT (use_p));
+	      if (flow_bb_inside_loop_p (loop, bb)
+		  && !gimple_vuse (USE_STMT (use_p)))
+		{
+		  use_in_loop = true;
+		  break;
+		}
+	    }
 
-	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
+	  if (!use_in_loop)
+	    {
+	       /* Do a final check to see if it's perhaps defined in the loop.  This
+		  mirrors the relevancy analysis's used_outside_scope.  */
+	      gimple *stmt = SSA_NAME_DEF_STMT (vm.def);
+	      if (!stmt || !flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
+		continue;
 	    }
+
+	  tree new_res = copy_ssa_name (vm.result);
+	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
+	  for (edge exit : loop_exits)
+	     add_phi_arg (lcssa_phi, vm.def, exit, vm.locus);
 	}
-      redirect_edge_and_branch_force (e, new_preheader);
-      flush_pending_stmts (e);
+
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
-      if (was_imm_dom || duplicate_outer_loop)
+
+      if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p)
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
 
       /* And remove the non-necessary forwarder again.  Keep the other
@@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
       delete_basic_block (preheader);
       set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
 			       loop_preheader_edge (scalar_loop)->src);
+
+      /* Finally after wiring the new epilogue we need to update its main exit
+	 to the original function exit we recorded.  Other exits are already
+	 correct.  */
+      if (multiple_exits_p)
+	{
+	  for (edge e : get_loop_exit_edges (loop))
+	    doms.safe_push (e->dest);
+	  update_loop = new_loop;
+	  doms.safe_push (exit_dest);
+
+	  /* Likely a fall-through edge, so update if needed.  */
+	  if (single_succ_p (exit_dest))
+	    doms.safe_push (single_succ (exit_dest));
+	}
     }
   else /* Add the copy at entry.  */
     {
+      /* Copy the current loop LC PHI nodes between the original loop exit
+	 block and the new loop header.  This allows us to later split the
+	 preheader block and still find the right LC nodes.  */
+      edge old_latch_loop = loop_latch_edge (loop);
+      edge old_latch_init = loop_preheader_edge (loop);
+      edge new_latch_loop = loop_latch_edge (new_loop);
+      edge new_latch_init = loop_preheader_edge (new_loop);
+      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),
+	   gsi_to = gsi_start_phis (old_latch_loop->dest);
+	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	   gsi_next (&gsi_from), gsi_next (&gsi_to))
+	{
+	  gimple *from_phi = gsi_stmt (gsi_from);
+	  gimple *to_phi = gsi_stmt (gsi_to);
+	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, new_latch_loop);
+	  adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg);
+	}
+
       if (scalar_loop != loop)
 	{
 	  /* Remove the non-necessary forwarder of scalar_loop again.  */
@@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
       delete_basic_block (new_preheader);
       set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
 			       loop_preheader_edge (new_loop)->src);
+
+      if (multiple_exits_p)
+	update_loop = loop;
     }
 
-  if (scalar_loop != loop)
+  if (multiple_exits_p)
     {
-      /* Update new_loop->header PHIs, so that on the preheader
-	 edge they are the ones from loop rather than scalar_loop.  */
-      gphi_iterator gsi_orig, gsi_new;
-      edge orig_e = loop_preheader_edge (loop);
-      edge new_e = loop_preheader_edge (new_loop);
-
-      for (gsi_orig = gsi_start_phis (loop->header),
-	   gsi_new = gsi_start_phis (new_loop->header);
-	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
-	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
+      for (edge e : get_loop_exit_edges (update_loop))
 	{
-	  gphi *orig_phi = gsi_orig.phi ();
-	  gphi *new_phi = gsi_new.phi ();
-	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
-	  location_t orig_locus
-	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
-
-	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
+	  edge ex;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
+	    {
+	      /* Find the first non-fallthrough block as fall-throughs can't
+		 dominate other blocks.  */
+	      while ((ex->flags & EDGE_FALLTHRU)
+		     && single_succ_p (ex->dest))
+		{
+		  doms.safe_push (ex->dest);
+		  ex = single_succ_edge (ex->dest);
+		}
+	      doms.safe_push (ex->dest);
+	    }
+	  doms.safe_push (e->dest);
 	}
-    }
 
+      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+      if (updated_doms)
+	updated_doms->safe_splice (doms);
+    }
   free (new_bbs);
   free (bbs);
 
@@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo, const_edge e)
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
   unsigned int num_bb = loop->inner? 5 : 2;
 
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
+
   /* All loops have an outer scope; the only case loop->outer is NULL is for
      the function itself.  */
   if (!loop_outer (loop)
@@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
 
+  /* For early exits we'll update the IVs in
+     vect_update_ivs_after_early_break.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return;
+
   basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   /* Make sure there exists a single-predecessor exit bb:  */
@@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       /* Fix phi expressions in the successor bb.  */
       adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
     }
+  return;
+}
+
+/*   Function vect_update_ivs_after_early_break.
+
+     "Advance" the induction variables of LOOP to the value they should take
+     after the execution of LOOP.  This is currently necessary because the
+     vectorizer does not handle induction variables that are used after the
+     loop.  Such a situation occurs when the last iterations of LOOP are
+     peeled, because of the early exit.  With an early exit we always peel the
+     loop.
+
+     Input:
+     - LOOP_VINFO - a loop info structure for the loop that is going to be
+		    vectorized. The last few iterations of LOOP were peeled.
+     - LOOP - a loop that is going to be vectorized. The last few iterations
+	      of LOOP were peeled.
+     - VF - The loop vectorization factor.
+     - NITERS_ORIG - the number of iterations that LOOP executes (before it is
+		     vectorized). i.e, the number of times the ivs should be
+		     bumped.
+     - NITERS_VECTOR - The number of iterations that the vector LOOP executes.
+     - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
+		  coming out from LOOP on which there are uses of the LOOP ivs
+		  (this is the path from LOOP->exit to epilog_loop->preheader).
+
+		  The new definitions of the ivs are placed in LOOP->exit.
+		  The phi args associated with the edge UPDATE_E in the bb
+		  UPDATE_E->dest are updated accordingly.
+
+     Output:
+       - If available, the LCSSA phi node for the loop IV temp.
+
+     Assumption 1: Like the rest of the vectorizer, this function assumes
+     a single loop exit that has a single predecessor.
+
+     Assumption 2: The phi nodes in the LOOP header and in update_bb are
+     organized in the same order.
+
+     Assumption 3: The access function of the ivs is simple enough (see
+     vect_can_advance_ivs_p).  This assumption will be relaxed in the future.
+
+     Assumption 4: Exactly one of the successors of LOOP exit-bb is on a path
+     coming out of LOOP on which the ivs of LOOP are used (this is the path
+     that leads to the epilog loop; other paths skip the epilog loop).  This
+     path starts with the edge UPDATE_E, and its destination (denoted update_bb)
+     needs to have its phis updated.
+ */
+
+static tree
+vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class loop * epilog,
+				   poly_int64 vf, tree niters_orig,
+				   tree niters_vector, edge update_e)
+{
+  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return NULL;
+
+  gphi_iterator gsi, gsi1;
+  tree ni_name, ivtmp = NULL;
+  basic_block update_bb = update_e->dest;
+  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
+  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  basic_block exit_bb = loop_iv->dest;
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
+
+  gcc_assert (cond);
+
+  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
+       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
+       gsi_next (&gsi), gsi_next (&gsi1))
+    {
+      tree init_expr, final_expr, step_expr;
+      tree type;
+      tree var, ni, off;
+      gimple_stmt_iterator last_gsi;
+
+      gphi *phi = gsi1.phi ();
+      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (epilog));
+      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
+      if (!phi1)
+	continue;
+      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "vect_update_ivs_after_early_break: phi: %G",
+			 (gimple *)phi);
+
+      /* Skip reduction and virtual phis.  */
+      if (!iv_phi_p (phi_info))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "reduc or virtual phi. skip.\n");
+	  continue;
+	}
+
+      /* For multiple exits where we handle early exits we need to carry on
+	 with the previous IV as loop iteration was not done because we exited
+	 early.  As such just grab the original IV.  */
+      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge (loop));
+      if (gimple_cond_lhs (cond) != phi_ssa
+	  && gimple_cond_rhs (cond) != phi_ssa)
+	{
+	  type = TREE_TYPE (gimple_phi_result (phi));
+	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
+	  step_expr = unshare_expr (step_expr);
+
+	  /* We previously generated the new merged phi in the same BB as the
+	     guard.  So use that to perform the scaling on rather than the
+	     normal loop phi which don't take the early breaks into account.  */
+	  final_expr = gimple_phi_result (phi1);
+	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_preheader_edge (loop));
+
+	  tree stype = TREE_TYPE (step_expr);
+	  /* For early break the final loop IV is:
+	     init + (final - init) * vf which takes into account peeling
+	     values and non-single steps.  */
+	  off = fold_build2 (MINUS_EXPR, stype,
+			     fold_convert (stype, final_expr),
+			     fold_convert (stype, init_expr));
+	  /* Now adjust for VF to get the final iteration value.  */
+	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf));
+
+	  /* Adjust the value with the offset.  */
+	  if (POINTER_TYPE_P (type))
+	    ni = fold_build_pointer_plus (init_expr, off);
+	  else
+	    ni = fold_convert (type,
+			       fold_build2 (PLUS_EXPR, stype,
+					    fold_convert (stype, init_expr),
+					    off));
+	  var = create_tmp_var (type, "tmp");
+
+	  last_gsi = gsi_last_bb (exit_bb);
+	  gimple_seq new_stmts = NULL;
+	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+	  /* Exit_bb shouldn't be empty.  */
+	  if (!gsi_end_p (last_gsi))
+	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
+	  else
+	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
+
+	  /* Fix phi expressions in the successor bb.  */
+	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
+	}
+      else
+	{
+	  type = TREE_TYPE (gimple_phi_result (phi));
+	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
+	  step_expr = unshare_expr (step_expr);
+
+	  /* We previously generated the new merged phi in the same BB as the
+	     guard.  So use that to perform the scaling on rather than the
+	     normal loop phi which don't take the early breaks into account.  */
+	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge (loop));
+	  tree stype = TREE_TYPE (step_expr);
+
+	  if (vf.is_constant ())
+	    {
+	      ni = fold_build2 (MULT_EXPR, stype,
+				fold_convert (stype,
+					      niters_vector),
+				build_int_cst (stype, vf));
+
+	      ni = fold_build2 (MINUS_EXPR, stype,
+				fold_convert (stype,
+					      niters_orig),
+				fold_convert (stype, ni));
+	    }
+	  else
+	    /* If the loop's VF isn't constant then the loop must have been
+	       masked, so at the end of the loop we know we have finished
+	       the entire loop and found nothing.  */
+	    ni = build_zero_cst (stype);
+
+	  ni = fold_convert (type, ni);
+	  /* We don't support variable n in this version yet.  */
+	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
+
+	  var = create_tmp_var (type, "tmp");
+
+	  last_gsi = gsi_last_bb (exit_bb);
+	  gimple_seq new_stmts = NULL;
+	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+	  /* Exit_bb shouldn't be empty.  */
+	  if (!gsi_end_p (last_gsi))
+	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
+	  else
+	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
+
+	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
+
+	  for (edge exit : alt_exits)
+	    adjust_phi_and_debug_stmts (phi1, exit,
+					build_int_cst (TREE_TYPE (step_expr),
+						       vf));
+	  ivtmp = gimple_phi_result (phi1);
+	}
+    }
+
+  return ivtmp;
 }
 
 /* Return a gimple value containing the misalignment (measured in vector
@@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
 
 /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
    this function searches for the corresponding lcssa phi node in exit
-   bb of LOOP.  If it is found, return the phi result; otherwise return
-   NULL.  */
+   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
+   return the phi result; otherwise return NULL.  */
 
 static tree
 find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
-		gphi *lcssa_phi)
+		gphi *lcssa_phi, int lcssa_edge = 0)
 {
   gphi_iterator gsi;
   edge e = loop->vec_loop_iv;
 
-  gcc_assert (single_pred_p (e->dest));
   for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gphi *phi = gsi.phi ();
-      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
-			   PHI_ARG_DEF (lcssa_phi, 0), 0))
-	return PHI_RESULT (phi);
-    }
-  return NULL_TREE;
-}
-
-/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates FIRST/SECOND
-   from SECOND/FIRST and puts it at the original loop's preheader/exit
-   edge, the two loops are arranged as below:
-
-       preheader_a:
-     first_loop:
-       header_a:
-	 i_1 = PHI<i_0, i_2>;
-	 ...
-	 i_2 = i_1 + 1;
-	 if (cond_a)
-	   goto latch_a;
-	 else
-	   goto between_bb;
-       latch_a:
-	 goto header_a;
-
-       between_bb:
-	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
-
-     second_loop:
-       header_b:
-	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
-				 or with i_2 if no LCSSA phi is created
-				 under condition of CREATE_LCSSA_FOR_IV_PHIS.
-	 ...
-	 i_4 = i_3 + 1;
-	 if (cond_b)
-	   goto latch_b;
-	 else
-	   goto exit_bb;
-       latch_b:
-	 goto header_b;
-
-       exit_bb:
-
-   This function creates loop closed SSA for the first loop; update the
-   second loop's PHI nodes by replacing argument on incoming edge with the
-   result of newly created lcssa PHI nodes.  IF CREATE_LCSSA_FOR_IV_PHIS
-   is false, Loop closed ssa phis will only be created for non-iv phis for
-   the first loop.
-
-   This function assumes exit bb of the first loop is preheader bb of the
-   second loop, i.e, between_bb in the example code.  With PHIs updated,
-   the second loop will execute rest iterations of the first.  */
-
-static void
-slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
-				   class loop *first, class loop *second,
-				   bool create_lcssa_for_iv_phis)
-{
-  gphi_iterator gsi_update, gsi_orig;
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-
-  edge first_latch_e = EDGE_SUCC (first->latch, 0);
-  edge second_preheader_e = loop_preheader_edge (second);
-  basic_block between_bb = single_exit (first)->dest;
-
-  gcc_assert (between_bb == second_preheader_e->src);
-  gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
-  /* Either the first loop or the second is the loop to be vectorized.  */
-  gcc_assert (loop == first || loop == second);
-
-  for (gsi_orig = gsi_start_phis (first->header),
-       gsi_update = gsi_start_phis (second->header);
-       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
-       gsi_next (&gsi_orig), gsi_next (&gsi_update))
-    {
-      gphi *orig_phi = gsi_orig.phi ();
-      gphi *update_phi = gsi_update.phi ();
-
-      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
-      /* Generate lcssa PHI node for the first loop.  */
-      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
-      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
-      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
+      /* Nested loops with multiple exits can have different no# phi node
+	 arguments between the main loop and epilog as epilog falls to the
+	 second loop.  */
+      if (gimple_phi_num_args (phi) > e->dest_idx)
 	{
-	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
-	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcssa_phi, arg, single_exit (first), UNKNOWN_LOCATION);
-	  arg = new_res;
-	}
-
-      /* Update PHI node in the second loop by replacing arg on the loop's
-	 incoming edge.  */
-      adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg);
-    }
-
-  /* For epilogue peeling we have to make sure to copy all LC PHIs
-     for correct vectorization of live stmts.  */
-  if (loop == first)
-    {
-      basic_block orig_exit = single_exit (second)->dest;
-      for (gsi_orig = gsi_start_phis (orig_exit);
-	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
-	{
-	  gphi *orig_phi = gsi_orig.phi ();
-	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
-	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p  (orig_arg))
-	    continue;
-
-	  /* Already created in the above loop.   */
-	  if (find_guard_arg (first, second, orig_phi))
+	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
+	  if (TREE_CODE (var) != SSA_NAME)
 	    continue;
 
-	  tree new_res = copy_ssa_name (orig_arg);
-	  gphi *lcphi = create_phi_node (new_res, between_bb);
-	  add_phi_arg (lcphi, orig_arg, single_exit (first), UNKNOWN_LOCATION);
+	  if (operand_equal_p (get_current_def (var),
+			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
+	    return PHI_RESULT (phi);
 	}
     }
+  return NULL_TREE;
 }
 
 /* Function slpeel_add_loop_guard adds guard skipping from the beginning
@@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
   gcc_assert (single_succ_p (merge_bb));
   edge e = single_succ_edge (merge_bb);
   basic_block exit_bb = e->dest;
-  gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
 
   for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gphi *update_phi = gsi.phi ();
-      tree old_arg = PHI_ARG_DEF (update_phi, 0);
+      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
 
       tree merge_arg = NULL_TREE;
 
@@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
       if (!merge_arg)
 	merge_arg = old_arg;
 
-      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
+      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e->dest_idx);
       /* If the var is live after loop but not a reduction, we simply
 	 use the old arg.  */
       if (!guard_arg)
@@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
     }
 }
 
-/* EPILOG loop is duplicated from the original loop for vectorizing,
-   the arg of its loop closed ssa PHI needs to be updated.  */
-
-static void
-slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
-{
-  gphi_iterator gsi;
-  basic_block exit_bb = single_exit (epilog)->dest;
-
-  gcc_assert (single_pred_p (exit_bb));
-  edge e = EDGE_PRED (exit_bb, 0);
-  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
-    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
-}
-
 /* EPILOGUE_VINFO is an epilogue loop that we now know would need to
    iterate exactly CONST_NITERS times.  Make a final decision about
    whether the epilogue loop should be used, returning true if so.  */
@@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
     bound_epilog += vf - 1;
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
     bound_epilog += 1;
+  /* For early breaks the scalar loop needs to execute at most VF times
+     to find the element that caused the break.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      bound_epilog = vf;
+      /* Force a scalar epilogue as we can't vectorize the index finding.  */
+      vect_epilogues = false;
+    }
   bool epilog_peeling = maybe_ne (bound_epilog, 0U);
   poly_uint64 bound_scalar = bound_epilog;
 
@@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 				  bound_prolog + bound_epilog)
 		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
 			 || vect_epilogues));
+
+  /* We only support early break vectorization on known bounds at this time.
+     This means that if the vector loop can't be entered then we won't generate
+     it at all.  So for now force skip_vector off because the additional control
+     flow messes with the BB exits and we've already analyzed them.  */
+ skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
+
   /* Epilog loop must be executed if the number of iterations for epilog
      loop is known at compile time, otherwise we need to add a check at
      the end of vector loop and skip to the end of epilog loop.  */
   bool skip_epilog = (prolog_peeling < 0
 		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
 		      || !vf.is_constant ());
-  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
-  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+  /* PEELING_FOR_GAPS and peeling for early breaks are special because epilog
+     loop must be executed.  */
+  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     skip_epilog = false;
-
   class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
   auto_vec<profile_count> original_counts;
   basic_block *original_bbs = NULL;
@@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
   if (prolog_peeling)
     {
       e = loop_preheader_edge (loop);
-      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
-
+      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
       /* Peel prolog and put it on preheader edge of loop.  */
-      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
+      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e,
+						       true);
       gcc_assert (prolog);
       prolog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
+
       first_loop = prolog;
       reset_original_copy_tables ();
 
@@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 as the transformations mentioned above make less or no sense when not
 	 vectorizing.  */
       epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
-      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
+      auto_vec<basic_block> doms;
+      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e, true,
+						       &doms);
       gcc_assert (epilog);
 
       epilog->force_vectorize = false;
-      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
 
       /* Scalar version loop may be preferred.  In this case, add guard
 	 and skip to epilog.  Note this only happens when the number of
@@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
 					update_e);
 
+      /* For early breaks we must create a guard to check how many iterations
+	 of the scalar loop are yet to be performed.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  tree ivtmp =
+	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters,
+					       *niters_vector, update_e);
+
+	  gcc_assert (ivtmp);
+	  tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
+					 fold_convert (TREE_TYPE (niters),
+						       ivtmp),
+					 build_zero_cst (TREE_TYPE (niters)));
+	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+
+	  /* If we had a fallthrough edge, the guard will the threaded through
+	     and so we may need to find the actual final edge.  */
+	  edge final_edge = epilog->vec_loop_iv;
+	  /* slpeel_update_phi_nodes_for_guard2 expects an empty block in
+	     between the guard and the exit edge.  It only adds new nodes and
+	     doesn't update existing one in the current scheme.  */
+	  basic_block guard_to = split_edge (final_edge);
+	  edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
+						guard_bb, prob_epilog.invert (),
+						irred_flag);
+	  doms.safe_push (guard_bb);
+
+	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+
+	  /* We must update all the edges from the new guard_bb.  */
+	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
+					      final_edge);
+
+	  /* If the loop was versioned we'll have an intermediate BB between
+	     the guard and the exit.  This intermediate block is required
+	     because in the current scheme of things the guard block phi
+	     updating can only maintain LCSSA by creating new blocks.  In this
+	     case we just need to update the uses in this block as well.  */
+	  if (loop != scalar_loop)
+	    {
+	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
+		   !gsi_end_p (gsi); gsi_next (&gsi))
+		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), guard_e));
+	    }
+
+	  flush_pending_stmts (guard_e);
+	}
+
       if (skip_epilog)
 	{
 	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
@@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	    }
 	  scale_loop_profile (epilog, prob_epilog, 0);
 	}
-      else
-	slpeel_update_phi_nodes_for_lcssa (epilog);
 
       unsigned HOST_WIDE_INT bound;
       if (bound_scalar.is_constant (&bound))
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49f21421958e7ee25eada 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
+    non_break_control_flow (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
     th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
 					  (loop_vinfo));
 
+  /* When we have multiple exits and VF is unknown, we must require partial
+     vectors because the loop bounds is not a minimum but a maximum.  That is to
+     say we cannot unpredicate the main loop unless we peel or use partial
+     vectors in the epilogue.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
+    return true;
+
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
     {
@@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
   loop_vinfo->scalar_costs->finish_cost (nullptr);
 }
 
-
 /* Function vect_analyze_loop_form.
 
    Verify that certain CFG restrictions hold, including:
    - the loop has a pre-header
-   - the loop has a single entry and exit
+   - the loop has a single entry
+   - nested loops can have only a single exit.
    - the loop exit condition is simple enough
    - the number of iterations can be analyzed, i.e, a countable loop.  The
      niter could be analyzed under some assumptions.  */
@@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
                            |
                         (exit-bb)  */
 
-      if (loop->num_nodes != 2)
-	return opt_result::failure_at (vect_location,
-				       "not vectorized:"
-				       " control flow in loop.\n");
-
       if (empty_block_p (loop->header))
 	return opt_result::failure_at (vect_location,
 				       "not vectorized: empty loop.\n");
@@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
         dump_printf_loc (MSG_NOTE, vect_location,
 			 "Considering outer-loop vectorization.\n");
       info->inner_loop_cond = inner.loop_cond;
+
+      if (!single_exit (loop))
+	return opt_result::failure_at (vect_location,
+				       "not vectorized: multiple exits.\n");
+
     }
 
-  if (!single_exit (loop))
-    return opt_result::failure_at (vect_location,
-				   "not vectorized: multiple exits.\n");
   if (EDGE_COUNT (loop->header->preds) != 2)
     return opt_result::failure_at (vect_location,
 				   "not vectorized:"
@@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				   "not vectorized: latch block not empty.\n");
 
   /* Make sure the exit is not abnormal.  */
-  edge e = single_exit (loop);
-  if (e->flags & EDGE_ABNORMAL)
-    return opt_result::failure_at (vect_location,
-				   "not vectorized:"
-				   " abnormal loop exit edge.\n");
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  edge nexit = loop->vec_loop_iv;
+  for (edge e : exits)
+    {
+      if (e->flags & EDGE_ABNORMAL)
+	return opt_result::failure_at (vect_location,
+				       "not vectorized:"
+				       " abnormal loop exit edge.\n");
+      /* Early break BB must be after the main exit BB.  In theory we should
+	 be able to vectorize the inverse order, but the current flow in the
+	 the vectorizer always assumes you update successor PHI nodes, not
+	 preds.  */
+      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e->src))
+	return opt_result::failure_at (vect_location,
+				       "not vectorized:"
+				       " abnormal loop exit edge order.\n");
+    }
+
+  /* We currently only support early exit loops with known bounds.   */
+  if (exits.length () > 1)
+    {
+      class tree_niter_desc niter;
+      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter, NULL)
+	  || chrec_contains_undetermined (niter.niter)
+	  || !evolution_function_is_constant_p (niter.niter))
+	return opt_result::failure_at (vect_location,
+				       "not vectorized:"
+				       " early breaks only supported on loops"
+				       " with known iteration bounds.\n");
+    }
 
   info->conds
     = vect_get_loop_niters (loop, &info->assumptions,
@@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
   LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
   LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
 
+  /* Check to see if we're vectorizing multiple exits.  */
+  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
+
   if (info->inner_loop_cond)
     {
       stmt_vec_info inner_loop_cond_info
@@ -3070,7 +3106,8 @@ start_over:
 
   /* If an epilogue loop is required make sure we can create one.  */
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
-      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
+      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
@@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
   basic_block exit_bb;
   tree scalar_dest;
   tree scalar_type;
-  gimple *new_phi = NULL, *phi;
+  gimple *new_phi = NULL, *phi = NULL;
   gimple_stmt_iterator exit_gsi;
   tree new_temp = NULL_TREE, new_name, new_scalar_dest;
   gimple *epilog_stmt = NULL;
@@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
+
+	/* Update the other exits.  */
+	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	  {
+	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
+	    gphi_iterator gsi, gsi1;
+	    for (edge exit : alt_exits)
+	      {
+		/* Find the phi node to propaget into the exit block for each
+		   exit edge.  */
+		for (gsi = gsi_start_phis (exit_bb),
+		     gsi1 = gsi_start_phis (exit->src);
+		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
+		     gsi_next (&gsi), gsi_next (&gsi1))
+		  {
+		    /* There really should be a function to just get the number
+		       of phis inside a bb.  */
+		    if (phi && phi == gsi.phi ())
+		      {
+			gphi *phi1 = gsi1.phi ();
+			SET_PHI_ARG_DEF (phi, exit->dest_idx,
+					 PHI_RESULT (phi1));
+			break;
+		      }
+		  }
+	      }
+	  }
       gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
     }
 
@@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info *vinfo,
 	   new_tree = lane_extract <vec_lhs', ...>;
 	   lhs' = new_tree;  */
 
+      /* When vectorizing an early break, any live statement that is used
+	 outside of the loop are dead.  The loop will never get to them.
+	 We could change the liveness value during analysis instead but since
+	 the below code is invalid anyway just ignore it during codegen.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	return true;
+
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
       basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
       gcc_assert (single_pred_p (exit_bb));
@@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
   /* Make sure there exists a single-predecessor exit bb.  Do this before 
      versioning.   */
   edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
-  if (! single_pred_p (e->dest))
+  if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       split_loop_exit_edge (e, true);
       if (dump_enabled_p ())
@@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
   if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
     {
       e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
-      if (! single_pred_p (e->dest))
+      if (e && ! single_pred_p (e->dest))
 	{
 	  split_loop_exit_edge (e, true);
 	  if (dump_enabled_p ())
@@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 
   /* Loops vectorized with a variable factor won't benefit from
      unrolling/peeling.  */
-  if (!vf.is_constant ())
+  if (!vf.is_constant ()
+      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       loop->unroll = 1;
       if (dump_enabled_p ())
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82afe78ee2a7c627be45b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
-  if (is_ctrl_stmt (stmt_info->stmt)
-      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
-    *relevant = vect_used_in_scope;
+  if (is_ctrl_stmt (stmt_info->stmt))
+    {
+      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge, but
+	 it looks like loop_manip doesn't do that..  So we have to do it
+	 the hard way.  */
+      basic_block bb = gimple_bb (stmt_info->stmt);
+      bool exit_bb = false, early_exit = false;
+      edge_iterator ei;
+      edge e;
+      FOR_EACH_EDGE (e, ei, bb->succs)
+        if (!flow_bb_inside_loop_p (loop, e->dest))
+	  {
+	    exit_bb = true;
+	    early_exit = loop->vec_loop_iv->src != bb;
+	    break;
+	  }
+
+      /* We should have processed any exit edge, so an edge not an early
+	 break must be a loop IV edge.  We need to distinguish between the
+	 two as we don't want to generate code for the main loop IV.  */
+      if (exit_bb)
+	{
+	  if (early_exit)
+	    *relevant = vect_used_in_scope;
+	}
+      else if (bb->loop_father == loop)
+	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;
+    }
 
   /* changing memory.  */
   if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
@@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	*relevant = vect_used_in_scope;
       }
 
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  auto_bitmap exit_bbs;
+  for (edge exit : exits)
+    bitmap_set_bit (exit_bbs, exit->dest->index);
+
   /* uses outside the loop.  */
   FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter, SSA_OP_DEF)
     {
@@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	      /* We expect all such uses to be in the loop exit phis
 		 (because of loop closed form)   */
 	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
-	      gcc_assert (bb == single_exit (loop)->dest);
+	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
 
               *live_p = true;
 	    }
@@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
 	}
     }
 
+  /* Ideally this should be in vect_analyze_loop_form but we haven't seen all
+     the conds yet at that point and there's no quick way to retrieve them.  */
+  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
+    return opt_result::failure_at (vect_location,
+				   "not vectorized:"
+				   " unsupported control flow in loop.\n");
+
   /* 2. Process_worklist */
   while (worklist.length () > 0)
     {
@@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
 			return res;
 		    }
                  }
+	    }
+	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
+	    {
+	      enum tree_code rhs_code = gimple_cond_code (cond);
+	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
+	      opt_result res
+		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
+			       loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
+	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
+				loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
             }
 	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
 	    {
@@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
 			     node_instance, cost_vec);
       if (!res)
 	return res;
-   }
+    }
+
+  if (is_ctrl_stmt (stmt_info->stmt))
+    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
 
   switch (STMT_VINFO_DEF_TYPE (stmt_info))
     {
       case vect_internal_def:
+      case vect_early_exit_def:
         break;
 
       case vect_reduction_def:
@@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
     {
       gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
       gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
+		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
 		  || (call && gimple_call_lhs (call) == NULL_TREE));
       *need_to_vectorize = true;
     }
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b340baff61d18da8e242dd 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -63,6 +63,7 @@ enum vect_def_type {
   vect_internal_def,
   vect_induction_def,
   vect_reduction_def,
+  vect_early_exit_def,
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
@@ -876,6 +877,13 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
+  /* When the loop has a non-early break control flow inside.  */
+  bool non_break_control_flow;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -985,9 +993,11 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
 #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
 #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
 #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
+#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)->non_break_control_flow
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
@@ -1038,8 +1048,8 @@ public:
    stack.  */
 typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
 
-inline loop_vec_info
-loop_vec_info_for_loop (class loop *loop)
+static inline loop_vec_info
+loop_vec_info_for_loop (const class loop *loop)
 {
   return (loop_vec_info) loop->aux;
 }
@@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
 {
   if (bb == (bb->loop_father)->header)
     return true;
-  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
+
   return false;
 }
 
@@ -2176,9 +2186,10 @@ class auto_purge_vect_location
    in tree-vect-loop-manip.cc.  */
 extern void vect_set_loop_condition (class loop *, loop_vec_info,
 				     tree, tree, tree, bool);
-extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge);
+extern bool slpeel_can_duplicate_loop_p (const loop_vec_info, const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
-						     class loop *, edge);
+						    class loop *, edge, bool,
+						    vec<basic_block> * = NULL);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,
diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c67f29f5dc9a8d72235f4 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
 	 predicates that need to be shared for optimal predicate usage.
 	 However reassoc will re-order them and prevent CSE from working
 	 as it should.  CSE only the loop body, not the entry.  */
-      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
+      auto_vec<edge> exits = get_loop_exit_edges (loop);
+      for (edge exit : exits)
+	bitmap_set_bit (exit_bbs, exit->dest->index);
 
       edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
       do_rpo_vn (fun, entry, exit_bbs);




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (11 preceding siblings ...)
  2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
@ 2023-06-28 13:47 ` Tamar Christina
  2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 2283 bytes --]

Hi All,

I didn't want these to get lost in the noise of updates.

The following three tests now correctly work for targets that have an
implementation of cbranch for vectors so XFAILs are conditionally removed gated
on vect_early_break support.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/tsvc/vect-tsvc-s332.c: Remove xfail when early break
	supported.
	* gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
	* gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -49,4 +49,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break} } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -37,4 +37,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */




-- 

[-- Attachment #2: rb17507.patch --]
[-- Type: text/plain, Size: 1703 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -49,4 +49,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break} } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -37,4 +37,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization.
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (12 preceding siblings ...)
  2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
@ 2023-06-28 13:47 ` Tamar Christina
  2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 75160 bytes --]

Hi All,

This adds new test to check for all the early break functionality.
It includes a number of codegen and runtime tests checking the values at
different needles in the array.

They also check the values on different array sizes and peeling positions,
datatypes, VL, ncopies and every other variant I could think of.

Additionally it also contains reduced cases from issues found running over
various codebases.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Also regtested with:
 -march=armv8.3-a+sve
 -march=armv8.3-a+nosve
 -march=armv9-a

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* doc/sourcebuild.texi: Document vect_early_break.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp (vect_early_break): New.
	* gcc.dg/vect/vect-early-break-run_1.c: New test.
	* gcc.dg/vect/vect-early-break-run_10.c: New test.
	* gcc.dg/vect/vect-early-break-run_2.c: New test.
	* gcc.dg/vect/vect-early-break-run_3.c: New test.
	* gcc.dg/vect/vect-early-break-run_4.c: New test.
	* gcc.dg/vect/vect-early-break-run_5.c: New test.
	* gcc.dg/vect/vect-early-break-run_6.c: New test.
	* gcc.dg/vect/vect-early-break-run_7.c: New test.
	* gcc.dg/vect/vect-early-break-run_8.c: New test.
	* gcc.dg/vect/vect-early-break-run_9.c: New test.
	* gcc.dg/vect/vect-early-break-template_1.c: New test.
	* gcc.dg/vect/vect-early-break-template_2.c: New test.
	* gcc.dg/vect/vect-early-break_1.c: New test.
	* gcc.dg/vect/vect-early-break_10.c: New test.
	* gcc.dg/vect/vect-early-break_11.c: New test.
	* gcc.dg/vect/vect-early-break_12.c: New test.
	* gcc.dg/vect/vect-early-break_13.c: New test.
	* gcc.dg/vect/vect-early-break_14.c: New test.
	* gcc.dg/vect/vect-early-break_15.c: New test.
	* gcc.dg/vect/vect-early-break_16.c: New test.
	* gcc.dg/vect/vect-early-break_17.c: New test.
	* gcc.dg/vect/vect-early-break_18.c: New test.
	* gcc.dg/vect/vect-early-break_19.c: New test.
	* gcc.dg/vect/vect-early-break_2.c: New test.
	* gcc.dg/vect/vect-early-break_20.c: New test.
	* gcc.dg/vect/vect-early-break_21.c: New test.
	* gcc.dg/vect/vect-early-break_22.c: New test.
	* gcc.dg/vect/vect-early-break_23.c: New test.
	* gcc.dg/vect/vect-early-break_24.c: New test.
	* gcc.dg/vect/vect-early-break_25.c: New test.
	* gcc.dg/vect/vect-early-break_26.c: New test.
	* gcc.dg/vect/vect-early-break_27.c: New test.
	* gcc.dg/vect/vect-early-break_28.c: New test.
	* gcc.dg/vect/vect-early-break_29.c: New test.
	* gcc.dg/vect/vect-early-break_3.c: New test.
	* gcc.dg/vect/vect-early-break_30.c: New test.
	* gcc.dg/vect/vect-early-break_31.c: New test.
	* gcc.dg/vect/vect-early-break_32.c: New test.
	* gcc.dg/vect/vect-early-break_33.c: New test.
	* gcc.dg/vect/vect-early-break_34.c: New test.
	* gcc.dg/vect/vect-early-break_35.c: New test.
	* gcc.dg/vect/vect-early-break_36.c: New test.
	* gcc.dg/vect/vect-early-break_37.c: New test.
	* gcc.dg/vect/vect-early-break_38.c: New test.
	* gcc.dg/vect/vect-early-break_39.c: New test.
	* gcc.dg/vect/vect-early-break_4.c: New test.
	* gcc.dg/vect/vect-early-break_40.c: New test.
	* gcc.dg/vect/vect-early-break_41.c: New test.
	* gcc.dg/vect/vect-early-break_42.c: New test.
	* gcc.dg/vect/vect-early-break_43.c: New test.
	* gcc.dg/vect/vect-early-break_44.c: New test.
	* gcc.dg/vect/vect-early-break_45.c: New test.
	* gcc.dg/vect/vect-early-break_46.c: New test.
	* gcc.dg/vect/vect-early-break_47.c: New test.
	* gcc.dg/vect/vect-early-break_48.c: New test.
	* gcc.dg/vect/vect-early-break_49.c: New test.
	* gcc.dg/vect/vect-early-break_5.c: New test.
	* gcc.dg/vect/vect-early-break_50.c: New test.
	* gcc.dg/vect/vect-early-break_51.c: New test.
	* gcc.dg/vect/vect-early-break_52.c: New test.
	* gcc.dg/vect/vect-early-break_53.c: New test.
	* gcc.dg/vect/vect-early-break_54.c: New test.
	* gcc.dg/vect/vect-early-break_55.c: New test.
	* gcc.dg/vect/vect-early-break_56.c: New test.
	* gcc.dg/vect/vect-early-break_57.c: New test.
	* gcc.dg/vect/vect-early-break_58.c: New test.
	* gcc.dg/vect/vect-early-break_59.c: New test.
	* gcc.dg/vect/vect-early-break_6.c: New test.
	* gcc.dg/vect/vect-early-break_60.c: New test.
	* gcc.dg/vect/vect-early-break_61.c: New test.
	* gcc.dg/vect/vect-early-break_62.c: New test.
	* gcc.dg/vect/vect-early-break_63.c: New test.
	* gcc.dg/vect/vect-early-break_64.c: New test.
	* gcc.dg/vect/vect-early-break_65.c: New test.
	* gcc.dg/vect/vect-early-break_66.c: New test.
	* gcc.dg/vect/vect-early-break_7.c: New test.
	* gcc.dg/vect/vect-early-break_8.c: New test.
	* gcc.dg/vect/vect-early-break_9.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 526020c751150cd74f766eb83eaf61de6f4374cf..090ceebd7befb3ace9b0d498b74a4e3474990b91 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1636,6 +1636,10 @@ Target supports hardware vectors of @code{float} when
 @option{-funsafe-math-optimizations} is not in effect.
 This implies @code{vect_float}.
 
+@item vect_early_break
+Target supports hardware vectorization of loops with early breaks.
+This requires an implementation of the cbranch optab for vectors.
+
 @item vect_int
 Target supports hardware vectors of @code{int}.
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..2495b36a72eae94cb7abc4a0d17a5c979fd78083
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 0
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
new file mode 100644
index 0000000000000000000000000000000000000000..9bcd7f7e57ef9a1d4649d18569b3406050e54603
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 800
+#define P 799
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..63f63101a467909f328be7f3acbc5bcb721967ff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 802
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..626b95e9b8517081d41d794e9e0264d6301c8589
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 5
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..7e0e6426120551152a7bd800c15d9ed6ab15bada
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 278
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..242cf486f9c40055df0aef5fd238d1aff7a7c7da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 800
+#define P 799
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..9fe7136b7213a463ca6573c60476b7c8f531ddcb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 0
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
new file mode 100644
index 0000000000000000000000000000000000000000..02f93d77dba31b938f6fd9e8c7f5e4acde4aeec9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 802
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
new file mode 100644
index 0000000000000000000000000000000000000000..a614925465606b54c638221ffb95a5e8d3bee797
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 5
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
new file mode 100644
index 0000000000000000000000000000000000000000..94e2b9c301456eda8f9ad7eaa67604563f0afee7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 278
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..af70a8e2a5a9dc9756edb5580f2de02ddcc95de9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
@@ -0,0 +1,47 @@
+#ifndef N
+#define N 803
+#endif
+
+#ifndef P
+#define P 0
+#endif
+
+unsigned vect_a[N] = {0};
+unsigned vect_b[N] = {0};
+  
+__attribute__((noipa, noinline))
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+
+  int x = 1;
+  int idx = P;
+  vect_a[idx] = x + 1;
+
+  test4(x);
+
+  if (vect_b[idx] != (x + idx))
+    abort ();
+
+  if (vect_a[idx] != x + 1)
+    abort ();
+
+  if (idx > 0 && vect_a[idx-1] != x)
+    abort ();
+
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d0f924d904437e71567d27cc1f1089e5607dca0d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
@@ -0,0 +1,50 @@
+#ifndef N
+#define N 803
+#endif
+
+#ifndef P
+#define P 0
+#endif
+
+unsigned vect_a[N] = {0};
+unsigned vect_b[N] = {0};
+  
+__attribute__((noipa, noinline))
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+
+  int x = 1;
+  int idx = P;
+  vect_a[idx] = x + 1;
+
+  unsigned res = test4(x);
+
+  if (res != idx)
+    abort ();
+
+  if (vect_b[idx] != (x + idx))
+    abort ();
+
+  if (vect_a[idx] != x + 1)
+    abort ();
+
+  if (idx > 0 && vect_a[idx-1] != x)
+    abort ();
+
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..51e7d6489b99c25b9b4b3d1c839f98562b6d4dd7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
new file mode 100644
index 0000000000000000000000000000000000000000..9e4ad1763202dfdab3ed7961ead5114fcc61a11b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x,int y, int z)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+ }
+
+ ret = x + y * z;
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
new file mode 100644
index 0000000000000000000000000000000000000000..a613dd9909fb09278dd92a81a24ef854994a9890
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, int y)
+{
+ unsigned ret = 0;
+for (int o = 0; o < y; o++)
+{
+ ret += o;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+}
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
new file mode 100644
index 0000000000000000000000000000000000000000..cc10f3238f1cb8e1307e024a3ebcb5c25a39d1b2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, int y)
+{
+ unsigned ret = 0;
+for (int o = 0; o < y; o++)
+{
+ ret += o;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   
+ }
+}
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
new file mode 100644
index 0000000000000000000000000000000000000000..6967b7395ed7c19e38a436d6edcfe7c1580c7113
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i] * x;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
new file mode 100644
index 0000000000000000000000000000000000000000..03cce5cf6cadecb520b46be666bf608e3bc6a511
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 803
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+int test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
new file mode 100644
index 0000000000000000000000000000000000000000..dec6872e1115ff66695f5a500ffa7ca01c0f8d3a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 803
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+int test4(unsigned x)
+{
+ int ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
new file mode 100644
index 0000000000000000000000000000000000000000..30812d12a39bd94b4b8a3aade6512b162697d659
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret += vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
new file mode 100644
index 0000000000000000000000000000000000000000..510227a18435a8e47c5a754580180c6d340c0823
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret = vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
new file mode 100644
index 0000000000000000000000000000000000000000..1372f79242b250cabbab29757b62cbc28a9064a8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
new file mode 100644
index 0000000000000000000000000000000000000000..677487f7da496a8f467d8c529575d47ff22c6a31
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, unsigned step)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=step)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..7268f6ae2485d0274fd85ea53cc1e44ef4b84d5c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
new file mode 100644
index 0000000000000000000000000000000000000000..ed41377d1c979bf14e0a4e80401831c09ffa463f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_b[N];
+struct testStruct {
+ long e;
+ long f;
+ bool a : 1;
+ bool b : 1;
+ int c : 14;
+ int d;
+};
+struct testStruct vect_a[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i].a > x)
+     return true;
+   vect_a[i].e = x;
+ }
+ return ret;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
new file mode 100644
index 0000000000000000000000000000000000000000..6415e4951cb9ef70e56b7cfb1db3d3151368666d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_b[N];
+struct testStruct {
+ long e;
+ long f;
+ bool a : 1;
+ bool b : 1;
+ int c : 14;
+ int d;
+};
+struct testStruct vect_a[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i].a)
+     return true;
+   vect_a[i].e = x;
+ }
+ return ret;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
new file mode 100644
index 0000000000000000000000000000000000000000..2ca189899fb6bd6dfdf63de7729f54e3bee06ba0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_perm } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include "tree-vect.h"
+
+void __attribute__((noipa))
+foo (int * __restrict__ a, short * __restrict__ b, int * __restrict__ c)
+{
+  int t1 = *c;
+  int t2 = *c;
+  for (int i = 0; i < 64; i+=2)
+    {
+      b[i] = a[i] - t1;
+      t1 = a[i];
+      b[i+1] = a[i+1] - t2;
+      t2 = a[i+1];
+    }
+}
+
+int a[64];
+short b[64];
+
+int
+main ()
+{
+  check_vect ();
+  for (int i = 0; i < 64; ++i)
+    {
+      a[i] = i;
+      __asm__ volatile ("" ::: "memory");
+    }
+  int c = 7;
+  foo (a, b, &c);
+  for (int i = 2; i < 64; i+=2)
+    if (b[i] != a[i] - a[i-2]
+	|| b[i+1] != a[i+1] - a[i-1])
+      abort ();
+  if (b[0] != -7 || b[1] != -6)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
new file mode 100644
index 0000000000000000000000000000000000000000..f3298656d5d67fd137c4029a96a2f9c1bae344ce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
@@ -0,0 +1,61 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#define N 200
+#define M 4
+
+typedef signed char sc;
+typedef unsigned char uc;
+typedef signed short ss;
+typedef unsigned short us;
+typedef int si;
+typedef unsigned int ui;
+typedef signed long long sll;
+typedef unsigned long long ull;
+
+#define FOR_EACH_TYPE(M) \
+  M (sc) M (uc) \
+  M (ss) M (us) \
+  M (si) M (ui) \
+  M (sll) M (ull) \
+  M (float) M (double)
+
+#define TEST_VALUE(I) ((I) * 17 / 2)
+
+#define ADD_TEST(TYPE)				\
+  void __attribute__((noinline, noclone))	\
+  test_##TYPE (TYPE *a, TYPE *b)		\
+  {						\
+    for (int i = 0; i < N; i += 2)		\
+      {						\
+	a[i + 0] = b[i + 0] + 2;		\
+	a[i + 1] = b[i + 1] + 3;		\
+      }						\
+  }
+
+#define DO_TEST(TYPE)					\
+  for (int j = 1; j < M; ++j)				\
+    {							\
+      TYPE a[N + M];					\
+      for (int i = 0; i < N + M; ++i)			\
+	a[i] = TEST_VALUE (i);				\
+      test_##TYPE (a + j, a);				\
+      for (int i = 0; i < N; i += 2)			\
+	if (a[i + j] != (TYPE) (a[i] + 2)		\
+	    || a[i + j + 1] != (TYPE) (a[i + 1] + 3))	\
+	  __builtin_abort ();				\
+    }
+
+FOR_EACH_TYPE (ADD_TEST)
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (DO_TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
new file mode 100644
index 0000000000000000000000000000000000000000..7b4b2ffb9b75db6d5ca7e313d1f18d9b51f5b566
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include "tree-vect.h"
+
+extern void abort (void);
+void __attribute__((noinline,noclone))
+foo (double *b, double *d, double *f)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      d[2*i] = 2. * d[2*i];
+      d[2*i+1] = 4. * d[2*i+1];
+      b[i] = d[2*i] - 1.;
+      f[i] = d[2*i+1] + 2.;
+    }
+}
+int main()
+{
+  double b[1024], d[2*1024], f[1024];
+  int i;
+
+  check_vect ();
+
+  for (i = 0; i < 2*1024; i++)
+    d[i] = 1.;
+  foo (b, d, f);
+  for (i = 0; i < 1024; i+= 2)
+    {
+      if (d[2*i] != 2.)
+	abort ();
+      if (d[2*i+1] != 4.)
+	abort ();
+    }
+  for (i = 0; i < 1024; i++)
+    {
+      if (b[i] != 1.)
+	abort ();
+      if (f[i] != 6.)
+	abort ();
+    }
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
new file mode 100644
index 0000000000000000000000000000000000000000..8db9b60128b9e21529ae73ea1902afb8fa327112
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+
+#include "vect-peel-1-src.c"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 14 "vect" { target { { vect_element_align } && { vect_aligned_arrays } } xfail { ! vect_unaligned_possible } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail vect_element_align_preferred } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
new file mode 100644
index 0000000000000000000000000000000000000000..5905847cc0b6b393dde728a9f4ecb44c8ab42da5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_perm } */
+
+#include "tree-vect.h"
+
+void __attribute__((noipa))
+foo (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c)
+{
+  int t1 = *c;
+  int t2 = *c;
+  for (int i = 0; i < 64; i+=2)
+    {
+      b[i] = a[i] - t1;
+      t1 = a[i];
+      b[i+1] = a[i+1] - t2;
+      t2 = a[i+1];
+    }
+}
+
+int a[64], b[64];
+
+int
+main ()
+{
+  check_vect ();
+  for (int i = 0; i < 64; ++i)
+    {
+      a[i] = i;
+      __asm__ volatile ("" ::: "memory");
+    }
+  int c = 7;
+  foo (a, b, &c);
+  for (int i = 2; i < 64; i+=2)
+    if (b[i] != a[i] - a[i-2]
+	|| b[i+1] != a[i+1] - a[i-1])
+      abort ();
+  if (b[0] != -7 || b[1] != -6)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
new file mode 100644
index 0000000000000000000000000000000000000000..d0cfbb01667fa016d72828d098aeaa252c2c9318
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[128];
+
+int main ()
+{
+  int i;
+  for (i = 1; i < 128; i++)
+    if (a[i] != i%4 + 1)
+      abort ();
+  if (a[0] != 5)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
new file mode 100644
index 0000000000000000000000000000000000000000..a5eae81f3f5f5b7d92082f1588c6453a71e205cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[128];
+int main ()
+{
+  int i;
+  for (i = 1; i < 128; i++)
+    if (a[i] != i%4 + 1)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
new file mode 100644
index 0000000000000000000000000000000000000000..75d87e99e939fab61f751be025ca0398fa5bd078
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int in[100];
+int out[100 * 2];
+
+int main (void)
+{
+  if (out[0] != in[100 - 1])
+  for (int i = 1; i <= 100; ++i)
+    if (out[i] != 2)
+      __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3c6d28bd2d6e6e794146baf89e43c3b70293b7d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+unsigned test4(char x, char *vect, int n)
+{  
+ unsigned ret = 0;
+ for (int i = 0; i < n; i++)
+ {
+   if (vect[i] > x)
+     return 1;
+
+   vect[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
new file mode 100644
index 0000000000000000000000000000000000000000..e09d883db84685679e73867d83aba9900563983d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int x[100];
+int choose1(int);
+int choose2();
+void consume(int);
+void f() {
+    for (int i = 0; i < 100; ++i) {
+        if (x[i] == 11) {
+            if (choose1(i))
+                goto A;
+            else
+                goto B;
+        }
+    }
+    if (choose2())
+        goto B;
+A:
+    for (int i = 0; i < 100; ++i)
+        consume(i);
+B:
+    for (int i = 0; i < 100; ++i)
+        consume(i * i);
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
new file mode 100644
index 0000000000000000000000000000000000000000..6001523162d24d140af73143435f25bcd3a217c8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 1025
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret += vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
new file mode 100644
index 0000000000000000000000000000000000000000..73abddc267a0170c2d97a7e7c680525721455f22
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 1024
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret = vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
new file mode 100644
index 0000000000000000000000000000000000000000..29b37f70939af7fa9409edd3a1e29f718c959706
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+
+unsigned test4(unsigned x, int z)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     {
+       for (int y = 0; y < z; y++)
+	 vect_a2 [y] *= vect_a1[i];
+       break;
+     }
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 2 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
new file mode 100644
index 0000000000000000000000000000000000000000..2c48e3cee33fc37f45ef59c2bbaff7bc5a76b460
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+
+unsigned vect_a[N] __attribute__ ((aligned (4)));;
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ 
+ for (int i = 1; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
new file mode 100644
index 0000000000000000000000000000000000000000..3442484a81161f9bd09e30bc268fbcf66a899902
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     break;
+   vect_a1[i] = x;
+   if (vect_a2[i]*4 > x)
+     break;
+   vect_a2[i] = x*x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
new file mode 100644
index 0000000000000000000000000000000000000000..027766c51f508eab157db365a1653f3e92dcac10
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     break;
+   vect_a1[i] = x;
+   if (vect_a2[i]*4 > x)
+     return i;
+   vect_a2[i] = x*x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
new file mode 100644
index 0000000000000000000000000000000000000000..8d363120898232bb1402b9cf7b4b83b38a10505b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 4
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 != x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
new file mode 100644
index 0000000000000000000000000000000000000000..226d55d7194ca3f676ab52976fea25b7e335bbec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
new file mode 100644
index 0000000000000000000000000000000000000000..554e6ec84318c600c87982ad6ef0f90e8b47af01
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, unsigned n)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+= (N % 4))
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..216c56faf330449bf1969b7e51ff1e94270dc861
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect[N];
+
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (i > 16 && vect[i] > x)
+     break;
+
+   vect[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
new file mode 100644
index 0000000000000000000000000000000000000000..f2ae372cd96e74cc06254937c2b8fa69ecdedf09
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i*=3)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* SCEV can't currently analyze this loop bounds.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
new file mode 100644
index 0000000000000000000000000000000000000000..6ad9b3f17ddb953bfbf614e9331fa81f565b262f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+#pragma GCC novector
+#pragma GCC unroll 4
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += vect_a[i] + x;
+ }
+ return ret;
+}
+
+/* novector should have blocked vectorization.  */
+/* { dg-final { scan-tree-dump-not "vectorized \d loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
new file mode 100644
index 0000000000000000000000000000000000000000..88652f01595cb49a8736a1da6563507b607aae8f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 800
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
new file mode 100644
index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 802
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i + 1;
+   if (vect_a[i]*2 > x)
+     break;
+   if (vect_a[i+1]*2 > x)
+     break;
+   vect_a[i] = x;
+   vect_a[i+1] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
new file mode 100644
index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 802
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i + 1;
+   if (vect_a[i]*2 > x)
+     break;
+   if (vect_a[i+1]*2 > x)
+     break;
+   vect_a[i] = x;
+   vect_a[i+1] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
new file mode 100644
index 0000000000000000000000000000000000000000..cf1cb903b31d5fb5527bc6216c0cb9047357da96
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
new file mode 100644
index 0000000000000000000000000000000000000000..356d971e3a1f69f5c190b49d1d108e6be8766b39
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
+
+/* At -O2 we can't currently vectorize this because of the libcalls not being
+   lowered.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect"  { xfail *-*-* } } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
new file mode 100644
index 0000000000000000000000000000000000000000..d1cca4a33a25fbf6b631d46ce3dcd3608cffa046
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+void abort ();
+
+float results1[16] = {192.00,240.00,288.00,336.00,384.00,432.00,480.00,528.00,0.00};
+float results2[16] = {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,54.00,120.00,198.00,288.00,390.00,504.00,630.00};
+float a[16] = {0};
+float e[16] = {0};
+float b[16] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+int main1 ()
+{
+  int i;
+  for (i=0; i<16; i++)
+    {
+      if (a[i] != results1[i] || e[i] != results2[i])
+        abort();
+    }
+
+  if (a[i+3] != b[i-1])
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
new file mode 100644
index 0000000000000000000000000000000000000000..77043182860321a9e265a89ad8f29ec7946b17e8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int main (void)
+{
+  signed char a[50], b[50], c[50];
+  for (int i = 0; i < 50; ++i)
+    if (a[i] != ((((signed int) -1 < 0 ? -126 : 4) + ((signed int) -1 < 0 ? -101 : 26) + i * 9 + 0) >> 1))
+      __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
new file mode 100644
index 0000000000000000000000000000000000000000..bc9e5bf899a54c5b2ef67e0193d56b243ec5f043
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort();
+struct foostr {
+  _Complex short f1;
+  _Complex short f2;
+};
+struct foostr a[16] __attribute__ ((__aligned__(16))) = {};
+struct foostr c[16] __attribute__ ((__aligned__(16)));
+struct foostr res[16] = {};
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < 16; i++)
+    {
+      if (c[i].f1 != res[i].f1)
+ abort ();
+      if (c[i].f2 != res[i].f2)
+ abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..4a36d6979db1fd1f97ba2a290f78ac3b84f6de24
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
new file mode 100644
index 0000000000000000000000000000000000000000..e2ac8283091597f6f4776560c86f89d1f98b58ee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
new file mode 100644
index 0000000000000000000000000000000000000000..af036079457a7f5e50eae5a9ad4c952f33e62f87
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int x_in[32];
+int x_out_a[32], x_out_b[32];
+int c[16] = {3,2,1,10,1,42,3,4,50,9,32,8,11,10,1,2};
+int a[16 +1] = {0,16,32,48,64,128,256,512,0,16,32,48,64,128,256,512,1024};
+int b[16 +1] = {17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1};
+
+void foo ()
+{
+  int j, i, x;
+  int curr_a, flag, next_a, curr_b, next_b;
+    {
+      for (i = 0; i < 16; i++)
+        {
+          next_b = b[i+1];
+          curr_b = flag ? next_b : curr_b;
+        }
+      x_out_b[j] = curr_b;
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
new file mode 100644
index 0000000000000000000000000000000000000000..85cdfe0938e4093c7725e7f397accf26198f6a53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort();
+int main1 (short X)
+{
+  unsigned char a[128];
+  unsigned short b[128];
+  unsigned int c[128];
+  short myX = X;
+  int i;
+  for (i = 0; i < 128; i++)
+    {
+      if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
+        abort ();
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
new file mode 100644
index 0000000000000000000000000000000000000000..f066ddcfe458ca04bb1336f832121c91d7a3e80e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[64], b[64];
+int main ()
+{
+  int c = 7;
+  for (int i = 1; i < 64; ++i)
+    if (b[i] != a[i] - a[i-1])
+      abort ();
+  if (b[0] != -7)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
new file mode 100644
index 0000000000000000000000000000000000000000..9d0dd8dc5fccb05aeabcbce4014c4994bafdfb05
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ unsigned tmp[N];
+ for (int i = 0; i < N; i++)
+ {
+   tmp[i] = x + i;
+   vect_b[i] = tmp[i];
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
new file mode 100644
index 0000000000000000000000000000000000000000..073cbdf614f81525975dbd188632582218e60e9e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   volatile unsigned tmp = x + i;
+   vect_b[i] = tmp;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
new file mode 100644
index 0000000000000000000000000000000000000000..9086e885f56974d17f8cdf2dce4c6a44e580d74b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
@@ -0,0 +1,101 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-add-options bind_pic_locally } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+unsigned short sa[N];
+unsigned short sc[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[N];
+unsigned int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+/* Current peeling-for-alignment scheme will consider the 'sa[i+7]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-7%VF. This will result in a peeling factor 1,
+   which will also align the access to 'ia[i+3]', and the loop could be
+   vectorized on all targets that support unaligned loads.
+   Without cost model on targets that support misaligned stores, no peeling
+   will be applied since we want to keep the four loads aligned.  */
+
+__attribute__ ((noinline))
+int main1 ()
+{
+  int i;
+  int n = N - 7;
+
+  /* Multiple types with different sizes, used in independent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      sa[i+7] = sb[i] + sc[i];
+      ia[i+3] = ib[i] + ic[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+	abort ();
+    }
+
+  return 0;
+}
+
+/* Current peeling-for-alignment scheme will consider the 'ia[i+3]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-3%VF. This will result in a peeling factor
+   1 if VF=4,2. This will not align the access to 'sa[i+3]', for which we 
+   need to peel 5,1 iterations for VF=4,2 respectively, so the loop can not 
+   be vectorized.  However, 'ia[i+3]' also gets aligned if we peel 5
+   iterations, so the loop is vectorizable on all targets that support
+   unaligned loads.
+   Without cost model on targets that support misaligned stores, no peeling
+   will be applied since we want to keep the four loads aligned.  */
+
+__attribute__ ((noinline))
+int main2 ()
+{
+  int i;
+  int n = N-3;
+
+  /* Multiple types with different sizes, used in independent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      ia[i+3] = ib[i] + ic[i];
+      sa[i+3] = sb[i] + sc[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 ();
+  main2 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 2 "vect" { xfail { vect_early_break && { ! vect_hw_misalign } } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
new file mode 100644
index 0000000000000000000000000000000000000000..9c7c3df59ffbaaf23292107f982fd7af31741ada
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+void abort ();
+
+unsigned short sa[32];
+unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[32];
+unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+int main2 (int n)
+{
+  int i;
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
new file mode 100644
index 0000000000000000000000000000000000000000..84ea627b4927609079297f11674bdb4c6b301140
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != ((i % 3) == 0))
+      abort ();
+}
+
+/* Pattern didn't match inside gcond.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
new file mode 100644
index 0000000000000000000000000000000000000000..193f14e8a4d90793f65a5902eabb8d06496bd6e1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != (i == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..63ff6662f5c2c93201897e43680daa580ed53867
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < (N/2); i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i+1;
+   if (vect_a[i] > x || vect_a[i+1] > x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   vect_a[i+1] += x * vect_b[i+1]; 
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
new file mode 100644
index 0000000000000000000000000000000000000000..4c523d4e714ba67e84b213c2aaf3a56231f8b7e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  char i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != (i == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
new file mode 100644
index 0000000000000000000000000000000000000000..a0c34f71e3bbd3516247a8e026fe513c25413252
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+typedef float real_t;
+__attribute__((aligned(64))) real_t a[32000], b[32000], c[32000];
+real_t s482()
+{
+    for (int nl = 0; nl < 10000; nl++) {
+        for (int i = 0; i < 32000; i++) {
+            a[i] += b[i] * c[i];
+            if (c[i] > b[i]) break;
+        }
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
new file mode 100644
index 0000000000000000000000000000000000000000..9b94772934f75e685d71a41f3a0336fbfb7320d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int a, b;
+int e() {
+  int d, c;
+  d = 0;
+  for (; d < b; d++)
+    a = 0;
+  d = 0;
+  for (; d < b; d++)
+    if (d)
+      c++;
+  for (;;)
+    if (c)
+      break;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
new file mode 100644
index 0000000000000000000000000000000000000000..11f7fb8547b351734a964175380d1ada696011ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
@@ -0,0 +1,28 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-do compile } */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_long } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-additional-options "-fno-tree-scev-cprop" } */
+
+/* Statement used outside the loop.
+   NOTE: SCEV disabled to ensure the live operation is not removed before
+   vectorization.  */
+__attribute__ ((noinline)) int
+liveloop (int start, int n, int *x, int *y)
+{
+  int i = start;
+  int j;
+  int ret;
+
+  for (j = 0; j < n; ++j)
+    {
+      i += 1;
+      x[j] = i;
+      ret = y[j];
+    }
+  return ret;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
new file mode 100644
index 0000000000000000000000000000000000000000..32b9c087feba1780223e3aee8a2636c99990408c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fdump-tree-vect-all" } */
+
+int d(unsigned);
+
+void a() {
+  char b[8];
+  unsigned c = 0;
+  while (c < 7 && b[c])
+    ++c;
+  if (d(c))
+    return;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_partial_vectors } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
new file mode 100644
index 0000000000000000000000000000000000000000..577c4e96ba91d4dd4aa448233c632de508286eb9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-Ofast -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+enum a { b };
+
+struct {
+  enum a c;
+} d[10], *e;
+
+void f() {
+  int g;
+  for (g = 0, e = d; g < sizeof(1); g++, e++)
+    if (e->c)
+      return;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
new file mode 100644
index 0000000000000000000000000000000000000000..b56a4f755f89225cedd8c156cc7385fe5e07eee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int a[0];
+int b;
+
+void g();
+
+void f() {
+  int d, e;
+  for (; e; e++) {
+    int c;
+    switch (b)
+    case '9': {
+      for (; d < 1; d++)
+        if (a[d])
+          c = 1;
+      break;
+    case '<':
+      g();
+      c = 0;
+    }
+      while (c)
+        ;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
new file mode 100644
index 0000000000000000000000000000000000000000..10fd8b42952c42f3d3a014da103931ca394423d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
new file mode 100644
index 0000000000000000000000000000000000000000..ae706b2952cfcecf20546a67a735b8d902cbb607
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+char vect_a[N];
+char vect_b[N];
+  
+char test4(char x, char * restrict res)
+{
+ char ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   res[i] *= vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
new file mode 100644
index 0000000000000000000000000000000000000000..350f02f3c7caef457adbe1be802bba51cd818393
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_a[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index d79ad4be10502969209f9b13bd65ab142b92e644..5516188dc0aa86d161d67dea5a7769e3c3d72f85 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3775,6 +3775,17 @@ proc check_effective_target_vect_int { } {
 	}}]
 }
 
+# Return 1 if the target supports hardware vectorization of early breaks,
+# 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_early_break { } {
+    return [check_cached_effective_target_indexed vect_early_break {
+      expr {
+	[istarget aarch64*-*-*]
+	}}]
+}
 # Return 1 if the target supports hardware vectorization of complex additions of
 # byte, 0 otherwise.
 #




-- 

[-- Attachment #2: rb17508.patch --]
[-- Type: text/plain, Size: 70598 bytes --]

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 526020c751150cd74f766eb83eaf61de6f4374cf..090ceebd7befb3ace9b0d498b74a4e3474990b91 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1636,6 +1636,10 @@ Target supports hardware vectors of @code{float} when
 @option{-funsafe-math-optimizations} is not in effect.
 This implies @code{vect_float}.
 
+@item vect_early_break
+Target supports hardware vectorization of loops with early breaks.
+This requires an implementation of the cbranch optab for vectors.
+
 @item vect_int
 Target supports hardware vectors of @code{int}.
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..2495b36a72eae94cb7abc4a0d17a5c979fd78083
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 0
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
new file mode 100644
index 0000000000000000000000000000000000000000..9bcd7f7e57ef9a1d4649d18569b3406050e54603
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 800
+#define P 799
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..63f63101a467909f328be7f3acbc5bcb721967ff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 802
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..626b95e9b8517081d41d794e9e0264d6301c8589
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 5
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..7e0e6426120551152a7bd800c15d9ed6ab15bada
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 278
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..242cf486f9c40055df0aef5fd238d1aff7a7c7da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 800
+#define P 799
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..9fe7136b7213a463ca6573c60476b7c8f531ddcb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 0
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
new file mode 100644
index 0000000000000000000000000000000000000000..02f93d77dba31b938f6fd9e8c7f5e4acde4aeec9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 802
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
new file mode 100644
index 0000000000000000000000000000000000000000..a614925465606b54c638221ffb95a5e8d3bee797
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 5
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
new file mode 100644
index 0000000000000000000000000000000000000000..94e2b9c301456eda8f9ad7eaa67604563f0afee7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 278
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..af70a8e2a5a9dc9756edb5580f2de02ddcc95de9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
@@ -0,0 +1,47 @@
+#ifndef N
+#define N 803
+#endif
+
+#ifndef P
+#define P 0
+#endif
+
+unsigned vect_a[N] = {0};
+unsigned vect_b[N] = {0};
+  
+__attribute__((noipa, noinline))
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+
+  int x = 1;
+  int idx = P;
+  vect_a[idx] = x + 1;
+
+  test4(x);
+
+  if (vect_b[idx] != (x + idx))
+    abort ();
+
+  if (vect_a[idx] != x + 1)
+    abort ();
+
+  if (idx > 0 && vect_a[idx-1] != x)
+    abort ();
+
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d0f924d904437e71567d27cc1f1089e5607dca0d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
@@ -0,0 +1,50 @@
+#ifndef N
+#define N 803
+#endif
+
+#ifndef P
+#define P 0
+#endif
+
+unsigned vect_a[N] = {0};
+unsigned vect_b[N] = {0};
+  
+__attribute__((noipa, noinline))
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+
+  int x = 1;
+  int idx = P;
+  vect_a[idx] = x + 1;
+
+  unsigned res = test4(x);
+
+  if (res != idx)
+    abort ();
+
+  if (vect_b[idx] != (x + idx))
+    abort ();
+
+  if (vect_a[idx] != x + 1)
+    abort ();
+
+  if (idx > 0 && vect_a[idx-1] != x)
+    abort ();
+
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..51e7d6489b99c25b9b4b3d1c839f98562b6d4dd7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
new file mode 100644
index 0000000000000000000000000000000000000000..9e4ad1763202dfdab3ed7961ead5114fcc61a11b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x,int y, int z)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+ }
+
+ ret = x + y * z;
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
new file mode 100644
index 0000000000000000000000000000000000000000..a613dd9909fb09278dd92a81a24ef854994a9890
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, int y)
+{
+ unsigned ret = 0;
+for (int o = 0; o < y; o++)
+{
+ ret += o;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+}
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
new file mode 100644
index 0000000000000000000000000000000000000000..cc10f3238f1cb8e1307e024a3ebcb5c25a39d1b2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, int y)
+{
+ unsigned ret = 0;
+for (int o = 0; o < y; o++)
+{
+ ret += o;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   
+ }
+}
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
new file mode 100644
index 0000000000000000000000000000000000000000..6967b7395ed7c19e38a436d6edcfe7c1580c7113
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i] * x;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
new file mode 100644
index 0000000000000000000000000000000000000000..03cce5cf6cadecb520b46be666bf608e3bc6a511
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 803
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+int test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
new file mode 100644
index 0000000000000000000000000000000000000000..dec6872e1115ff66695f5a500ffa7ca01c0f8d3a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 803
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+int test4(unsigned x)
+{
+ int ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
new file mode 100644
index 0000000000000000000000000000000000000000..30812d12a39bd94b4b8a3aade6512b162697d659
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret += vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
new file mode 100644
index 0000000000000000000000000000000000000000..510227a18435a8e47c5a754580180c6d340c0823
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret = vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
new file mode 100644
index 0000000000000000000000000000000000000000..1372f79242b250cabbab29757b62cbc28a9064a8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
new file mode 100644
index 0000000000000000000000000000000000000000..677487f7da496a8f467d8c529575d47ff22c6a31
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, unsigned step)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=step)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..7268f6ae2485d0274fd85ea53cc1e44ef4b84d5c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
new file mode 100644
index 0000000000000000000000000000000000000000..ed41377d1c979bf14e0a4e80401831c09ffa463f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_b[N];
+struct testStruct {
+ long e;
+ long f;
+ bool a : 1;
+ bool b : 1;
+ int c : 14;
+ int d;
+};
+struct testStruct vect_a[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i].a > x)
+     return true;
+   vect_a[i].e = x;
+ }
+ return ret;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
new file mode 100644
index 0000000000000000000000000000000000000000..6415e4951cb9ef70e56b7cfb1db3d3151368666d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_b[N];
+struct testStruct {
+ long e;
+ long f;
+ bool a : 1;
+ bool b : 1;
+ int c : 14;
+ int d;
+};
+struct testStruct vect_a[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i].a)
+     return true;
+   vect_a[i].e = x;
+ }
+ return ret;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
new file mode 100644
index 0000000000000000000000000000000000000000..2ca189899fb6bd6dfdf63de7729f54e3bee06ba0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_perm } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include "tree-vect.h"
+
+void __attribute__((noipa))
+foo (int * __restrict__ a, short * __restrict__ b, int * __restrict__ c)
+{
+  int t1 = *c;
+  int t2 = *c;
+  for (int i = 0; i < 64; i+=2)
+    {
+      b[i] = a[i] - t1;
+      t1 = a[i];
+      b[i+1] = a[i+1] - t2;
+      t2 = a[i+1];
+    }
+}
+
+int a[64];
+short b[64];
+
+int
+main ()
+{
+  check_vect ();
+  for (int i = 0; i < 64; ++i)
+    {
+      a[i] = i;
+      __asm__ volatile ("" ::: "memory");
+    }
+  int c = 7;
+  foo (a, b, &c);
+  for (int i = 2; i < 64; i+=2)
+    if (b[i] != a[i] - a[i-2]
+	|| b[i+1] != a[i+1] - a[i-1])
+      abort ();
+  if (b[0] != -7 || b[1] != -6)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
new file mode 100644
index 0000000000000000000000000000000000000000..f3298656d5d67fd137c4029a96a2f9c1bae344ce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
@@ -0,0 +1,61 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#define N 200
+#define M 4
+
+typedef signed char sc;
+typedef unsigned char uc;
+typedef signed short ss;
+typedef unsigned short us;
+typedef int si;
+typedef unsigned int ui;
+typedef signed long long sll;
+typedef unsigned long long ull;
+
+#define FOR_EACH_TYPE(M) \
+  M (sc) M (uc) \
+  M (ss) M (us) \
+  M (si) M (ui) \
+  M (sll) M (ull) \
+  M (float) M (double)
+
+#define TEST_VALUE(I) ((I) * 17 / 2)
+
+#define ADD_TEST(TYPE)				\
+  void __attribute__((noinline, noclone))	\
+  test_##TYPE (TYPE *a, TYPE *b)		\
+  {						\
+    for (int i = 0; i < N; i += 2)		\
+      {						\
+	a[i + 0] = b[i + 0] + 2;		\
+	a[i + 1] = b[i + 1] + 3;		\
+      }						\
+  }
+
+#define DO_TEST(TYPE)					\
+  for (int j = 1; j < M; ++j)				\
+    {							\
+      TYPE a[N + M];					\
+      for (int i = 0; i < N + M; ++i)			\
+	a[i] = TEST_VALUE (i);				\
+      test_##TYPE (a + j, a);				\
+      for (int i = 0; i < N; i += 2)			\
+	if (a[i + j] != (TYPE) (a[i] + 2)		\
+	    || a[i + j + 1] != (TYPE) (a[i + 1] + 3))	\
+	  __builtin_abort ();				\
+    }
+
+FOR_EACH_TYPE (ADD_TEST)
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (DO_TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
new file mode 100644
index 0000000000000000000000000000000000000000..7b4b2ffb9b75db6d5ca7e313d1f18d9b51f5b566
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include "tree-vect.h"
+
+extern void abort (void);
+void __attribute__((noinline,noclone))
+foo (double *b, double *d, double *f)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      d[2*i] = 2. * d[2*i];
+      d[2*i+1] = 4. * d[2*i+1];
+      b[i] = d[2*i] - 1.;
+      f[i] = d[2*i+1] + 2.;
+    }
+}
+int main()
+{
+  double b[1024], d[2*1024], f[1024];
+  int i;
+
+  check_vect ();
+
+  for (i = 0; i < 2*1024; i++)
+    d[i] = 1.;
+  foo (b, d, f);
+  for (i = 0; i < 1024; i+= 2)
+    {
+      if (d[2*i] != 2.)
+	abort ();
+      if (d[2*i+1] != 4.)
+	abort ();
+    }
+  for (i = 0; i < 1024; i++)
+    {
+      if (b[i] != 1.)
+	abort ();
+      if (f[i] != 6.)
+	abort ();
+    }
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
new file mode 100644
index 0000000000000000000000000000000000000000..8db9b60128b9e21529ae73ea1902afb8fa327112
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+
+#include "vect-peel-1-src.c"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 14 "vect" { target { { vect_element_align } && { vect_aligned_arrays } } xfail { ! vect_unaligned_possible } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail vect_element_align_preferred } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
new file mode 100644
index 0000000000000000000000000000000000000000..5905847cc0b6b393dde728a9f4ecb44c8ab42da5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_perm } */
+
+#include "tree-vect.h"
+
+void __attribute__((noipa))
+foo (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c)
+{
+  int t1 = *c;
+  int t2 = *c;
+  for (int i = 0; i < 64; i+=2)
+    {
+      b[i] = a[i] - t1;
+      t1 = a[i];
+      b[i+1] = a[i+1] - t2;
+      t2 = a[i+1];
+    }
+}
+
+int a[64], b[64];
+
+int
+main ()
+{
+  check_vect ();
+  for (int i = 0; i < 64; ++i)
+    {
+      a[i] = i;
+      __asm__ volatile ("" ::: "memory");
+    }
+  int c = 7;
+  foo (a, b, &c);
+  for (int i = 2; i < 64; i+=2)
+    if (b[i] != a[i] - a[i-2]
+	|| b[i+1] != a[i+1] - a[i-1])
+      abort ();
+  if (b[0] != -7 || b[1] != -6)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
new file mode 100644
index 0000000000000000000000000000000000000000..d0cfbb01667fa016d72828d098aeaa252c2c9318
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[128];
+
+int main ()
+{
+  int i;
+  for (i = 1; i < 128; i++)
+    if (a[i] != i%4 + 1)
+      abort ();
+  if (a[0] != 5)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
new file mode 100644
index 0000000000000000000000000000000000000000..a5eae81f3f5f5b7d92082f1588c6453a71e205cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[128];
+int main ()
+{
+  int i;
+  for (i = 1; i < 128; i++)
+    if (a[i] != i%4 + 1)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
new file mode 100644
index 0000000000000000000000000000000000000000..75d87e99e939fab61f751be025ca0398fa5bd078
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int in[100];
+int out[100 * 2];
+
+int main (void)
+{
+  if (out[0] != in[100 - 1])
+  for (int i = 1; i <= 100; ++i)
+    if (out[i] != 2)
+      __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3c6d28bd2d6e6e794146baf89e43c3b70293b7d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+unsigned test4(char x, char *vect, int n)
+{  
+ unsigned ret = 0;
+ for (int i = 0; i < n; i++)
+ {
+   if (vect[i] > x)
+     return 1;
+
+   vect[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
new file mode 100644
index 0000000000000000000000000000000000000000..e09d883db84685679e73867d83aba9900563983d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int x[100];
+int choose1(int);
+int choose2();
+void consume(int);
+void f() {
+    for (int i = 0; i < 100; ++i) {
+        if (x[i] == 11) {
+            if (choose1(i))
+                goto A;
+            else
+                goto B;
+        }
+    }
+    if (choose2())
+        goto B;
+A:
+    for (int i = 0; i < 100; ++i)
+        consume(i);
+B:
+    for (int i = 0; i < 100; ++i)
+        consume(i * i);
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
new file mode 100644
index 0000000000000000000000000000000000000000..6001523162d24d140af73143435f25bcd3a217c8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 1025
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret += vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
new file mode 100644
index 0000000000000000000000000000000000000000..73abddc267a0170c2d97a7e7c680525721455f22
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 1024
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret = vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
new file mode 100644
index 0000000000000000000000000000000000000000..29b37f70939af7fa9409edd3a1e29f718c959706
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+
+unsigned test4(unsigned x, int z)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     {
+       for (int y = 0; y < z; y++)
+	 vect_a2 [y] *= vect_a1[i];
+       break;
+     }
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 2 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
new file mode 100644
index 0000000000000000000000000000000000000000..2c48e3cee33fc37f45ef59c2bbaff7bc5a76b460
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+
+unsigned vect_a[N] __attribute__ ((aligned (4)));;
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ 
+ for (int i = 1; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
new file mode 100644
index 0000000000000000000000000000000000000000..3442484a81161f9bd09e30bc268fbcf66a899902
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     break;
+   vect_a1[i] = x;
+   if (vect_a2[i]*4 > x)
+     break;
+   vect_a2[i] = x*x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
new file mode 100644
index 0000000000000000000000000000000000000000..027766c51f508eab157db365a1653f3e92dcac10
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     break;
+   vect_a1[i] = x;
+   if (vect_a2[i]*4 > x)
+     return i;
+   vect_a2[i] = x*x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
new file mode 100644
index 0000000000000000000000000000000000000000..8d363120898232bb1402b9cf7b4b83b38a10505b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 4
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 != x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
new file mode 100644
index 0000000000000000000000000000000000000000..226d55d7194ca3f676ab52976fea25b7e335bbec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
new file mode 100644
index 0000000000000000000000000000000000000000..554e6ec84318c600c87982ad6ef0f90e8b47af01
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, unsigned n)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+= (N % 4))
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..216c56faf330449bf1969b7e51ff1e94270dc861
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect[N];
+
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (i > 16 && vect[i] > x)
+     break;
+
+   vect[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
new file mode 100644
index 0000000000000000000000000000000000000000..f2ae372cd96e74cc06254937c2b8fa69ecdedf09
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i*=3)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* SCEV can't currently analyze this loop bounds.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
new file mode 100644
index 0000000000000000000000000000000000000000..6ad9b3f17ddb953bfbf614e9331fa81f565b262f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+#pragma GCC novector
+#pragma GCC unroll 4
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += vect_a[i] + x;
+ }
+ return ret;
+}
+
+/* novector should have blocked vectorization.  */
+/* { dg-final { scan-tree-dump-not "vectorized \d loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
new file mode 100644
index 0000000000000000000000000000000000000000..88652f01595cb49a8736a1da6563507b607aae8f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 800
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
new file mode 100644
index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 802
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i + 1;
+   if (vect_a[i]*2 > x)
+     break;
+   if (vect_a[i+1]*2 > x)
+     break;
+   vect_a[i] = x;
+   vect_a[i+1] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
new file mode 100644
index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 802
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i + 1;
+   if (vect_a[i]*2 > x)
+     break;
+   if (vect_a[i+1]*2 > x)
+     break;
+   vect_a[i] = x;
+   vect_a[i+1] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
new file mode 100644
index 0000000000000000000000000000000000000000..cf1cb903b31d5fb5527bc6216c0cb9047357da96
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
new file mode 100644
index 0000000000000000000000000000000000000000..356d971e3a1f69f5c190b49d1d108e6be8766b39
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
+
+/* At -O2 we can't currently vectorize this because of the libcalls not being
+   lowered.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect"  { xfail *-*-* } } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
new file mode 100644
index 0000000000000000000000000000000000000000..d1cca4a33a25fbf6b631d46ce3dcd3608cffa046
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+void abort ();
+
+float results1[16] = {192.00,240.00,288.00,336.00,384.00,432.00,480.00,528.00,0.00};
+float results2[16] = {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,54.00,120.00,198.00,288.00,390.00,504.00,630.00};
+float a[16] = {0};
+float e[16] = {0};
+float b[16] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+int main1 ()
+{
+  int i;
+  for (i=0; i<16; i++)
+    {
+      if (a[i] != results1[i] || e[i] != results2[i])
+        abort();
+    }
+
+  if (a[i+3] != b[i-1])
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
new file mode 100644
index 0000000000000000000000000000000000000000..77043182860321a9e265a89ad8f29ec7946b17e8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int main (void)
+{
+  signed char a[50], b[50], c[50];
+  for (int i = 0; i < 50; ++i)
+    if (a[i] != ((((signed int) -1 < 0 ? -126 : 4) + ((signed int) -1 < 0 ? -101 : 26) + i * 9 + 0) >> 1))
+      __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
new file mode 100644
index 0000000000000000000000000000000000000000..bc9e5bf899a54c5b2ef67e0193d56b243ec5f043
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort();
+struct foostr {
+  _Complex short f1;
+  _Complex short f2;
+};
+struct foostr a[16] __attribute__ ((__aligned__(16))) = {};
+struct foostr c[16] __attribute__ ((__aligned__(16)));
+struct foostr res[16] = {};
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < 16; i++)
+    {
+      if (c[i].f1 != res[i].f1)
+ abort ();
+      if (c[i].f2 != res[i].f2)
+ abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..4a36d6979db1fd1f97ba2a290f78ac3b84f6de24
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
new file mode 100644
index 0000000000000000000000000000000000000000..e2ac8283091597f6f4776560c86f89d1f98b58ee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
new file mode 100644
index 0000000000000000000000000000000000000000..af036079457a7f5e50eae5a9ad4c952f33e62f87
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int x_in[32];
+int x_out_a[32], x_out_b[32];
+int c[16] = {3,2,1,10,1,42,3,4,50,9,32,8,11,10,1,2};
+int a[16 +1] = {0,16,32,48,64,128,256,512,0,16,32,48,64,128,256,512,1024};
+int b[16 +1] = {17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1};
+
+void foo ()
+{
+  int j, i, x;
+  int curr_a, flag, next_a, curr_b, next_b;
+    {
+      for (i = 0; i < 16; i++)
+        {
+          next_b = b[i+1];
+          curr_b = flag ? next_b : curr_b;
+        }
+      x_out_b[j] = curr_b;
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
new file mode 100644
index 0000000000000000000000000000000000000000..85cdfe0938e4093c7725e7f397accf26198f6a53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort();
+int main1 (short X)
+{
+  unsigned char a[128];
+  unsigned short b[128];
+  unsigned int c[128];
+  short myX = X;
+  int i;
+  for (i = 0; i < 128; i++)
+    {
+      if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
+        abort ();
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
new file mode 100644
index 0000000000000000000000000000000000000000..f066ddcfe458ca04bb1336f832121c91d7a3e80e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[64], b[64];
+int main ()
+{
+  int c = 7;
+  for (int i = 1; i < 64; ++i)
+    if (b[i] != a[i] - a[i-1])
+      abort ();
+  if (b[0] != -7)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
new file mode 100644
index 0000000000000000000000000000000000000000..9d0dd8dc5fccb05aeabcbce4014c4994bafdfb05
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ unsigned tmp[N];
+ for (int i = 0; i < N; i++)
+ {
+   tmp[i] = x + i;
+   vect_b[i] = tmp[i];
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
new file mode 100644
index 0000000000000000000000000000000000000000..073cbdf614f81525975dbd188632582218e60e9e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   volatile unsigned tmp = x + i;
+   vect_b[i] = tmp;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
new file mode 100644
index 0000000000000000000000000000000000000000..9086e885f56974d17f8cdf2dce4c6a44e580d74b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
@@ -0,0 +1,101 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-add-options bind_pic_locally } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+unsigned short sa[N];
+unsigned short sc[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[N];
+unsigned int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+/* Current peeling-for-alignment scheme will consider the 'sa[i+7]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-7%VF. This will result in a peeling factor 1,
+   which will also align the access to 'ia[i+3]', and the loop could be
+   vectorized on all targets that support unaligned loads.
+   Without cost model on targets that support misaligned stores, no peeling
+   will be applied since we want to keep the four loads aligned.  */
+
+__attribute__ ((noinline))
+int main1 ()
+{
+  int i;
+  int n = N - 7;
+
+  /* Multiple types with different sizes, used in independent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      sa[i+7] = sb[i] + sc[i];
+      ia[i+3] = ib[i] + ic[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+	abort ();
+    }
+
+  return 0;
+}
+
+/* Current peeling-for-alignment scheme will consider the 'ia[i+3]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-3%VF. This will result in a peeling factor
+   1 if VF=4,2. This will not align the access to 'sa[i+3]', for which we 
+   need to peel 5,1 iterations for VF=4,2 respectively, so the loop can not 
+   be vectorized.  However, 'ia[i+3]' also gets aligned if we peel 5
+   iterations, so the loop is vectorizable on all targets that support
+   unaligned loads.
+   Without cost model on targets that support misaligned stores, no peeling
+   will be applied since we want to keep the four loads aligned.  */
+
+__attribute__ ((noinline))
+int main2 ()
+{
+  int i;
+  int n = N-3;
+
+  /* Multiple types with different sizes, used in independent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      ia[i+3] = ib[i] + ic[i];
+      sa[i+3] = sb[i] + sc[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 ();
+  main2 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 2 "vect" { xfail { vect_early_break && { ! vect_hw_misalign } } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
new file mode 100644
index 0000000000000000000000000000000000000000..9c7c3df59ffbaaf23292107f982fd7af31741ada
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+void abort ();
+
+unsigned short sa[32];
+unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[32];
+unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+int main2 (int n)
+{
+  int i;
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
new file mode 100644
index 0000000000000000000000000000000000000000..84ea627b4927609079297f11674bdb4c6b301140
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != ((i % 3) == 0))
+      abort ();
+}
+
+/* Pattern didn't match inside gcond.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
new file mode 100644
index 0000000000000000000000000000000000000000..193f14e8a4d90793f65a5902eabb8d06496bd6e1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != (i == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..63ff6662f5c2c93201897e43680daa580ed53867
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < (N/2); i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i+1;
+   if (vect_a[i] > x || vect_a[i+1] > x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   vect_a[i+1] += x * vect_b[i+1]; 
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
new file mode 100644
index 0000000000000000000000000000000000000000..4c523d4e714ba67e84b213c2aaf3a56231f8b7e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  char i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != (i == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
new file mode 100644
index 0000000000000000000000000000000000000000..a0c34f71e3bbd3516247a8e026fe513c25413252
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+typedef float real_t;
+__attribute__((aligned(64))) real_t a[32000], b[32000], c[32000];
+real_t s482()
+{
+    for (int nl = 0; nl < 10000; nl++) {
+        for (int i = 0; i < 32000; i++) {
+            a[i] += b[i] * c[i];
+            if (c[i] > b[i]) break;
+        }
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
new file mode 100644
index 0000000000000000000000000000000000000000..9b94772934f75e685d71a41f3a0336fbfb7320d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int a, b;
+int e() {
+  int d, c;
+  d = 0;
+  for (; d < b; d++)
+    a = 0;
+  d = 0;
+  for (; d < b; d++)
+    if (d)
+      c++;
+  for (;;)
+    if (c)
+      break;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
new file mode 100644
index 0000000000000000000000000000000000000000..11f7fb8547b351734a964175380d1ada696011ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
@@ -0,0 +1,28 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-do compile } */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_long } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-additional-options "-fno-tree-scev-cprop" } */
+
+/* Statement used outside the loop.
+   NOTE: SCEV disabled to ensure the live operation is not removed before
+   vectorization.  */
+__attribute__ ((noinline)) int
+liveloop (int start, int n, int *x, int *y)
+{
+  int i = start;
+  int j;
+  int ret;
+
+  for (j = 0; j < n; ++j)
+    {
+      i += 1;
+      x[j] = i;
+      ret = y[j];
+    }
+  return ret;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
new file mode 100644
index 0000000000000000000000000000000000000000..32b9c087feba1780223e3aee8a2636c99990408c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fdump-tree-vect-all" } */
+
+int d(unsigned);
+
+void a() {
+  char b[8];
+  unsigned c = 0;
+  while (c < 7 && b[c])
+    ++c;
+  if (d(c))
+    return;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_partial_vectors } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
new file mode 100644
index 0000000000000000000000000000000000000000..577c4e96ba91d4dd4aa448233c632de508286eb9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-Ofast -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+enum a { b };
+
+struct {
+  enum a c;
+} d[10], *e;
+
+void f() {
+  int g;
+  for (g = 0, e = d; g < sizeof(1); g++, e++)
+    if (e->c)
+      return;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
new file mode 100644
index 0000000000000000000000000000000000000000..b56a4f755f89225cedd8c156cc7385fe5e07eee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int a[0];
+int b;
+
+void g();
+
+void f() {
+  int d, e;
+  for (; e; e++) {
+    int c;
+    switch (b)
+    case '9': {
+      for (; d < 1; d++)
+        if (a[d])
+          c = 1;
+      break;
+    case '<':
+      g();
+      c = 0;
+    }
+      while (c)
+        ;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
new file mode 100644
index 0000000000000000000000000000000000000000..10fd8b42952c42f3d3a014da103931ca394423d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
new file mode 100644
index 0000000000000000000000000000000000000000..ae706b2952cfcecf20546a67a735b8d902cbb607
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+char vect_a[N];
+char vect_b[N];
+  
+char test4(char x, char * restrict res)
+{
+ char ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   res[i] *= vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
new file mode 100644
index 0000000000000000000000000000000000000000..350f02f3c7caef457adbe1be802bba51cd818393
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_a[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index d79ad4be10502969209f9b13bd65ab142b92e644..5516188dc0aa86d161d67dea5a7769e3c3d72f85 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3775,6 +3775,17 @@ proc check_effective_target_vect_int { } {
 	}}]
 }
 
+# Return 1 if the target supports hardware vectorization of early breaks,
+# 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_early_break { } {
+    return [check_cached_effective_target_indexed vect_early_break {
+      expr {
+	[istarget aarch64*-*-*]
+	}}]
+}
 # Return 1 if the target supports hardware vectorization of complex additions of
 # byte, 0 otherwise.
 #




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (13 preceding siblings ...)
  2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
@ 2023-06-28 13:48 ` Tamar Christina
  2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:48 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 4899 bytes --]

Hi All,

This adds an implementation for conditional branch optab for AArch64.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
    {
      b[i] += a[i];
      if (a[i] > 0)
	break;
    }
}

For 128-bit vectors we generate:

        cmgt    v1.4s, v1.4s, #0
        umaxp   v1.4s, v1.4s, v1.4s
        fmov    x3, d1
        cbnz    x3, .L8

and of 64-bit vector we can omit the compression:

        cmgt    v1.2s, v1.2s, #0
        fmov    x2, d1
        cbz     x2, .L13

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (cbranch<mode>4): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+        (if_then_else
+          (match_operator 0 "aarch64_equality_operator"
+            [(match_operand:VDQ_I 1 "register_operand")
+             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+          (label_ref (match_operand 3 ""))
+          (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+     so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (<MODE>mode))
+    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
+					operands[2]));
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      rtx reduc = gen_lowpart (V4SImode, tmp);
+      rtx res = gen_reg_rtx (V4SImode);
+      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+    }
+
+  rtx val = gen_reg_rtx (DImode);
+  emit_move_insn (val, gen_lowpart (DImode, tmp));
+
+  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}




-- 

[-- Attachment #2: rb17509.patch --]
[-- Type: text/plain, Size: 4128 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+        (if_then_else
+          (match_operator 0 "aarch64_equality_operator"
+            [(match_operand:VDQ_I 1 "register_operand")
+             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+          (label_ref (match_operand 3 ""))
+          (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+     so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (<MODE>mode))
+    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
+					operands[2]));
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      rtx reduc = gen_lowpart (V4SImode, tmp);
+      rtx res = gen_reg_rtx (V4SImode);
+      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+    }
+
+  rtx val = gen_reg_rtx (DImode);
+  emit_move_insn (val, gen_lowpart (DImode, tmp));
+
+  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 for Advanced SIMD
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (14 preceding siblings ...)
  2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
@ 2023-06-28 13:48 ` Tamar Christina
  2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:48 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 3156 bytes --]

Hi All,

Advanced SIMD lacks a cmpeq for vectors, and unlike compare to 0 we can't
rewrite to a cmtst.

This operation is however fairly common, especially now that we support early
break vectorization.

As such this adds a pattern to recognize the negated any comparison and
transform it to an all.  i.e. any(~x) => all(x) and invert the branches.

For e.g.

void f1 (int x)
{
  for (int i = 0; i < N; i++)
    {
      b[i] += a[i];
      if (a[i] != x)
	break;
    }
}

We currently generate:

	cmeq	v31.4s, v30.4s, v29.4s
	not	v31.16b, v31.16b
	umaxp	v31.4s, v31.4s, v31.4s
	fmov	x5, d31
	cbnz	x5, .L2

and after this patch:

	cmeq	v31.4s, v30.4s, v29.4s
	uminp	v31.4s, v31.4s, v31.4s
	fmov	x5, d31
	cbz	x5, .L2

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (*cbranchnev4si): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vect-early-break-cbranch_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3870,6 +3870,37 @@ (define_expand "cbranch<mode>4"
   DONE;
 })
 
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+(define_insn_and_split "*cbranchnev4si"
+  [(set (pc)
+    (if_then_else
+      (ne (subreg:DI
+	    (unspec:V4SI
+	      [(not:V4SI (match_operand:V4SI 0 "register_operand" "w"))
+	       (not:V4SI (match_dup 0))]
+		UNSPEC_UMAXV) 0)
+	   (const_int 0))
+	(label_ref (match_operand 1 ""))
+	(pc)))
+    (clobber (match_scratch:DI 2 "=w"))]
+  "TARGET_SIMD"
+  "#"
+  "&& true"
+  [(set (match_dup 2)
+	(unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV))
+   (set (pc)
+    (if_then_else
+      (eq (subreg:DI (match_dup 2) 0)
+	  (const_int 0))
+	(label_ref (match_dup 1))
+	(pc)))]
+{
+  if (can_create_pseudo_p ())
+    operands[2] = gen_reg_rtx (V4SImode);
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..e81027bb50138be627f4dfdffb1557893a5a7723
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+	cmeq	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+	uminp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+	fmov	x[0-9]+, d[0-9]+
+	cbz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 (int x)
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != x)
+	break;
+    }
+}




-- 

[-- Attachment #2: rb17510.patch --]
[-- Type: text/plain, Size: 2147 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3870,6 +3870,37 @@ (define_expand "cbranch<mode>4"
   DONE;
 })
 
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+(define_insn_and_split "*cbranchnev4si"
+  [(set (pc)
+    (if_then_else
+      (ne (subreg:DI
+	    (unspec:V4SI
+	      [(not:V4SI (match_operand:V4SI 0 "register_operand" "w"))
+	       (not:V4SI (match_dup 0))]
+		UNSPEC_UMAXV) 0)
+	   (const_int 0))
+	(label_ref (match_operand 1 ""))
+	(pc)))
+    (clobber (match_scratch:DI 2 "=w"))]
+  "TARGET_SIMD"
+  "#"
+  "&& true"
+  [(set (match_dup 2)
+	(unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV))
+   (set (pc)
+    (if_then_else
+      (eq (subreg:DI (match_dup 2) 0)
+	  (const_int 0))
+	(label_ref (match_dup 1))
+	(pc)))]
+{
+  if (can_create_pseudo_p ())
+    operands[2] = gen_reg_rtx (V4SImode);
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..e81027bb50138be627f4dfdffb1557893a5a7723
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+	cmeq	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+	uminp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+	fmov	x[0-9]+, d[0-9]+
+	cbz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 (int x)
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != x)
+	break;
+    }
+}




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and Advanced SIMD
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (15 preceding siblings ...)
  2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
@ 2023-06-28 13:48 ` Tamar Christina
  2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:48 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 14835 bytes --]

Hi All,

Advanced SIMD lacks flag setting vector comparisons which SVE adds.  Since machines
with SVE also support Advanced SIMD we can use the SVE comparisons to perform the
operation in cases where SVE codegen is allowed, but the vectorizer has decided
to generate Advanced SIMD because of loop costing.

e.g. for

void f1 (int x)
{
  for (int i = 0; i < N; i++)
    {
      b[i] += a[i];
      if (a[i] != x)
	break;
    }
}

We currently generate:

        cmeq    v31.4s, v31.4s, v28.4s
        uminp   v31.4s, v31.4s, v31.4s
        fmov    x5, d31
        cbz     x5, .L2

and after this patch:

        ptrue   p7.b, vl16
        ...
        cmpne   p15.s, p7/z, z31.s, z28.s
        b.any   .L2

Because we need to lift the predicate creation to outside of the loop we need to
expand the predicate early,  however in the cbranch expansion we don't see the
outer compare which we need to consume.

For this reason the expansion is two fold, when expanding the cbranch we emit an
SVE predicated comparison and later on during combine we match the SVE and NEON
comparison while also consuming the ptest.

Unfortunately *aarch64_pred_cmpne<mode><EQL:code>_neon_ptest is needed because
for some reason combine destroys the NOT and transforms it into a plus and -1.

For the straight SVE ones, we seem to fail to eliminate the ptest in these cases
but that's a separate optimization

Test show that I'm missing a few, but before I write the patterns for them, are
these OK?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (cbranch<mode>4): Update with SVE.
	* config/aarch64/aarch64-sve.md
	(*aarch64_pred_cmp<UCOMPARISONS:cmp_op><mode><EQL:code>_neon_ptest,
	*aarch64_pred_cmpeq<mode><EQL:code>_neon_ptest,
	*aarch64_pred_cmpne<mode><EQL:code>_neon_ptest): New.
	(aarch64_ptest<mode>): Rename to...
	(@aarch64_ptest<mode>): ... This.
	* genemit.cc: Include rtx-vector-builder.h.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/vect-early-break-cbranch_1.c: New test.
	* gcc.target/aarch64/sve/vect-early-break-cbranch_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78..75cb5d6f7f92b70fed8762fe64e23f0c05a99c99 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3843,31 +3843,59 @@ (define_expand "cbranch<mode>4"
   "TARGET_SIMD"
 {
   auto code = GET_CODE (operands[0]);
-  rtx tmp = operands[1];
 
-  /* If comparing against a non-zero vector we have to do a comparison first
-     so we can have a != 0 comparison with the result.  */
-  if (operands[2] != CONST0_RTX (<MODE>mode))
-    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
-					operands[2]));
-
-  /* For 64-bit vectors we need no reductions.  */
-  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+  /* If SVE is available, lets borrow some instructions.  We will optimize
+     these further later in combine.  */
+  if (TARGET_SVE)
     {
-      /* Always reduce using a V4SI.  */
-      rtx reduc = gen_lowpart (V4SImode, tmp);
-      rtx res = gen_reg_rtx (V4SImode);
-      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
-      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+      machine_mode full_mode = aarch64_full_sve_mode (<VEL>mode).require ();
+      rtx in1 = lowpart_subreg (full_mode, operands[1], <MODE>mode);
+      rtx in2 = lowpart_subreg (full_mode, operands[2], <MODE>mode);
+
+      machine_mode pred_mode = aarch64_sve_pred_mode (full_mode);
+      rtx_vector_builder builder (VNx16BImode, 16, 2);
+      for (unsigned int i = 0; i < 16; ++i)
+	builder.quick_push (CONST1_RTX (BImode));
+      for (unsigned int i = 0; i < 16; ++i)
+	builder.quick_push (CONST0_RTX (BImode));
+      rtx ptrue = force_reg (VNx16BImode, builder.build ());
+      rtx cast_ptrue = gen_lowpart (pred_mode, ptrue);
+      rtx ptrue_flag = gen_int_mode (SVE_KNOWN_PTRUE, SImode);
+
+      rtx tmp = gen_reg_rtx (pred_mode);
+      aarch64_expand_sve_vec_cmp_int (tmp, code, in1, in2);
+      emit_insn (gen_aarch64_ptest (pred_mode, ptrue, cast_ptrue, ptrue_flag, tmp));
+      operands[1] = gen_rtx_REG (CC_NZCmode, CC_REGNUM);
+      operands[2] = const0_rtx;
     }
+  else
+    {
+      rtx tmp = operands[1];
 
-  rtx val = gen_reg_rtx (DImode);
-  emit_move_insn (val, gen_lowpart (DImode, tmp));
+      /* If comparing against a non-zero vector we have to do a comparison first
+	 so we can have a != 0 comparison with the result.  */
+      if (operands[2] != CONST0_RTX (<MODE>mode))
+	emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
+					    operands[2]));
 
-  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
-  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
-  DONE;
+      /* For 64-bit vectors we need no reductions.  */
+      if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+	{
+	  /* Always reduce using a V4SI.  */
+	  rtx reduc = gen_lowpart (V4SImode, tmp);
+	  rtx res = gen_reg_rtx (V4SImode);
+	  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+	  emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+	}
+
+      rtx val = gen_reg_rtx (DImode);
+      emit_move_insn (val, gen_lowpart (DImode, tmp));
+
+      rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+      rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+      emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+      DONE;
+    }
 })
 
 ;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index da5534c3e32b3a8819c57a26582cfa5e22e63753..0e10e497e073ee7cfa4025d9adb19076c1615e87 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -8059,6 +8059,105 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest"
   "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d"
 )
 
+;; Predicated integer comparisons over Advanced SIMD arguments in which only
+;; the flags result is interesting.
+(define_insn "*aarch64_pred_cmp<UCOMPARISONS:cmp_op><mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (neg:<V128>
+		  (UCOMPARISONS:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmp<UCOMPARISONS:cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
+;; Predicated integer comparisons over Advanced SIMD arguments in which only
+;; the flags result is interesting.
+(define_insn "*aarch64_pred_cmpeq<mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (neg:<V128>
+		  (eq:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmpeq\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
+;; Same as the above but version for == and !=
+(define_insn "*aarch64_pred_cmpne<mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (plus:<V128>
+		  (eq:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))
+		  (match_operand:<V128> 9 "aarch64_simd_imm_minus_one" "i")) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmpne\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [INT] While tests
 ;; -------------------------------------------------------------------------
@@ -8537,7 +8636,7 @@ (define_expand "cbranch<mode>4"
 )
 
 ;; See "Description of UNSPEC_PTEST" above for details.
-(define_insn "aarch64_ptest<mode>"
+(define_insn "@aarch64_ptest<mode>"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC [(match_operand:VNx16BI 0 "register_operand" "Upa")
 			(match_operand 1)
diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index 1ce0564076d8b0d39542f49dd51e5df01cc83c35..73309ca00ec0aa3cd76c85e04535bac44cb2f354 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -906,6 +906,7 @@ from the machine description file `md'.  */\n\n");
   printf ("#include \"tm-constrs.h\"\n");
   printf ("#include \"ggc.h\"\n");
   printf ("#include \"target.h\"\n\n");
+  printf ("#include \"rtx-vector-builder.h\"\n\n");
 
   /* Read the machine description.  */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..c281cfccbe12f0ac8c01ede563dbe325237902c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmpgt	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmpge	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmpeq	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmplt	p[0-9]+.s, p7/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmple	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..f1ca3eafc5ae33393a7df9b5e40fa3420a79bfc2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c
@@ -0,0 +1,114 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param=aarch64-autovec-preference=1" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmpeq	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}




-- 

[-- Attachment #2: rb17511.patch --]
[-- Type: text/plain, Size: 12659 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78..75cb5d6f7f92b70fed8762fe64e23f0c05a99c99 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3843,31 +3843,59 @@ (define_expand "cbranch<mode>4"
   "TARGET_SIMD"
 {
   auto code = GET_CODE (operands[0]);
-  rtx tmp = operands[1];
 
-  /* If comparing against a non-zero vector we have to do a comparison first
-     so we can have a != 0 comparison with the result.  */
-  if (operands[2] != CONST0_RTX (<MODE>mode))
-    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
-					operands[2]));
-
-  /* For 64-bit vectors we need no reductions.  */
-  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+  /* If SVE is available, lets borrow some instructions.  We will optimize
+     these further later in combine.  */
+  if (TARGET_SVE)
     {
-      /* Always reduce using a V4SI.  */
-      rtx reduc = gen_lowpart (V4SImode, tmp);
-      rtx res = gen_reg_rtx (V4SImode);
-      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
-      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+      machine_mode full_mode = aarch64_full_sve_mode (<VEL>mode).require ();
+      rtx in1 = lowpart_subreg (full_mode, operands[1], <MODE>mode);
+      rtx in2 = lowpart_subreg (full_mode, operands[2], <MODE>mode);
+
+      machine_mode pred_mode = aarch64_sve_pred_mode (full_mode);
+      rtx_vector_builder builder (VNx16BImode, 16, 2);
+      for (unsigned int i = 0; i < 16; ++i)
+	builder.quick_push (CONST1_RTX (BImode));
+      for (unsigned int i = 0; i < 16; ++i)
+	builder.quick_push (CONST0_RTX (BImode));
+      rtx ptrue = force_reg (VNx16BImode, builder.build ());
+      rtx cast_ptrue = gen_lowpart (pred_mode, ptrue);
+      rtx ptrue_flag = gen_int_mode (SVE_KNOWN_PTRUE, SImode);
+
+      rtx tmp = gen_reg_rtx (pred_mode);
+      aarch64_expand_sve_vec_cmp_int (tmp, code, in1, in2);
+      emit_insn (gen_aarch64_ptest (pred_mode, ptrue, cast_ptrue, ptrue_flag, tmp));
+      operands[1] = gen_rtx_REG (CC_NZCmode, CC_REGNUM);
+      operands[2] = const0_rtx;
     }
+  else
+    {
+      rtx tmp = operands[1];
 
-  rtx val = gen_reg_rtx (DImode);
-  emit_move_insn (val, gen_lowpart (DImode, tmp));
+      /* If comparing against a non-zero vector we have to do a comparison first
+	 so we can have a != 0 comparison with the result.  */
+      if (operands[2] != CONST0_RTX (<MODE>mode))
+	emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
+					    operands[2]));
 
-  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
-  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
-  DONE;
+      /* For 64-bit vectors we need no reductions.  */
+      if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+	{
+	  /* Always reduce using a V4SI.  */
+	  rtx reduc = gen_lowpart (V4SImode, tmp);
+	  rtx res = gen_reg_rtx (V4SImode);
+	  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+	  emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+	}
+
+      rtx val = gen_reg_rtx (DImode);
+      emit_move_insn (val, gen_lowpart (DImode, tmp));
+
+      rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+      rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+      emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+      DONE;
+    }
 })
 
 ;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index da5534c3e32b3a8819c57a26582cfa5e22e63753..0e10e497e073ee7cfa4025d9adb19076c1615e87 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -8059,6 +8059,105 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest"
   "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d"
 )
 
+;; Predicated integer comparisons over Advanced SIMD arguments in which only
+;; the flags result is interesting.
+(define_insn "*aarch64_pred_cmp<UCOMPARISONS:cmp_op><mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (neg:<V128>
+		  (UCOMPARISONS:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmp<UCOMPARISONS:cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
+;; Predicated integer comparisons over Advanced SIMD arguments in which only
+;; the flags result is interesting.
+(define_insn "*aarch64_pred_cmpeq<mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (neg:<V128>
+		  (eq:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmpeq\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
+;; Same as the above but version for == and !=
+(define_insn "*aarch64_pred_cmpne<mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (plus:<V128>
+		  (eq:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))
+		  (match_operand:<V128> 9 "aarch64_simd_imm_minus_one" "i")) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmpne\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [INT] While tests
 ;; -------------------------------------------------------------------------
@@ -8537,7 +8636,7 @@ (define_expand "cbranch<mode>4"
 )
 
 ;; See "Description of UNSPEC_PTEST" above for details.
-(define_insn "aarch64_ptest<mode>"
+(define_insn "@aarch64_ptest<mode>"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC [(match_operand:VNx16BI 0 "register_operand" "Upa")
 			(match_operand 1)
diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index 1ce0564076d8b0d39542f49dd51e5df01cc83c35..73309ca00ec0aa3cd76c85e04535bac44cb2f354 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -906,6 +906,7 @@ from the machine description file `md'.  */\n\n");
   printf ("#include \"tm-constrs.h\"\n");
   printf ("#include \"ggc.h\"\n");
   printf ("#include \"target.h\"\n\n");
+  printf ("#include \"rtx-vector-builder.h\"\n\n");
 
   /* Read the machine description.  */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..c281cfccbe12f0ac8c01ede563dbe325237902c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmpgt	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmpge	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmpeq	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmplt	p[0-9]+.s, p7/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmple	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..f1ca3eafc5ae33393a7df9b5e40fa3420a79bfc2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c
@@ -0,0 +1,114 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param=aarch64-autovec-preference=1" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmpeq	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (16 preceding siblings ...)
  2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
@ 2023-06-28 13:49 ` Tamar Christina
  2023-06-28 13:50 ` [PATCH 19/19]Arm: Add MVE " Tamar Christina
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:49 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Ramana.Radhakrishnan, Richard.Earnshaw, nickc, Kyrylo.Tkachov

[-- Attachment #1: Type: text/plain, Size: 6094 bytes --]

Hi All,

This adds an implementation for conditional branch optab for AArch32.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
    {
      b[i] += a[i];
      if (a[i] > 0)
	break;
    }
}

For 128-bit vectors we generate:

        vcgt.s32        q8, q9, #0
        vpmax.u32       d7, d16, d17
        vpmax.u32       d7, d7, d7
        vmov    r3, s14 @ int
        cmp     r3, #0

and of 64-bit vector we can omit one vpmax as we still need to compress to
32-bits.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/arm/neon.md (cbranch<mode>4): New.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp (vect_early_break): Add AArch32.
	* gcc.target/arm/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -408,6 +408,45 @@ (define_insn "vec_extract<mode><V_elem_l>"
   [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
 )
 
+;; Patterns comparing two vectors and conditionally jump.
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+;;
+;; However unlike the AArch64 version, we can't optimize this further as the
+;; chain is too long for combine due to these being unspecs so it doesn't fold
+;; the operation to something simpler.
+(define_expand "cbranch<mode>4"
+  [(set (pc) (if_then_else
+	      (match_operator 0 "expandable_comparison_operator"
+	       [(match_operand:VDQI 1 "register_operand")
+	        (match_operand:VDQI 2 "zero_operand")])
+	      (label_ref (match_operand 3 "" ""))
+	      (pc)))]
+  "TARGET_NEON"
+{
+  rtx mask = operands[1];
+
+  /* For 128-bit vectors we need an additional reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      mask = gen_reg_rtx (V2SImode);
+      rtx low = gen_reg_rtx (V2SImode);
+      rtx high = gen_reg_rtx (V2SImode);
+      emit_insn (gen_neon_vget_lowv4si (low, operands[1]));
+      emit_insn (gen_neon_vget_highv4si (high, operands[1]));
+      emit_insn (gen_neon_vpumaxv2si (mask, low, high));
+    }
+
+  emit_insn (gen_neon_vpumaxv2si (mask, mask, mask));
+
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, mask));
+  emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; This pattern is renamed from "vec_extract<mode><V_elem_l>" to
 ;; "neon_vec_extract<mode><V_elem_l>" and this pattern is called
 ;; by define_expand in vec-common.md file.
diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..2c05aa10d26ed4ac9785672e6e3b4355cef046dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
@@ -0,0 +1,136 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/* f1:
+**	...
+**	vcgt.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	vcge.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	vceq.i32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	vceq.i32	q[0-9]+, q[0-9]+, #0
+**	vmvn	q[0-9]+, q[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	vclt.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	vcle.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5516188dc0aa86d161d67dea5a7769e3c3d72f85..8f58671e6cfd3546c6a98e40341fe31c6492594b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3784,6 +3784,7 @@ proc check_effective_target_vect_early_break { } {
     return [check_cached_effective_target_indexed vect_early_break {
       expr {
 	[istarget aarch64*-*-*]
+	|| [check_effective_target_arm_neon_ok]
 	}}]
 }
 # Return 1 if the target supports hardware vectorization of complex additions of




-- 

[-- Attachment #2: rb17512.patch --]
[-- Type: text/plain, Size: 5281 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -408,6 +408,45 @@ (define_insn "vec_extract<mode><V_elem_l>"
   [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
 )
 
+;; Patterns comparing two vectors and conditionally jump.
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+;;
+;; However unlike the AArch64 version, we can't optimize this further as the
+;; chain is too long for combine due to these being unspecs so it doesn't fold
+;; the operation to something simpler.
+(define_expand "cbranch<mode>4"
+  [(set (pc) (if_then_else
+	      (match_operator 0 "expandable_comparison_operator"
+	       [(match_operand:VDQI 1 "register_operand")
+	        (match_operand:VDQI 2 "zero_operand")])
+	      (label_ref (match_operand 3 "" ""))
+	      (pc)))]
+  "TARGET_NEON"
+{
+  rtx mask = operands[1];
+
+  /* For 128-bit vectors we need an additional reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      mask = gen_reg_rtx (V2SImode);
+      rtx low = gen_reg_rtx (V2SImode);
+      rtx high = gen_reg_rtx (V2SImode);
+      emit_insn (gen_neon_vget_lowv4si (low, operands[1]));
+      emit_insn (gen_neon_vget_highv4si (high, operands[1]));
+      emit_insn (gen_neon_vpumaxv2si (mask, low, high));
+    }
+
+  emit_insn (gen_neon_vpumaxv2si (mask, mask, mask));
+
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, mask));
+  emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; This pattern is renamed from "vec_extract<mode><V_elem_l>" to
 ;; "neon_vec_extract<mode><V_elem_l>" and this pattern is called
 ;; by define_expand in vec-common.md file.
diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..2c05aa10d26ed4ac9785672e6e3b4355cef046dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
@@ -0,0 +1,136 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/* f1:
+**	...
+**	vcgt.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	vcge.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	vceq.i32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	vceq.i32	q[0-9]+, q[0-9]+, #0
+**	vmvn	q[0-9]+, q[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	vclt.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	vcle.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5516188dc0aa86d161d67dea5a7769e3c3d72f85..8f58671e6cfd3546c6a98e40341fe31c6492594b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3784,6 +3784,7 @@ proc check_effective_target_vect_early_break { } {
     return [check_cached_effective_target_indexed vect_early_break {
       expr {
 	[istarget aarch64*-*-*]
+	|| [check_effective_target_arm_neon_ok]
 	}}]
 }
 # Return 1 if the target supports hardware vectorization of complex additions of




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 19/19]Arm: Add MVE cbranch implementation
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (17 preceding siblings ...)
  2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
@ 2023-06-28 13:50 ` Tamar Christina
       [not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:50 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Ramana.Radhakrishnan, Richard.Earnshaw, nickc, Kyrylo.Tkachov

[-- Attachment #1: Type: text/plain, Size: 6234 bytes --]

Hi All,

This adds an implementation for conditional branch optab for MVE.

Unfortunately MVE has rather limited operations on VPT.P0, we are missing the
ability to do P0 comparisons and logical OR on P0.

For that reason we can only support cbranch with 0, as for comparing to a 0
predicate we don't need to actually do a comparison, we only have to check that
any bit is set within P0.

Because we can only do P0 comparisons with 0, the costing of the comparison was
reduced in order for the compiler not to try to push 0 to a register thinking
it's too expensive.  For the cbranch implementation to be safe we must see the
constant 0 vector.

For the lack of logical OR on P0 we can't really work around.  This means MVE
can't support cases where the sizes of operands in the comparison don't match,
i.e. when one operand has been unpacked.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
    {
      b[i] += a[i];
      if (a[i] > 0)
	break;
    }
}

For 128-bit vectors we generate:

        vcmp.s32        gt, q3, q1
        vmrs    r3, p0  @ movhi
        cbnz    r3, .L2

MVE does not have 64-bit vector comparisons, as such that is also not supported.

Bootstrapped arm-none-linux-gnueabihf and regtested with
-march=armv8.1-m.main+mve -mfpu=auto and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/arm/arm.cc (arm_rtx_costs_internal): Update costs for pred 0
	compares.
	* config/arm/mve.md (cbranch<mode>4): New.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp (vect_early_break): Add MVE.
	* gcc.target/arm/mve/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84b255a24eb51e32 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	   || TARGET_HAVE_MVE)
 	  && simd_immediate_valid_for_move (x, mode, NULL, NULL))
 	*cost = COSTS_N_INSNS (1);
+      else if (TARGET_HAVE_MVE
+	       && outer_code == COMPARE
+	       && VALID_MVE_PRED_MODE (mode))
+	/* MVE allows very limited instructions on VPT.P0,  however comparisons
+	   to 0 do not require us to materialze this constant or require a
+	   predicate comparison as we can go through SImode.  For that reason
+	   allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to
+	   registers as we can't compare two predicates.  */
+	*cost = COSTS_N_INSNS (1);
       else
 	*cost = COSTS_N_INSNS (4);
       return true;
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38306dd43cbebfb3f 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -6880,6 +6880,21 @@ (define_expand "vcond_mask_<mode><MVE_vpred>"
   DONE;
 })
 
+(define_expand "cbranch<mode>4"
+  [(set (pc) (if_then_else
+	      (match_operator 0 "expandable_comparison_operator"
+	       [(match_operand:MVE_7 1 "register_operand")
+	        (match_operand:MVE_7 2 "zero_operand")])
+	      (label_ref (match_operand 3 "" ""))
+	      (pc)))]
+  "TARGET_HAVE_MVE"
+{
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, operands[1]));
+  emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
 (define_expand "@arm_mve_reinterpret<mode>"
   [(set (match_operand:MVE_vecs 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c3b8506dca0b2b044e6869a6c8259d663c1ff930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/*
+** f1:
+**	...
+**	vcmp.s32	gt, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	vcmp.s32	ge, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	vcmp.i32	eq, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	vcmp.i32	ne, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	vcmp.s32	lt, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	vcmp.s32	le, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8f58671e6cfd3546c6a98e40341fe31c6492594b..1eef764542a782786e27ed935a06243e319ae3fc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3785,6 +3785,8 @@ proc check_effective_target_vect_early_break { } {
       expr {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_neon_ok]
+	|| ([check_effective_target_arm_v8_1m_mve_fp_ok]
+	     && [check_effective_target_arm_little_endian])
 	}}]
 }
 # Return 1 if the target supports hardware vectorization of complex additions of




-- 

[-- Attachment #2: rb17513.patch --]
[-- Type: text/plain, Size: 4604 bytes --]

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84b255a24eb51e32 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	   || TARGET_HAVE_MVE)
 	  && simd_immediate_valid_for_move (x, mode, NULL, NULL))
 	*cost = COSTS_N_INSNS (1);
+      else if (TARGET_HAVE_MVE
+	       && outer_code == COMPARE
+	       && VALID_MVE_PRED_MODE (mode))
+	/* MVE allows very limited instructions on VPT.P0,  however comparisons
+	   to 0 do not require us to materialze this constant or require a
+	   predicate comparison as we can go through SImode.  For that reason
+	   allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to
+	   registers as we can't compare two predicates.  */
+	*cost = COSTS_N_INSNS (1);
       else
 	*cost = COSTS_N_INSNS (4);
       return true;
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38306dd43cbebfb3f 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -6880,6 +6880,21 @@ (define_expand "vcond_mask_<mode><MVE_vpred>"
   DONE;
 })
 
+(define_expand "cbranch<mode>4"
+  [(set (pc) (if_then_else
+	      (match_operator 0 "expandable_comparison_operator"
+	       [(match_operand:MVE_7 1 "register_operand")
+	        (match_operand:MVE_7 2 "zero_operand")])
+	      (label_ref (match_operand 3 "" ""))
+	      (pc)))]
+  "TARGET_HAVE_MVE"
+{
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, operands[1]));
+  emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
 (define_expand "@arm_mve_reinterpret<mode>"
   [(set (match_operand:MVE_vecs 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c3b8506dca0b2b044e6869a6c8259d663c1ff930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/*
+** f1:
+**	...
+**	vcmp.s32	gt, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	vcmp.s32	ge, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	vcmp.i32	eq, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	vcmp.i32	ne, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	vcmp.s32	lt, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	vcmp.s32	le, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8f58671e6cfd3546c6a98e40341fe31c6492594b..1eef764542a782786e27ed935a06243e319ae3fc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3785,6 +3785,8 @@ proc check_effective_target_vect_early_break { } {
       expr {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_neon_ok]
+	|| ([check_effective_target_arm_v8_1m_mve_fp_ok]
+	     && [check_effective_target_arm_little_endian])
 	}}]
 }
 # Return 1 if the target supports hardware vectorization of complex additions of




^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 3/19]middle-end clean up vect testsuite using pragma novector
  2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
@ 2023-06-28 13:54   ` Tamar Christina
  2023-07-04 11:31   ` Richard Biener
  1 sibling, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

Resending attached only due to size limit

> -----Original Message-----
> From: Tamar Christina
> Sent: Wednesday, June 28, 2023 2:42 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: [PATCH 3/19]middle-end clean up vect testsuite using pragma
> novector
> 
> Hi All,
> 
> The support for early break vectorization breaks lots of scan vect and slp
> testcases because they assume that loops with abort () in them cannot be
> vectorized.  Additionally it breaks the point of having a scalar loop to check
> the output of the vectorizer if that loop is also vectorized.
> 
> For that reason this adds
> 
> #pragma GCC novector to all tests which have a scalar loop that we would
> have
> vectorized using this patch series.
> 
> FWIW, none of these tests were failing to vectorize or run before the pragma.
> The tests that did point to some issues were copies to the early break test
> suit as well.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
> 	* g++.dg/vect/pr84556.cc: Add novector pragma.
> 	* g++.dg/vect/simd-1.cc: Add novector pragma.
> 	* g++.dg/vect/simd-2.cc: Add novector pragma.
> 	* g++.dg/vect/simd-3.cc: Add novector pragma.
> 	* g++.dg/vect/simd-4.cc: Add novector pragma.
> 	* g++.dg/vect/simd-5.cc: Add novector pragma.
> 	* g++.dg/vect/simd-6.cc: Add novector pragma.
> 	* g++.dg/vect/simd-7.cc: Add novector pragma.
> 	* g++.dg/vect/simd-8.cc: Add novector pragma.
> 	* g++.dg/vect/simd-9.cc: Add novector pragma.
> 	* g++.dg/vect/simd-clone-6.cc: Add novector pragma.
> 	* gcc.dg/vect/O3-pr70130.c: Add novector pragma.
> 	* gcc.dg/vect/Os-vect-95.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-1.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-16.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-2.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-24.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-25.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-26.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-27.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-28.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-29.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-42.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-cond-1.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-over-widen-1.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-over-widen-2.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-pattern-1.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-pattern-2.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-pow-1.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-pr101615-2.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-pr65935.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-subgroups-1.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/i386/costmodel-vect-68.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: Add
> novector pragma.
> 	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: Add novector
> pragma.
> 	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c: Add novector
> pragma.
> 	* gcc.dg/vect/fast-math-bb-slp-call-1.c: Add novector pragma.
> 	* gcc.dg/vect/fast-math-bb-slp-call-2.c: Add novector pragma.
> 	* gcc.dg/vect/fast-math-vect-call-1.c: Add novector pragma.
> 	* gcc.dg/vect/fast-math-vect-call-2.c: Add novector pragma.
> 	* gcc.dg/vect/fast-math-vect-complex-3.c: Add novector pragma.
> 	* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-noreassoc-outer-1.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-noreassoc-outer-2.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-noreassoc-outer-3.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-noreassoc-outer-5.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-10.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-10a.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-10b.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-11.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-12.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-15.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-16.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-17.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-18.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-19.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-20.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-21.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-22.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-3.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-4.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-5.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-6-global.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-6.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-7.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-8.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-9.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-9a.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-9b.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-slp-30.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-slp-31.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-vect-iv-2.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-31.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-34.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-36.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-64.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-65.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-66.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-68.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-69.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-outer-4h.c: Add novector
> pragma.
> 	* gcc.dg/vect/no-trapping-math-2.c: Add novector pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-111.c: Add novector pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c: Add novector
> pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c: Add novector
> pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c: Add novector
> pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c: Add novector
> pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c: Add novector
> pragma.
> 	* gcc.dg/vect/no-tree-dom-vect-bug.c: Add novector pragma.
> 	* gcc.dg/vect/no-tree-pre-slp-29.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-pr29145.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-101.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-102.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-102a.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-37.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-43.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-45.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-49.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-51.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-53.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-57.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-61.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-79.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-depend-1.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-depend-2.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-depend-3.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-dv-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr101445.c: Add novector pragma.
> 	* gcc.dg/vect/pr103581.c: Add novector pragma.
> 	* gcc.dg/vect/pr105219.c: Add novector pragma.
> 	* gcc.dg/vect/pr108608.c: Add novector pragma.
> 	* gcc.dg/vect/pr18400.c: Add novector pragma.
> 	* gcc.dg/vect/pr18536.c: Add novector pragma.
> 	* gcc.dg/vect/pr20122.c: Add novector pragma.
> 	* gcc.dg/vect/pr25413.c: Add novector pragma.
> 	* gcc.dg/vect/pr30784.c: Add novector pragma.
> 	* gcc.dg/vect/pr37539.c: Add novector pragma.
> 	* gcc.dg/vect/pr40074.c: Add novector pragma.
> 	* gcc.dg/vect/pr45752.c: Add novector pragma.
> 	* gcc.dg/vect/pr45902.c: Add novector pragma.
> 	* gcc.dg/vect/pr46009.c: Add novector pragma.
> 	* gcc.dg/vect/pr48172.c: Add novector pragma.
> 	* gcc.dg/vect/pr51074.c: Add novector pragma.
> 	* gcc.dg/vect/pr51581-3.c: Add novector pragma.
> 	* gcc.dg/vect/pr51581-4.c: Add novector pragma.
> 	* gcc.dg/vect/pr53185-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr56826.c: Add novector pragma.
> 	* gcc.dg/vect/pr56918.c: Add novector pragma.
> 	* gcc.dg/vect/pr56920.c: Add novector pragma.
> 	* gcc.dg/vect/pr56933.c: Add novector pragma.
> 	* gcc.dg/vect/pr57705.c: Add novector pragma.
> 	* gcc.dg/vect/pr57741-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr57741-3.c: Add novector pragma.
> 	* gcc.dg/vect/pr59591-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr59591-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr59594.c: Add novector pragma.
> 	* gcc.dg/vect/pr59984.c: Add novector pragma.
> 	* gcc.dg/vect/pr60276.c: Add novector pragma.
> 	* gcc.dg/vect/pr61194.c: Add novector pragma.
> 	* gcc.dg/vect/pr61680.c: Add novector pragma.
> 	* gcc.dg/vect/pr62021.c: Add novector pragma.
> 	* gcc.dg/vect/pr63341-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr64252.c: Add novector pragma.
> 	* gcc.dg/vect/pr64404.c: Add novector pragma.
> 	* gcc.dg/vect/pr64421.c: Add novector pragma.
> 	* gcc.dg/vect/pr64493.c: Add novector pragma.
> 	* gcc.dg/vect/pr64495.c: Add novector pragma.
> 	* gcc.dg/vect/pr66251.c: Add novector pragma.
> 	* gcc.dg/vect/pr66253.c: Add novector pragma.
> 	* gcc.dg/vect/pr68502-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr68502-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr69820.c: Add novector pragma.
> 	* gcc.dg/vect/pr70021.c: Add novector pragma.
> 	* gcc.dg/vect/pr70354-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr70354-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr71259.c: Add novector pragma.
> 	* gcc.dg/vect/pr78005.c: Add novector pragma.
> 	* gcc.dg/vect/pr78558.c: Add novector pragma.
> 	* gcc.dg/vect/pr80815-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr80815-3.c: Add novector pragma.
> 	* gcc.dg/vect/pr80928.c: Add novector pragma.
> 	* gcc.dg/vect/pr81410.c: Add novector pragma.
> 	* gcc.dg/vect/pr81633.c: Add novector pragma.
> 	* gcc.dg/vect/pr81740-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr81740-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr85586.c: Add novector pragma.
> 	* gcc.dg/vect/pr87288-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr87288-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr87288-3.c: Add novector pragma.
> 	* gcc.dg/vect/pr88903-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr88903-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr90018.c: Add novector pragma.
> 	* gcc.dg/vect/pr92420.c: Add novector pragma.
> 	* gcc.dg/vect/pr94994.c: Add novector pragma.
> 	* gcc.dg/vect/pr96783-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr96783-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr97081-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr97558-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr97678.c: Add novector pragma.
> 	* gcc.dg/vect/section-anchors-pr27770.c: Add novector pragma.
> 	* gcc.dg/vect/section-anchors-vect-69.c: Add novector pragma.
> 	* gcc.dg/vect/slp-1.c: Add novector pragma.
> 	* gcc.dg/vect/slp-10.c: Add novector pragma.
> 	* gcc.dg/vect/slp-11a.c: Add novector pragma.
> 	* gcc.dg/vect/slp-11b.c: Add novector pragma.
> 	* gcc.dg/vect/slp-11c.c: Add novector pragma.
> 	* gcc.dg/vect/slp-12a.c: Add novector pragma.
> 	* gcc.dg/vect/slp-12b.c: Add novector pragma.
> 	* gcc.dg/vect/slp-12c.c: Add novector pragma.
> 	* gcc.dg/vect/slp-13-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-13.c: Add novector pragma.
> 	* gcc.dg/vect/slp-14.c: Add novector pragma.
> 	* gcc.dg/vect/slp-15.c: Add novector pragma.
> 	* gcc.dg/vect/slp-16.c: Add novector pragma.
> 	* gcc.dg/vect/slp-17.c: Add novector pragma.
> 	* gcc.dg/vect/slp-18.c: Add novector pragma.
> 	* gcc.dg/vect/slp-19a.c: Add novector pragma.
> 	* gcc.dg/vect/slp-19b.c: Add novector pragma.
> 	* gcc.dg/vect/slp-19c.c: Add novector pragma.
> 	* gcc.dg/vect/slp-2.c: Add novector pragma.
> 	* gcc.dg/vect/slp-20.c: Add novector pragma.
> 	* gcc.dg/vect/slp-21.c: Add novector pragma.
> 	* gcc.dg/vect/slp-22.c: Add novector pragma.
> 	* gcc.dg/vect/slp-23.c: Add novector pragma.
> 	* gcc.dg/vect/slp-24-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-24.c: Add novector pragma.
> 	* gcc.dg/vect/slp-25.c: Add novector pragma.
> 	* gcc.dg/vect/slp-26.c: Add novector pragma.
> 	* gcc.dg/vect/slp-28.c: Add novector pragma.
> 	* gcc.dg/vect/slp-3-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-3.c: Add novector pragma.
> 	* gcc.dg/vect/slp-33.c: Add novector pragma.
> 	* gcc.dg/vect/slp-34-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-34.c: Add novector pragma.
> 	* gcc.dg/vect/slp-35.c: Add novector pragma.
> 	* gcc.dg/vect/slp-37.c: Add novector pragma.
> 	* gcc.dg/vect/slp-4-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-4.c: Add novector pragma.
> 	* gcc.dg/vect/slp-41.c: Add novector pragma.
> 	* gcc.dg/vect/slp-43.c: Add novector pragma.
> 	* gcc.dg/vect/slp-45.c: Add novector pragma.
> 	* gcc.dg/vect/slp-46.c: Add novector pragma.
> 	* gcc.dg/vect/slp-47.c: Add novector pragma.
> 	* gcc.dg/vect/slp-48.c: Add novector pragma.
> 	* gcc.dg/vect/slp-49.c: Add novector pragma.
> 	* gcc.dg/vect/slp-5.c: Add novector pragma.
> 	* gcc.dg/vect/slp-6.c: Add novector pragma.
> 	* gcc.dg/vect/slp-7.c: Add novector pragma.
> 	* gcc.dg/vect/slp-8.c: Add novector pragma.
> 	* gcc.dg/vect/slp-9.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-1.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-2-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-2.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-3.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-4.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-5.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-1.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-10.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-11-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-11.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-12.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-2.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-3.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-4.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-5.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-6.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-7.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-8.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-9.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-1.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-10.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-11.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-12.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-2.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-3.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-4.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-5.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-6.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-7.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-8.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-9.c: Add novector pragma.
> 	* gcc.dg/vect/slp-widen-mult-half.c: Add novector pragma.
> 	* gcc.dg/vect/slp-widen-mult-s16.c: Add novector pragma.
> 	* gcc.dg/vect/slp-widen-mult-u8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-100.c: Add novector pragma.
> 	* gcc.dg/vect/vect-103.c: Add novector pragma.
> 	* gcc.dg/vect/vect-104.c: Add novector pragma.
> 	* gcc.dg/vect/vect-105-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-105.c: Add novector pragma.
> 	* gcc.dg/vect/vect-106.c: Add novector pragma.
> 	* gcc.dg/vect/vect-107.c: Add novector pragma.
> 	* gcc.dg/vect/vect-108.c: Add novector pragma.
> 	* gcc.dg/vect/vect-109.c: Add novector pragma.
> 	* gcc.dg/vect/vect-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-110.c: Add novector pragma.
> 	* gcc.dg/vect/vect-113.c: Add novector pragma.
> 	* gcc.dg/vect/vect-114.c: Add novector pragma.
> 	* gcc.dg/vect/vect-115.c: Add novector pragma.
> 	* gcc.dg/vect/vect-116.c: Add novector pragma.
> 	* gcc.dg/vect/vect-117.c: Add novector pragma.
> 	* gcc.dg/vect/vect-11a.c: Add novector pragma.
> 	* gcc.dg/vect/vect-12.c: Add novector pragma.
> 	* gcc.dg/vect/vect-122.c: Add novector pragma.
> 	* gcc.dg/vect/vect-124.c: Add novector pragma.
> 	* gcc.dg/vect/vect-13.c: Add novector pragma.
> 	* gcc.dg/vect/vect-14.c: Add novector pragma.
> 	* gcc.dg/vect/vect-15-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-17.c: Add novector pragma.
> 	* gcc.dg/vect/vect-18.c: Add novector pragma.
> 	* gcc.dg/vect/vect-19.c: Add novector pragma.
> 	* gcc.dg/vect/vect-2-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-20.c: Add novector pragma.
> 	* gcc.dg/vect/vect-21.c: Add novector pragma.
> 	* gcc.dg/vect/vect-22.c: Add novector pragma.
> 	* gcc.dg/vect/vect-23.c: Add novector pragma.
> 	* gcc.dg/vect/vect-24.c: Add novector pragma.
> 	* gcc.dg/vect/vect-25.c: Add novector pragma.
> 	* gcc.dg/vect/vect-26.c: Add novector pragma.
> 	* gcc.dg/vect/vect-27.c: Add novector pragma.
> 	* gcc.dg/vect/vect-28.c: Add novector pragma.
> 	* gcc.dg/vect/vect-29.c: Add novector pragma.
> 	* gcc.dg/vect/vect-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-30.c: Add novector pragma.
> 	* gcc.dg/vect/vect-31-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-31.c: Add novector pragma.
> 	* gcc.dg/vect/vect-32-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-32.c: Add novector pragma.
> 	* gcc.dg/vect/vect-33-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-33.c: Add novector pragma.
> 	* gcc.dg/vect/vect-34-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-34.c: Add novector pragma.
> 	* gcc.dg/vect/vect-35-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-35.c: Add novector pragma.
> 	* gcc.dg/vect/vect-36-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-36.c: Add novector pragma.
> 	* gcc.dg/vect/vect-38.c: Add novector pragma.
> 	* gcc.dg/vect/vect-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-40.c: Add novector pragma.
> 	* gcc.dg/vect/vect-42.c: Add novector pragma.
> 	* gcc.dg/vect/vect-44.c: Add novector pragma.
> 	* gcc.dg/vect/vect-46.c: Add novector pragma.
> 	* gcc.dg/vect/vect-48.c: Add novector pragma.
> 	* gcc.dg/vect/vect-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-50.c: Add novector pragma.
> 	* gcc.dg/vect/vect-52.c: Add novector pragma.
> 	* gcc.dg/vect/vect-54.c: Add novector pragma.
> 	* gcc.dg/vect/vect-56.c: Add novector pragma.
> 	* gcc.dg/vect/vect-58.c: Add novector pragma.
> 	* gcc.dg/vect/vect-6-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-60.c: Add novector pragma.
> 	* gcc.dg/vect/vect-62.c: Add novector pragma.
> 	* gcc.dg/vect/vect-63.c: Add novector pragma.
> 	* gcc.dg/vect/vect-64.c: Add novector pragma.
> 	* gcc.dg/vect/vect-65.c: Add novector pragma.
> 	* gcc.dg/vect/vect-66.c: Add novector pragma.
> 	* gcc.dg/vect/vect-67.c: Add novector pragma.
> 	* gcc.dg/vect/vect-68.c: Add novector pragma.
> 	* gcc.dg/vect/vect-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-70.c: Add novector pragma.
> 	* gcc.dg/vect/vect-71.c: Add novector pragma.
> 	* gcc.dg/vect/vect-72.c: Add novector pragma.
> 	* gcc.dg/vect/vect-73-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-73.c: Add novector pragma.
> 	* gcc.dg/vect/vect-74-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-74.c: Add novector pragma.
> 	* gcc.dg/vect/vect-75-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-75.c: Add novector pragma.
> 	* gcc.dg/vect/vect-76-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-76.c: Add novector pragma.
> 	* gcc.dg/vect/vect-77-alignchecks.c: Add novector pragma.
> 	* gcc.dg/vect/vect-77-global.c: Add novector pragma.
> 	* gcc.dg/vect/vect-77.c: Add novector pragma.
> 	* gcc.dg/vect/vect-78-alignchecks.c: Add novector pragma.
> 	* gcc.dg/vect/vect-78-global.c: Add novector pragma.
> 	* gcc.dg/vect/vect-78.c: Add novector pragma.
> 	* gcc.dg/vect/vect-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-80-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-80.c: Add novector pragma.
> 	* gcc.dg/vect/vect-82.c: Add novector pragma.
> 	* gcc.dg/vect/vect-82_64.c: Add novector pragma.
> 	* gcc.dg/vect/vect-83.c: Add novector pragma.
> 	* gcc.dg/vect/vect-83_64.c: Add novector pragma.
> 	* gcc.dg/vect/vect-85-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-85.c: Add novector pragma.
> 	* gcc.dg/vect/vect-86.c: Add novector pragma.
> 	* gcc.dg/vect/vect-87.c: Add novector pragma.
> 	* gcc.dg/vect/vect-88.c: Add novector pragma.
> 	* gcc.dg/vect/vect-89-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-89.c: Add novector pragma.
> 	* gcc.dg/vect/vect-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-92.c: Add novector pragma.
> 	* gcc.dg/vect/vect-93.c: Add novector pragma.
> 	* gcc.dg/vect/vect-95.c: Add novector pragma.
> 	* gcc.dg/vect/vect-96.c: Add novector pragma.
> 	* gcc.dg/vect/vect-97-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-97.c: Add novector pragma.
> 	* gcc.dg/vect/vect-98-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-98.c: Add novector pragma.
> 	* gcc.dg/vect/vect-99.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-12.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-14.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-18.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-19.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-20.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-align-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-align-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-all-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-all.c: Add novector pragma.
> 	* gcc.dg/vect/vect-avg-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-avg-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-avg-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-avg-16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-avg-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bitfield-write-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bitfield-write-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bitfield-write-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bitfield-write-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bitfield-write-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bool-cmp.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bswap16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bswap32.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bswap64.c: Add novector pragma.
> 	* gcc.dg/vect/vect-complex-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-complex-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-complex-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cselim-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cselim-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-div-bitmask-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-div-bitmask-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-div-bitmask.h: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-6-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-float-extend-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-float-truncate-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-floatint-conversion-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-floatint-conversion-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-fma-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-gather-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-gather-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-17.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-intfloat-conversion-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-intfloat-conversion-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-intfloat-conversion-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-intfloat-conversion-4a.c: Add novector pragma.
> 	* gcc.dg/vect/vect-intfloat-conversion-4b.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-8-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-8a-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-8a.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-slp-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-slp-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-slp-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-mask-load-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-mask-loadstore-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-mulhrs-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-mult-const-pattern-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-mult-const-pattern-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-12.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-13.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-14.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-17.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nb-iter-ub-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nb-iter-ub-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nb-iter-ub-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-neg-store-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-neg-store-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nest-cycle-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nest-cycle-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nest-cycle-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2a-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2a.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2b.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2c-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2c.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2d.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3a-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3a.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3b.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3c.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-4d-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-4d.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-fir-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-fir-lb-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-fir-lb.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-fir.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-simd-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-simd-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-simd-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-slp-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-slp-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-1-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-13.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-17.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-18.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-19.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-2-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-20.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-21.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-22.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-3-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-4-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-peel-1-src.c: Add novector pragma.
> 	* gcc.dg/vect/vect-peel-2-src.c: Add novector pragma.
> 	* gcc.dg/vect/vect-peel-4-src.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-sdiv-pow2-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-sdivmod-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-shift-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-shift-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-shift-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-12.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-13.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-14.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-17.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-18.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-19.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-20.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-mult.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u16-i2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u16-i4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u16-mult.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u32-mult.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u8-i2-gap.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c: Add novector
> pragma.
> 	* gcc.dg/vect/vect-strided-a-u8-i8-gap2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c: Add novector
> pragma.
> 	* gcc.dg/vect/vect-strided-a-u8-i8-gap7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-float.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-mult-char-ls.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-mult.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-same-dr.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-shift-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-store-a-u8-i2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-store-u16-i4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-store-u32-i2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-store.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u16-i2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u16-i3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u16-i4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u32-i4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u32-i8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u32-mult.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i2-gap.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c: Add novector
> pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c: Add novector
> pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c: Add novector
> pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c: Add novector
> pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-vfa-01.c: Add novector pragma.
> 	* gcc.dg/vect/vect-vfa-02.c: Add novector pragma.
> 	* gcc.dg/vect/vect-vfa-03.c: Add novector pragma.
> 	* gcc.dg/vect/vect-vfa-04.c: Add novector pragma.
> 	* gcc.dg/vect/vect-vfa-slp.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-const-s16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-const-u16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-half-u8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-half.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-s16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-s8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-u16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-u8-s16-s32.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-u8-u32.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-u8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-shift-s16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-shift-s8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-shift-u16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-shift-u8.c: Add novector pragma.
> 	* gcc.dg/vect/wrapv-vect-7.c: Add novector pragma.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/19] middle-end: refactor vectorizable_comparison to make the main body re-usable.
  2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
@ 2023-06-28 13:55   ` Tamar Christina
  2023-07-13 16:23     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 13:55 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: nd, rguenther, jlaw

Adding proper maintainers.

> -----Original Message-----
> From: Tamar Christina <tamar.christina@arm.com>
> Sent: Wednesday, June 28, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Subject: [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison
> to make the main body re-usable.
> 
> Hi All,
> 
> Vectorization of a gcond starts off essentially the same as vectorizing a
> comparison witht he only difference being how the operands are extracted.
> 
> This refactors vectorable_comparison such that we now have a generic
> function that can be used from vectorizable_early_break.  The refactoring
> splits the gassign checks and actual validation/codegen off to a helper
> function.
> 
> No change in functionality expected.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting
> body
> 	to ...
> 	(vectorizable_comparison_1): ...This.
> 
> --- inline copy of patch --
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca8
> 1acd197693fc3457c31 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -11332,21 +11332,22 @@ vectorizable_condition (vec_info *vinfo,
> 
>  /* vectorizable_comparison.
> 
> -   Check if STMT_INFO is comparison expression that can be vectorized.
> +/* Helper of vectorizable_comparison.
> +
> +   Check if STMT_INFO is comparison expression CODE that can be vectorized.
>     If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
>     comparison, put it in VEC_STMT, and insert it at GSI.
> 
>     Return true if STMT_INFO is vectorizable in this way.  */
> 
>  static bool
> -vectorizable_comparison (vec_info *vinfo,
> -			 stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> -			 gimple **vec_stmt,
> -			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
> +			   stmt_vec_info stmt_info, tree_code code,
> +			   gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			   slp_tree slp_node, stmt_vector_for_cost *cost_vec)
>  {
>    tree lhs, rhs1, rhs2;
>    tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
> -  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>    tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
>    tree new_temp;
>    loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo); @@ -11354,7
> +11355,7 @@ vectorizable_comparison (vec_info *vinfo,
>    int ndts = 2;
>    poly_uint64 nunits;
>    int ncopies;
> -  enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
> +  enum tree_code bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
>    int i;
>    bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo);
>    vec<tree> vec_oprnds0 = vNULL;
> @@ -11377,14 +11378,6 @@ vectorizable_comparison (vec_info *vinfo,
>      ncopies = vect_get_num_copies (loop_vinfo, vectype);
> 
>    gcc_assert (ncopies >= 1);
> -  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
> -    return false;
> -
> -  gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt);
> -  if (!stmt)
> -    return false;
> -
> -  code = gimple_assign_rhs_code (stmt);
> 
>    if (TREE_CODE_CLASS (code) != tcc_comparison)
>      return false;
> @@ -11499,7 +11492,6 @@ vectorizable_comparison (vec_info *vinfo,
>  	  return false;
>  	}
> 
> -      STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
>        vect_model_simple_cost (vinfo, stmt_info,
>  			      ncopies * (1 + (bitop2 != NOP_EXPR)),
>  			      dts, ndts, slp_node, cost_vec); @@ -11565,6
> +11557,44 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
> 
> +/* vectorizable_comparison.
> +
> +   Check if STMT_INFO is comparison expression that can be vectorized.
> +   If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
> +   comparison, put it in VEC_STMT, and insert it at GSI.
> +
> +   Return true if STMT_INFO is vectorizable in this way.  */
> +
> +static bool
> +vectorizable_comparison (vec_info *vinfo,
> +			 stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> +			 gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec) {
> +  bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo);
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
> +    return false;
> +
> +  gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt);  if (!stmt)
> +    return false;
> +
> +  enum tree_code code = gimple_assign_rhs_code (stmt);
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    return false;
> +
> +  if (!vec_stmt)
> +    STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> 
> 
> 
> 
> --

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: FW: [PATCH v5 0/19] Support early break/return auto-vectorization
       [not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
@ 2023-06-28 14:49   ` 钟居哲
  2023-06-28 16:00     ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: 钟居哲 @ 2023-06-28 14:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, jlaw, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 13343 bytes --]

Hi, Tamar.

This is an amazing auto-vectorization flow.

I am thinking about whether RVV can also get benefits from this optimization.
IMHO, RVV should be also using this flow.

So, to allow RVV  (target uses len as loop_control and mask as flow control),
I am not sure whether we can do this (Feel free to correct me if I am wrong):

+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);

Maybe it can be ?

if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) {
  if (mask_loop_p)
     vect_record_loop_mask
   else
     vect_record_loop_len
}

+  tree cond = gimple_assign_lhs (new_stmt);
+  if (masked_loop_p)
+    {
+      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0);
+      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			       &cond_gsi);
+    }
+
+  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
+			build_zero_cst (truth_type));

From my understanding, you are using final_mask = loop_mask (WHILE_ULT) && control_mask (comparison).
Then Test final_mask using NE_EXPR. Am I right?

For RVV, I thinking whether we can have a good way to do this testing.
Not sure whether we can have something like LEN_TEST_MASK_NE (loop_len, control_mask...)

I am not saying that we should support "early break" auto-vectorization for RVV (loop_len && control_mask).
I am just write some comments trying to figure out how I can adapt your working for RVV in the future.

Thanks.

juzhe.zhong@rivai.ai

From: Li, Pan2
Date: 2023-06-28 22:21
To: juzhe.zhong@rivai.ai
Subject: FW: [PATCH v5 0/19] Support early break/return auto-vectorization
FYI.

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Tamar Christina via Gcc-patches
Sent: Wednesday, June 28, 2023 9:41 PM
To: gcc-patches@gcc.gnu.org
Cc: nd@arm.com; rguenther@suse.de; jlaw@ventanamicro.com
Subject: [PATCH v5 0/19] Support early break/return auto-vectorization

Hi All,

This patch adds initial support for early break vectorization in GCC.
The support is added for any target that implements a vector cbranch optab,
this includes both fully masked and non-masked targets.

Depending on the operation, the vectorizer may also require support for boolean
mask reductions using Inclusive OR.  This is however only checked then the
comparison would produce multiple statements.

Concretely the kind of loops supported are of the forms:

for (int i = 0; i < N; i++)
{
   <statements1>
   if (<condition>)
     {
       ...
       <action>;
     }
   <statements2>
}

where <action> can be:
- break
- return
- goto

Any number of statements can be used before the <action> occurs.

Since this is an initial version for GCC 14 it has the following limitations and
features:

- Only fixed sized iterations and buffers are supported.  That is to say any
  vectors loaded or stored must be to statically allocated arrays with known
  sizes. N must also be known.  This limitation is because our primary target
  for this optimization is SVE.  For VLA SVE we can't easily do cross page
  iteraion checks. The result is likely to also not be beneficial. For that
  reason we punt support for variable buffers till we have First-Faulting
  support in GCC.
- any stores in <statements1> should not be to the same objects as in
  <condition>.  Loads are fine as long as they don't have the possibility to
  alias.  More concretely, we block RAW dependencies when the intermediate value
  can't be separated fromt the store, or the store itself can't be moved.
- The number of loop iterations must be known,  this is just a temporarily
  limitation that I intend to address in GCC 14 itself as follow on patches.
- Prologue peeling, alignment peelinig and loop versioning are supported.
- Fully masked loops, unmasked loops and partially masked loops are supported
- Any number of loop early exits are supported.
- The early exit must be before the natural loop exit/latch.  The vectorizer is
  designed in way to propage phi-nodes downwards.  As such supporting this
  inverted control flow is hard.
- No support for epilogue vectorization.  The only epilogue supported is the
  scalar final one.  Epilogue vectorization would also not be profitable.
- Early breaks are only supported for inner loop vectorization.

I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break

With the help of IPA and LTO this still gets hit quite often.  During bootstrap
it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
since these are tests for support for early exit vectorization.

This implementation does not support completely handling the early break inside
the vector loop itself but instead supports adding checks such that if we know
that we have to exit in the current iteration then we branch to scalar code to
actually do the final VF iterations which handles all the code in <action>.

niters analysis and the majority of the vectorizer with hardcoded single_exit
have been updated with the use of a new function vec_loop_iv value which returns
the exit the vectorizer wants to use as the main IV exit.

for niters the this exit is what determines the overall iterations as
that is the O(iters) for the loop.

For the scalar loop we know that whatever exit you take you have to perform at
most VF iterations.  For vector code we only case about the state of fully
performed iteration and reset the scalar code to the (partially) remaining loop.

This new version of the patch does the majority of the work in a new rewritten
loop peeling.  This new function maintains LCSSA all the way through and no
longer requires the touch up functions the vectorized used to incrementally
adjust them later on.  This means that aside from IV updates and guard edge
updates the early exit code is identical to the single exit cases.

When the loop is peeled during the copying I have to go through great lengths to
keep the dominators up to date.  All exits from the first loop are rewired to the
loop header of the second loop.  But this can change the immediate dominator.

The dominators can change again when we wire in the loop guard, as such peeling
now returns a list of dominators that need to be updated if a new guard edge is
added.

For the loop peeling we rewrite the loop form:

                     Header
                      ---
                      |x|
                       2
                       |
                       v
                -------3<------
     early exit |      |      |
                v      v      | latch
                7      4----->6
                |      |
                |      v
                |      8
                |      |
                |      v
                ------>5

into

                     Header
                      ---
                      |x|
                       2
                       |
                       v
                -------3<------
     early exit |      |      |
                v      v      | latch
                7      4----->6
                |      |
                |      v
                |      8
                |      |
                |      v
                |  New Header
                |     ---
                ----->|x|
                       9
                       |
                       v
                ------10<-----
     early exit |      |      |
                v      v      | latch
                14     11---->13
                |      |
                |      v
                |      12
                |      |
                |      v
                ------> 5

That is to say, the first vector loop executes so long as the early exit isn't
needed.  Once the exit is taken, the scalar code will perform at most VF extra
iterations.  The exact number depending on peeling and iteration start and which
exit was taken (natural or early).   For this scalar loop, all early exits are
treated the same.

When we vectorize we move any statement not related to the early break itself
and that would be incorrect to execute before the break (i.e. has side effects)
to after the break.  If this is not possible we decline to vectorize.

This means that we check at the start of iterations whether we are going to exit
or not.  During the analyis phase we check whether we are allowed to do this
moving of statements.  Also note that we only move the scalar statements, but
only do so after peeling but just before we start transforming statements.

Codegen:

for e.g.

#define N 803
unsigned vect_a[N];
unsigned vect_b[N];

unsigned test4(unsigned x)
{
unsigned ret = 0;
for (int i = 0; i < N; i++)
{
   vect_b[i] = x + i;
   if (vect_a[i] > x)
     break;
   vect_a[i] = x;

}
return ret;
}

We generate for Adv. SIMD:

test4:
        adrp    x2, .LC0
        adrp    x3, .LANCHOR0
        dup     v2.4s, w0
        add     x3, x3, :lo12:.LANCHOR0
        movi    v4.4s, 0x4
        add     x4, x3, 3216
        ldr     q1, [x2, #:lo12:.LC0]
        mov     x1, 0
        mov     w2, 0
        .p2align 3,,7
.L3:
        ldr     q0, [x3, x1]
        add     v3.4s, v1.4s, v2.4s
        add     v1.4s, v1.4s, v4.4s
        cmhi    v0.4s, v0.4s, v2.4s
        umaxp   v0.4s, v0.4s, v0.4s
        fmov    x5, d0
        cbnz    x5, .L6
        add     w2, w2, 1
        str     q3, [x1, x4]
        str     q2, [x3, x1]
        add     x1, x1, 16
        cmp     w2, 200
        bne     .L3
        mov     w7, 3
.L2:
        lsl     w2, w2, 2
        add     x5, x3, 3216
        add     w6, w2, w0
        sxtw    x4, w2
        ldr     w1, [x3, x4, lsl 2]
        str     w6, [x5, x4, lsl 2]
        cmp     w0, w1
        bcc     .L4
        add     w1, w2, 1
        str     w0, [x3, x4, lsl 2]
        add     w6, w1, w0
        sxtw    x1, w1
        ldr     w4, [x3, x1, lsl 2]
        str     w6, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        add     w4, w2, 2
        str     w0, [x3, x1, lsl 2]
        sxtw    x1, w4
        add     w6, w1, w0
        ldr     w4, [x3, x1, lsl 2]
        str     w6, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        str     w0, [x3, x1, lsl 2]
        add     w2, w2, 3
        cmp     w7, 3
        beq     .L4
        sxtw    x1, w2
        add     w2, w2, w0
        ldr     w4, [x3, x1, lsl 2]
        str     w2, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        str     w0, [x3, x1, lsl 2]
.L4:
        mov     w0, 0
        ret
        .p2align 2,,3
.L6:
        mov     w7, 4
        b       .L2

and for SVE:

test4:
        adrp    x2, .LANCHOR0
        add     x2, x2, :lo12:.LANCHOR0
        add     x5, x2, 3216
        mov     x3, 0
        mov     w1, 0
        cntw    x4
        mov     z1.s, w0
        index   z0.s, #0, #1
        ptrue   p1.b, all
        ptrue   p0.s, all
        .p2align 3,,7
.L3:
        ld1w    z2.s, p1/z, [x2, x3, lsl 2]
        add     z3.s, z0.s, z1.s
        cmplo   p2.s, p0/z, z1.s, z2.s
        b.any   .L2
        st1w    z3.s, p1, [x5, x3, lsl 2]
        add     w1, w1, 1
        st1w    z1.s, p1, [x2, x3, lsl 2]
        add     x3, x3, x4
        incw    z0.s
        cmp     w3, 803
        bls     .L3
.L5:
        mov     w0, 0
        ret
        .p2align 2,,3
.L2:
        cntw    x5
        mul     w1, w1, w5
        cbz     w5, .L5
        sxtw    x1, w1
        sub     w5, w5, #1
        add     x5, x5, x1
        add     x6, x2, 3216
        b       .L6
        .p2align 2,,3
.L14:
        str     w0, [x2, x1, lsl 2]
        cmp     x1, x5
        beq     .L5
        mov     x1, x4
.L6:
        ldr     w3, [x2, x1, lsl 2]
        add     w4, w0, w1
        str     w4, [x6, x1, lsl 2]
        add     x4, x1, 1
        cmp     w0, w3
        bcs     .L14
        mov     w0, 0
        ret

On the workloads this work is based on we see between 2-3x performance uplift
using this patch.

Follow up plan:
- Boolean vectorization has several shortcomings.  I've filed PR110223 with the
   bigger ones that cause vectorization to fail with this patch.
- SLP support.  This is planned for GCC 15 as for majority of the cases build
   SLP itself fails.  This means I'll need to spend time in making this more
   robust first.  Additionally it requires:
     * Adding support for vectorizing CFG (gconds)
     * Support for CFG to differ between vector and scalar loops.
   Both of which would be disruptive to the tree and I suspect I'll be handling
   fallouts from this patch for a while.  So I plan to work on the surrounding
   building blocks first for the remainder of the year.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Also ran across various workloads and no issues.

When closer to acceptance I will run on other targets as well and clean up
related testsuite fallouts there.

--- inline copy of patch -- 

-- 

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: FW: [PATCH v5 0/19] Support early break/return auto-vectorization
  2023-06-28 14:49   ` FW: [PATCH v5 0/19] Support early break/return auto-vectorization 钟居哲
@ 2023-06-28 16:00     ` Tamar Christina
  0 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-06-28 16:00 UTC (permalink / raw)
  To: 钟居哲, gcc-patches; +Cc: rguenther, jlaw, Richard Sandiford

Hi Juzhe,

> 
> Hi, Tamar.
> 
> This is an amazing auto-vectorization flow.
> 
> I am thinking about whether RVV can also get benefits from this optimization.
> IMHO, RVV should be also using this flow.
> 
> So, to allow RVV  (target uses len as loop_control and mask as flow control), I
> am not sure whether we can do this (Feel free to correct me if I am wrong):
> 
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type,
> NULL);
> 
> Maybe it can be ?
> 
> if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) {
>   if (mask_loop_p)
>      vect_record_loop_mask
>    else
>      vect_record_loop_len
> }
> 

Yeah, that should be the only change required,  I started this patch before the loop_len change
made it in and just rebased recently 😊

> 
> +  tree cond = gimple_assign_lhs (new_stmt);
> +  if (masked_loop_p)
> +    {
> +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> truth_type, 0);
> +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			       &cond_gsi);
> +    }
> +
> +  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
> +			build_zero_cst (truth_type));
> 
> From my understanding, you are using final_mask = loop_mask (WHILE_ULT)
> && control_mask (comparison).
> Then Test final_mask using NE_EXPR. Am I right?

Yeah that's right, It's creating the mask for partial iterations.  The only other constraint is
being able to reduce a boolean mask using inclusive OR,  but that's optional and is only
needed if one side of the comparison produces more than 1 copy (so it's only checked then).

> 
> For RVV, I thinking whether we can have a good way to do this testing.
> Not sure whether we can have something like LEN_TEST_MASK_NE (loop_len,
> control_mask...)
> 

Hmm Is just the vect_record_loop_len change not enough? (I haven't followed the masking
implementation in RVV in detail) but I assume that it's following the general principle than
& an operation with a mask creates a masked operation?

That is to say, I thought LOOP_LEN was only for the loop control? Which doesn't change here.

> I am not saying that we should support "early break" auto-vectorization for
> RVV (loop_len && control_mask).
> I am just write some comments trying to figure out how I can adapt your
> working for RVV in the future.
> 

Yes happy to help, the more uses it gets the more bugs I can fix 😊

Cheers,
Tamar

> Thanks.
> 
> 
> juzhe.zhong@rivai.ai
> 
> From: Li, Pan2
> Date: 2023-06-28 22:21
> To: juzhe.zhong@rivai.ai
> Subject: FW: [PATCH v5 0/19] Support early break/return auto-vectorization
> FYI.
> 
> -----Original Message-----
> From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>
> On Behalf Of Tamar Christina via Gcc-patches
> Sent: Wednesday, June 28, 2023 9:41 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd@arm.com; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: [PATCH v5 0/19] Support early break/return auto-vectorization
> 
> Hi All,
> 
> This patch adds initial support for early break vectorization in GCC.
> The support is added for any target that implements a vector cbranch optab,
> this includes both fully masked and non-masked targets.
> 
> Depending on the operation, the vectorizer may also require support for
> boolean mask reductions using Inclusive OR.  This is however only checked
> then the comparison would produce multiple statements.
> 
> Concretely the kind of loops supported are of the forms:
> 
> for (int i = 0; i < N; i++)
> {
>    <statements1>
>    if (<condition>)
>      {
>        ...
>        <action>;
>      }
>    <statements2>
> }
> 
> where <action> can be:
> - break
> - return
> - goto
> 
> Any number of statements can be used before the <action> occurs.
> 
> Since this is an initial version for GCC 14 it has the following limitations and
> features:
> 
> - Only fixed sized iterations and buffers are supported.  That is to say any
>   vectors loaded or stored must be to statically allocated arrays with known
>   sizes. N must also be known.  This limitation is because our primary target
>   for this optimization is SVE.  For VLA SVE we can't easily do cross page
>   iteraion checks. The result is likely to also not be beneficial. For that
>   reason we punt support for variable buffers till we have First-Faulting
>   support in GCC.
> - any stores in <statements1> should not be to the same objects as in
>   <condition>.  Loads are fine as long as they don't have the possibility to
>   alias.  More concretely, we block RAW dependencies when the intermediate
> value
>   can't be separated fromt the store, or the store itself can't be moved.
> - The number of loop iterations must be known,  this is just a temporarily
>   limitation that I intend to address in GCC 14 itself as follow on patches.
> - Prologue peeling, alignment peelinig and loop versioning are supported.
> - Fully masked loops, unmasked loops and partially masked loops are
> supported
> - Any number of loop early exits are supported.
> - The early exit must be before the natural loop exit/latch.  The vectorizer is
>   designed in way to propage phi-nodes downwards.  As such supporting this
>   inverted control flow is hard.
> - No support for epilogue vectorization.  The only epilogue supported is the
>   scalar final one.  Epilogue vectorization would also not be profitable.
> - Early breaks are only supported for inner loop vectorization.
> 
> I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break
> 
> With the help of IPA and LTO this still gets hit quite often.  During bootstrap it
> hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
> since these are tests for support for early exit vectorization.
> 
> This implementation does not support completely handling the early break
> inside the vector loop itself but instead supports adding checks such that if we
> know that we have to exit in the current iteration then we branch to scalar
> code to actually do the final VF iterations which handles all the code in
> <action>.
> 
> niters analysis and the majority of the vectorizer with hardcoded single_exit
> have been updated with the use of a new function vec_loop_iv value which
> returns the exit the vectorizer wants to use as the main IV exit.
> 
> for niters the this exit is what determines the overall iterations as that is the
> O(iters) for the loop.
> 
> For the scalar loop we know that whatever exit you take you have to perform
> at most VF iterations.  For vector code we only case about the state of fully
> performed iteration and reset the scalar code to the (partially) remaining loop.
> 
> This new version of the patch does the majority of the work in a new rewritten
> loop peeling.  This new function maintains LCSSA all the way through and no
> longer requires the touch up functions the vectorized used to incrementally
> adjust them later on.  This means that aside from IV updates and guard edge
> updates the early exit code is identical to the single exit cases.
> 
> When the loop is peeled during the copying I have to go through great lengths
> to keep the dominators up to date.  All exits from the first loop are rewired to
> the loop header of the second loop.  But this can change the immediate
> dominator.
> 
> The dominators can change again when we wire in the loop guard, as such
> peeling now returns a list of dominators that need to be updated if a new
> guard edge is added.
> 
> For the loop peeling we rewrite the loop form:
> 
> 
>                      Header
>                       ---
>                       |x|
>                        2
>                        |
>                        v
>                 -------3<------
>      early exit |      |      |
>                 v      v      | latch
>                 7      4----->6
>                 |      |
>                 |      v
>                 |      8
>                 |      |
>                 |      v
>                 ------>5
> 
> into
> 
>                      Header
>                       ---
>                       |x|
>                        2
>                        |
>                        v
>                 -------3<------
>      early exit |      |      |
>                 v      v      | latch
>                 7      4----->6
>                 |      |
>                 |      v
>                 |      8
>                 |      |
>                 |      v
>                 |  New Header
>                 |     ---
>                 ----->|x|
>                        9
>                        |
>                        v
>                 ------10<-----
>      early exit |      |      |
>                 v      v      | latch
>                 14     11---->13
>                 |      |
>                 |      v
>                 |      12
>                 |      |
>                 |      v
>                 ------> 5
> 
> That is to say, the first vector loop executes so long as the early exit isn't
> needed.  Once the exit is taken, the scalar code will perform at most VF extra
> iterations.  The exact number depending on peeling and iteration start and
> which
> exit was taken (natural or early).   For this scalar loop, all early exits are
> treated the same.
> 
> When we vectorize we move any statement not related to the early break
> itself and that would be incorrect to execute before the break (i.e. has side
> effects) to after the break.  If this is not possible we decline to vectorize.
> 
> This means that we check at the start of iterations whether we are going to
> exit or not.  During the analyis phase we check whether we are allowed to do
> this moving of statements.  Also note that we only move the scalar
> statements, but only do so after peeling but just before we start transforming
> statements.
> 
> Codegen:
> 
> for e.g.
> 
> #define N 803
> unsigned vect_a[N];
> unsigned vect_b[N];
> 
> unsigned test4(unsigned x)
> {
> unsigned ret = 0;
> for (int i = 0; i < N; i++)
> {
>    vect_b[i] = x + i;
>    if (vect_a[i] > x)
>      break;
>    vect_a[i] = x;
> 
> }
> return ret;
> }
> 
> We generate for Adv. SIMD:
> 
> test4:
>         adrp    x2, .LC0
>         adrp    x3, .LANCHOR0
>         dup     v2.4s, w0
>         add     x3, x3, :lo12:.LANCHOR0
>         movi    v4.4s, 0x4
>         add     x4, x3, 3216
>         ldr     q1, [x2, #:lo12:.LC0]
>         mov     x1, 0
>         mov     w2, 0
>         .p2align 3,,7
> .L3:
>         ldr     q0, [x3, x1]
>         add     v3.4s, v1.4s, v2.4s
>         add     v1.4s, v1.4s, v4.4s
>         cmhi    v0.4s, v0.4s, v2.4s
>         umaxp   v0.4s, v0.4s, v0.4s
>         fmov    x5, d0
>         cbnz    x5, .L6
>         add     w2, w2, 1
>         str     q3, [x1, x4]
>         str     q2, [x3, x1]
>         add     x1, x1, 16
>         cmp     w2, 200
>         bne     .L3
>         mov     w7, 3
> .L2:
>         lsl     w2, w2, 2
>         add     x5, x3, 3216
>         add     w6, w2, w0
>         sxtw    x4, w2
>         ldr     w1, [x3, x4, lsl 2]
>         str     w6, [x5, x4, lsl 2]
>         cmp     w0, w1
>         bcc     .L4
>         add     w1, w2, 1
>         str     w0, [x3, x4, lsl 2]
>         add     w6, w1, w0
>         sxtw    x1, w1
>         ldr     w4, [x3, x1, lsl 2]
>         str     w6, [x5, x1, lsl 2]
>         cmp     w0, w4
>         bcc     .L4
>         add     w4, w2, 2
>         str     w0, [x3, x1, lsl 2]
>         sxtw    x1, w4
>         add     w6, w1, w0
>         ldr     w4, [x3, x1, lsl 2]
>         str     w6, [x5, x1, lsl 2]
>         cmp     w0, w4
>         bcc     .L4
>         str     w0, [x3, x1, lsl 2]
>         add     w2, w2, 3
>         cmp     w7, 3
>         beq     .L4
>         sxtw    x1, w2
>         add     w2, w2, w0
>         ldr     w4, [x3, x1, lsl 2]
>         str     w2, [x5, x1, lsl 2]
>         cmp     w0, w4
>         bcc     .L4
>         str     w0, [x3, x1, lsl 2]
> .L4:
>         mov     w0, 0
>         ret
>         .p2align 2,,3
> .L6:
>         mov     w7, 4
>         b       .L2
> 
> and for SVE:
> 
> test4:
>         adrp    x2, .LANCHOR0
>         add     x2, x2, :lo12:.LANCHOR0
>         add     x5, x2, 3216
>         mov     x3, 0
>         mov     w1, 0
>         cntw    x4
>         mov     z1.s, w0
>         index   z0.s, #0, #1
>         ptrue   p1.b, all
>         ptrue   p0.s, all
>         .p2align 3,,7
> .L3:
>         ld1w    z2.s, p1/z, [x2, x3, lsl 2]
>         add     z3.s, z0.s, z1.s
>         cmplo   p2.s, p0/z, z1.s, z2.s
>         b.any   .L2
>         st1w    z3.s, p1, [x5, x3, lsl 2]
>         add     w1, w1, 1
>         st1w    z1.s, p1, [x2, x3, lsl 2]
>         add     x3, x3, x4
>         incw    z0.s
>         cmp     w3, 803
>         bls     .L3
> .L5:
>         mov     w0, 0
>         ret
>         .p2align 2,,3
> .L2:
>         cntw    x5
>         mul     w1, w1, w5
>         cbz     w5, .L5
>         sxtw    x1, w1
>         sub     w5, w5, #1
>         add     x5, x5, x1
>         add     x6, x2, 3216
>         b       .L6
>         .p2align 2,,3
> .L14:
>         str     w0, [x2, x1, lsl 2]
>         cmp     x1, x5
>         beq     .L5
>         mov     x1, x4
> .L6:
>         ldr     w3, [x2, x1, lsl 2]
>         add     w4, w0, w1
>         str     w4, [x6, x1, lsl 2]
>         add     x4, x1, 1
>         cmp     w0, w3
>         bcs     .L14
>         mov     w0, 0
>         ret
> 
> On the workloads this work is based on we see between 2-3x performance
> uplift using this patch.
> 
> Follow up plan:
> - Boolean vectorization has several shortcomings.  I've filed PR110223 with
> the
>    bigger ones that cause vectorization to fail with this patch.
> - SLP support.  This is planned for GCC 15 as for majority of the cases build
>    SLP itself fails.  This means I'll need to spend time in making this more
>    robust first.  Additionally it requires:
>      * Adding support for vectorizing CFG (gconds)
>      * Support for CFG to differ between vector and scalar loops.
>    Both of which would be disruptive to the tree and I suspect I'll be handling
>    fallouts from this patch for a while.  So I plan to work on the surrounding
>    building blocks first for the remainder of the year.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Also ran across various workloads and no issues.
> 
> When closer to acceptance I will run on other targets as well and clean up
> related testsuite fallouts there.
> 
> --- inline copy of patch --
> 
> --

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector
  2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
@ 2023-06-29 22:17   ` Jason Merrill
  2023-06-30 16:18     ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Jason Merrill @ 2023-06-29 22:17 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: nd, joseph, rguenther, nathan

On 6/28/23 09:41, Tamar Christina wrote:
> Hi All,
> 
> FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
> not be applied to a particular loop.
> 
> ICC/ICX also has such a pragma for C and C++ called #pragma novector.
> 
> As part of this patch series I need a way to easily turn off vectorization of
> particular loops, particularly for testsuite reasons.
> 
> This patch proposes a #pragma GCC novector that does the same for C and C++
> as gfortan does for FORTRAN and what ICX/ICX does for C and C++.
> 
> I added only some basic tests here, but the next patch in the series uses this
> in the testsuite in about ~800 tests.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/c-family/ChangeLog:
> 
> 	* c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
> 	* c-pragma.cc (init_pragma): Use it.
> 
> gcc/c/ChangeLog:
> 
> 	* c-parser.cc (c_parser_while_statement, c_parser_do_statement,
> 	c_parser_for_statement, c_parser_statement_after_labels,
> 	c_parse_pragma_novector, c_parser_pragma): Wire through novector and
> 	default to false.

I'll let the C maintainers review the C changes.

> gcc/cp/ChangeLog:
> 
> 	* cp-tree.def (RANGE_FOR_STMT): Update comment.
> 	* cp-tree.h (RANGE_FOR_NOVECTOR): New.
> 	(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
> 	finish_for_cond): Add novector param.
> 	* init.cc (build_vec_init): Default novector to false.
> 	* method.cc (build_comparison_op): Likewise.
> 	* parser.cc (cp_parser_statement): Likewise.
> 	(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
> 	cp_convert_range_for, cp_parser_iteration_statement,
> 	cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
> 	(cp_parser_pragma_novector): New.
> 	* pt.cc (tsubst_expr): Likewise.
> 	* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
> 	finish_for_cond): Likewise.
> 
> gcc/ChangeLog:
> 
> 	* doc/extend.texi: Document it.
> 	* tree-core.h (struct tree_base): Add lang_flag_7 and reduce spare0.
> 	* tree.h (TREE_LANG_FLAG_7): New.

This doesn't seem necessary; I think only flags 1 and 6 are currently 
used in RANGE_FOR_STMT.

> gcc/testsuite/ChangeLog:
> 
> 	* g++.dg/vect/vect-novector-pragma.cc: New test.
> 	* gcc.dg/vect/vect-novector-pragma.c: New test.
> 
> --- inline copy of patch --
>...
> @@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser)
>      not included. */
>   
>   static tree
> -cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
> +cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll,
> +	       bool novector)

I wonder about combining the ivdep and novector parameters here and in 
other functions?  Up to you.

> @@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
>   	    break;
>   	  }
>   	const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
> -	unsigned short unroll;
> +	unsigned short unroll = 0;
> +	bool novector = false;
>   	cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
> -	if (tok->type == CPP_PRAGMA
> -	    && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
> +
> +	while (tok->type == CPP_PRAGMA)
>   	  {
> -	    tok = cp_lexer_consume_token (parser->lexer);
> -	    unroll = cp_parser_pragma_unroll (parser, tok);
> -	    tok = cp_lexer_peek_token (the_parser->lexer);
> +	    switch (cp_parser_pragma_kind (tok))
> +	      {
> +		case PRAGMA_UNROLL:
> +		  {
> +		    tok = cp_lexer_consume_token (parser->lexer);
> +		    unroll = cp_parser_pragma_unroll (parser, tok);
> +		    tok = cp_lexer_peek_token (the_parser->lexer);
> +		    break;
> +		  }
> +		case PRAGMA_NOVECTOR:
> +		  {
> +		    tok = cp_lexer_consume_token (parser->lexer);
> +		    novector = cp_parser_pragma_novector (parser, tok);
> +		    tok = cp_lexer_peek_token (the_parser->lexer);
> +		    break;
> +		  }
> +		default:
> +		  gcc_unreachable ();
> +	      }
>   	  }

Repeating this pattern three times for the three related pragmas is too 
much; please combine the three cases into one.

Jason


^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector
  2023-06-29 22:17   ` Jason Merrill
@ 2023-06-30 16:18     ` Tamar Christina
  2023-06-30 16:44       ` Jason Merrill
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-06-30 16:18 UTC (permalink / raw)
  To: Jason Merrill, gcc-patches; +Cc: nd, joseph, rguenther, nathan

Hi Jason,

Thanks for the review. I only now realized I should have split them between C and C++.

Will do so on the respins.

> 
> On 6/28/23 09:41, Tamar Christina wrote:
> > Hi All,
> >
> > FORTRAN currently has a pragma NOVECTOR for indicating that
> > vectorization should not be applied to a particular loop.
> >
> > ICC/ICX also has such a pragma for C and C++ called #pragma novector.
> >
> > As part of this patch series I need a way to easily turn off
> > vectorization of particular loops, particularly for testsuite reasons.
> >
> > This patch proposes a #pragma GCC novector that does the same for C
> > and C++ as gfortan does for FORTRAN and what ICX/ICX does for C and C++.
> >
> > I added only some basic tests here, but the next patch in the series
> > uses this in the testsuite in about ~800 tests.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/c-family/ChangeLog:
> >
> > 	* c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
> > 	* c-pragma.cc (init_pragma): Use it.
> >
> > gcc/c/ChangeLog:
> >
> > 	* c-parser.cc (c_parser_while_statement, c_parser_do_statement,
> > 	c_parser_for_statement, c_parser_statement_after_labels,
> > 	c_parse_pragma_novector, c_parser_pragma): Wire through novector
> and
> > 	default to false.
> 
> I'll let the C maintainers review the C changes.
> 
> > gcc/cp/ChangeLog:
> >
> > 	* cp-tree.def (RANGE_FOR_STMT): Update comment.
> > 	* cp-tree.h (RANGE_FOR_NOVECTOR): New.
> > 	(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
> > 	finish_for_cond): Add novector param.
> > 	* init.cc (build_vec_init): Default novector to false.
> > 	* method.cc (build_comparison_op): Likewise.
> > 	* parser.cc (cp_parser_statement): Likewise.
> > 	(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
> > 	cp_convert_range_for, cp_parser_iteration_statement,
> > 	cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
> > 	(cp_parser_pragma_novector): New.
> > 	* pt.cc (tsubst_expr): Likewise.
> > 	* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
> > 	finish_for_cond): Likewise.
> >
> > gcc/ChangeLog:
> >
> > 	* doc/extend.texi: Document it.
> > 	* tree-core.h (struct tree_base): Add lang_flag_7 and reduce spare0.
> > 	* tree.h (TREE_LANG_FLAG_7): New.
> 
> This doesn't seem necessary; I think only flags 1 and 6 are currently used in
> RANGE_FOR_STMT.

Ah fair, I thought every option needed to occupy a specific bit. I'll try to re-use one.

> 
> > gcc/testsuite/ChangeLog:
> >
> > 	* g++.dg/vect/vect-novector-pragma.cc: New test.
> > 	* gcc.dg/vect/vect-novector-pragma.c: New test.
> >
> > --- inline copy of patch --
> >...
> > @@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser)
> >      not included. */
> >
> >   static tree
> > -cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
> > +cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll,
> > +	       bool novector)
> 
> I wonder about combining the ivdep and novector parameters here and in
> other functions?  Up to you.

As in, combine them in e.g. a struct?

> 
> > @@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser,
> enum pragma_context context, bool *if_p)
> >   	    break;
> >   	  }
> >   	const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
> > -	unsigned short unroll;
> > +	unsigned short unroll = 0;
> > +	bool novector = false;
> >   	cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
> > -	if (tok->type == CPP_PRAGMA
> > -	    && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
> > +
> > +	while (tok->type == CPP_PRAGMA)
> >   	  {
> > -	    tok = cp_lexer_consume_token (parser->lexer);
> > -	    unroll = cp_parser_pragma_unroll (parser, tok);
> > -	    tok = cp_lexer_peek_token (the_parser->lexer);
> > +	    switch (cp_parser_pragma_kind (tok))
> > +	      {
> > +		case PRAGMA_UNROLL:
> > +		  {
> > +		    tok = cp_lexer_consume_token (parser->lexer);
> > +		    unroll = cp_parser_pragma_unroll (parser, tok);
> > +		    tok = cp_lexer_peek_token (the_parser->lexer);
> > +		    break;
> > +		  }
> > +		case PRAGMA_NOVECTOR:
> > +		  {
> > +		    tok = cp_lexer_consume_token (parser->lexer);
> > +		    novector = cp_parser_pragma_novector (parser, tok);
> > +		    tok = cp_lexer_peek_token (the_parser->lexer);
> > +		    break;
> > +		  }
> > +		default:
> > +		  gcc_unreachable ();
> > +	      }
> >   	  }
> 
> Repeating this pattern three times for the three related pragmas is too much;
> please combine the three cases into one.

Sure, I had some trouble combing them before because of the initial token being
consumed, but think I know a way.

Thanks for the review, will send updated split patch Monday.

Cheers,
Tamar
> 
> Jason


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector
  2023-06-30 16:18     ` Tamar Christina
@ 2023-06-30 16:44       ` Jason Merrill
  0 siblings, 0 replies; 200+ messages in thread
From: Jason Merrill @ 2023-06-30 16:44 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches List, nd, Joseph S. Myers, Richard Biener, Nathan Sidwell

[-- Attachment #1: Type: text/plain, Size: 5427 bytes --]

On Fri, Jun 30, 2023, 12:18 PM Tamar Christina <Tamar.Christina@arm.com>
wrote:

> Hi Jason,
>
> Thanks for the review. I only now realized I should have split them
> between C and C++.
>
> Will do so on the respins.
>
> >
> > On 6/28/23 09:41, Tamar Christina wrote:
> > > Hi All,
> > >
> > > FORTRAN currently has a pragma NOVECTOR for indicating that
> > > vectorization should not be applied to a particular loop.
> > >
> > > ICC/ICX also has such a pragma for C and C++ called #pragma novector.
> > >
> > > As part of this patch series I need a way to easily turn off
> > > vectorization of particular loops, particularly for testsuite reasons.
> > >
> > > This patch proposes a #pragma GCC novector that does the same for C
> > > and C++ as gfortan does for FORTRAN and what ICX/ICX does for C and
> C++.
> > >
> > > I added only some basic tests here, but the next patch in the series
> > > uses this in the testsuite in about ~800 tests.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/c-family/ChangeLog:
> > >
> > >     * c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
> > >     * c-pragma.cc (init_pragma): Use it.
> > >
> > > gcc/c/ChangeLog:
> > >
> > >     * c-parser.cc (c_parser_while_statement, c_parser_do_statement,
> > >     c_parser_for_statement, c_parser_statement_after_labels,
> > >     c_parse_pragma_novector, c_parser_pragma): Wire through novector
> > and
> > >     default to false.
> >
> > I'll let the C maintainers review the C changes.
> >
> > > gcc/cp/ChangeLog:
> > >
> > >     * cp-tree.def (RANGE_FOR_STMT): Update comment.
> > >     * cp-tree.h (RANGE_FOR_NOVECTOR): New.
> > >     (cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
> > >     finish_for_cond): Add novector param.
> > >     * init.cc (build_vec_init): Default novector to false.
> > >     * method.cc (build_comparison_op): Likewise.
> > >     * parser.cc (cp_parser_statement): Likewise.
> > >     (cp_parser_for, cp_parser_c_for, cp_parser_range_for,
> > >     cp_convert_range_for, cp_parser_iteration_statement,
> > >     cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
> > >     (cp_parser_pragma_novector): New.
> > >     * pt.cc (tsubst_expr): Likewise.
> > >     * semantics.cc (finish_while_stmt_cond, finish_do_stmt,
> > >     finish_for_cond): Likewise.
> > >
> > > gcc/ChangeLog:
> > >
> > >     * doc/extend.texi: Document it.
> > >     * tree-core.h (struct tree_base): Add lang_flag_7 and reduce
> spare0.
> > >     * tree.h (TREE_LANG_FLAG_7): New.
> >
> > This doesn't seem necessary; I think only flags 1 and 6 are currently
> used in
> > RANGE_FOR_STMT.
>
> Ah fair, I thought every option needed to occupy a specific bit. I'll try
> to re-use one.
>
> >
> > > gcc/testsuite/ChangeLog:
> > >
> > >     * g++.dg/vect/vect-novector-pragma.cc: New test.
> > >     * gcc.dg/vect/vect-novector-pragma.c: New test.
> > >
> > > --- inline copy of patch --
> > >...
> > > @@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser)
> > >      not included. */
> > >
> > >   static tree
> > > -cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
> > > +cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll,
> > > +          bool novector)
> >
> > I wonder about combining the ivdep and novector parameters here and in
> > other functions?  Up to you.
>
> As in, combine them in e.g. a struct?
>

I was thinking in an int or enum.

>
> > > @@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser,
> > enum pragma_context context, bool *if_p)
> > >         break;
> > >       }
> > >     const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
> > > -   unsigned short unroll;
> > > +   unsigned short unroll = 0;
> > > +   bool novector = false;
> > >     cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
> > > -   if (tok->type == CPP_PRAGMA
> > > -       && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
> > > +
> > > +   while (tok->type == CPP_PRAGMA)
> > >       {
> > > -       tok = cp_lexer_consume_token (parser->lexer);
> > > -       unroll = cp_parser_pragma_unroll (parser, tok);
> > > -       tok = cp_lexer_peek_token (the_parser->lexer);
> > > +       switch (cp_parser_pragma_kind (tok))
> > > +         {
> > > +           case PRAGMA_UNROLL:
> > > +             {
> > > +               tok = cp_lexer_consume_token (parser->lexer);
> > > +               unroll = cp_parser_pragma_unroll (parser, tok);
> > > +               tok = cp_lexer_peek_token (the_parser->lexer);
> > > +               break;
> > > +             }
> > > +           case PRAGMA_NOVECTOR:
> > > +             {
> > > +               tok = cp_lexer_consume_token (parser->lexer);
> > > +               novector = cp_parser_pragma_novector (parser, tok);
> > > +               tok = cp_lexer_peek_token (the_parser->lexer);
> > > +               break;
> > > +             }
> > > +           default:
> > > +             gcc_unreachable ();
> > > +         }
> > >       }
> >
> > Repeating this pattern three times for the three related pragmas is too
> much;
> > please combine the three cases into one.
>
> Sure, I had some trouble combing them before because of the initial token
> being
> consumed, but think I know a way.
>
> Thanks for the review, will send updated split patch Monday.
>
> Cheers,
> Tamar
> >
> > Jason
>
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops
  2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
@ 2023-07-04 11:29   ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-04 11:29 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 28 Jun 2023, Tamar Christina wrote:

> Hi,
> 
> With the patch enabling the vectorization of early-breaks, we'd like to allow
> bitfield lowering in such loops, which requires the relaxation of allowing
> multiple exits when doing so.  In order to avoid a similar issue to PR107275,
> the code that rejects loops with certain types of gimple_stmts was hoisted from
> 'if_convertible_loop_p_1' to 'get_loop_body_in_if_conv_order', to avoid trying
> to lower bitfields in loops we are not going to vectorize anyway.
> 
> This also ensures 'ifcvt_local_dec' doesn't accidentally remove statements it
> shouldn't as it will never come across them.  I made sure to add a comment to
> make clear that there is a direct connection between the two and if we were to
> enable vectorization of any other gimple statement we should make sure both
> handle it.
> 
> NOTE: This patch accepted before but never committed because it is a no-op
> without the early break patch.   This is a respun version of Andre's patch and
> rebased to changes in ifcvt and updated to handle multiple exits.
> 
> Bootstrappend and regression tested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu and no issues.

OK.

> gcc/ChangeLog:
> 
> 	* tree-if-conv.cc (if_convertible_loop_p_1): Move check from here ...
> 	(get_loop_body_if_conv_order): ... to here.
> 	(if_convertible_loop_p): Remove single_exit check.
> 	(tree_if_conversion): Move single_exit check to if-conversion part and
> 	support multiple exits.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-bitfield-read-1-not.c: New test.
> 	* gcc.dg/vect/vect-bitfield-read-2-not.c: New test.
> 	* gcc.dg/vect/vect-bitfield-read-8.c: New test.
> 	* gcc.dg/vect/vect-bitfield-read-9.c: New test.
> 
> Co-Authored-By:  Andre Vieira <andre.simoesdiasvieira@arm.com>
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..0d91067ebb27b1db2b2352975c43bce8b4171e3f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
> @@ -0,0 +1,60 @@
> +/* { dg-require-effective-target vect_shift } */
> +/* { dg-require-effective-target vect_long_long } */
> +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +extern void abort(void);
> +
> +struct s {
> +    char a : 4;
> +};
> +
> +#define N 32
> +#define ELT0 {0}
> +#define ELT1 {1}
> +#define ELT2 {2}
> +#define ELT3 {3}
> +#define RES 56
> +struct s A[N]
> +  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
> +
> +int __attribute__ ((noipa))
> +f(struct s *ptr, unsigned n) {
> +    int res = 0;
> +    for (int i = 0; i < n; ++i)
> +      {
> +	switch (ptr[i].a)
> +	  {
> +	  case 0:
> +	    res += ptr[i].a + 1;
> +	    break;
> +	  case 1:
> +	  case 2:
> +	  case 3:
> +	    res += ptr[i].a;
> +	    break;
> +	  default:
> +	    return 0;
> +	  }
> +      }
> +    return res;
> +}
> +
> +int main (void)
> +{
> +  check_vect ();
> +
> +  if (f(&A[0], N) != RES)
> +    abort ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
> +
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..4ac7b3fc0dfd1c9d0b5e94a2ba6a745545577ec1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c
> @@ -0,0 +1,49 @@
> +/* { dg-require-effective-target vect_shift } */
> +/* { dg-require-effective-target vect_long_long } */
> +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +extern void abort(void);
> +
> +struct s {
> +    char a : 4;
> +};
> +
> +#define N 32
> +#define ELT0 {0}
> +#define ELT1 {1}
> +#define ELT2 {2}
> +#define ELT3 {3}
> +#define RES 48
> +struct s A[N]
> +  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
> +
> +int __attribute__ ((noipa))
> +f(struct s *ptr, unsigned n) {
> +    int res = 0;
> +    for (int i = 0; i < n; ++i)
> +      {
> +	asm volatile ("" ::: "memory");
> +	res += ptr[i].a;
> +      }
> +    return res;
> +}
> +
> +int main (void)
> +{
> +  check_vect ();
> +
> +  if (f(&A[0], N) != RES)
> +    abort ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
> +
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..52cfd33d937ae90f3fe9556716c90e098b768ac8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c
> @@ -0,0 +1,49 @@
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_shift } */
> +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +extern void abort(void);
> +
> +struct s { int i : 31; };
> +
> +#define ELT0 {0}
> +#define ELT1 {1}
> +#define ELT2 {2}
> +#define ELT3 {3}
> +#define ELT4 {4}
> +#define N 32
> +#define RES 25
> +struct s A[N]
> +  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT4, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
> +
> +int __attribute__ ((noipa))
> +f(struct s *ptr, unsigned n) {
> +    int res = 0;
> +    for (int i = 0; i < n; ++i)
> +      {
> +	if (ptr[i].i == 4)
> +	  return res;
> +	res += ptr[i].i;
> +      }
> +
> +    return res;
> +}
> +
> +int main (void)
> +{
> +  check_vect ();
> +
> +  if (f(&A[0], N) != RES)
> +    abort ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "Bitfield OK to lower." "ifcvt" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ab814698131a5905def181eeed85d8a3c62b924b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c
> @@ -0,0 +1,51 @@
> +/* { dg-require-effective-target vect_shift } */
> +/* { dg-require-effective-target vect_long_long } */
> +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +extern void abort(void);
> +
> +struct s {
> +    unsigned i : 31;
> +    char a : 4;
> +};
> +
> +#define N 32
> +#define ELT0 {0x7FFFFFFFUL, 0}
> +#define ELT1 {0x7FFFFFFFUL, 1}
> +#define ELT2 {0x7FFFFFFFUL, 2}
> +#define ELT3 {0x7FFFFFFFUL, 3}
> +#define ELT4 {0x7FFFFFFFUL, 4}
> +#define RES 9
> +struct s A[N]
> +  = { ELT0, ELT4, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
> +      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
> +
> +int __attribute__ ((noipa))
> +f(struct s *ptr, unsigned n) {
> +    int res = 0;
> +    for (int i = 0; i < n; ++i)
> +      {
> +	if (ptr[i].a)
> +	  return 9;
> +	res += ptr[i].a;
> +      }
> +    return res;
> +}
> +
> +int main (void)
> +{
> +  check_vect ();
> +
> +  if (f(&A[0], N) != RES)
> +    abort ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump "Bitfield OK to lower." "ifcvt" } } */
> +
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index e342532a343a3c066142adeec5fdfaf736a653e5..cdb0fe4c29dfa531e3277925022d127b13ffcc16 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -586,7 +586,7 @@ add_to_dst_predicate_list (class loop *loop, edge e,
>  /* Return true if one of the successor edges of BB exits LOOP.  */
>  
>  static bool
> -bb_with_exit_edge_p (class loop *loop, basic_block bb)
> +bb_with_exit_edge_p (const class loop *loop, basic_block bb)
>  {
>    edge e;
>    edge_iterator ei;
> @@ -1268,6 +1268,44 @@ get_loop_body_in_if_conv_order (const class loop *loop)
>      }
>    free (blocks_in_bfs_order);
>    BITMAP_FREE (visited);
> +
> +  /* Go through loop and reject if-conversion or lowering of bitfields if we
> +     encounter statements we do not believe the vectorizer will be able to
> +     handle.  If adding a new type of statement here, make sure
> +     'ifcvt_local_dce' is also able to handle it propertly.  */
> +  for (index = 0; index < loop->num_nodes; index++)
> +    {
> +      basic_block bb = blocks[index];
> +      gimple_stmt_iterator gsi;
> +
> +      bool may_have_nonlocal_labels
> +	= bb_with_exit_edge_p (loop, bb) || bb == loop->latch;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +	switch (gimple_code (gsi_stmt (gsi)))
> +	  {
> +	  case GIMPLE_LABEL:
> +	    if (!may_have_nonlocal_labels)
> +	      {
> +		tree label
> +		  = gimple_label_label (as_a <glabel *> (gsi_stmt (gsi)));
> +		if (DECL_NONLOCAL (label) || FORCED_LABEL (label))
> +		  {
> +		    free (blocks);
> +		    return NULL;
> +		  }
> +	      }
> +	    /* Fallthru.  */
> +	  case GIMPLE_ASSIGN:
> +	  case GIMPLE_CALL:
> +	  case GIMPLE_DEBUG:
> +	  case GIMPLE_COND:
> +	    gimple_set_uid (gsi_stmt (gsi), 0);
> +	    break;
> +	  default:
> +	    free (blocks);
> +	    return NULL;
> +	  }
> +    }
>    return blocks;
>  }
>  
> @@ -1438,36 +1476,6 @@ if_convertible_loop_p_1 (class loop *loop, vec<data_reference_p> *refs)
>  	exit_bb = bb;
>      }
>  
> -  for (i = 0; i < loop->num_nodes; i++)
> -    {
> -      basic_block bb = ifc_bbs[i];
> -      gimple_stmt_iterator gsi;
> -
> -      bool may_have_nonlocal_labels
> -	= bb_with_exit_edge_p (loop, bb) || bb == loop->latch;
> -      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> -	switch (gimple_code (gsi_stmt (gsi)))
> -	  {
> -	  case GIMPLE_LABEL:
> -	    if (!may_have_nonlocal_labels)
> -	      {
> -		tree label
> -		  = gimple_label_label (as_a <glabel *> (gsi_stmt (gsi)));
> -		if (DECL_NONLOCAL (label) || FORCED_LABEL (label))
> -		  return false;
> -	      }
> -	    /* Fallthru.  */
> -	  case GIMPLE_ASSIGN:
> -	  case GIMPLE_CALL:
> -	  case GIMPLE_DEBUG:
> -	  case GIMPLE_COND:
> -	    gimple_set_uid (gsi_stmt (gsi), 0);
> -	    break;
> -	  default:
> -	    return false;
> -	  }
> -    }
> -
>    data_reference_p dr;
>  
>    innermost_DR_map
> @@ -1579,14 +1587,6 @@ if_convertible_loop_p (class loop *loop, vec<data_reference_p> *refs)
>        return false;
>      }
>  
> -  /* More than one loop exit is too much to handle.  */
> -  if (!single_exit (loop))
> -    {
> -      if (dump_file && (dump_flags & TDF_DETAILS))
> -	fprintf (dump_file, "multiple exits\n");
> -      return false;
> -    }
> -
>    /* If one of the loop header's edge is an exit edge then do not
>       apply if-conversion.  */
>    FOR_EACH_EDGE (e, ei, loop->header->succs)
> @@ -3566,9 +3566,6 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>  	aggressive_if_conv = true;
>      }
>  
> -  if (!single_exit (loop))
> -    goto cleanup;
> -
>    /* If there are more than two BBs in the loop then there is at least one if
>       to convert.  */
>    if (loop->num_nodes > 2
> @@ -3588,15 +3585,25 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>  
>    if (loop->num_nodes > 2)
>      {
> -      need_to_ifcvt = true;
> +      /* More than one loop exit is too much to handle.  */
> +      if (!single_exit (loop))
> +	{
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    fprintf (dump_file, "Can not ifcvt due to multiple exits\n");
> +	}
> +      else
> +	{
> +	  need_to_ifcvt = true;
>  
> -      if (!if_convertible_loop_p (loop, &refs) || !dbg_cnt (if_conversion_tree))
> -	goto cleanup;
> +	  if (!if_convertible_loop_p (loop, &refs)
> +	      || !dbg_cnt (if_conversion_tree))
> +	    goto cleanup;
>  
> -      if ((need_to_predicate || any_complicated_phi)
> -	  && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
> -	      || loop->dont_vectorize))
> -	goto cleanup;
> +	  if ((need_to_predicate || any_complicated_phi)
> +	      && ((!flag_tree_loop_vectorize && !loop->force_vectorize)
> +		  || loop->dont_vectorize))
> +	    goto cleanup;
> +	}
>      }
>  
>    if ((flag_tree_loop_vectorize || loop->force_vectorize)
> @@ -3687,7 +3694,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> *preds)
>       PHIs, those are to be kept in sync with the non-if-converted copy.
>       ???  We'll still keep dead stores though.  */
>    exit_bbs = BITMAP_ALLOC (NULL);
> -  bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> +  for (edge exit : get_loop_exit_edges (loop))
> +    bitmap_set_bit (exit_bbs, exit->dest->index);
>    bitmap_set_bit (exit_bbs, loop->latch->index);
>  
>    std::pair <tree, tree> *name_pair;
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 3/19]middle-end clean up vect testsuite using pragma novector
  2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
  2023-06-28 13:54   ` Tamar Christina
@ 2023-07-04 11:31   ` Richard Biener
  1 sibling, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-04 11:31 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 28 Jun 2023, Tamar Christina wrote:

> Hi All,
> 
> The support for early break vectorization breaks lots of scan vect and slp
> testcases because they assume that loops with abort () in them cannot be
> vectorized.  Additionally it breaks the point of having a scalar loop to check
> the output of the vectorizer if that loop is also vectorized.
> 
> For that reason this adds
> 
> #pragma GCC novector to all tests which have a scalar loop that we would have
> vectorized using this patch series.
> 
> FWIW, none of these tests were failing to vectorize or run before the pragma.
> The tests that did point to some issues were copies to the early break test
> suit as well.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK if the frontend parts are approved.  I think that's good independent of
the rest of the series as well, so feel free to split it out.

Richard.

> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
> 	* g++.dg/vect/pr84556.cc: Add novector pragma.
> 	* g++.dg/vect/simd-1.cc: Add novector pragma.
> 	* g++.dg/vect/simd-2.cc: Add novector pragma.
> 	* g++.dg/vect/simd-3.cc: Add novector pragma.
> 	* g++.dg/vect/simd-4.cc: Add novector pragma.
> 	* g++.dg/vect/simd-5.cc: Add novector pragma.
> 	* g++.dg/vect/simd-6.cc: Add novector pragma.
> 	* g++.dg/vect/simd-7.cc: Add novector pragma.
> 	* g++.dg/vect/simd-8.cc: Add novector pragma.
> 	* g++.dg/vect/simd-9.cc: Add novector pragma.
> 	* g++.dg/vect/simd-clone-6.cc: Add novector pragma.
> 	* gcc.dg/vect/O3-pr70130.c: Add novector pragma.
> 	* gcc.dg/vect/Os-vect-95.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-1.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-16.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-2.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-24.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-25.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-26.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-27.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-28.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-29.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-42.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-cond-1.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-over-widen-1.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-over-widen-2.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-pattern-1.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-pattern-2.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-pow-1.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-pr101615-2.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-pr65935.c: Add novector pragma.
> 	* gcc.dg/vect/bb-slp-subgroups-1.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/i386/costmodel-vect-68.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: Add novector pragma.
> 	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c: Add novector pragma.
> 	* gcc.dg/vect/fast-math-bb-slp-call-1.c: Add novector pragma.
> 	* gcc.dg/vect/fast-math-bb-slp-call-2.c: Add novector pragma.
> 	* gcc.dg/vect/fast-math-vect-call-1.c: Add novector pragma.
> 	* gcc.dg/vect/fast-math-vect-call-2.c: Add novector pragma.
> 	* gcc.dg/vect/fast-math-vect-complex-3.c: Add novector pragma.
> 	* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-noreassoc-outer-1.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-noreassoc-outer-2.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-noreassoc-outer-3.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-noreassoc-outer-5.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-10.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-10a.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-10b.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-11.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-12.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-15.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-16.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-17.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-18.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-19.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-20.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-21.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-22.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-3.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-4.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-5.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-6-global.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-6.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-7.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-8.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-9.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-9a.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-outer-9b.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-slp-30.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-slp-31.c: Add novector pragma.
> 	* gcc.dg/vect/no-scevccp-vect-iv-2.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-31.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-34.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-36.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-64.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-65.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-66.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-68.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-69.c: Add novector pragma.
> 	* gcc.dg/vect/no-section-anchors-vect-outer-4h.c: Add novector pragma.
> 	* gcc.dg/vect/no-trapping-math-2.c: Add novector pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-111.c: Add novector pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c: Add novector pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c: Add novector pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c: Add novector pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c: Add novector pragma.
> 	* gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c: Add novector pragma.
> 	* gcc.dg/vect/no-tree-dom-vect-bug.c: Add novector pragma.
> 	* gcc.dg/vect/no-tree-pre-slp-29.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-pr29145.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-101.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-102.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-102a.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-37.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-43.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-45.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-49.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-51.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-53.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-57.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-61.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-79.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-depend-1.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-depend-2.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-depend-3.c: Add novector pragma.
> 	* gcc.dg/vect/no-vfa-vect-dv-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr101445.c: Add novector pragma.
> 	* gcc.dg/vect/pr103581.c: Add novector pragma.
> 	* gcc.dg/vect/pr105219.c: Add novector pragma.
> 	* gcc.dg/vect/pr108608.c: Add novector pragma.
> 	* gcc.dg/vect/pr18400.c: Add novector pragma.
> 	* gcc.dg/vect/pr18536.c: Add novector pragma.
> 	* gcc.dg/vect/pr20122.c: Add novector pragma.
> 	* gcc.dg/vect/pr25413.c: Add novector pragma.
> 	* gcc.dg/vect/pr30784.c: Add novector pragma.
> 	* gcc.dg/vect/pr37539.c: Add novector pragma.
> 	* gcc.dg/vect/pr40074.c: Add novector pragma.
> 	* gcc.dg/vect/pr45752.c: Add novector pragma.
> 	* gcc.dg/vect/pr45902.c: Add novector pragma.
> 	* gcc.dg/vect/pr46009.c: Add novector pragma.
> 	* gcc.dg/vect/pr48172.c: Add novector pragma.
> 	* gcc.dg/vect/pr51074.c: Add novector pragma.
> 	* gcc.dg/vect/pr51581-3.c: Add novector pragma.
> 	* gcc.dg/vect/pr51581-4.c: Add novector pragma.
> 	* gcc.dg/vect/pr53185-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr56826.c: Add novector pragma.
> 	* gcc.dg/vect/pr56918.c: Add novector pragma.
> 	* gcc.dg/vect/pr56920.c: Add novector pragma.
> 	* gcc.dg/vect/pr56933.c: Add novector pragma.
> 	* gcc.dg/vect/pr57705.c: Add novector pragma.
> 	* gcc.dg/vect/pr57741-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr57741-3.c: Add novector pragma.
> 	* gcc.dg/vect/pr59591-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr59591-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr59594.c: Add novector pragma.
> 	* gcc.dg/vect/pr59984.c: Add novector pragma.
> 	* gcc.dg/vect/pr60276.c: Add novector pragma.
> 	* gcc.dg/vect/pr61194.c: Add novector pragma.
> 	* gcc.dg/vect/pr61680.c: Add novector pragma.
> 	* gcc.dg/vect/pr62021.c: Add novector pragma.
> 	* gcc.dg/vect/pr63341-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr64252.c: Add novector pragma.
> 	* gcc.dg/vect/pr64404.c: Add novector pragma.
> 	* gcc.dg/vect/pr64421.c: Add novector pragma.
> 	* gcc.dg/vect/pr64493.c: Add novector pragma.
> 	* gcc.dg/vect/pr64495.c: Add novector pragma.
> 	* gcc.dg/vect/pr66251.c: Add novector pragma.
> 	* gcc.dg/vect/pr66253.c: Add novector pragma.
> 	* gcc.dg/vect/pr68502-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr68502-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr69820.c: Add novector pragma.
> 	* gcc.dg/vect/pr70021.c: Add novector pragma.
> 	* gcc.dg/vect/pr70354-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr70354-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr71259.c: Add novector pragma.
> 	* gcc.dg/vect/pr78005.c: Add novector pragma.
> 	* gcc.dg/vect/pr78558.c: Add novector pragma.
> 	* gcc.dg/vect/pr80815-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr80815-3.c: Add novector pragma.
> 	* gcc.dg/vect/pr80928.c: Add novector pragma.
> 	* gcc.dg/vect/pr81410.c: Add novector pragma.
> 	* gcc.dg/vect/pr81633.c: Add novector pragma.
> 	* gcc.dg/vect/pr81740-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr81740-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr85586.c: Add novector pragma.
> 	* gcc.dg/vect/pr87288-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr87288-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr87288-3.c: Add novector pragma.
> 	* gcc.dg/vect/pr88903-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr88903-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr90018.c: Add novector pragma.
> 	* gcc.dg/vect/pr92420.c: Add novector pragma.
> 	* gcc.dg/vect/pr94994.c: Add novector pragma.
> 	* gcc.dg/vect/pr96783-1.c: Add novector pragma.
> 	* gcc.dg/vect/pr96783-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr97081-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr97558-2.c: Add novector pragma.
> 	* gcc.dg/vect/pr97678.c: Add novector pragma.
> 	* gcc.dg/vect/section-anchors-pr27770.c: Add novector pragma.
> 	* gcc.dg/vect/section-anchors-vect-69.c: Add novector pragma.
> 	* gcc.dg/vect/slp-1.c: Add novector pragma.
> 	* gcc.dg/vect/slp-10.c: Add novector pragma.
> 	* gcc.dg/vect/slp-11a.c: Add novector pragma.
> 	* gcc.dg/vect/slp-11b.c: Add novector pragma.
> 	* gcc.dg/vect/slp-11c.c: Add novector pragma.
> 	* gcc.dg/vect/slp-12a.c: Add novector pragma.
> 	* gcc.dg/vect/slp-12b.c: Add novector pragma.
> 	* gcc.dg/vect/slp-12c.c: Add novector pragma.
> 	* gcc.dg/vect/slp-13-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-13.c: Add novector pragma.
> 	* gcc.dg/vect/slp-14.c: Add novector pragma.
> 	* gcc.dg/vect/slp-15.c: Add novector pragma.
> 	* gcc.dg/vect/slp-16.c: Add novector pragma.
> 	* gcc.dg/vect/slp-17.c: Add novector pragma.
> 	* gcc.dg/vect/slp-18.c: Add novector pragma.
> 	* gcc.dg/vect/slp-19a.c: Add novector pragma.
> 	* gcc.dg/vect/slp-19b.c: Add novector pragma.
> 	* gcc.dg/vect/slp-19c.c: Add novector pragma.
> 	* gcc.dg/vect/slp-2.c: Add novector pragma.
> 	* gcc.dg/vect/slp-20.c: Add novector pragma.
> 	* gcc.dg/vect/slp-21.c: Add novector pragma.
> 	* gcc.dg/vect/slp-22.c: Add novector pragma.
> 	* gcc.dg/vect/slp-23.c: Add novector pragma.
> 	* gcc.dg/vect/slp-24-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-24.c: Add novector pragma.
> 	* gcc.dg/vect/slp-25.c: Add novector pragma.
> 	* gcc.dg/vect/slp-26.c: Add novector pragma.
> 	* gcc.dg/vect/slp-28.c: Add novector pragma.
> 	* gcc.dg/vect/slp-3-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-3.c: Add novector pragma.
> 	* gcc.dg/vect/slp-33.c: Add novector pragma.
> 	* gcc.dg/vect/slp-34-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-34.c: Add novector pragma.
> 	* gcc.dg/vect/slp-35.c: Add novector pragma.
> 	* gcc.dg/vect/slp-37.c: Add novector pragma.
> 	* gcc.dg/vect/slp-4-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-4.c: Add novector pragma.
> 	* gcc.dg/vect/slp-41.c: Add novector pragma.
> 	* gcc.dg/vect/slp-43.c: Add novector pragma.
> 	* gcc.dg/vect/slp-45.c: Add novector pragma.
> 	* gcc.dg/vect/slp-46.c: Add novector pragma.
> 	* gcc.dg/vect/slp-47.c: Add novector pragma.
> 	* gcc.dg/vect/slp-48.c: Add novector pragma.
> 	* gcc.dg/vect/slp-49.c: Add novector pragma.
> 	* gcc.dg/vect/slp-5.c: Add novector pragma.
> 	* gcc.dg/vect/slp-6.c: Add novector pragma.
> 	* gcc.dg/vect/slp-7.c: Add novector pragma.
> 	* gcc.dg/vect/slp-8.c: Add novector pragma.
> 	* gcc.dg/vect/slp-9.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-1.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-2-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-2.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-3.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-4.c: Add novector pragma.
> 	* gcc.dg/vect/slp-cond-5.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-1.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-10.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-11-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-11.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-12.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-2.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-3.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-4.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-5.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-6.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-7.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-8.c: Add novector pragma.
> 	* gcc.dg/vect/slp-multitypes-9.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-1.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-10.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-11.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-12.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-2.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-3.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-4.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-5.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-6.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-7.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-8.c: Add novector pragma.
> 	* gcc.dg/vect/slp-perm-9.c: Add novector pragma.
> 	* gcc.dg/vect/slp-widen-mult-half.c: Add novector pragma.
> 	* gcc.dg/vect/slp-widen-mult-s16.c: Add novector pragma.
> 	* gcc.dg/vect/slp-widen-mult-u8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-100.c: Add novector pragma.
> 	* gcc.dg/vect/vect-103.c: Add novector pragma.
> 	* gcc.dg/vect/vect-104.c: Add novector pragma.
> 	* gcc.dg/vect/vect-105-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-105.c: Add novector pragma.
> 	* gcc.dg/vect/vect-106.c: Add novector pragma.
> 	* gcc.dg/vect/vect-107.c: Add novector pragma.
> 	* gcc.dg/vect/vect-108.c: Add novector pragma.
> 	* gcc.dg/vect/vect-109.c: Add novector pragma.
> 	* gcc.dg/vect/vect-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-110.c: Add novector pragma.
> 	* gcc.dg/vect/vect-113.c: Add novector pragma.
> 	* gcc.dg/vect/vect-114.c: Add novector pragma.
> 	* gcc.dg/vect/vect-115.c: Add novector pragma.
> 	* gcc.dg/vect/vect-116.c: Add novector pragma.
> 	* gcc.dg/vect/vect-117.c: Add novector pragma.
> 	* gcc.dg/vect/vect-11a.c: Add novector pragma.
> 	* gcc.dg/vect/vect-12.c: Add novector pragma.
> 	* gcc.dg/vect/vect-122.c: Add novector pragma.
> 	* gcc.dg/vect/vect-124.c: Add novector pragma.
> 	* gcc.dg/vect/vect-13.c: Add novector pragma.
> 	* gcc.dg/vect/vect-14.c: Add novector pragma.
> 	* gcc.dg/vect/vect-15-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-17.c: Add novector pragma.
> 	* gcc.dg/vect/vect-18.c: Add novector pragma.
> 	* gcc.dg/vect/vect-19.c: Add novector pragma.
> 	* gcc.dg/vect/vect-2-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-20.c: Add novector pragma.
> 	* gcc.dg/vect/vect-21.c: Add novector pragma.
> 	* gcc.dg/vect/vect-22.c: Add novector pragma.
> 	* gcc.dg/vect/vect-23.c: Add novector pragma.
> 	* gcc.dg/vect/vect-24.c: Add novector pragma.
> 	* gcc.dg/vect/vect-25.c: Add novector pragma.
> 	* gcc.dg/vect/vect-26.c: Add novector pragma.
> 	* gcc.dg/vect/vect-27.c: Add novector pragma.
> 	* gcc.dg/vect/vect-28.c: Add novector pragma.
> 	* gcc.dg/vect/vect-29.c: Add novector pragma.
> 	* gcc.dg/vect/vect-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-30.c: Add novector pragma.
> 	* gcc.dg/vect/vect-31-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-31.c: Add novector pragma.
> 	* gcc.dg/vect/vect-32-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-32.c: Add novector pragma.
> 	* gcc.dg/vect/vect-33-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-33.c: Add novector pragma.
> 	* gcc.dg/vect/vect-34-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-34.c: Add novector pragma.
> 	* gcc.dg/vect/vect-35-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-35.c: Add novector pragma.
> 	* gcc.dg/vect/vect-36-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-36.c: Add novector pragma.
> 	* gcc.dg/vect/vect-38.c: Add novector pragma.
> 	* gcc.dg/vect/vect-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-40.c: Add novector pragma.
> 	* gcc.dg/vect/vect-42.c: Add novector pragma.
> 	* gcc.dg/vect/vect-44.c: Add novector pragma.
> 	* gcc.dg/vect/vect-46.c: Add novector pragma.
> 	* gcc.dg/vect/vect-48.c: Add novector pragma.
> 	* gcc.dg/vect/vect-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-50.c: Add novector pragma.
> 	* gcc.dg/vect/vect-52.c: Add novector pragma.
> 	* gcc.dg/vect/vect-54.c: Add novector pragma.
> 	* gcc.dg/vect/vect-56.c: Add novector pragma.
> 	* gcc.dg/vect/vect-58.c: Add novector pragma.
> 	* gcc.dg/vect/vect-6-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-60.c: Add novector pragma.
> 	* gcc.dg/vect/vect-62.c: Add novector pragma.
> 	* gcc.dg/vect/vect-63.c: Add novector pragma.
> 	* gcc.dg/vect/vect-64.c: Add novector pragma.
> 	* gcc.dg/vect/vect-65.c: Add novector pragma.
> 	* gcc.dg/vect/vect-66.c: Add novector pragma.
> 	* gcc.dg/vect/vect-67.c: Add novector pragma.
> 	* gcc.dg/vect/vect-68.c: Add novector pragma.
> 	* gcc.dg/vect/vect-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-70.c: Add novector pragma.
> 	* gcc.dg/vect/vect-71.c: Add novector pragma.
> 	* gcc.dg/vect/vect-72.c: Add novector pragma.
> 	* gcc.dg/vect/vect-73-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-73.c: Add novector pragma.
> 	* gcc.dg/vect/vect-74-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-74.c: Add novector pragma.
> 	* gcc.dg/vect/vect-75-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-75.c: Add novector pragma.
> 	* gcc.dg/vect/vect-76-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-76.c: Add novector pragma.
> 	* gcc.dg/vect/vect-77-alignchecks.c: Add novector pragma.
> 	* gcc.dg/vect/vect-77-global.c: Add novector pragma.
> 	* gcc.dg/vect/vect-77.c: Add novector pragma.
> 	* gcc.dg/vect/vect-78-alignchecks.c: Add novector pragma.
> 	* gcc.dg/vect/vect-78-global.c: Add novector pragma.
> 	* gcc.dg/vect/vect-78.c: Add novector pragma.
> 	* gcc.dg/vect/vect-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-80-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-80.c: Add novector pragma.
> 	* gcc.dg/vect/vect-82.c: Add novector pragma.
> 	* gcc.dg/vect/vect-82_64.c: Add novector pragma.
> 	* gcc.dg/vect/vect-83.c: Add novector pragma.
> 	* gcc.dg/vect/vect-83_64.c: Add novector pragma.
> 	* gcc.dg/vect/vect-85-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-85.c: Add novector pragma.
> 	* gcc.dg/vect/vect-86.c: Add novector pragma.
> 	* gcc.dg/vect/vect-87.c: Add novector pragma.
> 	* gcc.dg/vect/vect-88.c: Add novector pragma.
> 	* gcc.dg/vect/vect-89-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-89.c: Add novector pragma.
> 	* gcc.dg/vect/vect-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-92.c: Add novector pragma.
> 	* gcc.dg/vect/vect-93.c: Add novector pragma.
> 	* gcc.dg/vect/vect-95.c: Add novector pragma.
> 	* gcc.dg/vect/vect-96.c: Add novector pragma.
> 	* gcc.dg/vect/vect-97-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-97.c: Add novector pragma.
> 	* gcc.dg/vect/vect-98-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-98.c: Add novector pragma.
> 	* gcc.dg/vect/vect-99.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-12.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-14.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-18.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-19.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-20.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-alias-check-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-align-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-align-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-all-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-all.c: Add novector pragma.
> 	* gcc.dg/vect/vect-avg-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-avg-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-avg-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-avg-16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-avg-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bitfield-write-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bitfield-write-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bitfield-write-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bitfield-write-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bitfield-write-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bool-cmp.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bswap16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bswap32.c: Add novector pragma.
> 	* gcc.dg/vect/vect-bswap64.c: Add novector pragma.
> 	* gcc.dg/vect/vect-complex-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-complex-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-complex-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cond-arith-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cselim-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-cselim-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-div-bitmask-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-div-bitmask-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-div-bitmask.h: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-6-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-double-reduc-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-float-extend-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-float-truncate-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-floatint-conversion-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-floatint-conversion-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-fma-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-gather-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-gather-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-17.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-ifcvt-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-intfloat-conversion-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-intfloat-conversion-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-intfloat-conversion-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-intfloat-conversion-4a.c: Add novector pragma.
> 	* gcc.dg/vect/vect-intfloat-conversion-4b.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-8-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-8a-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-iv-8a.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-slp-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-slp-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-live-slp-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-mask-load-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-mask-loadstore-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-mulhrs-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-mult-const-pattern-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-mult-const-pattern-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-12.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-13.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-14.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-17.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-multitypes-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nb-iter-ub-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nb-iter-ub-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nb-iter-ub-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-neg-store-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-neg-store-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nest-cycle-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nest-cycle-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-nest-cycle-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2a-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2a.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2b.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2c-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2c.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-2d.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3a-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3a.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3b.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-3c.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-4d-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-4d.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-fir-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-fir-lb-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-fir-lb.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-fir.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-simd-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-simd-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-simd-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-slp-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-outer-slp-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-1-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-13.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-17.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-18.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-19.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-2-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-20.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-21.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-22.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-3-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-4-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-over-widen-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-peel-1-src.c: Add novector pragma.
> 	* gcc.dg/vect/vect-peel-2-src.c: Add novector pragma.
> 	* gcc.dg/vect/vect-peel-4-src.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-recurr-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-sdiv-pow2-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-sdivmod-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-shift-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-shift-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-shift-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-12.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-13.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-14.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-17.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-18.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-19.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-20.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-10.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-11.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-15.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-5.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-6.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-simd-clone-9.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-mult.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u16-i2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u16-i4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u16-mult.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u32-mult.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u8-i2-gap.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u8-i8-gap2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-a-u8-i8-gap7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-float.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-mult-char-ls.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-mult.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-same-dr.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-shift-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-store-a-u8-i2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-store-u16-i4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-store-u32-i2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-store.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u16-i2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u16-i3.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u16-i4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u32-i4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u32-i8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u32-mult.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i2-gap.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap2.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap4.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8-gap7.c: Add novector pragma.
> 	* gcc.dg/vect/vect-strided-u8-i8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-vfa-01.c: Add novector pragma.
> 	* gcc.dg/vect/vect-vfa-02.c: Add novector pragma.
> 	* gcc.dg/vect/vect-vfa-03.c: Add novector pragma.
> 	* gcc.dg/vect/vect-vfa-04.c: Add novector pragma.
> 	* gcc.dg/vect/vect-vfa-slp.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-1.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-const-s16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-const-u16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-half-u8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-half.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-s16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-s8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-u16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-u8-s16-s32.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-u8-u32.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-mult-u8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-shift-s16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-shift-s8.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-shift-u16.c: Add novector pragma.
> 	* gcc.dg/vect/vect-widen-shift-u8.c: Add novector pragma.
> 	* gcc.dg/vect/wrapv-vect-7.c: Add novector pragma.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/g++.dg/vect/pr84556.cc b/gcc/testsuite/g++.dg/vect/pr84556.cc
> index e0655536f7a0a1c32a918f4b112604a7e6b5e389..e2c97e917bed3e7c5e709f61384d75588f522308 100644
> --- a/gcc/testsuite/g++.dg/vect/pr84556.cc
> +++ b/gcc/testsuite/g++.dg/vect/pr84556.cc
> @@ -15,6 +15,7 @@ main ()
>    };
>    x ();
>    x ();
> +#pragma GCC novector
>    for (int i = 0; i < 8; ++i)
>      if (y[i] != i + 3)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/g++.dg/vect/simd-1.cc b/gcc/testsuite/g++.dg/vect/simd-1.cc
> index 76ce45d939dca8ddbc4953885ac71cf9f6ad298b..991db1d5dfee2a8d89de4aeae659b797629406c1 100644
> --- a/gcc/testsuite/g++.dg/vect/simd-1.cc
> +++ b/gcc/testsuite/g++.dg/vect/simd-1.cc
> @@ -88,12 +88,14 @@ main ()
>    s.foo (x, y);
>    if (x != 1024 || s.s != 2051 || s.t != 2054)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1025; ++i)
>      if (a[i] != 2 * i)
>        abort ();
>    s.bar (x, y);
>    if (x != 2049 || s.s != 4101 || s.t != 4104)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1025; ++i)
>      if (a[i] != 4 * i)
>        abort ();
> @@ -102,12 +104,14 @@ main ()
>    s.baz (x, y);
>    if (x != 1024 || s.s != 2051 || s.t != 2054)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1025; ++i)
>      if (a[i] != 2 * i)
>        abort ();
>    s.qux (x, y);
>    if (x != 2049 || s.s != 4101 || s.t != 4104)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1025; ++i)
>      if (a[i] != 4 * i)
>        abort ();
> diff --git a/gcc/testsuite/g++.dg/vect/simd-2.cc b/gcc/testsuite/g++.dg/vect/simd-2.cc
> index 6f5737b7e40b5c2889f26cb4e4c3445e1c3822dd..0ff57e3178d1d79393120529ceea282498015d09 100644
> --- a/gcc/testsuite/g++.dg/vect/simd-2.cc
> +++ b/gcc/testsuite/g++.dg/vect/simd-2.cc
> @@ -110,6 +110,7 @@ main ()
>    foo (a, b);
>    if (r.s != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s.s += i;
> @@ -121,6 +122,7 @@ main ()
>    if (bar ().s != 1024 * 1023)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s.s += 2 * i;
> @@ -132,6 +134,7 @@ main ()
>    if (r.s != 1024 * 1023 / 2)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s.s += i;
> @@ -143,6 +146,7 @@ main ()
>    if (qux ().s != 1024 * 1023)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s.s += 2 * i;
> diff --git a/gcc/testsuite/g++.dg/vect/simd-3.cc b/gcc/testsuite/g++.dg/vect/simd-3.cc
> index d9981719f58ced487c4ffbbecb7c8a5564165bc7..47148f050ed056a2b3340f1e60604606f6cc1311 100644
> --- a/gcc/testsuite/g++.dg/vect/simd-3.cc
> +++ b/gcc/testsuite/g++.dg/vect/simd-3.cc
> @@ -75,6 +75,7 @@ main ()
>    foo (a, b, r);
>    if (r != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += i;
> @@ -86,6 +87,7 @@ main ()
>    if (bar () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += 2 * i;
> @@ -99,6 +101,7 @@ main ()
>    if (r != 1024 * 1023 / 2)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += i;
> @@ -110,6 +113,7 @@ main ()
>    if (qux () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += 2 * i;
> diff --git a/gcc/testsuite/g++.dg/vect/simd-4.cc b/gcc/testsuite/g++.dg/vect/simd-4.cc
> index 8f3198943a7427ae3d4800bfbc5575c5849627ff..15b1bc1c99d5d42ecca330e063fed19a50fb3276 100644
> --- a/gcc/testsuite/g++.dg/vect/simd-4.cc
> +++ b/gcc/testsuite/g++.dg/vect/simd-4.cc
> @@ -77,6 +77,7 @@ main ()
>    foo (a, b, r);
>    if (r != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += i;
> @@ -88,6 +89,7 @@ main ()
>    if (bar () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += 2 * i;
> @@ -101,6 +103,7 @@ main ()
>    if (r != 1024 * 1023 / 2)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += i;
> @@ -112,6 +115,7 @@ main ()
>    if (qux () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += 2 * i;
> diff --git a/gcc/testsuite/g++.dg/vect/simd-5.cc b/gcc/testsuite/g++.dg/vect/simd-5.cc
> index dd817b8888b1b17d822f576d6d6b123f338e984f..31c2ce8e7129983e02237cdd32e41ef0a8f25f90 100644
> --- a/gcc/testsuite/g++.dg/vect/simd-5.cc
> +++ b/gcc/testsuite/g++.dg/vect/simd-5.cc
> @@ -110,6 +110,7 @@ main ()
>    foo (a, b, r);
>    if (r.s != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s.s += i;
> @@ -121,6 +122,7 @@ main ()
>    if (bar ().s != 1024 * 1023)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s.s += 2 * i;
> @@ -132,6 +134,7 @@ main ()
>    if (r.s != 1024 * 1023 / 2)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s.s += i;
> @@ -143,6 +146,7 @@ main ()
>    if (qux ().s != 1024 * 1023)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s.s += 2 * i;
> diff --git a/gcc/testsuite/g++.dg/vect/simd-6.cc b/gcc/testsuite/g++.dg/vect/simd-6.cc
> index 883b769a9b854bd8c1915648d15ea8996d461f05..7de41a90cae3d80c0ccafad8a9b041bee89764d3 100644
> --- a/gcc/testsuite/g++.dg/vect/simd-6.cc
> +++ b/gcc/testsuite/g++.dg/vect/simd-6.cc
> @@ -118,6 +118,7 @@ main ()
>    foo (a, b);
>    if (r.s != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i].s != s.s)
> @@ -129,6 +130,7 @@ main ()
>    if (bar<int> ().s != 1024 * 1023)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i].s != s.s)
> @@ -140,6 +142,7 @@ main ()
>    if (r.s != 1024 * 1023 / 2)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i].s != s.s)
> @@ -151,6 +154,7 @@ main ()
>    if (qux ().s != 1024 * 1023)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i].s != s.s)
> diff --git a/gcc/testsuite/g++.dg/vect/simd-7.cc b/gcc/testsuite/g++.dg/vect/simd-7.cc
> index 1467849e0c6baa791016b039ca21cfa2cc63ce7f..b543efb191cfbf9c561b243996cdd3a4b66b7533 100644
> --- a/gcc/testsuite/g++.dg/vect/simd-7.cc
> +++ b/gcc/testsuite/g++.dg/vect/simd-7.cc
> @@ -79,6 +79,7 @@ main ()
>    foo<int *, int &> (a, b, r);
>    if (r != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -90,6 +91,7 @@ main ()
>    if (bar<int> () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -103,6 +105,7 @@ main ()
>    if (r != 1024 * 1023 / 2)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -114,6 +117,7 @@ main ()
>    if (qux<int &> () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> diff --git a/gcc/testsuite/g++.dg/vect/simd-8.cc b/gcc/testsuite/g++.dg/vect/simd-8.cc
> index 8e297e246bd41a2f63469260f4fdcfcb5a68a62e..4d76a97a97233cecd4d35797a4cc52f70a4c5e3b 100644
> --- a/gcc/testsuite/g++.dg/vect/simd-8.cc
> +++ b/gcc/testsuite/g++.dg/vect/simd-8.cc
> @@ -77,6 +77,7 @@ main ()
>    foo (a, b, r);
>    if (r != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -88,6 +89,7 @@ main ()
>    if (bar () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -101,6 +103,7 @@ main ()
>    if (r != 1024 * 1023 / 2)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -112,6 +115,7 @@ main ()
>    if (qux () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> diff --git a/gcc/testsuite/g++.dg/vect/simd-9.cc b/gcc/testsuite/g++.dg/vect/simd-9.cc
> index 4c5b0508fbd79f0e6aa311072062725536d8e2a3..5d1a174e0fc5425f33769fd017b4fd6a51a2fb14 100644
> --- a/gcc/testsuite/g++.dg/vect/simd-9.cc
> +++ b/gcc/testsuite/g++.dg/vect/simd-9.cc
> @@ -110,6 +110,7 @@ main ()
>    foo (a, b, r);
>    if (r.s != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i].s != s.s)
> @@ -121,6 +122,7 @@ main ()
>    if (bar ().s != 1024 * 1023)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i].s != s.s)
> @@ -132,6 +134,7 @@ main ()
>    if (r.s != 1024 * 1023 / 2)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i].s != s.s)
> @@ -143,6 +146,7 @@ main ()
>    if (qux ().s != 1024 * 1023)
>      abort ();
>    s.s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i].s != s.s)
> diff --git a/gcc/testsuite/g++.dg/vect/simd-clone-6.cc b/gcc/testsuite/g++.dg/vect/simd-clone-6.cc
> index fb00e8816a5fc157b780edd1d7064804a67d6373..2d9bb62555ff6c9473db2d1b754aed0123f2cb62 100644
> --- a/gcc/testsuite/g++.dg/vect/simd-clone-6.cc
> +++ b/gcc/testsuite/g++.dg/vect/simd-clone-6.cc
> @@ -30,6 +30,7 @@ do_main ()
>    #pragma omp simd
>    for (i = 0; i < N; i++)
>      e[i] = foo (c[i], d[i], f[i]);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (e[i] != 6 * i)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/O3-pr70130.c b/gcc/testsuite/gcc.dg/vect/O3-pr70130.c
> index f8b84405140e87a2244ae9f5db6136af2fe9cf57..17ce6c392546f7e46a6db9f30f76dcaedb96d08c 100644
> --- a/gcc/testsuite/gcc.dg/vect/O3-pr70130.c
> +++ b/gcc/testsuite/gcc.dg/vect/O3-pr70130.c
> @@ -90,6 +90,7 @@ main (void)
>    for (i = 0; i < 8; i++)
>      Loop_err (images + i, s, -1);
>  
> +#pragma GCC novector
>    for (i = 0; i < 8; i++)
>      if (__builtin_memcmp (&expected, images + i, sizeof (expected)))
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/Os-vect-95.c b/gcc/testsuite/gcc.dg/vect/Os-vect-95.c
> index 97e516ed68e6166eb5f0631004d89f8eedde1cc4..8039be89febdb150226b513ffe267f6065613ccb 100644
> --- a/gcc/testsuite/gcc.dg/vect/Os-vect-95.c
> +++ b/gcc/testsuite/gcc.dg/vect/Os-vect-95.c
> @@ -10,6 +10,7 @@ void bar (float *pd, float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-1.c
> index 793c41f6b724d2b6f5ecca6511ea8504e1731a8c..3dc5e746cd0d5c99dcb0c88a05b94c73b44b0e65 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-1.c
> @@ -29,6 +29,7 @@ main1 (int dummy)
>      }
>  
>    /* check results: */ 
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8]
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
> index 82fae06e3244a9bbb4a471faecdc5f1174970229..76430e0915e2d6ad342dae602fd22337f4559b63 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
> @@ -37,6 +37,7 @@ main1 (int dummy)
>  
>    a = 0;
>    /* check results: */ 
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8] + a
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-2.c
> index fcf1cd327e0b20582e3512faacfebfe6b7db7278..cb1b38dda14785c6755d311683fbe9703355b39a 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-2.c
> @@ -28,6 +28,7 @@ main1 (int dummy)
>      }
>  
>    /* check results:  */ 
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8]
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-24.c b/gcc/testsuite/gcc.dg/vect/bb-slp-24.c
> index ca049c81ba05482813dbab50ab3f4c6df94570e4..6de8dd8affce8e6f6ad40a36d6a163fc25b3fcf9 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-24.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-24.c
> @@ -44,6 +44,7 @@ int main (void)
>  
>    foo (dst, src, N, 8);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (dst[i] != A * i)
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-25.c b/gcc/testsuite/gcc.dg/vect/bb-slp-25.c
> index 7a9cf955e3e540e08b42cd80872bb99b53cabcb2..d44d585ff25aed7394945cff64f20923b5600061 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-25.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-25.c
> @@ -45,6 +45,7 @@ int main (void)
>  
>    foo (dst, src, N, 8);
>  
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (dst[i] != A * i + i + 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-26.c b/gcc/testsuite/gcc.dg/vect/bb-slp-26.c
> index df529673f6c817620a8423ab14724fe4e72bca49..fde062e86c7a01ca29d6e7eb8367414bd734500b 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-26.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-26.c
> @@ -45,6 +45,7 @@ int main (void)
>  
>    foo (dst, src, N, 8);
>  
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (dst[i] != A * src[i] + src[i+8])
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-27.c b/gcc/testsuite/gcc.dg/vect/bb-slp-27.c
> index bc27f2fca04de8f837ce51090657c8f2cc250c24..3647dd97c69df8a36fc66ca8e9988e215dad71eb 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-27.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-27.c
> @@ -35,6 +35,7 @@ int main (void)
>  
>    foo (A);
>  
> +#pragma GCC novector
>    for (i = 0; i < 8; i++)
>      {
>        if (dst[i] != A * i)
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-28.c b/gcc/testsuite/gcc.dg/vect/bb-slp-28.c
> index 8749a1f22a6cc1e62a15bd988c50f6f63f26a0a2..c92b687aa44705118f21421a817ac3067e2023c6 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-28.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-28.c
> @@ -56,6 +56,7 @@ int main (void)
>  
>    foo (A);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (dst[i] != A * i
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-29.c b/gcc/testsuite/gcc.dg/vect/bb-slp-29.c
> index b531350ff3073b7f54b9c03609d6c8279e0374db..9272f02b2aa14f52b04e3d6bb08f15be17ce6a2f 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-29.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-29.c
> @@ -45,6 +45,7 @@ int main (void)
>  
>    foo (dst, src, N, 8);
>  
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (dst[i] != A * src[i] + B * src[i+1])
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-42.c b/gcc/testsuite/gcc.dg/vect/bb-slp-42.c
> index 1dfa301184aad4c8edf00af80fb861562c941049..69fd0968491544f98d1406ff8a166b723714dd23 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-42.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-42.c
> @@ -36,6 +36,7 @@ main ()
>    foo (a, b);
>  
>    for (int i = 0; i < 4; ++i)
> +#pragma GCC novector
>      for (int j = 0; j < ARR_SIZE; ++j)
>        if (a[i][j] != (i + 1) * ARR_SIZE - j + 20 * i)
>  	__builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> index ccb4ef659e47e3524d0dd602fa9d1291847dee3c..c8024429e9c44d924f5bb2af2fcc6b5eaa1b7db7 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
> @@ -35,6 +35,7 @@ int main ()
>  
>    foo (a, 4);
>  
> +#pragma GCC novector
>    for (i = 1; i < N; i++)
>      if (a[i] != i%4 + 1)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
> index 5a9fe423691e549ea877c42e46e9ba70d6ab5b00..b556a1d627865f5425e644df11f98661e6a85c29 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
> @@ -45,6 +45,7 @@ DEF_LOOP (unsigned)
>  	asm volatile ("" ::: "memory");			\
>        }							\
>      f_##SIGNEDNESS (a, b, c);				\
> +    _Pragma("GCC novector")				\
>      for (int i = 0; i < N; ++i)				\
>        if (a[i] != (BASE_B + BASE_C + i * 29) >> 1)	\
>  	__builtin_abort ();				\
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
> index 15a94e680be4568232e31956732d7416549a18ff..d1aa161c3adcfad1d916de486a04c075f0aaf958 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
> @@ -44,6 +44,7 @@ DEF_LOOP (unsigned)
>  	asm volatile ("" ::: "memory");			\
>        }							\
>      f_##SIGNEDNESS (a, b, C);				\
> +    _Pragma("GCC novector")				\
>      for (int i = 0; i < N; ++i)				\
>        if (a[i] != (BASE_B + C + i * 15) >> 1)		\
>  	__builtin_abort ();				\
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
> index 47b1a43665130e11f902f5aea11b01faf307101b..a3ff0f5b3da2f25ce62a5e9fabe5b38e9b952fa9 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-1.c
> @@ -37,6 +37,7 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        __asm__ volatile ("");
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c
> index c50560b53696c340b0c071296f002f65bcb91631..05fde3a7feba81caf54acff82870079b87b7cf53 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pattern-2.c
> @@ -39,6 +39,7 @@ int main ()
>  
>    foo (a, b, 8);
>  
> +#pragma GCC novector
>    for (i = 1; i < N; i++)
>      if (a[i] != i%8 + 1)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c
> index fc76700ced3d4f439b0f12eaf9dbc2b1fec72c20..c186c7b66c65e5f62edee25a924fdcfb25b252ab 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pow-1.c
> @@ -16,6 +16,7 @@ int
>  main (void)
>  {
>    f (a);
> +#pragma GCC novector
>    for (int i = 0; i < 4; ++i)
>      {
>        if (a[i] != (i + 1) * (i + 1))
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c
> index ac89883de22c9f647041fb373618dae5b7c036f3..dda74ebe03c35811ee991a181379e688430d8412 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101615-2.c
> @@ -16,6 +16,8 @@ int main()
>  	for (int e = 0; e <= 4; e++)
>  	  a[e + 1] |= 3;
>      }
> +
> +#pragma GCC novector
>    for (int d = 0; d < 6; d++)
>      if (a[d] != res[d])
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> index ee12136491071c6bfd7678c164df7a1c0a71818f..77d3ae7d424e208409c5baf18c6f39f294f7e351 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
> @@ -51,6 +51,7 @@ int main()
>    rephase ();
>    for (i = 0; i < 32; ++i)
>      for (j = 0; j < 3; ++j)
> +#pragma GCC novector
>        for (k = 0; k < 3; ++k)
>  	if (lattice->link[i].e[j][k].real != i
>  	    || lattice->link[i].e[j][k].imag != i)
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
> index 40a02ed1309e2b6b4dc44cf56018a4bb71cc519f..bea3b92ba775a4e8b547d4edccf3ae4a4aa50b40 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
> @@ -31,9 +31,11 @@ main (int argc, char **argv)
>    __asm__ volatile ("" : : : "memory");
>    test (a, b);
>    __asm__ volatile ("" : : : "memory");
> +#pragma GCC novector
>    for (int i = 0; i < 4; i++)
>      if (a[i] != i+4)
>        abort ();
> +#pragma GCC novector
>    for (int i = 4; i < 8; i++)
>      if (a[i] != 0)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
> index cc50a5bde01315be13058ac3409db07f4ce6de5f..085cb986b99c00cb1449db61bb68ccec4e7aa0ba 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
> @@ -32,6 +32,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.b[i] != 5)
> @@ -45,6 +46,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.c[i] != 6)
> @@ -58,6 +60,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.d.k[i] != 7)
> @@ -71,6 +74,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.e.k[i] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
> index b82b8916be125b194a02aa74cef74f821796de7f..f07893458b658fc728703ffc8897a7f7aeafdbb3 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
> @@ -23,6 +23,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
> index 51847599fa62a88ecc090673ab670f7c0a8ac711..cfe7b8536892caa5455e9440505187f21fa09e63 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
> @@ -29,6 +29,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (tmp1.a.n[1][2][i] != 5)
> @@ -42,6 +43,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 3; i < N-1; i++)
>      {
>        if (tmp1.a.n[1][2][i] != 6)
> @@ -55,6 +57,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (tmp1.e.n[1][2][i] != 7)
> @@ -68,6 +71,7 @@ int main1 ()
>      }
>   
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 3; i <N-3; i++)
>      {
>        if (tmp1.e.n[1][2][i] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
> index c00a5bec6d5f9c325beb7e79a4520b76843f0a43..9e57cae9751d7231a2156acbb4c63c49dc0e8b95 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
> @@ -48,6 +48,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
> @@ -73,6 +74,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  (in[i*4] + 2) * 3
> @@ -92,6 +94,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*4; i++)
>      {
>        if (out2[i*2] !=  (float) (in[i*2] * 2 + 11)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c
> index e27152eb91ef2feb6e547e5a00b0fc8fe40e2cee..4afbeea9927676b7dbdf78480671056e8777b183 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c
> @@ -23,6 +23,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/4; i++)
>      {
>        if (tmp.b[2*i] != 5
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> index c092848dfc9d093e6fc78ce171bb4c1f59a0cf85..9cfae91534f38248a06fb60ebbe05c84a4baccd2 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> @@ -58,6 +58,7 @@ main (void)
>    foo ();
>  
>    /* Check resiults. */ 
> +#pragma GCC novector
>    for (i = 0; i < 16; i++)
>      {
>        if (cf[i].f1 != res1[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
> index c57f065cccdd6cba4f96efe777318310415863c9..454a714a309163a39128bf20ef7e8426bd26da15 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
> @@ -30,6 +30,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.b[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c
> index 9bb81e371725ea0714f91eee1f5683c7c014e64c..f69e5c2ee5383abb0a242938426ef09621e54043 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c
> @@ -31,6 +31,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.c[i] != 6)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c
> index d062d659ffb0138859333f3d7e375bd83fc1c99a..cab6842f72d150b83d525abf7a23971817b9082e 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c
> @@ -30,6 +30,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.d.k[i] != 7)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c
> index dc170a0530c564c884bc739e6d82874ccddad12c..05c28fe75e6dc67acba59e73d2b8d3363cd47c9b 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c
> @@ -22,6 +22,7 @@ __attribute__((noipa)) int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c
> index ce27e4f082151b630376bd9cfbbabb78e80e4387..648e19f1071f844cc9f968414952897c12897688 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c
> @@ -29,6 +29,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (tmp1.a.n[1][2][i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c
> index dae5a78808f1d6a5754adb8e7ff4b22608ea33b4..badf5dff70225104207b65a6fe4a2a79223ff1ff 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c
> @@ -29,6 +29,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 3; i < N-1; i++)
>      {
>        if (tmp1.a.n[1][2][i] != 6)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c
> index 8221f9e49f8875f453dbc12ca0da4a226e7cf62d..d71a202d8d2b6edaee8b71a485fa68ff56e983ba 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c
> @@ -29,6 +29,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (tmp1.e.n[1][2][i] != 7)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c
> index 2fc751ce70da35b055c64d9e8bec222a4b4feb8b..f18da3fc1f0c0df27c5bd9dd7995deae19352620 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c
> @@ -26,6 +26,7 @@ int main1 (int *pib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = OFF; i < N; i++)
>      {
>       if (ia[i] != pib[i - OFF])
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> index 5da4343198c10e3d35c9f446bc96f1b97d123f84..cbbfbb24658f8a11d4695fe5e16de4e4cfbdbc7e 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> @@ -28,6 +28,7 @@ int main1 (int *pib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = OFF; i < N; i++)
>      {
>       if (pib[i - OFF] != ic[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c
> index 1fc14666f286c1f1170d66120d734647db7686cf..2a672122bcc549029c95563745b56d74f41d9a82 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c
> @@ -26,6 +26,7 @@ int main1 (int *pib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = OFF; i < N; i++)
>      {
>       if (ia[i] != ic[i - OFF])
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c
> index 1a1a86878405bd3bf240e1417ad68970a585c562..9c659f83928046df2b40c2dcc20cdc12fad6c4fe 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c
> @@ -59,6 +59,7 @@ int main (void)
>    foo ();
>    fir ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      if (out[i] != fir_out[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
> index cc50a5bde01315be13058ac3409db07f4ce6de5f..085cb986b99c00cb1449db61bb68ccec4e7aa0ba 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
> @@ -32,6 +32,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.b[i] != 5)
> @@ -45,6 +46,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.c[i] != 6)
> @@ -58,6 +60,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.d.k[i] != 7)
> @@ -71,6 +74,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.e.k[i] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
> index 5e4affae7db61a0a07568603f1c80aefaf819adb..2f48955caa19f61c12e4c178f60f564c2e277bee 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
> @@ -23,6 +23,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
> index 51847599fa62a88ecc090673ab670f7c0a8ac711..cfe7b8536892caa5455e9440505187f21fa09e63 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
> @@ -29,6 +29,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (tmp1.a.n[1][2][i] != 5)
> @@ -42,6 +43,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 3; i < N-1; i++)
>      {
>        if (tmp1.a.n[1][2][i] != 6)
> @@ -55,6 +57,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (tmp1.e.n[1][2][i] != 7)
> @@ -68,6 +71,7 @@ int main1 ()
>      }
>   
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 3; i <N-3; i++)
>      {
>        if (tmp1.e.n[1][2][i] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c
> index cfea8723ba2731c334c1fffd749dc157d8f68e36..d9f19d90431ab1e458de738411d7d903445cd04d 100644
> --- a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-1.c
> @@ -32,6 +32,7 @@ main1 ()
>        d[i] = i * i;
>      }
>    f1 ();
> +#pragma GCC novector
>    for (i = 0; i < 8; i++)
>      if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + i + i - a[i]) >= 0.0001f)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c
> index 6d67d12f9961f5cbc53d6f7df5240ac2178a08ac..76bb044914f462cf6d76b559b751f1338a3fc0f8 100644
> --- a/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-2.c
> @@ -44,12 +44,14 @@ main1 ()
>        b[i] = ((i & 1) ? -4 * i : 4 * i) + 0.25;
>      }
>    f1 ();
> +#pragma GCC novector
>    for (i = 0; i < 8; i++)
>      if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1 + i)
>        abort ();
>      else
>        a[i] = 131.25;
>    f2 ();
> +#pragma GCC novector
>    for (i = 0; i < 8; i++)
>      if (a[i] != ((i & 1) ? -4 * i : 4 * i))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
> index 495c0319c9dabd65436b5f6180114dfa8967f071..ad22f6e82b3c3312c9f10522377c4749e87ce3aa 100644
> --- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
> @@ -65,24 +65,28 @@ main1 ()
>        d[i] = i * i;
>      }
>    f1 (16);
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + (i & 3) + i - a[i]) >= 0.0001f)
>        abort ();
>      else
>        a[i] = 131.25;
>    f2 (16);
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + (i & 1) + i - a[i]) >= 0.0001f)
>        abort ();
>      else
>        a[i] = 131.25;
>    f3 ();
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + i - a[i]) >= 0.0001f)
>        abort ();
>      else
>        a[i] = 131.25;
>    f4 (10);
> +#pragma GCC novector
>    for (i = 0; i < 60; i++)
>      if (fabsf (((i & 2) ? -4 * i : 4 * i) + 1 + (i % 3) + i - a[i]) >= 0.0001f)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c
> index 274ff0f9942c5aff6c6aaca5243ef21bd8708856..d51e17ff656b7cc7ef3d87d207f78aae8eec9373 100644
> --- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-2.c
> @@ -82,36 +82,42 @@ main1 ()
>        b[i] = ((i & 1) ? -4 * i : 4 * i) + 0.25;
>      }
>    f1 (16);
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1 + (i & 3))
>        abort ();
>      else
>        a[i] = 131.25;
>    f2 (16);
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1 + (i & 1))
>        abort ();
>      else
>        a[i] = 131.25;
>    f3 ();
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      if (a[i] != ((i & 1) ? -4 * i : 4 * i) + 1)
>        abort ();
>      else
>        a[i] = 131.25;
>    f4 (16);
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      if (a[i] != ((i & 1) ? -4 * i : 4 * i))
>        abort ();
>      else
>        a[i] = 131.25;
>    f5 (16);
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      if (a[i] != ((i & 1) ? -4 * i : 4 * i))
>        abort ();
>      else
>        a[i] = 131.25;
>    f6 ();
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      if (a[i] != ((i & 1) ? -4 * i : 4 * i))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c b/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
> index 893e521ed8b83768699bc9b70f7d33b91dd89c9b..07992cf72dcfa4da5211a7a160fb146cf0b7ba5c 100644
> --- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
> @@ -47,6 +47,7 @@ main (void)
>    foo ();
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>    {
>      if (c[i] != res[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
> index 71f2db3e0f281c4cdb1bf89315cc959382459e83..fc710637ac8142778b18810cefadf00dda3f39a6 100644
> --- a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
> +++ b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
> @@ -56,6 +56,7 @@ main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i].a != result[2*i] || out[i].b != result[2*i+1])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c
> index 82b37d4ca71344d15e00e0453dae6470c8d5ba9b..aeaf8146b1a817379a09dc3bf09f542524522f99 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c
> @@ -32,6 +32,7 @@ int main (void)
>    foo ();
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c
> index cafb96f52849c2a9b51591753898207beac9bdd0..635df4573c7cc0d4005421ce12d87b0c6511a228 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c
> @@ -31,6 +31,7 @@ int main (void)
>    foo ();
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<200*N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c
> index b376fb1d13ec533eebdbcb8092f03b4790de379a..494ff0b6f8f14f3d3b6aba1ada60d6442ce10811 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c
> @@ -31,6 +31,7 @@ int main (void)
>    foo ();
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c
> index 64c8dfc470ab02f3ea323f13b6477d6370210937..ba766a3f157db3f1a3d174ca6062fe7ddc60812c 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c
> @@ -38,6 +38,7 @@ int main (void)
>    foo ();
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c
> index 277b73b4934a7bd689f8b2856b7813567dd762bc..d2eee349a42cd1061917c828895e45af5f730eb1 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10.c
> @@ -38,6 +38,7 @@ int main (void)
>    foo (N-1);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N/2; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
> index 325e201e2dac7aff88f4cb7aff53a7ee25b18631..cf7d605f23ba94b7a0a71526db02b59b517cbacc 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
> @@ -42,6 +42,7 @@ int main (void)
>    foo (N-1);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N/2; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
> index d9cf28b22d9712f4e7f16ed18b89b0875d94daee..cfb837dced894ad8a885dcb392f489be381a3065 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
> @@ -41,6 +41,7 @@ int main (void)
>    foo (N-1);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N/2; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c
> index f5aeac981870d0e58679d5574dd12e2c4b40d23a..d650a9d1cdc7af778a2dac8e3e251527b825487d 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-11.c
> @@ -34,6 +34,7 @@ int main (void)
>    foo (N);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
> index b5f8c3c88e4d0a6562ba867ae83c1ab120077111..e9ec4ca0da316be7d4d02138b0313a9ab087a601 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-12.c
> @@ -33,6 +33,7 @@ int main (void)
>    foo ();
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c
> index 9d642415a133eefb420ced6642ac6d32a0e7e33f..13aac4a939846f05826f2b8628258c0fbd2e413a 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-15.c
> @@ -32,6 +32,7 @@ int main (void)
>    foo (3);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> index f00132ede83134eb10639b77f5617487356e2ef1..c7c2fa8a5041fbc67747b4d4b98571f71f9599b6 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-16.c
> @@ -41,6 +41,7 @@ int main (void)
>    res = foo ();
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum += i;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> index 2dfdd59e14ef9685d22b2b8c34d55052ee747e7e..ba904a6c03e5a94f4a2b225f180bfe6a384f21d1 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-17.c
> @@ -47,6 +47,7 @@ int main (void)
>    res = foo ();
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum += (b[i] - c[i]);
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c
> index 49dd5207d862fd1de81b59013a07ea74ee9b5beb..464fcb1fc310a7366ef6a55c5ed491a7410720f8 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c
> @@ -35,6 +35,7 @@ int main (void)
>    foo ();
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N/2; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> index 934eadd3e326b950bf33eac03136868089fa1371..5cd4049d08c84ab9f3503a3f1577d170df8ce6c3 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-19.c
> @@ -36,6 +36,7 @@ int main (void)
>    foo ();
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c
> index 42e218924fdbeb7c21830537a55364ad5ca822ac..a9ef1c04c70510797006d8782dcc6abf2908e4f4 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c
> @@ -38,6 +38,7 @@ int main (void)
>    foo ();
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N/2; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> index 75b7f066abae1496caa98080cdf4355ca1383091..72e53c2bfb0338a48def620159e384d423399d0b 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-21.c
> @@ -41,6 +41,7 @@ int main (void)
>    res = foo ();
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum += i;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c
> index ec04bc28f6279c0cd6a6c174698aedc4312c7ab5..b41b2c322b91ab0a9a06ab93acd335b53f654a6d 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-22.c
> @@ -38,6 +38,7 @@ int main (void)
>    foo (N);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c
> index ee39891efea2231362dc776efc4193898f06a02c..91e57a5963ac81964fb0c98a28f7586bf98df059 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-3.c
> @@ -35,6 +35,7 @@ int main (void)
>    foo ();
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c
> index f8ef02d25190a29315e6909b9d89642f699b6c6a..a6c29956f3b84ee0def117bdc886219bf07ec2d0 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-4.c
> @@ -39,6 +39,7 @@ int main (void)
>    foo (N);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c
> index 2ef43cd146bdbbe6e7a8b8f0a66a11a1b8b7ec08..f01fcfb5c34906dbb96d050068b528192aa0f79a 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-5.c
> @@ -37,6 +37,7 @@ int main (void)
>    foo ();
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
> index 7ac4e1ed949caecd6d2aaa7bf6d33d459ff74f8c..cf529efa31d6a10d3aaad69570f3f3ae102d327c 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c
> @@ -39,6 +39,7 @@ int main (void)
>      a[i] = foo (b,i);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = b[i];
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c
> index ad2f472f1ed100912386d51ef999353baf50dd93..9c1e251f6a79fd34a820d64393696722c508e671 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c
> @@ -38,6 +38,7 @@ int main (void)
>      a[i] = foo (b,i);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = b[i];
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
> index f56bd2e50af42f20f57791b2e3f0227dac13ee82..543ee98b5a44c91c2c249df0ece304dd3282cc1a 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
> @@ -63,6 +63,7 @@ int main (void)
>    res = foo (2);
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        if (a[i] != bar ())
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c
> index 7c9113bc0f0425139c6723105c78cc8306d82f8c..0ed589b47e6bc722386a9db83e6397377f0e2069 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-8.c
> @@ -34,6 +34,7 @@ int main (void)
>    foo (a);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c
> index cea495c44808373543242d8998cdbfb9691499ca..62fa559e6ce064065b3191f673962a63e874055f 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9.c
> @@ -34,6 +34,7 @@ int main (void)
>    foo (N);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
> index 9e1f7890dd1ebc14b4a9a88488625347dcabd38a..96ffb4ce7b4a8a06cb6966acc15924512ad00f31 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
> @@ -38,6 +38,7 @@ int main (void)
>    foo (N);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
> index ee65ceb6f92a185ca476afcc0b82295ab0034ba5..d76752c0dba3bbedb2913f87ed4b95f7d48ed2cf 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
> @@ -37,6 +37,7 @@ int main (void)
>    foo (N);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
> index fe9e7e7ab4038acfe02d3e6ea9c4fc37ba207043..00d0eca56eeca6aee6f11567629dc955c0924c74 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
> @@ -24,6 +24,7 @@ main1 ()
>     }
>  
>    /* check results:  */
> +#pragma GCC novector
>     for (j = 0; j < N; j++)
>     {
>      for (i = 0; i < N; i++)
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
> index dc5f16fcfd2c269a719f7dcc5d2d0d4f9dbbf556..48b6a9b0681cf1fe410755c3e639b825b27895b0 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
> @@ -24,6 +24,7 @@ main1 ()
>     }
>  
>    /* check results:  */
> +#pragma GCC novector
>   for (i = 0; i < N; i++)
>     {
>      for (j = 0; j < N; j++) 
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c
> index 131d2d9e03f44ed680cb49c71673908511c9236f..57ebd5c92a4297940bbdfc051c8a08d99a3b184e 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c
> @@ -22,6 +22,7 @@ int main1 ()
>     } while (i < N);
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (arr1[i] != 2+2*i)
> diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
> index d2ae7976781e20c6e4257e0ad4141ceb21ed711b..a1311504d2f8e67c275e8738b3c201187cd02bc0 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c
> @@ -39,6 +39,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.b[i] != 5)
> @@ -52,6 +53,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.c[i] != 6)
> @@ -65,6 +67,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.d.k[i] != 7)
> @@ -78,6 +81,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.e.k[i] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c
> index 1edad1ca30eeca0a224a61b5035546615a360fef..604d4b1bc6772f7bf9466b204ebf43e639642a02 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c
> @@ -22,6 +22,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != cb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c
> index 7663ca7281aacc0ba3e685887e3c20be97322148..3eada6057dd91995709f313d706b6d94b8fb99eb 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c
> @@ -32,6 +32,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != s.cb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
> index 243e01e6dadf48d976fdd72bedd9547746cf73b5..19fbe331b57fde1412bfdaf7024e8c108f913da5 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c
> @@ -54,6 +54,7 @@ int main1 ()
>    /* check results: */  
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ia[i][1][j] != ib[i])
> @@ -64,6 +65,7 @@ int main1 ()
>    /* check results: */  
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ic[i][1][1][j] != ib[i])
> @@ -74,6 +76,7 @@ int main1 ()
>    /* check results: */  
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (id[i][1][j+1] != ib[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c
> index 581554064b572b7eb26d5f9852d4d13622317c7e..d51ef31aeac0d910a69d0959cc0da46d92bd7af9 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c
> @@ -44,6 +44,7 @@ int main1 ()
>    /* check results: */  
>    for (i = 0; i < M; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ia[i][1][j] != ib[2][i][j])
> @@ -64,6 +65,7 @@ int main1 ()
>    /* check results: */
>    for (i = 0; i < M; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ic[j] != ib[2][i][j])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
> index e339590bacb494569558bfe9536c43f0d6339b8e..23cd3d5c11157f6735ed219c16075007f26034e5 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c
> @@ -29,6 +29,7 @@ int main1 ()
>      {
>        for (j = 0; j < N; j++)
>          {
> +#pragma GCC novector
>             if (ia[2][6][j] != 5)
>                  abort();
>          }
> @@ -45,6 +46,7 @@ int main1 ()
>      {
>        for (j = 2; j < N+2; j++)
>          {
> +#pragma GCC novector
>             if (ia[3][6][j] != 5)
>                  abort();
>          }
> @@ -62,6 +64,7 @@ int main1 ()
>    /* check results: */  
>    for (i = 0; i < 16; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ic[2][1][6][j+1] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
> index c403a8302d842a8eda96d2ee0fb25a94e8323254..36b79c2907cc1b41664cdca5074d458e36bdee98 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c
> @@ -35,6 +35,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (tmp1.a.n[1][2][i] != 5)
> @@ -48,6 +49,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 3; i < N-1; i++)
>      {
>        if (tmp1.a.n[1][2][i] != 6)
> @@ -61,6 +63,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (tmp1.e.n[1][2][i] != 7)
> @@ -74,6 +77,7 @@ int main1 ()
>      }
>   
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 3; i <N-3; i++)
>      {
>        if (tmp1.e.n[1][2][i] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
> index 34317ccb624a4ca75c612c70a5b5105bb85e272b..a0e53d5fef91868dfdbd542dd0a98dff92bd265b 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
> @@ -52,6 +52,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (tmp1[2].a.n[1][2][i] != 5)
> @@ -65,6 +66,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = NINTS; i < N - 1; i++)
>      {
>        if (tmp1[2].a.n[1][2][i] != 6)
> @@ -81,6 +83,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        for (j = 0; j < N; j++)
> @@ -100,6 +103,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N - NINTS; i++)
>      {
>        for (j = 0; j < N - NINTS; j++)
> diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
> index 2199d11e2faee58663484a4d4e6ed06be508188b..f79b74d15700ccd86fc268e039efc8d7b8d245c2 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
> @@ -31,7 +31,9 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
> +#pragma GCC novector
>      for (j = 0; j < M; j++) {
>        if (a[j][i] != 4)
>          abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c
> index d0e4ec2373b66b76235f53522c50ac1067ece4d2..8358b6e54328336f1bd0f6c618c58e96b19401d5 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-2.c
> @@ -21,6 +21,7 @@ main1 (void)
>      a[i] = (b[i] > 0 ? b[i] : 0);
>    }
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>    {
>      if (a[i] != b[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c
> index d718b5923b11aaee4d259c62cab1a82c714cc934..ae5d23fab86a4dd363e3df7310571ac93fc93f81 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-111.c
> @@ -20,6 +20,7 @@ main1 (void)
>      a[i] = (b[i] > 0 ? b[i] : 0);
>    }
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>    {
>      if (a[i] != b[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c
> index 7316985829f589dbbbe782b037096b2c5bd2be3c..4aaff3430a4cb110d586da83e2db410ae88bc977 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] >= MAX ? MAX : 0); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c
> index e87bcb8b43d3b82d30f8d3c2340b4968c8dd8da4..c644523a0047a6dfaa0ec8f3d74db79f71b82ec7 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c
> @@ -21,6 +21,7 @@ int main ()
>      A[i] = ( A[i] > MAX ? MAX : 0); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c
> index 9bd583fde6e71096b9cfd07d2668a9f32b50bf17..5902f61f954c5f65929616b0f924b8941cac847c 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] <= MAX ? MAX : 0); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c
> index 9bd583fde6e71096b9cfd07d2668a9f32b50bf17..5902f61f954c5f65929616b0f924b8941cac847c 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] <= MAX ? MAX : 0); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c
> index dcb09b7e7c7a3c983763fb3e57ea036e26d2d1ba..7f436a69e99bff6cebbc19a35c2dbbe5dce94c5a 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] < MAX ? MAX : 0); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c b/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c
> index ebde13167c863d91376d7c17d65191c047a7c9e7..d31157713bf3d0f0fadf305053dfae0612712b8d 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-tree-dom-vect-bug.c
> @@ -21,6 +21,7 @@ int main ()
>    check_vect ();
>    main1 (32);
>  
> +#pragma GCC novector
>    for (si = 0; si < 32; ++si)
>      if (stack_vars_sorted[si] != si)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c b/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c
> index e965910d66d06434a367f08553fde8a733a53e41..8491d5f0070233af5c0baf64f9123d270fe1d51c 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-tree-pre-slp-29.c
> @@ -22,6 +22,7 @@ main1 (unsigned short *in)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*4] !=  in[i*4]
> @@ -48,6 +49,7 @@ main2 (unsigned short * __restrict__ in, unsigned short * __restrict__ out)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*4] !=  in[i*4]
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c b/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c
> index a92ec9c1656275e1b0e31cfe1dcde3be78dfac7e..45cca1d1991c126fdef29bb129c443aae249a295 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-pr29145.c
> @@ -41,6 +41,7 @@ int main(void)
>    with_restrict(a + 1);
>    without_restrict(b + 1);
>  
> +#pragma GCC novector
>    for (i = 0; i < 1002; ++i) {
>      if (a[i] != b[i])
>        abort();
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
> index ce934279ddfe073a96ef8cd7e0d383ca979bda7a..73b92177dabf5193d9d158a92e0383d389b67c82 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
> @@ -30,6 +30,7 @@ int main1 (int x, int y) {
>      }
>  
>    /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>         if (p->a[i] != a[i] || p->b[i] != b[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
> index d9e0529e73f0a566220020ad671f432f3e72299f..9a3fdab128a3bf2609018f92a38a7a6de8b7270b 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
> @@ -35,6 +35,7 @@ int main1 (int x, int y) {
>      }
>  
>    /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>         if (p->a[i] != 1) 
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c
> index 581438823fd2d1fa83ae4cb770995ff30c18abf8..439347c3bb10711911485a9c1f3bc6abf1c7798c 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102a.c
> @@ -34,6 +34,7 @@ int main1 (int x, int y) {
>      }
>  
>    /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>         if (p->a[i] != 1)
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c
> index 6f4c84b4cd2b928c5df21a44e910620c1937e863..f59eb69d99fbe2794f3f6c6822cc87b209e8295f 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-37.c
> @@ -24,6 +24,7 @@ int main1 (char *y)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.p[i] != cb[i])
> @@ -38,6 +39,7 @@ int main1 (char *y)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.p[i] != s.q[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c
> index 18d4d1bbe6d0fdd357a95ab997437ab6b9a46ded..6b4542f5948bc32ca736ad92328a0fd37e44334c 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-43.c
> @@ -11,6 +11,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> @@ -66,6 +67,7 @@ main2 (float *pa, float *pb, float *pc)
>      }   
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (b[i] * c[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c
> index cad507a708f3079f36e2c85c594513514a1e172b..5db05288c81bf5c4c158efbc50f6d4862bf3f335 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-45.c
> @@ -11,6 +11,7 @@ void bar (const float *pa, const float *pb, const float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c
> index a364c7b0d6f1f19292b937eedf0854163c1f549a..a33375f94fec55183493f96c84099224b7f4af6f 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-49.c
> @@ -11,6 +11,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c
> index 69e921b95031b9275e6f4edeb120f247e93646a3..5ebb8fea0b7cb101f73fa2b079f4a37092eb6f2d 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-51.c
> @@ -11,6 +11,7 @@ void bar (const float *pa, const float *pb, const float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c
> index b1c1d86587e5bd9b1dcd364ad495ee7a52ccfb2b..b6d251ec48950dacdecc4d141ebceb4cedaa0755 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-53.c
> @@ -11,6 +11,7 @@ void bar (const float *pa, const float *pb, const float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c
> index 83dc628f0b0803eab9489101c6f3c26f87cf429c..6291dd9d53c33160a0aacf05aeb6febb79fdadf0 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-57.c
> @@ -12,6 +12,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (pa[i] != (pb[i+1] * pc[i+1]))
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c
> index 9524454d367db2a45ab744d55a9d32a32e773140..d0334e3ba90f511fd6c0bc5faa72d78c07510cd9 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-61.c
> @@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (pa[i] != (pb[i+1] * pc[i+1]))
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c
> index 6e9ddcfa5ce61f7a53829e81cab277165ecd1d91..37e474f8a06f1f7df7e9a83290e865d1baa12fce 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-79.c
> @@ -23,6 +23,7 @@ main1 (float *pa, float *pb, float *pc)
>        pa[i] = q[i] * pc[i];
>      }
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != q[i] * pc[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c
> index da3506a4cecdce11bf929a98c533026d31fc5f96..e808c87158076d3430eac124df9fdd55192821a8 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-1.c
> @@ -21,6 +21,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N - 1; i++)
>      {
>        if (ia[i] != 0)
> @@ -34,6 +35,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N - 1; i++)
>      {
>        if (ib[i] != res[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
> index 89958378fca009fba6b59509c2ea7f96fa53805b..25a3409ae5e2ebdb6f7ebabc7974cd49ac7b7d47 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
> @@ -21,6 +21,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != 0)
> @@ -34,6 +35,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ib[i] != res[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
> index e5914d970e3596a082e015725ba99369670db4e7..d1d70dda2eb9b3d7b462ebe0c30536a1f2744af4 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
> @@ -130,6 +130,7 @@ main1 (void)
>  	case 7: f8 (); break;
>  	}
>  
> +#pragma GCC novector
>        for (i = 0; i <= N; i++)
>  	{
>  	  int ea = i + 3;
> diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
> index 8cc69ab22c5ab7cc193eeba1aa50365db640b254..407b683961ff0f5caaa1f168913fb7011b7fd2a3 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c
> @@ -37,6 +37,7 @@ int main ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N-20; i++)
>      {
>        if (A[i] != D[i+20])
> @@ -50,6 +51,7 @@ int main ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < 16; i++)
>      {
>        if (B[i] != C[i] + 5)
> @@ -63,6 +65,7 @@ int main ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < 4; i++)
>      {
>        if (C[i] != E[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/pr101445.c b/gcc/testsuite/gcc.dg/vect/pr101445.c
> index f8a6e9ce6f7fa514cacd8b58d9263636d1d28eff..143156f2464e84e392c04231e4717ef9ec7d8a6e 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr101445.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr101445.c
> @@ -21,6 +21,7 @@ int main()
>  {
>    check_vect ();
>    foo ();
> +#pragma GCC novector
>    for (int d = 0; d < 25; d++)
>      if (a[d] != 0)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr103581.c b/gcc/testsuite/gcc.dg/vect/pr103581.c
> index d072748de31d2c6beb5d6dd86bf762ee1f4d0182..92695c83d99bf048b52c8978634027bcfd71c13d 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr103581.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr103581.c
> @@ -39,6 +39,7 @@ main()
>    unsigned int *resusiusi = maskgatherusiusi (16, idx4, data4);
>    unsigned long long *resudiudi = maskgatherudiudi (16, idx8, data8);
>    unsigned int *resusiudi = maskgatherusiudi (16, idx8, data4);
> +#pragma GCC novector
>    for (int i = 0; i < 16; ++i)
>      {
>        unsigned int d = idx4[i];
> diff --git a/gcc/testsuite/gcc.dg/vect/pr105219.c b/gcc/testsuite/gcc.dg/vect/pr105219.c
> index 4bca5bbba30a9740a54e6205bc0d0c8011070977..2289f5e1a633b56218d089d81528599d4f1f282b 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr105219.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr105219.c
> @@ -22,6 +22,7 @@ int main()
>        {
>          __builtin_memset (data, 0, sizeof (data));
>          foo (&data[start], n);
> +#pragma GCC novector
>          for (int j = 0; j < n; ++j)
>            if (data[start + j] != j)
>              __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr108608.c b/gcc/testsuite/gcc.dg/vect/pr108608.c
> index e968141ba03639ab86ccf77e5e9ad5dd56a66e0d..fff5c1a89365665edc3478263ee909b2b260e178 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr108608.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr108608.c
> @@ -13,6 +13,7 @@ main (void)
>  {
>    check_vect ();
>    float ptr[256];
> +#pragma GCC novector
>    for (int j = 0; j < 16; ++j)
>      {
>        for (int i = 0; i < 256; ++i)
> diff --git a/gcc/testsuite/gcc.dg/vect/pr18400.c b/gcc/testsuite/gcc.dg/vect/pr18400.c
> index 012086138f7199fdf2b4b40666795f7df03a89d2..dd96d87be99287da19df4634578e2e073ab42455 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr18400.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr18400.c
> @@ -19,6 +19,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != b[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/pr18536.c b/gcc/testsuite/gcc.dg/vect/pr18536.c
> index 6d02675913b68c811f4e3bc1f71df830d7f4e2aa..33ee3a5ddcfa296672924678b40474bea947b9ea 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr18536.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr18536.c
> @@ -22,6 +22,7 @@ int main (void)
>    main1 (0, x);
>  
>    /* check results:  */
> +#pragma GCC novector
>    while (++i < 4)
>      {
>        if (x[i-1] != 2)
> diff --git a/gcc/testsuite/gcc.dg/vect/pr20122.c b/gcc/testsuite/gcc.dg/vect/pr20122.c
> index 4f1b7bd6c1e723405b6625f7c7c890a46d3272bc..3a0387e7728fedc9872cb385dd7817f7f5cf07ac 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr20122.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr20122.c
> @@ -27,6 +27,7 @@ static void VecBug2(short Kernel[8][24])
>              Kernshort2[i] = Kernel[k][i];
>  
>      for (k = 0; k<8; k++)
> +#pragma GCC novector
>          for (i = 0; i<24; i++)
>              if (Kernshort2[i] != Kernel[k][i])
>                  abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr25413.c b/gcc/testsuite/gcc.dg/vect/pr25413.c
> index e80d6970933e675b6056e5d119c6eb0e817a40f9..266ef3109f20df7615e85079a5d2330f26cf540d 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr25413.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr25413.c
> @@ -26,6 +26,7 @@ int main (void)
>    check_vect ();
>    
>    main1 ();
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      if (a.d[i] != 1)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr30784.c b/gcc/testsuite/gcc.dg/vect/pr30784.c
> index 840dbc5f1f139aafe012904a774c1e5b9739b653..ad1fa05d8edae5e28a3308f39ff304de3b1d60c1 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr30784.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr30784.c
> @@ -21,6 +21,7 @@ int main ()
>    check_vect ();
>    main1 (32);
>  
> +#pragma GCC novector
>    for (si = 0; si < 32; ++si)
>      if (stack_vars_sorted[si] != si)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr37539.c b/gcc/testsuite/gcc.dg/vect/pr37539.c
> index dfbfc20c5cbca0cfa7158423ee4a42e5976b56fe..c7934eb384739778a841271841fd8b7777ee19be 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr37539.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr37539.c
> @@ -17,6 +17,7 @@ ayuv2yuyv_ref (int *d, int *src, int n)
>    }
>  
>    /* Check results.  */
> +#pragma GCC novector
>    for(i=0;i<n/2;i++){
>     if (dest[i*4 + 0] != (src[i*2 + 0])>>16
>         || dest[i*4 + 1] != (src[i*2 + 1])>>8
> diff --git a/gcc/testsuite/gcc.dg/vect/pr40074.c b/gcc/testsuite/gcc.dg/vect/pr40074.c
> index 143ee05b1fda4b0f858e31cad2ecd4211530e7b6..b75061a8116c34f609eb9ed59256b6eea87976a4 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr40074.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr40074.c
> @@ -30,6 +30,7 @@ main1 ()
>      }
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N-1; i++)
>      {
>        if (res[i] != arr[i].b + arr[i].d + arr[i+1].b)
> diff --git a/gcc/testsuite/gcc.dg/vect/pr45752.c b/gcc/testsuite/gcc.dg/vect/pr45752.c
> index 4ddac7ad5097c72f08b948f64caa54421d4f55d0..e8b364f29eb0c4b20bb2b2be5d49db3aab5ac39b 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr45752.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr45752.c
> @@ -146,6 +146,7 @@ int main (int argc, const char* argv[])
>  
>    foo (input, output, input2, output2);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (output[i] != check_results[i]
>          || output2[i] != check_results2[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/pr45902.c b/gcc/testsuite/gcc.dg/vect/pr45902.c
> index ac8e1ca6d38159d3c26497a414b638f49846381e..74510bf94b82850b6492c6d1ed0abacb73f65a16 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr45902.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr45902.c
> @@ -34,6 +34,7 @@ main ()
>  
>    main1 ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (res[i] != a[i] >> 8)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr46009.c b/gcc/testsuite/gcc.dg/vect/pr46009.c
> index 9649e2fb4bbfd74e134a9ef3d068d50b9bcb86c0..fe73dbf5db08732cc74115281dcf6a020f893cb6 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr46009.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr46009.c
> @@ -49,6 +49,7 @@ main (void)
>        e[i] = -1;
>      }
>    foo ();
> +#pragma GCC novector
>    for (i = 0; i < 1024; i++)
>      {
>        int g;
> @@ -59,6 +60,7 @@ main (void)
>        e[i] = -1;
>      }
>    bar ();
> +#pragma GCC novector
>    for (i = 0; i < 1024; i++)
>      {
>        int g;
> diff --git a/gcc/testsuite/gcc.dg/vect/pr48172.c b/gcc/testsuite/gcc.dg/vect/pr48172.c
> index a7fc05cae9119076efad4ca13a0f6fd0aff004b7..850e9b92bc15ac5f51fee8ac7fd2c9122def66b6 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr48172.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr48172.c
> @@ -25,6 +25,7 @@ int main() {
>      array[HALF+i] = array[2*i] + array[2*i + 1];
>  
>    /* see if we have any failures */
> +#pragma GCC novector
>    for (i = 0; i < HALF - 1; i++)
>      if (array[HALF+i] != array[2*i] + array[2*i + 1])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr51074.c b/gcc/testsuite/gcc.dg/vect/pr51074.c
> index 4144572126e9de36f5b2e85bb56ff9fdff372bce..d6c8cea1f842e08436a3d04af513307d3e980d27 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr51074.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr51074.c
> @@ -15,6 +15,7 @@ main ()
>        s[i].a = i;
>      }
>    asm volatile ("" : : : "memory");
> +#pragma GCC novector
>    for (i = 0; i < 8; i++)
>      if (s[i].b != 0 || s[i].a != i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr51581-3.c b/gcc/testsuite/gcc.dg/vect/pr51581-3.c
> index 76c156adf9d0dc083b7eb5fb2e6f056398e2b845..25acceef0e5ca6f8c180a41131cd190b9c84b533 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr51581-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr51581-3.c
> @@ -97,17 +97,20 @@ main ()
>      }
>    f1 ();
>    f2 ();
> +#pragma GCC novector
>    for (i = 0; i < 8; i++)
>      if (a[i] != b[i] / 8 || c[i] != d[i] / 3)
>        abort ();
>    f3 ();
>    f4 ();
> +#pragma GCC novector
>    for (i = 0; i < 8; i+= 2)
>      if (a[i] != b[i] / 8 || a[i + 1] != b[i + 1] / 4
>  	|| c[i] != d[i] / 3 || c[i + 1] != d[i + 1] / 5)
>        abort ();
>    f5 ();
>    f6 ();
> +#pragma GCC novector
>    for (i = 0; i < 8; i+= 2)
>      if (a[i] != b[i] / 14 || a[i + 1] != b[i + 1] / 15
>  	|| c[i] != d[i] / (i == 6 ? 13 : 6) || c[i + 1] != d[i + 1] / 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/pr51581-4.c b/gcc/testsuite/gcc.dg/vect/pr51581-4.c
> index 632c96e7481339a6dfac92913a519ad5501d34c4..f6234f3e7c09194dba54af08832171798c7d9c09 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr51581-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr51581-4.c
> @@ -145,17 +145,20 @@ main ()
>      }
>    f1 ();
>    f2 ();
> +#pragma GCC novector
>    for (i = 0; i < 16; i++)
>      if (a[i] != b[i] / 8 || c[i] != d[i] / 3)
>        abort ();
>    f3 ();
>    f4 ();
> +#pragma GCC novector
>    for (i = 0; i < 16; i+= 2)
>      if (a[i] != b[i] / 8 || a[i + 1] != b[i + 1] / 4
>  	|| c[i] != d[i] / 3 || c[i + 1] != d[i + 1] / 5)
>        abort ();
>    f5 ();
>    f6 ();
> +#pragma GCC novector
>    for (i = 0; i < 16; i+= 2)
>      if (a[i] != b[i] / 14 || a[i + 1] != b[i + 1] / 15
>  	|| c[i] != d[i] / ((i & 7) == 6 ? 13 : 6) || c[i + 1] != d[i + 1] / 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/pr53185-2.c b/gcc/testsuite/gcc.dg/vect/pr53185-2.c
> index 6057c69a24a81be20ecc5582685fb4516f47803d..51614e70d8feac0004644b2e6bb7deb52eeeefea 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr53185-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr53185-2.c
> @@ -20,6 +20,7 @@ int main ()
>    for (off = 0; off < 8; ++off)
>      {
>        fn1 (&a[off], &b[off], 32 - off, 3);
> +#pragma GCC novector
>        for (i = 0; i < 32 - off; ++i)
>  	if (a[off+i] != b[off+i*3])
>  	  abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr56826.c b/gcc/testsuite/gcc.dg/vect/pr56826.c
> index e8223808184e6b7b37a6d458bdb440566314e959..2f2da458b89ac04634cb809873d7a60e55484499 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr56826.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr56826.c
> @@ -35,6 +35,7 @@ int main()
>        __asm__ volatile ("");
>      }
>    bar (&A[0], &B[0], 100);
> +#pragma GCC novector
>    for (i=0; i<300; i++)
>      if (A[i] != i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr56918.c b/gcc/testsuite/gcc.dg/vect/pr56918.c
> index 1c88d324b902e9389afe4c5c729f20b2ad790dbf..4941453bbe9940b4e775239c4c2c9606435ea20a 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr56918.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr56918.c
> @@ -22,6 +22,7 @@ main ()
>    foo ();
>    if (data[0] != 3 || data[7] != 1)
>      abort ();
> +#pragma GCC novector
>    for (i = 1; i < 4; ++i)
>      if (data[i] != i || data[i + 3] != i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr56920.c b/gcc/testsuite/gcc.dg/vect/pr56920.c
> index 865cfda760d1978eb1f3f063c75e2bac558254bd..ef73471468392b573e999a59e282b4d796556b8d 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr56920.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr56920.c
> @@ -12,6 +12,7 @@ main ()
>    check_vect ();
>    for (i = 0; i < 15; ++i)
>      a[i] = (i * 2) % 15;
> +#pragma GCC novector
>    for (i = 0; i < 15; ++i)
>      if (a[i] != (i * 2) % 15)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr56933.c b/gcc/testsuite/gcc.dg/vect/pr56933.c
> index 7206682d7935a0436aaf502537bb56642d5e4648..2f2afe6df134163d2e7761be4906d778dbd6b670 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr56933.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr56933.c
> @@ -25,6 +25,7 @@ int main()
>    for (i = 0; i < 2*1024; i++)
>      d[i] = 1.;
>    foo (b, d, f);
> +#pragma GCC novector
>    for (i = 0; i < 1024; i+= 2)
>      {
>        if (d[2*i] != 2.)
> @@ -32,6 +33,7 @@ int main()
>        if (d[2*i+1] != 4.)
>  	abort ();
>      }
> +#pragma GCC novector
>    for (i = 0; i < 1024; i++)
>      {
>        if (b[i] != 1.)
> diff --git a/gcc/testsuite/gcc.dg/vect/pr57705.c b/gcc/testsuite/gcc.dg/vect/pr57705.c
> index e17ae09beb68051637c3ece69ac2f29e1433008d..39c32946d74ef01efce6fbc2f23c72dd0b33091d 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr57705.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr57705.c
> @@ -47,14 +47,17 @@ main ()
>    int i;
>    check_vect ();
>    foo (5, 3);
> +#pragma GCC novector
>    for (i = 0; i < 1024; i++)
>      if (a[i] != 5 + 4 * i)
>        abort ();
>    bar (5, 3);
> +#pragma GCC novector
>    for (i = 0; i < 1024; i++)
>      if (a[i] != 9 + 4 * i)
>        abort ();
>    baz (5, 3);
> +#pragma GCC novector
>    for (i = 0; i < 1024; i++)
>      if (a[i] != 5 + 4 * i || b[i] != (unsigned char) i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr57741-2.c b/gcc/testsuite/gcc.dg/vect/pr57741-2.c
> index df63a49927d38badb2503787bcd828b796116199..6addd76b422614a2e28272f4d696e3cba4bb0376 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr57741-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr57741-2.c
> @@ -34,6 +34,7 @@ main ()
>    int i;
>    check_vect ();
>    foo (p, q, 1.5f);
> +#pragma GCC novector
>    for (i = 0; i < 1024; i++)
>      if (p[i] != 1.0f + i * 1.5f || q[i] != 2.0f + i * 0.5f)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr57741-3.c b/gcc/testsuite/gcc.dg/vect/pr57741-3.c
> index 2e4954ac7f14b21463b0ef0ca97e05c4eb96e8fd..916fa131513b88321d36cdbe46f101361b4f8244 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr57741-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr57741-3.c
> @@ -33,6 +33,7 @@ main ()
>    check_vect ();
>    r[0] = 0;
>    foo (1.5f);
> +#pragma GCC novector
>    for (i = 0; i < 1024; i++)
>      if (p[i] != 1.0f + i * 1.5f || q[i] != 2.0f + i * 0.5f || r[i] != 1)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr59591-1.c b/gcc/testsuite/gcc.dg/vect/pr59591-1.c
> index 892fce58e36b37e5412cc6c100f82b6077ace77e..e768fb3e1de48cf43b389cf83b4f7f1f030c4f91 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr59591-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr59591-1.c
> @@ -31,6 +31,7 @@ bar (void)
>        t[i] = i * 13;
>      }
>    foo ();
> +#pragma GCC novector
>    for (i = 0; i < 256; i++)
>      if ((i >> 2) & (1 << (i & 3)))
>        {
> diff --git a/gcc/testsuite/gcc.dg/vect/pr59591-2.c b/gcc/testsuite/gcc.dg/vect/pr59591-2.c
> index bd82d765794a32af6509ffd60d1f552ce10570a3..3bdf4252cffe63830b5b47cd17fa29a3c65afc73 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr59591-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr59591-2.c
> @@ -32,6 +32,7 @@ bar (void)
>        t[i] = i * 13;
>      }
>    foo ();
> +#pragma GCC novector
>    for (i = 0; i < 256; i++)
>      if ((i >> 2) & (1 << (i & 3)))
>        {
> diff --git a/gcc/testsuite/gcc.dg/vect/pr59594.c b/gcc/testsuite/gcc.dg/vect/pr59594.c
> index 947fa4c0c301d98cbdfeb5da541482858b69180f..e3ece8abf7131aa4ed0a2d5af79d4bdea90bd8c1 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr59594.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr59594.c
> @@ -22,6 +22,7 @@ main ()
>      }
>    if (b[0] != 1)
>      __builtin_abort ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (b[i + 1] != i)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr59984.c b/gcc/testsuite/gcc.dg/vect/pr59984.c
> index d6977f0020878c043376b7e7bfdc6a0e85ac2663..c00c2267158667784fb084b0ade19e2ab763c6a3 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr59984.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr59984.c
> @@ -37,6 +37,7 @@ test (void)
>        foo (a[i], &v1, &v2);
>        a[i] = v1 * v2;
>      }
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != i * i * i * i - 1)
>        __builtin_abort ();
> @@ -49,6 +50,7 @@ test (void)
>        bar (a[i], &v1, &v2);
>        a[i] = v1 * v2;
>      }
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != i * i * i * i - 1)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr60276.c b/gcc/testsuite/gcc.dg/vect/pr60276.c
> index 9fc18ac7428cf71903b6ebb04b90eb21b2e8b3c7..824e2a336b6d9fad2e7a72c445ec2edf80be8138 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr60276.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr60276.c
> @@ -44,6 +44,7 @@ int main(void)
>    foo (out + 2, lp + 1, 48);
>    foo_novec (out2 + 2, lp + 1, 48);
>  
> +#pragma GCC novector
>    for (s = 0; s < 49; s++)
>      if (out[s] != out2[s])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr61194.c b/gcc/testsuite/gcc.dg/vect/pr61194.c
> index 8421367577278cdf5762327d83cdc4a0e65c9411..8cd38b3d5da616d65ba131d048280b1d5644339d 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr61194.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr61194.c
> @@ -32,6 +32,7 @@ int main()
>  
>    barX();
>  
> +#pragma GCC novector
>    for (i = 0; i < 1024; ++i)
>      if (z[i] != ((x[i]>0 && w[i]<0) ? 0. : 1.))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr61680.c b/gcc/testsuite/gcc.dg/vect/pr61680.c
> index e25bf78090ce49d68cb3694233253b403709331a..bb24014bdf045f22a0c9c5234481f07153c25d41 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr61680.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr61680.c
> @@ -8,6 +8,7 @@ bar (double p[][4])
>  {
>    int i;
>    double d = 172.0;
> +#pragma GCC novector
>    for (i = 0; i < 4096; i++)
>      {
>        if (p[i][0] != 6.0 || p[i][1] != 6.0 || p[i][2] != 10.0)
> diff --git a/gcc/testsuite/gcc.dg/vect/pr62021.c b/gcc/testsuite/gcc.dg/vect/pr62021.c
> index 40c64429d6382821af4a31b3569c696ea0e5fa2a..460fadb3f6cd73c7cac2bbba65cc09d4211396e8 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr62021.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr62021.c
> @@ -24,6 +24,7 @@ main ()
>    #pragma omp simd
>    for (i = 0; i < 1024; i++)
>      b[i] = foo (b[i], i);
> +#pragma GCC novector
>    for (i = 0; i < 1024; i++)
>      if (b[i] != &a[1023])
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr63341-2.c b/gcc/testsuite/gcc.dg/vect/pr63341-2.c
> index 2004a79b80ef4081136ade20df9b6acd5b6428c1..aa338263a7584b06f10e4cb4a6baf19dea20f40a 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr63341-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr63341-2.c
> @@ -16,6 +16,7 @@ foo ()
>    int i;
>    for (i = 0; i < 32; i++)
>      d[i] = t.s[i].s + 4;
> +#pragma GCC novector
>    for (i = 0; i < 32; i++)
>      if (d[i] != t.s[i].s + 4)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr64252.c b/gcc/testsuite/gcc.dg/vect/pr64252.c
> index b82ad017c16fda6e031b503a9b11fe39a3691a6c..89070c27ff0f9763bd8eaff4a81b5b0197ae12dc 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr64252.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr64252.c
> @@ -57,6 +57,7 @@ int main()
>    int i;
>    check_vect ();
>    bar(2, q);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (q[0].a[i].f != 0 || q[0].a[i].c != i || q[0].a[i].p != -1)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr64404.c b/gcc/testsuite/gcc.dg/vect/pr64404.c
> index 26fceb6cd8936f7300fb0067c0f18c3d35ac4595..6fecf9ecae18e49808a58fe17a6b912786bdbad3 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr64404.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr64404.c
> @@ -42,6 +42,7 @@ main (void)
>  
>    Compute ();
>  
> +#pragma GCC novector
>    for (d = 0; d < 1024; d++)
>      {
>        if (Y[d].l != X[d].l + X[d].h
> diff --git a/gcc/testsuite/gcc.dg/vect/pr64421.c b/gcc/testsuite/gcc.dg/vect/pr64421.c
> index 3b5ab2d980c207c1d5e7fff73cd403ac38790080..47afd22d93e5ed8fbfff034cd2a03d8d70f7e422 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr64421.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr64421.c
> @@ -27,6 +27,7 @@ main ()
>      a[i] = foo (a[i], i);
>    if (a[0] != 1 || a[1] != 3)
>      abort ();
> +#pragma GCC novector
>    for (i = 2; i < 1024; i++)
>      if (a[i] != i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr64493.c b/gcc/testsuite/gcc.dg/vect/pr64493.c
> index 6fb13eb6d96fe67471fdfafd2eed2a897ae8b670..d3faf84bcc16d31fc11dd2d0cd7242972fdbafdc 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr64493.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr64493.c
> @@ -9,6 +9,7 @@ main ()
>  
>    for (; a; a--)
>      for (d = 1; d <= 0; d++)
> +#pragma GCC novector
>        for (; d;)
>  	if (h)
>  	  {
> diff --git a/gcc/testsuite/gcc.dg/vect/pr64495.c b/gcc/testsuite/gcc.dg/vect/pr64495.c
> index 5cbaeff8389dafd3444f90240a910e7d5e4f2431..c48f9389aa325a8b8ceb5697684f563b8c13a72d 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr64495.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr64495.c
> @@ -11,6 +11,7 @@ main ()
>  
>    for (; a;)
>      for (; g; g++)
> +#pragma GCC novector
>        for (; f; f++)
>  	if (j)
>  	  {
> diff --git a/gcc/testsuite/gcc.dg/vect/pr66251.c b/gcc/testsuite/gcc.dg/vect/pr66251.c
> index 26afbc96a5d57a49fbbac95753f4df006cb36018..355590e69a98687084fee2c5486d14c2a20f3fcb 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr66251.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr66251.c
> @@ -51,6 +51,7 @@ int main ()
>  
>        test1(da, ia, stride, 256/stride);
>  
> +#pragma GCC novector
>        for (i = 0; i < 256/stride; i++)
>  	{
>  	  if (da[i*stride] != ia[i*stride])
> @@ -66,6 +67,7 @@ int main ()
>  
>        test2(ia, da, stride, 256/stride);
>  
> +#pragma GCC novector
>        for (i = 0; i < 256/stride; i++)
>  	{
>  	  if (da[i*stride] != ia[i*stride])
> diff --git a/gcc/testsuite/gcc.dg/vect/pr66253.c b/gcc/testsuite/gcc.dg/vect/pr66253.c
> index bdf3ff9ca51f7f656fad687fd8c77c6ee053794f..6b99b4f3b872cbeab14e035f2e2d40aab6e438e4 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr66253.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr66253.c
> @@ -39,6 +39,7 @@ int main ()
>  
>        test1(da, ia, ca, stride, 256/stride);
>  
> +#pragma GCC novector
>        for (i = 0; i < 256/stride; i++)
>  	{
>  	  if (da[i*stride] != 0.5 * ia[i*stride] * ca[i*stride])
> diff --git a/gcc/testsuite/gcc.dg/vect/pr68502-1.c b/gcc/testsuite/gcc.dg/vect/pr68502-1.c
> index 4f7d0bfca38693877ff080842d6ef7abf3d3e17b..cc6e6cd9a2be0e921382bda3c653f6a6b730b905 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr68502-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr68502-1.c
> @@ -41,6 +41,7 @@ int main ()
>    for (i = 0; i < numf1s; i++)
>      f1_layer[i].I = (double *)-1;
>    reset_nodes ();
> +#pragma GCC novector
>    for (i = 0; i < numf1s; i++)
>      if (f1_layer[i].I != (double *)-1)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr68502-2.c b/gcc/testsuite/gcc.dg/vect/pr68502-2.c
> index a3eddafc7ca76cbe4c21f6ed873249cb2c94b7a6..11f87125b75df9db29669aa55cdc3c202b0fedda 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr68502-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr68502-2.c
> @@ -41,6 +41,7 @@ int main ()
>    for (i = 0; i < numf1s; i++)
>      f1_layer[i].I = -1;
>    reset_nodes ();
> +#pragma GCC novector
>    for (i = 0; i < numf1s; i++)
>      if (f1_layer[i].I != -1)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr69820.c b/gcc/testsuite/gcc.dg/vect/pr69820.c
> index be24e4fa9a1343e4308bfd967f1ccfdd3549db5c..72d10b65c16b54764aac0cf271138ffa187f4052 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr69820.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr69820.c
> @@ -28,6 +28,7 @@ main ()
>        c[i] = 38364;
>      }
>    foo ();
> +#pragma GCC novector
>    for (i = 0; i < 100; ++i)
>      if (b[i] != 0xed446af8U)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr70021.c b/gcc/testsuite/gcc.dg/vect/pr70021.c
> index 988fc53216d12908bbbc564c9efc4d63a5c057d7..d4d5db12bc0e646413ba393b57edc60ba1189059 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr70021.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr70021.c
> @@ -32,6 +32,7 @@ main ()
>        e[i] = 14234165565810642243ULL;
>      }
>    foo ();
> +#pragma GCC novector
>    for (i = 0; i < N; ++i)
>      if (e[i] != ((i & 3) ? 14234165565810642243ULL : 1ULL))
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr70354-1.c b/gcc/testsuite/gcc.dg/vect/pr70354-1.c
> index 9d601dc9d4a92922e4114b8b4d1b7ef2f49c0c44..2687758b022b01af3eb7b444fee25be8bc1f8b3c 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr70354-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr70354-1.c
> @@ -41,6 +41,7 @@ main ()
>        h[i] = 8193845517487445944ULL;
>      }
>    foo ();
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      if (d[i] != 8193845517487445944ULL || e[i] != 1
>  	|| g[i] != 4402992416302558097ULL)
> diff --git a/gcc/testsuite/gcc.dg/vect/pr70354-2.c b/gcc/testsuite/gcc.dg/vect/pr70354-2.c
> index 160e1e083e03e0652d06bf29df060192cbe75fd5..cb4cdaae30ba5760fc32e255b651072ca397a499 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr70354-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr70354-2.c
> @@ -29,6 +29,7 @@ main ()
>        b[i] = 0x1200000000ULL + (i % 54);
>      }
>    foo ();
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      if (a[i] != (0x1234ULL << (i % 54)))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr71259.c b/gcc/testsuite/gcc.dg/vect/pr71259.c
> index 587a8e3c8f378f3c57f8a9a2e9fa5aee3a968860..6cb22f622ee2ce2d6de51c440472e36fe7294362 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr71259.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr71259.c
> @@ -20,6 +20,7 @@ main ()
>    asm volatile ("" : : : "memory");
>    for (i = 0; i < 44; i++) 
>      for (j = 0; j < 17; j++)
> +#pragma GCC novector
>        for (k = 0; k < 2; k++)
>  	if (c[i][j][k] != -5105075050047261684)
>  	  __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr78005.c b/gcc/testsuite/gcc.dg/vect/pr78005.c
> index 7cefe73fe1b3d0050befeb5e25aec169867fd96a..6da7acf50c2a1237b817abf8e6b9191b3c3e1378 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr78005.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr78005.c
> @@ -22,6 +22,7 @@ foo (int n, int d)
>  
>  #define check_u(x)		\
>    foo (x, 2);			\
> +  _Pragma("GCC novector")	\
>    for (i = 0; i < N; i++)	\
>      {				\
>        if (u[i] != res##x[i])	\
> diff --git a/gcc/testsuite/gcc.dg/vect/pr78558.c b/gcc/testsuite/gcc.dg/vect/pr78558.c
> index 2606d4ec10d3fa18a4c0e4b8e9dd02131cb57ba7..2c28426eb85fc6663625c542e84860fa7bcfd3c2 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr78558.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr78558.c
> @@ -37,6 +37,7 @@ main ()
>    asm volatile ("" : : "g" (s), "g" (d) : "memory");
>    foo ();
>    asm volatile ("" : : "g" (s), "g" (d) : "memory");
> +#pragma GCC novector
>    for (i = 0; i < 50; ++i)
>      if (d[i].q != i || d[i].r != 50 * i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr80815-2.c b/gcc/testsuite/gcc.dg/vect/pr80815-2.c
> index 83557daa6963632ccf2cf0a641a4106b4dc833f5..3ffff0be3be96df4c3e6a3d5caa68b7d4b6bad9a 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr80815-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr80815-2.c
> @@ -38,6 +38,7 @@ int main (void)
>  
>    foo (a, b);
>  
> +#pragma GCC novector
>    for (i = 973; i < 1020; i++)
>      if (arr[i] != res[i - 973])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr80815-3.c b/gcc/testsuite/gcc.dg/vect/pr80815-3.c
> index 50392ab1a417de2af81af6473bf0a85bd9eb7279..5e2be5262ebb639d4bd771e326f9a07ed2ee0680 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr80815-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr80815-3.c
> @@ -35,6 +35,7 @@ int main (void)
>  
>    foo (a, b, 50);
>  
> +#pragma GCC novector
>    for (i = 975; i < 1025; i++)
>      if (arr[i] != res[i - 975])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr80928.c b/gcc/testsuite/gcc.dg/vect/pr80928.c
> index e6c1f1ab5a7f4ca7eac98cf91fccffbff2dcfc7a..34566c4535247d2fa39c5d856d1e0c32687e9a2a 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr80928.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr80928.c
> @@ -25,6 +25,7 @@ int main ()
>    foo ();
>  
>    /* check results */
> +#pragma GCC novector
>    for (int i = 0; i < 1020; ++i)
>      if (a[i] != ((i + 4) / 5) * 5)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr81410.c b/gcc/testsuite/gcc.dg/vect/pr81410.c
> index 9c91c08d33c729d8ff26cae72f4651081850b550..6b7586992fe46918aab537a06f166ce2e25f90d8 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr81410.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr81410.c
> @@ -26,6 +26,7 @@ int main()
>        __asm__ volatile ("" : : : "memory");
>      }
>    foo ();
> +#pragma GCC novector
>    for (int i = 0; i < 8; ++i)
>      if (y[2*i] != 3*i || y[2*i+1] != 3*i + 1)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr81633.c b/gcc/testsuite/gcc.dg/vect/pr81633.c
> index 9689ab3959cd9df8234b89ec307b7cd5d6f9d795..2ad144a60444eb82b8e8575efd8fcec94fcd6f01 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr81633.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr81633.c
> @@ -24,6 +24,7 @@ int main(void)
>    double A[4][4] = {{0.0}};
>    kernel(A);
>    for ( int i = 0; i < 4; i++ )
> +#pragma GCC novector
>      for ( int j = 0; j < 4; j++ )
>        if (A[i][j] != expected[i][j])
>  	__builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr81740-1.c b/gcc/testsuite/gcc.dg/vect/pr81740-1.c
> index f6fd43c7c87e0aad951ba092796f0aae39b80d54..b01e1994834934bbd50f3fc1cbcf494ecc62c315 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr81740-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr81740-1.c
> @@ -14,6 +14,7 @@ main ()
>      for (c = 0; c <= 6; c++)
>        a[c + 1][b + 2] = a[c][b + 1];
>    for (i = 0; i < 8; i++)
> +#pragma GCC novector
>      for (d = 0; d < 10; d++)
>        if (a[i][d] != (i == 3 && d == 6) * 4)
>  	__builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr81740-2.c b/gcc/testsuite/gcc.dg/vect/pr81740-2.c
> index 1e0d6645a03f77c9c042313fd5377b71ba75c4d6..7b2bfe139f20fb66c90cfd643b65df3edb9b536e 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr81740-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr81740-2.c
> @@ -15,6 +15,7 @@ main ()
>      for (c = 6; c >= 0; c--)
>        a[c + 1][b + 2] = a[c][b + 1];
>    for (i = 0; i < 8; i++)
> +#pragma GCC novector
>      for (d = 0; d < 10; d++)
>        if (a[i][d] != (i == 3 && d == 6) * 4)
>  	__builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr85586.c b/gcc/testsuite/gcc.dg/vect/pr85586.c
> index 3d075bfcec83bab119f77bad7b642eb3d634fb4c..a4a170a1fcd130d84da3be9f897889ff4cfc717c 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr85586.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr85586.c
> @@ -24,6 +24,7 @@ main (void)
>      }
>  
>    foo (out, in, 1);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (out[i] != in[i])
>        __builtin_abort ();
> @@ -33,6 +34,7 @@ main (void)
>    foo (out + N - 1, in, -1);
>    if (out[0] != in[N - 1])
>      __builtin_abort ();
> +#pragma GCC novector
>    for (int i = 1; i <= N; ++i)
>      if (out[i] != 2)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr87288-1.c b/gcc/testsuite/gcc.dg/vect/pr87288-1.c
> index 0d0a70dff6f21b2f07fecd937d4fe26c0df61513..ec968dfcd0153cdb001e8e282146dbdb67d23c65 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr87288-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr87288-1.c
> @@ -16,6 +16,7 @@ run (int *restrict a, int *restrict b, int count)
>  void __attribute__ ((noipa))
>  check (int *restrict a, int count)
>  {
> +#pragma GCC novector
>    for (int i = 0; i < count * N; ++i)
>      if (a[i * 2] != i * 41 + count || a[i * 2 + 1] != count)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr87288-2.c b/gcc/testsuite/gcc.dg/vect/pr87288-2.c
> index e9ff9a0be7c08a9755972717a63025f2825e95cf..03c7f88a6a48507bbbfbf2e177425d28605a3aa6 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr87288-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr87288-2.c
> @@ -22,6 +22,7 @@ RUN_COUNT (4)
>  void __attribute__ ((noipa))
>  check (int *restrict a, int count)
>  {
> +#pragma GCC novector
>    for (int i = 0; i < count * N; ++i)
>      if (a[i * 2] != i * 41 + count || a[i * 2 + 1] != count)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr87288-3.c b/gcc/testsuite/gcc.dg/vect/pr87288-3.c
> index 23f574ccb53268b59b933ec59a5eadaa890007ff..0475990992e58451de8649b735fa16f0e32ed657 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr87288-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr87288-3.c
> @@ -22,6 +22,7 @@ RUN_COUNT (4)
>  void __attribute__ ((noipa))
>  check (int *restrict a, int count)
>  {
> +#pragma GCC novector
>    for (int i = 0; i < count * N + 1; ++i)
>      if (a[i * 2] != i * 41 + count || a[i * 2 + 1] != count)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr88903-1.c b/gcc/testsuite/gcc.dg/vect/pr88903-1.c
> index 77dbfd47c91be8cce0edde8b09b7b90d40268306..0f78ccc995d5dcd35d5d7ba0f35afdc8bb5a1b2b 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr88903-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr88903-1.c
> @@ -19,6 +19,7 @@ main()
>    for (int i = 0; i < 1024; ++i)
>      x[i] = i;
>    foo ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      if (x[i] != i << ((i/2+1) & 31))
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr88903-2.c b/gcc/testsuite/gcc.dg/vect/pr88903-2.c
> index cd88a99c6045c6a3eb848f053386d22b9cbe46ce..8a1cf9c523632f392d95aa2d6ec8332fa50fec5b 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr88903-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr88903-2.c
> @@ -21,6 +21,7 @@ int main()
>    for (int i = 0; i < 1024; ++i)
>      x[i] = i, y[i] = i % 8;
>    foo ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      if (x[i] != i << ((i & ~1) % 8))
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr90018.c b/gcc/testsuite/gcc.dg/vect/pr90018.c
> index 52640f5aa6f02d6deed3b2790482a2d2d01ddd5b..08ca326f7ebfab1a42813bc121f1e5a46394e983 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr90018.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr90018.c
> @@ -41,6 +41,7 @@ int main(int argc, char **argv)
>        a42[i*4+n*4+1] = tem4 + a42[i*4+n*4+1];
>        __asm__ volatile ("": : : "memory");
>      }
> +#pragma GCC novector
>    for (int i = 0; i < 4 * n * 3; ++i)
>      if (a4[i] != a42[i])
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr92420.c b/gcc/testsuite/gcc.dg/vect/pr92420.c
> index e43539fbbd7202b3ae2e9f71bfd82a3fcdf8bde3..e56eb0e12fbec55b16785e244f3a24b889af784d 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr92420.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr92420.c
> @@ -41,6 +41,7 @@ main ()
>      }
>    foo (a, b + N, d, N);
>    bar (a, c, e, N);
> +#pragma GCC novector
>    for (i = 0; i < N; ++i)
>      if (d[i].r != e[i].r || d[i].i != e[i].i)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr94994.c b/gcc/testsuite/gcc.dg/vect/pr94994.c
> index e98aeb090d8cbcfc9628052b553b7a7d226069d1..2f598eacd541eafaef02f9aee34fc769dac2a4c6 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr94994.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr94994.c
> @@ -41,6 +41,7 @@ main (void)
>        for (unsigned int j = 0; j < INPUT_SIZE + MAX_STEP; ++j)
>  	x[j] = j + 10;
>        copy (x + i, x, INPUT_SIZE);
> +#pragma GCC novector
>        for (int j = 0; j < INPUT_SIZE + i; ++j)
>  	{
>  	  int expected;
> diff --git a/gcc/testsuite/gcc.dg/vect/pr96783-1.c b/gcc/testsuite/gcc.dg/vect/pr96783-1.c
> index 55d1364f056febd86c49272ede488bd37867dbe8..2de222d2ae6491054b6c7a6cf5891580abf5c6f7 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr96783-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr96783-1.c
> @@ -31,6 +31,7 @@ int main ()
>      a[i] = i;
>    foo (a + 3 * 5, 6-1, 5);
>    const long b[3 * 8] = { 0, 1, 2, 21, 22, 23, 18, 19, 20, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 };
> +#pragma GCC novector
>    for (int i = 0; i < 3 * 8; ++i)
>      if (a[i] != b[i])
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/pr96783-2.c b/gcc/testsuite/gcc.dg/vect/pr96783-2.c
> index 33c37109e3a8de646edd8339b0c98300bed25b51..bcdcfac072cf564d965edd4be7fbd9b23302e759 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr96783-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr96783-2.c
> @@ -20,6 +20,7 @@ int main()
>    for (int i = 0; i < 1024; ++i)
>      b[i] = i;
>    foo ();
> +#pragma GCC novector
>    for (int i = 0; i < 256; ++i)
>      if (a[3*i] != 1023 - 3*i - 2
>  	|| a[3*i+1] != 1023 - 3*i - 1
> diff --git a/gcc/testsuite/gcc.dg/vect/pr97081-2.c b/gcc/testsuite/gcc.dg/vect/pr97081-2.c
> index 98ad3c3fe17e4556985cb6a0392de72a19911a97..436e897cd2e6a8bb41228cec14480bac88e98952 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr97081-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr97081-2.c
> @@ -24,6 +24,7 @@ main ()
>        c[i] = i;
>      }
>    foo (3);
> +#pragma GCC novector
>    for (int i = 0; i < 1024; i++)
>      if (s[i] != (unsigned short) ((i << 3) | (i >> (__SIZEOF_SHORT__ * __CHAR_BIT__ - 3)))
>          || c[i] != (unsigned char) ((((unsigned char) i) << 3) | (((unsigned char) i) >> (__CHAR_BIT__ - 3))))
> diff --git a/gcc/testsuite/gcc.dg/vect/pr97558-2.c b/gcc/testsuite/gcc.dg/vect/pr97558-2.c
> index 8f0808686fbad0b5b5ec11471fd38f53ebd81bde..5dff065f2e220b1ff31027c271c07c9670b98f9c 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr97558-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr97558-2.c
> @@ -41,6 +41,7 @@ int main (void)
>    foo (N-1);
>  
>      /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N/2; i++)
>      {
>        sum = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/pr97678.c b/gcc/testsuite/gcc.dg/vect/pr97678.c
> index 7fb6c93515e41257f173f664d9304755a8dc0de2..1fa56326422e832e82bb6f1739f14ea1a1cb4955 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr97678.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr97678.c
> @@ -19,6 +19,7 @@ main ()
>        b[i * 2 + 1] = i * 8;
>      }
>  
> +#pragma GCC novector
>    for (i = 0; i < 158; ++i)
>      if (b[i*2] != (unsigned short)(i*7)
>          || b[i*2+1] != (unsigned short)(i*8))
> diff --git a/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c b/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c
> index 4373dce917f9d7916e128a639e81179fe1250ada..1154b40d4855b5a42187134e9d5f08a98a160744 100644
> --- a/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c
> +++ b/gcc/testsuite/gcc.dg/vect/section-anchors-pr27770.c
> @@ -22,6 +22,7 @@ int main (void)
>    int i;
>    check_vect ();
>    foo ();
> +#pragma GCC novector
>    for (i = 0; i < 100; i++)
>      if (f[i]!=1) 
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
> index e3466d0da1de6207b8583f42aad412b2c2000dcc..dbf65605e91c4219b6f5c6de220384ed09e999a7 100644
> --- a/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
> +++ b/gcc/testsuite/gcc.dg/vect/section-anchors-vect-69.c
> @@ -50,6 +50,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (tmp1[2].a.n[1][2][i] != 5)
> @@ -63,6 +64,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = NINTS - 1; i < N - 1; i++)
>      {
>        if (tmp1[2].a.n[1][2][i] != 6)
> @@ -81,6 +83,7 @@ int main1 ()
>    /* check results:  */
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>  	{
>            if (tmp1[2].e.n[1][i][j] != 8)
> @@ -100,6 +103,7 @@ int main1 ()
>    /* check results:  */
>    for (i = 0; i < N - NINTS; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N - NINTS; j++)
>  	{
>            if (tmp2[2].e.n[1][i][j] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-1.c b/gcc/testsuite/gcc.dg/vect/slp-1.c
> index 26b71d654252bcd2e4591f11a78a4c0a3dad5d85..82e4f6469fb9484f84c5c832d0461576b63ba8fe 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-1.c
> @@ -20,6 +20,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*4] != 8 
> @@ -42,6 +43,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] != 8
> @@ -66,6 +68,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*5] != 8
> @@ -91,6 +94,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (out[i*9] != 8
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-10.c b/gcc/testsuite/gcc.dg/vect/slp-10.c
> index da44f26601a9ba8ea52417ec5a160dc4bedfc315..2759b66f7772cb1af508622a3099bdfb524cba56 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-10.c
> @@ -46,6 +46,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
> @@ -68,6 +69,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  (in[i*4] + 2) * 3
> @@ -84,6 +86,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*4; i++)
>      {
>        if (out2[i*2] !=  (float) (in[i*2] * 2 + 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-11a.c b/gcc/testsuite/gcc.dg/vect/slp-11a.c
> index e6632fa77be8092524a202d6a322354b45e1794d..fcb7cf6c7a2c5d42ec7ce8bc081db7394ba2bd96 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-11a.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-11a.c
> @@ -44,6 +44,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-11b.c b/gcc/testsuite/gcc.dg/vect/slp-11b.c
> index d0b972f720be1c965207ded917f979957c76ee67..df64c8db350dbb12295c61e84d32d5a5c20a1ebe 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-11b.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-11b.c
> @@ -22,6 +22,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  (in[i*4] + 2) * 3
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-11c.c b/gcc/testsuite/gcc.dg/vect/slp-11c.c
> index bdcf434ce31ebc1df5f7cfecb5051ebc71af3aed..0f680cd4e60c41624992e4fb68d2c3664ff1722e 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-11c.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-11c.c
> @@ -21,6 +21,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*4; i++)
>      {
>        if (out[i*2] !=  ((float) in[i*2] * 2 + 6)
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c b/gcc/testsuite/gcc.dg/vect/slp-12a.c
> index 08a8f55bab0b3d09e7eae14354c515203146b3d8..f0dda55acaea38e463044c7495af1f57ac121ce0 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-12a.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c
> @@ -47,6 +47,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-12b.c b/gcc/testsuite/gcc.dg/vect/slp-12b.c
> index 48e78651a6dca24de91a1f36d0cd757e18f7c1b8..e2ea24d6c535c60ba903ce2411290e603414009a 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-12b.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-12b.c
> @@ -23,6 +23,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out2[i*2] !=  (float) (in[i*2] * 2 + 11)
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-12c.c b/gcc/testsuite/gcc.dg/vect/slp-12c.c
> index 6650b8bd94ece71dd9ccb9adcc3d17be2f2bc07a..9c48dff3bf486a8cd1843876975dfba40a055a23 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-12c.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-12c.c
> @@ -24,6 +24,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  (in[i*4] + 2) * 3
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
> index a16656ace00a6a31d0c056056ec2e3e1f050c09f..ca70856c1dd54f106c9f1c3cde6b0ff5f7994e74 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
> @@ -34,6 +34,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8] + i
> @@ -65,6 +66,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>          if (out2[i*12] != in2[i*12] + i
> @@ -100,6 +102,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>          if (out2[i*12] != in2[i*12] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-13.c b/gcc/testsuite/gcc.dg/vect/slp-13.c
> index 8769d62cfd4d975a063ad953344855091a1cd129..b7f947e6dbe1fb7d9a8aa8b5f6ac1edfc89d33a2 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-13.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-13.c
> @@ -28,6 +28,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8] + i
> @@ -59,6 +60,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>          if (out2[i*12] != in2[i*12] + i
> @@ -94,6 +96,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>          if (out2[i*12] != in2[i*12] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-14.c b/gcc/testsuite/gcc.dg/vect/slp-14.c
> index 6af70815dd43c13fc9abfcebd70c562268dea86f..ccf23c1e44b78ac62dc78eef0ff6c6bc26e99fc1 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-14.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-14.c
> @@ -64,6 +64,7 @@ main1 (int n)
>  }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-15.c b/gcc/testsuite/gcc.dg/vect/slp-15.c
> index dbced88c98d1fc8d289e6ac32a84dc9f4072e49f..13a0f3e3014d84a16a68a807e6a2730cbe8e6840 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-15.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-15.c
> @@ -64,6 +64,7 @@ main1 (int n)
>  }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-16.c b/gcc/testsuite/gcc.dg/vect/slp-16.c
> index a7da9932c54c28669875d46e3e3945962d5e2dee..d053a64276db5c306749969cca7f336ba6a19b0b 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-16.c
> @@ -38,6 +38,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*2] !=  (in[i*2] + 5) * 3 - 2
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-17.c b/gcc/testsuite/gcc.dg/vect/slp-17.c
> index 6fa11e4c53ad73735af9ee74f56ddff0b777b99b..c759a5f0145ac239eb2a12efa89c4865fdbf703e 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-17.c
> @@ -27,6 +27,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*2] != in[i*2] + 5
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-18.c b/gcc/testsuite/gcc.dg/vect/slp-18.c
> index ed426a344985d1e205f7a94f72f86954a77b3d92..f31088cb76b4cdd80460c0d6a24568430e595ea0 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-18.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-18.c
> @@ -57,6 +57,7 @@ main1 ()
>    }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-19a.c b/gcc/testsuite/gcc.dg/vect/slp-19a.c
> index 0f92de92cd396227cc668396cd567ca965e9784b..ca7a0a8e456b1b787ad82e910ea5e3c5e5048c80 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-19a.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-19a.c
> @@ -28,6 +28,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8]
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-19b.c b/gcc/testsuite/gcc.dg/vect/slp-19b.c
> index 237b36dd227186c8f0cb78b703351fdae6fef27c..4d53ac698dbd164d20271c4fe9ccc2c20f3c4eaa 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-19b.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-19b.c
> @@ -29,6 +29,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  in[i*4] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-19c.c b/gcc/testsuite/gcc.dg/vect/slp-19c.c
> index 32566cb5e1320de2ce9c83867c05902a24036de4..188ab37a0b61ba33ff4c19115e5c54e0f7bac500 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-19c.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-19c.c
> @@ -47,6 +47,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (out[i*12] !=  in[i*12]
> @@ -79,6 +80,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (out[i*6] !=  in[i*6]
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-2.c b/gcc/testsuite/gcc.dg/vect/slp-2.c
> index 8d374d724539a47930fc951888471a7b367cd845..d0de3577eb6a1b8219e8a79a1a684f6b1b7baf52 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-2.c
> @@ -25,6 +25,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*4] != a8 
> @@ -55,6 +56,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*16] != a8
> @@ -85,6 +87,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*3] != a8
> @@ -110,6 +113,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*11] != a8
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-20.c b/gcc/testsuite/gcc.dg/vect/slp-20.c
> index dc5eab669ea9eaf7db83606b4c426921a6a5da15..ea19095f9fa06db508cfedda68ca2c65769b35b0 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-20.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-20.c
> @@ -34,6 +34,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*4] != b0 
> @@ -77,6 +78,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*4] != b0 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c b/gcc/testsuite/gcc.dg/vect/slp-21.c
> index 4b83adb9807fc29fb9f2d618d15e8eb15290dd67..712a73b69d730fd27cb75d3ebb3624809317f841 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-21.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-21.c
> @@ -45,6 +45,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        a0 = in[i*4];
> @@ -101,6 +102,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        a0 = in[i*4];
> @@ -158,6 +160,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        a0 = in[i*4];
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-22.c b/gcc/testsuite/gcc.dg/vect/slp-22.c
> index e2a0002ffaf363fc12b76deaaee3067c9a0a186b..2c083dc4ea3b1d7d3c6b56508cc7465b76060aa1 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-22.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-22.c
> @@ -39,6 +39,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*4] != b0 
> @@ -92,6 +93,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*4] != b0 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-23.c b/gcc/testsuite/gcc.dg/vect/slp-23.c
> index d7c67fe2c6e9c6ecf94a2ddc8c1d7a4c234933c8..d32ee5ba73becb9e0b53bfc2af27a64571c56899 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-23.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-23.c
> @@ -39,6 +39,7 @@ main1 (s *arr)
>      } 
>     
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (res[i].c != arr[i].c + arr[i].c
> @@ -67,6 +68,7 @@ main1 (s *arr)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].c != arr[i].c + arr[i].c
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
> index abd3a878f1ac36a7c8cde58743496f79b71f4476..5eaea9600acb2b8ffe674730bcf9514b51ae105f 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
> @@ -42,6 +42,7 @@ main1 (unsigned char x, unsigned char max_result, unsigned char min_result, s *a
>      pIn++;
>    }
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      if (ua1[2*i] != ub[2*i]
>          || ua1[2*i+1] != ub[2*i+1]
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-24.c b/gcc/testsuite/gcc.dg/vect/slp-24.c
> index a45ce7de71fa6a8595b611dd47507df4e91e3b36..59178f2c0f28bdbf657ad68658d373e75d076f79 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-24.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-24.c
> @@ -41,6 +41,7 @@ main1 (unsigned char x, unsigned char max_result, unsigned char min_result, s *a
>      pIn++;
>    }
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      if (ua1[2*i] != ub[2*i]
>          || ua1[2*i+1] != ub[2*i+1]
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-25.c b/gcc/testsuite/gcc.dg/vect/slp-25.c
> index 1c33927c4342e01f80765d0ea723e01cec5fe2e6..9e3b5bbc9469fd0dc8631332643c1eb496652218 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-25.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-25.c
> @@ -24,6 +24,7 @@ int main1 (int n)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <= N/2; i++)
>      {
>        if (ia[2*i] != 25
> @@ -38,6 +39,7 @@ int main1 (int n)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <= n/2; i++)
>      {
>        if (sa[2*i] != 25
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c b/gcc/testsuite/gcc.dg/vect/slp-26.c
> index f8b49ff603c16127694e599137b1f48ea665c4db..d398a5acb0cdb337b442f071c96f3ce62fe84cff 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-26.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
> @@ -24,6 +24,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*4] !=  in[i*4]
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-28.c b/gcc/testsuite/gcc.dg/vect/slp-28.c
> index 0bb5f0eb0e40307558dc3ab826d583ea004891cd..67b7be29b22bb646b4bea2e0448e919319b11c98 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-28.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-28.c
> @@ -34,6 +34,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (in[i] != i+5)
> @@ -51,6 +52,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (in2[i] != (i % 4) + (i / 4) * 5)
> @@ -69,6 +71,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (in3[i] != (i % 12) + (i / 12) * 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c
> index 4cf0e7a0ece17204221c483bcac8fe9bdab3c85c..615a79f4a30f8002a989047c99eea13dd9f9e1a6 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-3-big-array.c
> @@ -32,6 +32,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8]
> @@ -54,6 +55,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  in[i*4]
> @@ -84,6 +86,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (out[i*16] !=  in[i*16]
> @@ -120,6 +123,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        if (out[i*9] !=  in[i*9]
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-3.c b/gcc/testsuite/gcc.dg/vect/slp-3.c
> index 760b3fa35a2a2018a103b344c329464ca8cb52fe..183c7e65c57ae7dfe3994757385d9968b1de45e5 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-3.c
> @@ -26,6 +26,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8]
> @@ -48,6 +49,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  in[i*4]
> @@ -78,6 +80,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (out[i*16] !=  in[i*16]
> @@ -114,6 +117,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        if (out[i*9] !=  in[i*9]
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-33.c b/gcc/testsuite/gcc.dg/vect/slp-33.c
> index 2404a5f19b407ef47d4ed6e597da9381629530ff..c382093c2329b09d3ef9e78abadd1f7ffe22dfda 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-33.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-33.c
> @@ -43,6 +43,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*7] !=  (in[i*7] + 5) * 3 - 2
> @@ -64,6 +65,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*3] !=  (in[i*3] + 2) * 3
> @@ -81,6 +83,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out2[i*3] !=  (float) (in[i*3] * 2 + 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
> index 9e9c8207f7bbb0235e5864b529869b6db3768087..0baaff7dc6e6b8eeb958655f964f234512cc4500 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-34-big-array.c
> @@ -36,6 +36,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*3] != in[i*3] + 5
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-34.c b/gcc/testsuite/gcc.dg/vect/slp-34.c
> index 1fd09069247f546a9614c47fca529da4bc465497..41832d7f5191bfe7f82159cde69c1787cfdc6d8c 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-34.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-34.c
> @@ -30,6 +30,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*3] != in[i*3] + 5
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-35.c b/gcc/testsuite/gcc.dg/vect/slp-35.c
> index 76dd7456d89859108440eb0be2374215a16cfa57..5e9f6739e1f25d109319da1db349a4063f5aaa1b 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-35.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-35.c
> @@ -32,6 +32,7 @@ main1 (s *arr)
>      } 
>     
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (res[i].c != arr[i].c + arr[i].c
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-37.c b/gcc/testsuite/gcc.dg/vect/slp-37.c
> index a765cd70a09c2eb69df6d85b2056f0d90fc4120f..caee2bb508f1824fa549568dd09911c8624222f4 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-37.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-37.c
> @@ -28,6 +28,7 @@ foo1 (s1 *arr)
>      } 
>     
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>         if (arr[i].a != 6 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c
> index 98ac3f1f2839c717d66c04ba4e0179d4497be33e..fcda45ff368511b350b25857f21b2eaeb721561a 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-4-big-array.c
> @@ -34,6 +34,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8]
> @@ -59,6 +60,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  in[i*4]
> @@ -92,6 +94,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (out[i*16] !=  in[i*16]
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-4.c b/gcc/testsuite/gcc.dg/vect/slp-4.c
> index e4f65bc37f8c5e45c1673d2218bf75a2a98b3daf..29e741df02ba0ef6874cde2a4410b79d1d7608ee 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-4.c
> @@ -28,6 +28,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8]
> @@ -53,6 +54,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  in[i*4]
> @@ -86,6 +88,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (out[i*16] !=  in[i*16]
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-41.c b/gcc/testsuite/gcc.dg/vect/slp-41.c
> index 2ad9fd2077231a0124c7fe2aaf37570a3a10f849..b96de4fbcb7f9a3c60b884a47bbfc52ebbe1dd44 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-41.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-41.c
> @@ -48,6 +48,7 @@ int main()
>         __asm__ volatile ("");
>      }
>    testi (ia, sa, 8, 32);
> +#pragma GCC novector
>    for (i = 0; i < 128; ++i)
>      if (sa[i] != ia[(i / 4) * 8 + i % 4])
>        abort ();
> @@ -58,6 +59,7 @@ int main()
>         __asm__ volatile ("");
>      }
>    testi2 (ia, sa, 8, 32);
> +#pragma GCC novector
>    for (i = 0; i < 128; ++i)
>      if (ia[i] != sa[(i / 4) * 8 + i % 4])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-43.c b/gcc/testsuite/gcc.dg/vect/slp-43.c
> index 3cee613bdbed4b7ca7a796d45776b833cff2d1a2..3d8ffb113276c3b244436b98048fe78112340e0c 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-43.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-43.c
> @@ -23,11 +23,13 @@ foo_ ## T ## _ ## N (T * __restrict__ in_, T * __restrict__ out_, int s) \
>  }
>  
>  #define TEST(T,N) \
> + _Pragma("GCC novector") \
>   do { \
>    memset (out, 0, 4096); \
>    foo_ ## T ## _ ## N ((T *)in, (T *)out, 1); \
>    if (memcmp (in, out, sizeof (T) * MAX_VEC_ELEMENTS * N) != 0) \
>      __builtin_abort (); \
> +  _Pragma("GCC novector") \
>    for (int i = sizeof (T) * MAX_VEC_ELEMENTS * N; i < 4096; ++i) \
>      if (out[i] != 0) \
>        __builtin_abort (); \
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-45.c b/gcc/testsuite/gcc.dg/vect/slp-45.c
> index fadc4e5924308d46aaac81a0d5b42564285d58ff..f34033004520f106240fd4a7f6a6538cb22622ff 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-45.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-45.c
> @@ -23,11 +23,13 @@ foo_ ## T ## _ ## N (T * __restrict__ in_, T * __restrict__ out_, int s) \
>  }
>  
>  #define TEST(T,N) \
> + _Pragma("GCC novector") \
>   do { \
>    memset (out, 0, 4096); \
>    foo_ ## T ## _ ## N ((T *)in, (T *)out, 1); \
>    if (memcmp (in, out, sizeof (T) * MAX_VEC_ELEMENTS * N) != 0) \
>      __builtin_abort (); \
> +  _Pragma("GCC novector") \
>    for (int i = sizeof (T) * MAX_VEC_ELEMENTS * N; i < 4096; ++i) \
>      if (out[i] != 0) \
>        __builtin_abort (); \
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-46.c b/gcc/testsuite/gcc.dg/vect/slp-46.c
> index 18476a43d3f61c07aede8d90ca69817b0e0b5342..2d5534430b39f10c15ab4d0bdab47bf68af86376 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-46.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-46.c
> @@ -54,6 +54,7 @@ main ()
>      }
>  
>    foo ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      if (x[i] != y[i/2])
>        abort ();
> @@ -65,6 +66,7 @@ main ()
>      }
>  
>    bar ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      if (x[i] != y[2*(i/2)])
>        abort ();
> @@ -76,6 +78,7 @@ main ()
>      }
>  
>    baz ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      if (x[i] != y[511 - i/2])
>        abort ();
> @@ -87,6 +90,7 @@ main ()
>      }
>  
>    boo ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      if (x[i] != y[2*(511 - i/2)])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-47.c b/gcc/testsuite/gcc.dg/vect/slp-47.c
> index 7b2ddf664dfefa97ac80f9f9eb7993e18980c411..7772bb71c8d013b8699bee644a3bb471ff41678f 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-47.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-47.c
> @@ -35,6 +35,7 @@ main ()
>      }
>  
>    foo ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      if (x[i] != y[1023 - i])
>        abort ();
> @@ -46,6 +47,7 @@ main ()
>      }
>  
>    bar ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      if (x[i] != y[1023 - i^1])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-48.c b/gcc/testsuite/gcc.dg/vect/slp-48.c
> index 0b327aede8e6bb53d01315553ed9f2c3c3dc3290..38f533233d657189851a8942e8fa8133a9d2eb91 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-48.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-48.c
> @@ -35,6 +35,7 @@ main ()
>      }
>  
>    foo ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      if (x[i] != y[1023 - i^1])
>        abort ();
> @@ -46,6 +47,7 @@ main ()
>      }
>  
>    bar ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      if (x[i] != y[1023 - i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-49.c b/gcc/testsuite/gcc.dg/vect/slp-49.c
> index 4141a09ed97a9ceadf89d394d18c0b0226eb55d7..b2433c920793c34fb316cba925d7659db356af28 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-49.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-49.c
> @@ -24,6 +24,7 @@ main()
>  
>    foo (17);
>  
> +#pragma GCC novector
>    for (int i = 0; i < 512; ++i)
>      {
>        if (a[2*i] != 5 + i
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-5.c b/gcc/testsuite/gcc.dg/vect/slp-5.c
> index 989e05ac8be6bdd1fb36c4bdc079866ce101e017..6d51f6a73234ac41eb2cc4d2fcedc8928d9932b2 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-5.c
> @@ -30,6 +30,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8]
> @@ -55,6 +56,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  in[i*4]
> @@ -86,6 +88,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (out[i*16] !=  in[i*16]
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-6.c b/gcc/testsuite/gcc.dg/vect/slp-6.c
> index ec85eb77236e4b8bf5e0c6a8d07abf44a28e2a5c..ea9f7889734dca9bfa3b28747c382e94bb2c1c84 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-6.c
> @@ -28,6 +28,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8] + 5
> @@ -50,6 +51,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  in[i*4] + 2
> @@ -80,6 +82,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (out2[i*16] !=  in2[i*16] * 2
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-7.c b/gcc/testsuite/gcc.dg/vect/slp-7.c
> index e836a1ae9b5b60685e8ec2d15ca5005ff35a895e..2845a99dedf5c99032b099a136acd96f37fc5295 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-7.c
> @@ -30,6 +30,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  in[i*8] + 5
> @@ -55,6 +56,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*2; i++)
>      {
>        if (out[i*4] !=  in[i*4] + 1
> @@ -86,6 +88,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out2[i*16] !=  in2[i*16] * 2
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-8.c b/gcc/testsuite/gcc.dg/vect/slp-8.c
> index e9ea0ef0d6b32d23977d728c943bac05dc982b2d..8647249f546267185bb5c232f088a4c0984f2039 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-8.c
> @@ -22,6 +22,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        if (fa[4*i] != (float) ib[4*i]      
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-9.c b/gcc/testsuite/gcc.dg/vect/slp-9.c
> index d5212dca3ddcbffabdc9fbed8f2380ffceee626d..4fb6953cced876c2a1e5761b0f94968c5774da9e 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-9.c
> @@ -35,6 +35,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != X[i] * Y[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
> index 482fc080a0fc132409509b084fcd67ef95f2aa17..450c7141c96b07b9f798c62950d3de30eeab9a28 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
> @@ -79,11 +79,13 @@ main ()
>        e[i] = 2 * i;
>      }
>    f1 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 ? 17 : 0))
>        abort ();
>  
>    f2 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        switch (i % 9)
> @@ -115,6 +117,7 @@ main ()
>    f3 ();
>  
>    f4 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 ? e[i] : d[i]))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
> index 57cc67ee121108bcc5ccaaee0dca5085264c8818..cb7eb94b3a3ba207d513e3e701cd1c9908000a01 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
> @@ -82,11 +82,13 @@ main ()
>      }
>  
>    f1 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 ? 17 : 0))
>        abort ();
>  
>    f2 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        switch (i % 9)
> @@ -118,6 +120,7 @@ main ()
>    f3 ();
>  
>    f4 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (f[i] != ((i % 3) == 0 ? e[i] : d[i]))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-2.c b/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
> index 7350695ece0f53e36de861c4e7724ebf36ff6b76..1dcee46cd9540690521df07c9cacb608e37b62b7 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
> @@ -82,11 +82,13 @@ main ()
>      }
>  
>    f1 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 ? 17 : 0))
>        abort ();
>  
>    f2 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        switch (i % 9)
> @@ -118,6 +120,7 @@ main ()
>    f3 ();
>  
>    f4 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (f[i] != ((i % 3) == 0 ? e[i] : d[i]))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-3.c b/gcc/testsuite/gcc.dg/vect/slp-cond-3.c
> index d19ec13a21ac8660cc326dfaa4a36becab219d82..64904b001e6a39623eff9a1ddc530afbc5e64687 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-cond-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-cond-3.c
> @@ -72,6 +72,7 @@ int main ()
>      }
>  
>    bar (a, b, c, d, e, 2);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (e[i] != ((i % 3) == 0 ? 10 : 2 * i))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-4.c b/gcc/testsuite/gcc.dg/vect/slp-cond-4.c
> index f82b8416d8467a8127fbb498040c5559e33d6608..0e1bd3b40994016bb6232bd6a1e129602c03167b 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-cond-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-cond-4.c
> @@ -75,6 +75,7 @@ int main ()
>      }
>  
>    bar (a, b, c, d, e, 2);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (e[i] != ((i % 3) == 0 ? 5 : i))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-5.c b/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
> index 5ade7d1fbad9eee7861d1b0d12ac98e42d453422..f0a703f0030b4c01d4119c812086de2a8e78ff4f 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-cond-5.c
> @@ -70,6 +70,7 @@ int main ()
>      }
>  
>    bar (a, b, c, d, e, 2);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (e[i] != ((i % 3) == 0 || i <= 5 ? 10 : 2 * i))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c
> index 1850f063eb4fc74c26a9b1a1016f9d70a0c28441..605f6ab8ba638175d557145c82f2b78c30eb5835 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-1.c
> @@ -26,6 +26,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (sout[i*4] != 8 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c
> index 62580c070c8e19468812a9c81edc1c5847327ebb..06d9029e9202b15dc8de6d054779f9d53fbea60d 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-10.c
> @@ -26,6 +26,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*4; i++)
>      {
>        if (out[i].a !=  (unsigned char) in[i*2] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c
> index a3d0670cea98379af381fd7282f28e9724096a93..2792b932734a7a8ad4958454de56956081753d7c 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11-big-array.c
> @@ -34,6 +34,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i].a !=  (int) in[i*3] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
> index 86a3fafa16f41dc2c63f4704b85268330ad5568d..5c75dc12b695785405b7d56891e7e71ac24e2539 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-11.c
> @@ -28,6 +28,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i].a !=  (int) in[i*3] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c
> index d4c929de2ecbc73c75c08ae498b8b400f67bf636..13119822200fef23a96e920bde8ca968f0a09f84 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-12.c
> @@ -32,6 +32,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (sout[i*4] != 8 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c
> index 28a645c79472578d3775e9e2eb28cb7ee69efad0..c15baa00dd00fb8fa0ae79470d846b31ee4dd578 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c
> @@ -41,6 +41,7 @@ main1 (unsigned short a0, unsigned short a1, unsigned short a2,
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*16] != a8
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c
> index 39bd7c41f8cca2a517486bc9a9898031911115c6..c79906a8d7b30834dfcda5c70d6bf472849a39cb 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-3.c
> @@ -45,6 +45,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (out[i*8] !=  in[i*8] + 5
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c
> index faf17d6f0cde5eacb7756996a224e4004b305f7f..b221f705070af661716d1d6fbf70f16ef3652ca9 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c
> @@ -26,6 +26,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  (int) in[i*8] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c
> index fb4f720aa4935da6862951a3c618799bb37f535f..3237773e1b13223164473ad88b3c806c8df243b2 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c
> @@ -26,6 +26,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  (short) in[i*8] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c
> index f006d081346aa4f067d1e02018f2c46d4fcf1680..e62d16b6de34ce1919545a5815600263931e11ac 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-6.c
> @@ -26,6 +26,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  (unsigned char) in[i*8] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c
> index 286e2fc42af815dcc724f1a66d8d01a96c915beb..08ab2dc3d10f6ab208841e53609dc7c672a69c5e 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-7.c
> @@ -26,6 +26,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i*8] !=  (int) in[i*8] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c
> index d88ebe4d778c4487c00ef055059d2b825542679a..0b67ecc8e0730813966cfd6922e8d3f9db740408 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-8.c
> @@ -20,6 +20,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*4; i++)
>      {
>        if (out[i*2] !=  (int) in[i*2] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c b/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c
> index 872b20cac93c119854b8250eb85dc43767743da4..49261483166cbd6dcf99800a5c7062f7f091c103 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-multitypes-9.c
> @@ -20,6 +20,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N*4; i++)
>      {
>        if (out[i*2] !=  (unsigned char) in[i*2] + 1
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> index ca7803ec1a9a49b4800cf396bcdc05f263f344ee..dbb107f95fec3338b135ff965e8be2b514cc1fe6 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-1.c
> @@ -69,6 +69,7 @@ int main (int argc, const char* argv[])
>  
>    foo (input, output);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (output[i] != check_results[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-10.c b/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
> index 678152ba4168d32f84a1d1b01ba6c43b210ec8b9..2cce30c2444323ba6166ceee6a768fbd9d881a47 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-10.c
> @@ -35,6 +35,7 @@ int main ()
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < 32; ++i)
>      if (b[i*8+0] != i*8+0
>  	|| b[i*8+1] != i*8+0
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-11.c b/gcc/testsuite/gcc.dg/vect/slp-perm-11.c
> index 0318d468ef102cb263d090a33429849221dc3c0d..0d25d9d93bbf14b64fb6f2c116fe70bf17b5f432 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-11.c
> @@ -26,6 +26,7 @@ int main ()
>        __asm__ volatile ("");
>      }
>    foo (4);
> +#pragma GCC novector
>    for (i = 0; i < 64; ++i)
>      if (a[i] != (4*(i/2) + (i & 1) ^ 1))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-12.c b/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
> index 113223ab0f96507b74cfff8fc6b112070cabb5ee..642b1e8b399e7ffc77e54e02067eec053ea54c7e 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
> @@ -42,6 +42,7 @@ int main()
>  
>    test (a, b);
>  
> +#pragma GCC novector
>    for (i = 0; i < 64; ++i)
>      if (a[i] != 253)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
> index 82776f3f06af8a7b82e0d190a922b213d17aee88..41fd159adce8395dd805f089e94aacfe7eeba09f 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-2.c
> @@ -43,6 +43,7 @@ int main (int argc, const char* argv[])
>  
>    foo (input, output);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (output[i] != check_results[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
> index 1807275d1bfcc895ed68bd5e536b5837adf336e6..9ea35ba5afca2db0033150e35fca6b961b389c03 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-3.c
> @@ -56,6 +56,7 @@ int main (int argc, const char* argv[])
>  
>    foo (input, output);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (output[i] != check_results[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c b/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
> index 8457e4f45d62d6d704145b1c4f62af14c1877762..107968f1f7ce65c53bf0280e700f659f625d8c1e 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-4.c
> @@ -103,6 +103,7 @@ int main (int argc, const char* argv[])
>  
>    foo (input, output);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (output[i] != check_results[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-5.c b/gcc/testsuite/gcc.dg/vect/slp-perm-5.c
> index b86a3dc8756e0d30551a40ed1febb142813190a4..7128cf471555d5f589b11e1e58a65b0211e7d6fd 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-5.c
> @@ -96,6 +96,7 @@ int main (int argc, const char* argv[])
>  
>    foo (input, output, input2, output2);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>       if (output[i] != check_results[i] || output2[i] != check_results2[i])
>         abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-6.c b/gcc/testsuite/gcc.dg/vect/slp-perm-6.c
> index bec1544650ac9e897ab1c06f120fb6416091dec6..5cc6261d69a15d2a3f6b691c13544c27dc8f9941 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-6.c
> @@ -95,6 +95,7 @@ int main (int argc, const char* argv[])
>  
>    foo (input, output, input2, output2);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>       if (output[i] != check_results[i] || output2[i] != check_results2[i])
>         abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
> index 346411fd5042add21fdc6413922506bcb92f4594..df13c37bc75d43173d4e1b9d0daf533ba5829c7f 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
> @@ -88,6 +88,7 @@ int main (int argc, const char* argv[])
>  
>    foo (input, output, input2, output2);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>       if (output[i] != check_results[i] || output2[i] != check_results2[i])
>         abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
> index 44df21aae2a2f860d49c36568122733e693d4310..029be5485b62ffef915f3b6b28306501852733d7 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
> @@ -52,6 +52,7 @@ int main (int argc, const char* argv[])
>  
>    foo (input, output);
>  
> +#pragma GCC novector
>    for (i = 0; i < N - (N % 3); i++)
>       if (output[i] != check_results[i])
>         abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
> index 154c00af598d05bac9ebdad3bfb4eeb28594a1fc..c92fc2f38619a5c086f7029db444a6cb208749f0 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-9.c
> @@ -50,6 +50,7 @@ int main (int argc, const char* argv[])
>  
>    foo (input, output);
>  
> +#pragma GCC novector
>    for (i = 0; i < N - (N % 3); i++)
>       if (output[i] != check_results[i])
>         abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
> index e3bfee33348c5164f657a1494f480db26a7aeffa..72811eb852e5ed51ed5f5d042fac4e9b487911c2 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c
> @@ -40,6 +40,7 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (out[i] != in[i] * COEF || out2[i] != in[i] + COEF2)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c
> index abb10fde45bc807269cd5bb58f463a77f75118d8..f8ec1fa730d21cde5f2bbb0791b04ddf0e0b358c 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c
> @@ -29,6 +29,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != X[i] * Y[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c
> index 0756119afb455a0b834fd835553318eb29887f4d..76507c4f46157a8ded48e7c600ee53424e01382f 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c
> @@ -29,6 +29,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != X[i] * Y[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-100.c b/gcc/testsuite/gcc.dg/vect/vect-100.c
> index 9a4d4de06718228fcc0bd011d2e23d4c564c29ff..0d8703281f28c995a7c08c4366a4fccf22cf16e2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-100.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-100.c
> @@ -30,6 +30,7 @@ int main1 () {
>      }
>  
>    /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>         if (p->a[i] != a[i] || p->b[i] != b[i])
> @@ -55,6 +56,7 @@ int main2 () {
>      }
>  
>    /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>         if (p->a[i] != c[i] || p->b[i] != d[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-103.c b/gcc/testsuite/gcc.dg/vect/vect-103.c
> index d03562f7cddd0890e3e159fbdc7c5d629b54d58c..59d8edc38cacda52e53a5d059171b6eefee9f920 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-103.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-103.c
> @@ -43,6 +43,7 @@ int main1 (int x, int y) {
>    /* check results: */
>    if (p->a[0] != a[N - 1])
>      abort ();
> +#pragma GCC novector
>    for (i = 1; i < N; i++)
>      if (p->a[i] != b[i - 1])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-104.c b/gcc/testsuite/gcc.dg/vect/vect-104.c
> index a77c98735ebad6876c97ee22467f5287b4575a01..e0e5b5a53bdae1e148c61db716f0290bf3e829f1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-104.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-104.c
> @@ -43,6 +43,7 @@ int main1 (int x) {
>    }
>  
>    /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>     {
>      for (j = 0; j < N; j++)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c
> index 433565bfd4d3cea87abe23de29edbe8823054515..ec7e676439677ae587a67eae15aab34fd5ac5b03 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-105-big-array.c
> @@ -75,6 +75,7 @@ int main1 (int x) {
>    /* check results: */
>    for (i = 0; i < N; i++)
>     {
> +#pragma GCC novector
>      for (j = 0; j < N; j++)
>       {
>         if (p->a[i][j] != c[i][j])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-105.c b/gcc/testsuite/gcc.dg/vect/vect-105.c
> index 17b6e89d8f69053b5825c859f3ab5c68c49b3a5d..f0823fbe397358cb34bf4654fccce21a053ba2a7 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-105.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-105.c
> @@ -45,6 +45,7 @@ int main1 (int x) {
>    /* check results: */
>    for (i = 0; i < N; i++)
>     {
> +#pragma GCC novector
>      for (j = 0; j < N; j++)
>       {
>         if (p->a[i][j] != c[i][j])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-106.c b/gcc/testsuite/gcc.dg/vect/vect-106.c
> index 0171cfcdfa6e60e6cb8158d098d435c0e472abf8..4b3451cc783e9f83f7a6cb8c54cf50f4c43dddc0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-106.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-106.c
> @@ -28,6 +28,7 @@ int main1 () {
>      }
>  
>    /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>         if (*q != a[i] || *p != b[i])
> @@ -50,6 +51,7 @@ int main1 () {
>    q = q1;
>    p = p1;
>    /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>         if (*q != b[i] || *p != a[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-107.c b/gcc/testsuite/gcc.dg/vect/vect-107.c
> index aaab9c00345bf7f0b25fbcda25a141988bda9eac..60c83a99a19f4797bc7a5a175f33aecbc598f8e2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-107.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-107.c
> @@ -24,6 +24,7 @@ main1 (void)
>      }
>  
>    /* Check results.  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (a[i] != b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i]
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-108.c b/gcc/testsuite/gcc.dg/vect/vect-108.c
> index 4af6326e9c35963ec7109d66dd0d321cf1055597..2cbb6701d5c6df749482d5e4351b9cb4a808b94f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-108.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-108.c
> @@ -21,6 +21,7 @@ main1 (void)
>      }
>  
>    /* Check results.  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != ib[i] * ic[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-109.c b/gcc/testsuite/gcc.dg/vect/vect-109.c
> index fe7ea6c420fb1512286b0b468cbe9ffed5daae71..31b9aa2be690fb4f2d9cf8062acbf1b42971098d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-109.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-109.c
> @@ -34,6 +34,7 @@ int main1 (int n)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (sa[i+2] != sb[i] + sc[i] || ia[i+1] != ib[i] + ic[i])
> @@ -56,6 +57,7 @@ int main2 (int n)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (sa[i] != sb[i] + sc[i] || ia[i+1] != ib[i] + ic[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-11.c b/gcc/testsuite/gcc.dg/vect/vect-11.c
> index 044fc5edc2dddb0bddaca545b4e97de1499be8bd..1171757e323bc9a64c5e6762e98c101120fc1449 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-11.c
> @@ -22,6 +22,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != ib[i] * ic[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-110.c b/gcc/testsuite/gcc.dg/vect/vect-110.c
> index 47c6456107ddd4f326e8c9e783b01c59e23087e6..69ee547cfd17965f334d0d1af6bc28f99ae3a671 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-110.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-110.c
> @@ -20,6 +20,7 @@ main1 (void)
>    }
>  
>    /* Check results.  */
> +#pragma GCC novector
>    for (i = 0; i < N - 1; i++){
>      if (a[i] != b[i] + c[i])
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-113.c b/gcc/testsuite/gcc.dg/vect/vect-113.c
> index a9d45ce9fcc21195030dfcdf773ffc3a41e48a37..8e9cc545ce6b3204b5c9f4a220e12d0068aa4f3e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-113.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-113.c
> @@ -17,6 +17,7 @@ main1 (void)
>      a[i] = i;
>    }
>  
> +#pragma GCC novector
>    for ( i = 0; i < N; i++) 
>    {
>      if (a[i] != i)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-114.c b/gcc/testsuite/gcc.dg/vect/vect-114.c
> index 557b44110a095ae725b58cf1ca2494a103b96dd7..1617d3009eb3fdf0bb16980feb0f54d2862b8f3c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-114.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-114.c
> @@ -19,6 +19,7 @@ main1 (void)
>      }
>  
>    /* Check results.  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (a[i] != b[N-1-i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-115.c b/gcc/testsuite/gcc.dg/vect/vect-115.c
> index 0502d15ed3ebd37d8dda044dbe13d68525f3e30a..82b8e2eea1f3374bdbe5460ca58641f217d1ab33 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-115.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-115.c
> @@ -41,6 +41,7 @@ int main1 ()
>      }
>  
>    /* Check results.  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (tmp1.strc_t.strc_s.b[i] != a[i])
> @@ -54,6 +55,7 @@ int main1 ()
>      }
>  
>    /* Check results.  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (tmp1.ptr_t->strc_s.c[i] != a[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-116.c b/gcc/testsuite/gcc.dg/vect/vect-116.c
> index d4aa069772ed76f895f99c91609852bdcc43d324..ac603db44ee2601665c1de4bb60aee95f545c8ef 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-116.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-116.c
> @@ -18,6 +18,7 @@ void foo()
>    for (i = 0; i < 256; ++i)
>      C[i] = A[i] * B[i];
>  
> +#pragma GCC novector
>    for (i = 0; i < 256; ++i)
>      if (C[i] != (unsigned char)(i * i))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-117.c b/gcc/testsuite/gcc.dg/vect/vect-117.c
> index 22f8e01187272e2cfe445c66ca590f77923d4e95..f2c1c5857059a9bcaafad4ceadff02e192209840 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-117.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-117.c
> @@ -47,6 +47,7 @@ int main (void)
>  
>    for (i = 0; i < N; i++)
>     {
> +#pragma GCC novector
>      for (j = 0; j < N; j++)
>       {
>         if (a[i][j] != c[i][j])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-11a.c b/gcc/testsuite/gcc.dg/vect/vect-11a.c
> index 4f1e15e74293187d495c8c11cda333a1af1139a6..9d93a2e8951f61b34079f6d867abfaf0fccbb8fc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-11a.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-11a.c
> @@ -21,6 +21,7 @@ void u ()
>    
>    for (i=0; i<8; i++)
>      C[i] = A[i] * B[i];
> +#pragma GCC novector
>    for (i=0; i<8; i++)
>      if (C[i] != Answer[i])
>        abort ();
> @@ -41,6 +42,7 @@ void s()
>    
>    for (i=0; i<8; i++)
>      F[i] = D[i] * E[i];
> +#pragma GCC novector
>    for (i=0; i<8; i++)
>      if (F[i] != Dnswer[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-12.c b/gcc/testsuite/gcc.dg/vect/vect-12.c
> index b095170f008c719326a6cfd5820a7926ae8c722e..096ff10f53c9a4d7e0d3a8bbe4d8ef513a82c46c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-12.c
> @@ -24,6 +24,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != ib[i] + ic[i] || sa[i] != sb[i] + sc[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-122.c b/gcc/testsuite/gcc.dg/vect/vect-122.c
> index 04dae679647ff9831224b6dc200a25b2b1bb28d7..6e7a4c1578f4c4cddf43a81e3e4bc6ab87efa3ca 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-122.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-122.c
> @@ -50,6 +50,7 @@ main ()
>    f2 ();
>    f3 ();
>    f4 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != i || b[i] != i || l[i] != i * (i + 7LL) || m[i] != i * 7LL)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-124.c b/gcc/testsuite/gcc.dg/vect/vect-124.c
> index c720648aaddbe72d0073fcf7548408ce6bda3cdd..6b6730a22bdb62e0f8770b4a288aa1adeff756c2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-124.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-124.c
> @@ -21,6 +21,7 @@ main ()
>    
>    check_vect ();
>    foo (6);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != i * 3 + 6)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-13.c b/gcc/testsuite/gcc.dg/vect/vect-13.c
> index 5d902924ec20e2ea0ee29418a1b52d4e2ede728e..f1e99a3ec02487cd331e171c6e42496924e931a2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-13.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-13.c
> @@ -22,6 +22,7 @@ int main1()
>      }
>  
>    /* Check results  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != results[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-14.c b/gcc/testsuite/gcc.dg/vect/vect-14.c
> index 1640220a134ed8962e31b9d201c0e4a8630d631f..5898d4cd8924a5a6036f38efa79bc4146a78320d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-14.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-14.c
> @@ -17,6 +17,7 @@ int main1 ()
>    }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != i)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c
> index 5313eae598b4787e5294eefe87bf59f5a3581657..bc2689fce50cebf55720bfc9f60bd7c0dd9659dc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-15-big-array.c
> @@ -25,6 +25,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (a[i] != b[N-1-i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-15.c b/gcc/testsuite/gcc.dg/vect/vect-15.c
> index 178bc4404c420c3a7d74ca381f3503aaefc195db..4a73d0681f0db2b12e68ce805f987aabf8f1cf6f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-15.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-15.c
> @@ -19,6 +19,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (a[i] != b[N-1-i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-17.c b/gcc/testsuite/gcc.dg/vect/vect-17.c
> index 471a82336cf466856186eb9ad3f7a95e4087cedc..797444a4c4a312d41d9b507c5d2d024e5b5b87bb 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-17.c
> @@ -81,6 +81,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ia[i] != (ib[i] & ic[i]))
> @@ -95,6 +96,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ca[i] != (cb[i] & cc[i]))
> @@ -109,6 +111,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (sa[i] != (sb[i] & sc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-18.c b/gcc/testsuite/gcc.dg/vect/vect-18.c
> index 28b2941e581fa6abecbdafaa812cf4ff07ea9e5f..8c0fab43e28da6193f1e948e0c59985b2bff1119 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-18.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-18.c
> @@ -80,6 +80,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ia[i] != (ib[i] | ic[i]))
> @@ -94,6 +95,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ca[i] != (cb[i] | cc[i]))
> @@ -108,6 +110,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (sa[i] != (sb[i] | sc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-19.c b/gcc/testsuite/gcc.dg/vect/vect-19.c
> index 27c6dc835a60c42e8360521d343b13f461a0b009..fe2a88c7fd855a516c34ff3fa3b5da5364fb0a81 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-19.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-19.c
> @@ -80,6 +80,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ia[i] != (ib[i] ^ ic[i]))
> @@ -94,6 +95,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ca[i] != (cb[i] ^ cc[i]))
> @@ -108,6 +110,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (sa[i] != (sb[i] ^ sc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c
> index 162cb54b58d17efc205778adc14e846be39afab1..70595db744e349bdc6d786c7e64b762406689c64 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-2-big-array.c
> @@ -26,6 +26,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ca[i] != cb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-2.c b/gcc/testsuite/gcc.dg/vect/vect-2.c
> index d975668cbd023b0324c7526e162bc1aeb21dfcd7..80415a5b54b75f9e9b03f0123a53fd70ee07e7cd 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-2.c
> @@ -20,6 +20,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ca[i] != cb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-20.c b/gcc/testsuite/gcc.dg/vect/vect-20.c
> index 8d759f3c6a66e6a6e318510ba59196ab91b757ac..0491bb2fc73bcef98cb26e82fb74778c8fea2dc0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-20.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-20.c
> @@ -52,6 +52,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ia[i] != ~ib[i])
> @@ -66,6 +67,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ca[i] != ~cb[i])
> @@ -80,6 +82,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (sa[i] != ~sb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-21.c b/gcc/testsuite/gcc.dg/vect/vect-21.c
> index ab77df6ef88890907f57a89870e645bb51d51c5a..f98ae8b22ee3e8bbb2c8e4abbc6022c11150fdb1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-21.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-21.c
> @@ -80,6 +80,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ia[i] != !ib[i])
> @@ -94,6 +95,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ca[i] != !cb[i])
> @@ -108,6 +110,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (sa[i] != !sb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-22.c b/gcc/testsuite/gcc.dg/vect/vect-22.c
> index 78dc1ce91def46c31e913806aada5907d02fd4e0..3ab5070d94e85e8d332f55fe8511bbb82df781a6 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-22.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-22.c
> @@ -63,6 +63,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ia[i] != -ib[i])
> @@ -77,6 +78,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ca[i] != -cb[i])
> @@ -91,6 +93,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (sa[i] != -sb[i])
> @@ -105,6 +108,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (fa[i] != -fb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-23.c b/gcc/testsuite/gcc.dg/vect/vect-23.c
> index 69e0848c8eca10661d85a2f0b17b9a3d99319135..1a1c0b415a9247a3ed2555ca094d0a59e698384b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-23.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-23.c
> @@ -80,6 +80,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ia[i] != ib[i] && ic[i])
> @@ -94,6 +95,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ca[i] != cb[i] && cc[i])
> @@ -108,6 +110,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (sa[i] != sb[i] && sc[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-24.c b/gcc/testsuite/gcc.dg/vect/vect-24.c
> index fa4c0620d29cd44b82fc75f0dc3bab8a862058d9..2da477077111e04d86801c85282822319cd8cfb8 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-24.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-24.c
> @@ -81,6 +81,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ia[i] != (ib[i] || ic[i]))
> @@ -95,6 +96,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (ca[i] != (cb[i] || cc[i]))
> @@ -109,6 +111,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (sa[i] != (sb[i] || sc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-25.c b/gcc/testsuite/gcc.dg/vect/vect-25.c
> index 904eea8a17b7572ffa335dcf60d27df648f01f18..d665c3e53cde7e5be416a88ace81f68343c1f115 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-25.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-25.c
> @@ -19,6 +19,7 @@ int main1 (int n, int *p)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != n)
> @@ -32,6 +33,7 @@ int main1 (int n, int *p)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ib[i] != k)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-26.c b/gcc/testsuite/gcc.dg/vect/vect-26.c
> index 8a141f38400308c35a99aa77b0d181a4dce0643c..2ea9aa93dc46dbf11c91d468cdb91a1c0936b323 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-26.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-26.c
> @@ -21,6 +21,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <= N; i++)
>      {
>        if (ia[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-27.c b/gcc/testsuite/gcc.dg/vect/vect-27.c
> index ac86b21aceb7b238665e86bbbd8a46e2aaa4d162..d459a84cf85d285e56e4abb5b56b2c6157db4b6a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-27.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-27.c
> @@ -29,6 +29,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <= N; i++)
>      {
>        if (ia[i-1] != ib[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-28.c b/gcc/testsuite/gcc.dg/vect/vect-28.c
> index e213df1a46548d7d2962335c5600c252d9d5d5f3..531a7babb214ed2e6694f845c4b1d6f66f1c5d31 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-28.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-28.c
> @@ -21,6 +21,7 @@ int main1 (int off)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i+off] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-29.c b/gcc/testsuite/gcc.dg/vect/vect-29.c
> index bbd446dfe63f1477f91e7d548513d99be4c11d7d..42fb0467f1e31b0e89ef9323b60e3360c970f222 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-29.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-29.c
> @@ -30,6 +30,7 @@ int main1 (int off)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != ib[i+off])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-3.c b/gcc/testsuite/gcc.dg/vect/vect-3.c
> index 6fc6557cf9f13e9dcfb9e4198b4846bca44542ba..2c9b5066dd47f8b654e005fb6fac8a5a28f48111 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-3.c
> @@ -29,6 +29,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        float fres = b[i] + c[i] + d[i];
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-30.c b/gcc/testsuite/gcc.dg/vect/vect-30.c
> index 71f7a2d169f44990a59f57dcecd83e0a2824f81d..3585ac8cfefa1bd2c89611857c11de23d846f3f6 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-30.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-30.c
> @@ -21,6 +21,7 @@ int main1 (int n)
>    }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (a[i] != b[i])
> @@ -43,6 +44,7 @@ int main2 (unsigned int n)
>    }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < nn; i++)
>      {
>        if (c[i] != b[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c
> index 5621eb4d4ba17aaa6321807ee2d3610e38f8cceb..24bd0c7737df02a6b5dd5de9e745be070b0d8468 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-31-big-array.c
> @@ -31,6 +31,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.b[i] != 5)
> @@ -44,6 +45,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.c[i] != 6)
> @@ -57,6 +59,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.d.k[i] != 7)
> @@ -70,6 +73,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.e.k[i] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-31.c b/gcc/testsuite/gcc.dg/vect/vect-31.c
> index 3f7d00c1748058ef662710eda30d89f0a0560f2f..8e1274bae53d95cbe0a4e959fe6a6002dede7590 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-31.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-31.c
> @@ -31,6 +31,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.b[i] != 5)
> @@ -44,6 +45,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.c[i] != 6)
> @@ -57,6 +59,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.d.k[i] != 7)
> @@ -70,6 +73,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N/2; i++)
>      {
>        if (tmp.e.k[i] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c
> index 3e1403bbe96948188e7544d05f183a271828640f..5a4053ee8212ecb0f3824f2d0b2e6e03cb8e09ed 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-32-big-array.c
> @@ -19,6 +19,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-32.c b/gcc/testsuite/gcc.dg/vect/vect-32.c
> index 2684cf2e0d390406e4c6c2ac30ac178ecfe70d5c..b04cbeb7c8297d589608b1e7468d536a5f265337 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-32.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-32.c
> @@ -23,6 +23,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
> index c1aa399a240e8c7f50ae10610e2c40d41ea8d555..c3bfaaeb055183ee7a059a050d2fc8fe139bbbae 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-33-big-array.c
> @@ -23,6 +23,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-33.c b/gcc/testsuite/gcc.dg/vect/vect-33.c
> index e215052ff777a911358e1291630df9cabd27e343..8ffd888d482bc91e10b225317b399c0926ba437a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-33.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-33.c
> @@ -22,6 +22,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c
> index 0aa6d507a82f086056113157bc4b7ce0d5a87691..c3d44b4d15fef5b719cf618293bbc2a541582f4a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-34-big-array.c
> @@ -26,6 +26,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != cb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-34.c b/gcc/testsuite/gcc.dg/vect/vect-34.c
> index 9cc590253c78317843930fff480b64aaa68de2e2..e3beba56623e9312c1bcfcc81b96d19adb36d83f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-34.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-34.c
> @@ -21,6 +21,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != cb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
> index 28a99c910fd507414a4a732a6bcc93c4ce142ba6..a88d111b21a0ce2670311103678fb91bf1aff80f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
> @@ -26,6 +26,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.a[i] != i + 1)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-35.c b/gcc/testsuite/gcc.dg/vect/vect-35.c
> index a7ec0f16d4cf0225c2f62c2f0aabf142704b2af8..4267c0bebaef82f5a58601daefd7330fff21c5b1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-35.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-35.c
> @@ -26,6 +26,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.a[i] != i + 1)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c
> index d40fcb6d9925de2730acfd37dba2724904159ebb..9aa3bd7c2f40991ef8a3682058d8aea1bab9ba05 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-36-big-array.c
> @@ -27,6 +27,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != s.cb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-36.c b/gcc/testsuite/gcc.dg/vect/vect-36.c
> index 64bc7fe18095178bc4bc0db5ef93e4c6706fa7d2..59bef84ad2e134c2a47746fb0daf96f0aaa92a34 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-36.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-36.c
> @@ -27,6 +27,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.ca[i] != s.cb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-38.c b/gcc/testsuite/gcc.dg/vect/vect-38.c
> index 01d984c61b8245997b4db358dd579fc2042df9ff..81d9f38515afebd9e7e8c85a08660e4ff09aa571 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-38.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-38.c
> @@ -19,6 +19,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ca[i] != cb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-4.c b/gcc/testsuite/gcc.dg/vect/vect-4.c
> index b0cc45be7de6c24af16f0abedf34bc98370ae3e7..393c88df502ecd9261ac45a8366de969bfee84ae 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-4.c
> @@ -21,6 +21,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (a[i] != b[i] * c[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-40.c b/gcc/testsuite/gcc.dg/vect/vect-40.c
> index c74703268f913194119e89982092ec4ce7fa0fde..d524b4ebd433b434f55ca1681ef8ade732dfa1bc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-40.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-40.c
> @@ -12,6 +12,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-42.c b/gcc/testsuite/gcc.dg/vect/vect-42.c
> index 086cbf20c0a2cf7c38ede4e9db30042ac3237972..c1d16f659f130aeabbce4fcc1c1ab9d2cb46e12d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-42.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-42.c
> @@ -14,6 +14,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-44.c b/gcc/testsuite/gcc.dg/vect/vect-44.c
> index f7f1fd28665f23560cd7a2f397a0c773290c923f..b6895bd1d8287a246c2581ba24132f344dabb27e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-44.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-44.c
> @@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-46.c b/gcc/testsuite/gcc.dg/vect/vect-46.c
> index 185ac1424f94956fbcd5b26d0f4e6d36fd5f708b..7ca8b56ea9ffc50ae1cc99dc74662aea60d63023 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-46.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-46.c
> @@ -12,6 +12,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-48.c b/gcc/testsuite/gcc.dg/vect/vect-48.c
> index b29fe47635a349c0a845c43655c1a44d569d765e..10d8e09cac1daafeb0d5aa6e12eb7f3ecf6d33fc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-48.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-48.c
> @@ -30,6 +30,7 @@ main1 (float *pb, float *pc)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-5.c b/gcc/testsuite/gcc.dg/vect/vect-5.c
> index 17f3b2fac9a72f11b512659046dd8710d2e2f9a2..a999989215aa7693a1520c261d690c66f6f9ba13 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-5.c
> @@ -25,6 +25,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (a[i] != c[i])
> @@ -38,6 +39,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != d[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-50.c b/gcc/testsuite/gcc.dg/vect/vect-50.c
> index f43676896af4b9de482521b4aa915a47596ff4a9..76304cd10ce00881de8a2a6dc37fddf100e534c5 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-50.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-50.c
> @@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-52.c b/gcc/testsuite/gcc.dg/vect/vect-52.c
> index c20a4be2edee6c958ae150b7de81121d01b2ab8a..2ad7149fc612e5df4adc390dffc6a0e72717308f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-52.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-52.c
> @@ -30,6 +30,7 @@ main1 (int n, float *pb, float *pc)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-54.c b/gcc/testsuite/gcc.dg/vect/vect-54.c
> index 2b236e48e196106b7892d3f28b4bd901a700ff9c..7ae59c3e4d391200bcb46a1b3229c30ed26b6083 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-54.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-54.c
> @@ -14,6 +14,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (pa[i+1] != (pb[i+1] * pc[i+1]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-56.c b/gcc/testsuite/gcc.dg/vect/vect-56.c
> index c914126ece5f5929d316c5c107e7633efa4da55c..a8703d1e00969afdbb58782068e51e571b612b1d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-56.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-56.c
> @@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (pa[i] != (pb[i+1] * pc[i+1]))
> @@ -50,6 +51,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (pa[i] != (pb[i+1] * pc[i+1]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-58.c b/gcc/testsuite/gcc.dg/vect/vect-58.c
> index da4f9740e3358f67e9a05f82c87cf78bf3620e56..43a596f6e9522531c2c4d2138f80eae73da43038 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-58.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-58.c
> @@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (pa[i+1] != (pb[i+1] * pc[i+1]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c
> index c5de86b167a07ddf9043ae1ba77466ffd16765e6..a38373888907a7ed8f5ac610e030cd919315727d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-6-big-array.c
> @@ -39,6 +39,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        if (a[i] != results1[i] || e[i] != results2[i])
> @@ -52,6 +53,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <=N-4; i++)
>      {
>        if (a[i+3] != b[i-1])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-6.c b/gcc/testsuite/gcc.dg/vect/vect-6.c
> index c3e6336bb43c6ab30eb2c55049e0f1a9bd5788b6..eb006ad0735c70bd6a416d7575501a49febafd91 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-6.c
> @@ -24,6 +24,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      {
>        if (a[i] != results1[i] || e[i] != results2[i])
> @@ -37,6 +38,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <=N-4; i++)
>      {
>        if (a[i+3] != b[i-1])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-60.c b/gcc/testsuite/gcc.dg/vect/vect-60.c
> index 121c503c63afaf7cc5faa96bb537f4a184c82b00..2de6f0031aa6faf854a61bf60acf4e5a05a7d3d0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-60.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-60.c
> @@ -13,6 +13,7 @@ void bar (float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (pa[i] != (pb[i+1] * pc[i+1]))
> @@ -50,6 +51,7 @@ main1 (int n)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (pa[i] != (pb[i+1] * pc[i+1]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-62.c b/gcc/testsuite/gcc.dg/vect/vect-62.c
> index abd3d700668b019a075c52edfaff16061200305b..ea6ae91f56b9aea165a51c5fe6489729d5ba4e62 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-62.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-62.c
> @@ -25,6 +25,7 @@ int main1 ()
>    /* check results: */  
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ia[i][1][j+8] != ib[i])
> @@ -46,6 +47,7 @@ int main1 ()
>    /* check results: */
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ia[i][1][8] != ib[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-63.c b/gcc/testsuite/gcc.dg/vect/vect-63.c
> index 8d002a5e3c349bd4cbf9e37e8194e9a7450d0bde..20600728145325962598d6fbc17640296c5ca199 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-63.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-63.c
> @@ -25,6 +25,7 @@ int main1 ()
>    /* check results: */  
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ia[i + j][1][j] != ib[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-64.c b/gcc/testsuite/gcc.dg/vect/vect-64.c
> index 240b68f6d0d2d4bbef72b60aac2b26ba366514df..96773f6cab610ee565f33038515345ea799ba2c9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-64.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-64.c
> @@ -45,6 +45,7 @@ int main1 ()
>    /* check results: */  
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ia[i][1][j] != ib[i])
> @@ -55,6 +56,7 @@ int main1 ()
>    /* check results: */  
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ic[i][1][1][j] != ib[i])
> @@ -65,6 +67,7 @@ int main1 ()
>    /* check results: */  
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (id[i][1][j+1] != ib[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-65.c b/gcc/testsuite/gcc.dg/vect/vect-65.c
> index 9ac8ea4f013a5bea6dbfe8673056d35fc1c3fabb..af714d03ebb7f30ab56a93799c4c0d521b9cea93 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-65.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-65.c
> @@ -42,6 +42,7 @@ int main1 ()
>    /* check results: */  
>    for (i = 0; i < M; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ia[i][1][j] != ib[2][i][j])
> @@ -62,6 +63,7 @@ int main1 ()
>    /* check results: */
>    for (i = 0; i < M; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ic[j] != ib[2][i][j])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-66.c b/gcc/testsuite/gcc.dg/vect/vect-66.c
> index ccb66bc80017d3aa64698cba43f932a296a82e7d..cf16dd15ac2d1664d2edf9a676955c4479715fd2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-66.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-66.c
> @@ -23,6 +23,7 @@ void main1 ()
>    /* check results: */  
>    for (i = 0; i < 16; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ia[2][6][j] != 5)
> @@ -47,6 +48,7 @@ void main2 ()
>    /* check results: */  
>    for (i = 0; i < 16; i++)
>      {
> +#pragma GCC novector
>        for (j = 2; j < N+2; j++)
>          {
>             if (ia[3][6][j] != 5)
> @@ -73,6 +75,7 @@ void main3 ()
>    /* check results: */  
>    for (i = 0; i < 16; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ic[2][1][6][j+1] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-67.c b/gcc/testsuite/gcc.dg/vect/vect-67.c
> index 12183a233c273d8ae3932fa312e1734b48f8c7b0..f3322a32c1e34949a107772dc6a3f4a7064e7ce5 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-67.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-67.c
> @@ -31,6 +31,7 @@ int main1 (int a, int b)
>    /* check results: */  
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>          {
>             if (ia[i][1][j + NINTS] != (a == b))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-68.c b/gcc/testsuite/gcc.dg/vect/vect-68.c
> index 3012d88494d0494ec137ca89fef4e98e13ae108e..8cc2d84140967d2c54d3db2b408edf92c53340d6 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-68.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-68.c
> @@ -30,6 +30,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (tmp1.a.n[1][2][i] != 5)
> @@ -43,6 +44,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 3; i < N-1; i++)
>      {
>        if (tmp1.a.n[1][2][i] != 6)
> @@ -56,6 +58,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (tmp1.e.n[1][2][i] != 7)
> @@ -69,6 +72,7 @@ int main1 ()
>      }
>   
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 3; i <N-3; i++)
>      {
>        if (tmp1.e.n[1][2][i] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-7.c b/gcc/testsuite/gcc.dg/vect/vect-7.c
> index c4556e321c6b0d6bf1a2cd36136d71a43718af32..fb2737e92f5dc037c3253803134687081064ae0e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-7.c
> @@ -20,6 +20,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (sb[i] != 5)
> @@ -32,6 +33,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (sa[i] != 105)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-70.c b/gcc/testsuite/gcc.dg/vect/vect-70.c
> index 793dbfb748160ba709dd835dc253cb436f7aada1..cd432a6545a97d83ebac2323fe2b1a960df09c6e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-70.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-70.c
> @@ -52,6 +52,7 @@ int main1 ()
>  
>    /* check results:  */
>    for (i = 0; i < OUTERN; i++)
> +#pragma GCC novector
>      for (j = NINTS - 1; j < N - NINTS + 1; j++)
>      {
>        if (tmp1.e[i].n[1][2][j] != 8)
> @@ -67,6 +68,7 @@ int main1 ()
>    
>    /* check results:  */
>    for (i = 0; i < OUTERN; i++)
> +#pragma GCC novector
>      for (j = NINTS - 1; j < N - NINTS + 1; j++)
>      {
>        if (tmp1.e[j].n[1][2][j] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-71.c b/gcc/testsuite/gcc.dg/vect/vect-71.c
> index 581473fa4a1dcf1a7ee570336693ada765d429f3..46226c5f056bdceb902e73326a00959544892600 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-71.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-71.c
> @@ -20,6 +20,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 2; i < N+1; i++)
>      {
>        if (ia[ib[i]] != 0)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-72.c b/gcc/testsuite/gcc.dg/vect/vect-72.c
> index 9e8e91b7ae6a0bc61410ffcd3f0e5fdf4c3488f1..2ab51fdf307c0872248f2bb107c77d19e53894f4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-72.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-72.c
> @@ -33,6 +33,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <= N; i++)
>      {
>        if (ia[i-1] != ib[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c
> index 1c9d1fdaf9a2bb4eee4e9e766e531b72a3ecef2c..d81498ac0ce5926fb384c00aa5f66cc2a976cfdb 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-73-big-array.c
> @@ -28,6 +28,7 @@ int main1 ()
>      }
>  
>    /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>         if (ia[i] != ib[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-73.c b/gcc/testsuite/gcc.dg/vect/vect-73.c
> index fdb49b86362774b0fdf3e10e918b7d73f3383dd7..48e1e64558e53fe109b96bd56eb8af92268cd7ec 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-73.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-73.c
> @@ -22,6 +22,7 @@ int main1 ()
>      }
>  
>    /* check results: */  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>         if (ia[i] != ib[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c
> index ba1ae63bd57cd3347820d888045005a7d4d83f1a..27d708745d31bdb09f4f0d01d551088e02ba24b9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-74-big-array.c
> @@ -36,6 +36,7 @@ main1 (float *__restrict__  pa, float * __restrict__ pb, float * __restrict__ pc
>        pa[i] = q[i] * pc[i];
>      }
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != q[i] * pc[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-74.c b/gcc/testsuite/gcc.dg/vect/vect-74.c
> index a44f643ee96729fc0952a64e32a52275321557eb..c23c38a85063024b46c95c2e1c5158c81b6dcd65 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-74.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-74.c
> @@ -24,6 +24,7 @@ main1 (float *__restrict__  pa, float * __restrict__ pb, float * __restrict__ pc
>        pa[i] = q[i] * pc[i];
>      }
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != q[i] * pc[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c
> index a3fb5053037fcca89d7518c47eb2debfc136ba7f..10a3850d0da6d55a124fd6a7f4a2b7fd0efb3fae 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-75-big-array.c
> @@ -32,6 +32,7 @@ int main1 (int *ib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>       if (ia[i] != ib[i+OFF])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-75.c b/gcc/testsuite/gcc.dg/vect/vect-75.c
> index 88da97f0bb7cecee4ee93a9d3fa7f55f0ae9641c..ecf5174921cc779f92e12fc64c3014d1a4997783 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-75.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-75.c
> @@ -32,6 +32,7 @@ int main1 (int *ib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>       if (ia[i] != ib[i+OFF])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c
> index 5825cfc446468b16eff60fa2115bb1de4872654f..4f317f273c8737ab07e51699ed19e66d9eb8a51b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-76-big-array.c
> @@ -32,6 +32,7 @@ int main1 (int *pib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = OFF; i < N; i++)
>      {
>       if (ia[i] != pib[i - OFF])
> @@ -45,6 +46,7 @@ int main1 (int *pib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>       if (ia[i] != pib[i - OFF])
> @@ -58,6 +60,7 @@ int main1 (int *pib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = OFF; i < N; i++)
>      {
>       if (ia[i] != ic[i - OFF])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-76.c b/gcc/testsuite/gcc.dg/vect/vect-76.c
> index 3f4feeff8ac7882627c88490298c2f39b5172b7e..23210d4b775bfd4d436b2cdf2af2825cbf1924f0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-76.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-76.c
> @@ -26,6 +26,7 @@ int main1 (int *pib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = OFF; i < N; i++)
>      {
>       if (ia[i] != pib[i - OFF])
> @@ -39,6 +40,7 @@ int main1 (int *pib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>       if (ia[i] != pib[i - OFF])
> @@ -52,6 +54,7 @@ int main1 (int *pib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = OFF; i < N; i++)
>      {
>       if (ia[i] != ic[i - OFF])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c b/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c
> index fb3e49927826f77149d4813185a6a2cac00232d4..5fb833441d46ce2b6b0df2def5b3093290a2f7a4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c
> @@ -32,6 +32,7 @@ int main1 (int *ib, int off)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>       if (ia[i] != ib[i+off])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-77-global.c b/gcc/testsuite/gcc.dg/vect/vect-77-global.c
> index 1580d6e075b018696c56de4d680a0999a837bbca..b9622420c64b732047712ff343a3c0027e7bcf3a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-77-global.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-77-global.c
> @@ -28,6 +28,7 @@ int main1 (int *ib, int off)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>       if (ia[i] != ib[i+off])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-77.c b/gcc/testsuite/gcc.dg/vect/vect-77.c
> index d402e147043c0245f6523f6713dafc83e5357121..033d4ba79869c54f12fc3eea24a11ada871373ab 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-77.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-77.c
> @@ -25,6 +25,7 @@ int main1 (int *ib, int off)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>       if (ia[i] != ib[i+off])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c b/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c
> index 57e8da0a9090cae7d501ecb83220afff0bf553b2..f7563c4608546696e5c1174402b42bfc2fd3fa83 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c
> @@ -33,6 +33,7 @@ int main1 (int *ib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>       if (ia[i] != ib[i+off])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-78-global.c b/gcc/testsuite/gcc.dg/vect/vect-78-global.c
> index ea039b389b22fe16af9353bd5efa59a375a6a71c..11b7e0e9b63cd95bfff9f64f0cfca8b5e4137fe2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-78-global.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-78-global.c
> @@ -29,6 +29,7 @@ int main1 (int *ib)
>  
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>       if (ia[i] != ib[i+off])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-78.c b/gcc/testsuite/gcc.dg/vect/vect-78.c
> index faa7f2f4f768b0d7a191b8b67f5000f53c485142..b2bf78108dc9b2f8d43235b64a307addeb71e82a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-78.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-78.c
> @@ -25,6 +25,7 @@ int main1 (int *ib)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>       if (ia[i] != ib[i+off])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-8.c b/gcc/testsuite/gcc.dg/vect/vect-8.c
> index 44c5f53ebaf260c2087b298abf0428c8d21e8cfa..85bc347ff2f2803d8b830bc1a231e8dadfa525be 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-8.c
> @@ -19,6 +19,7 @@ int main1 (int n)
>    }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (a[i] != b[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c
> index 0baf4d2859b679f7b20d6b5fc939b71ec2533fb4..a43ec9ca9a635d055a6ef70dcdd919102ae3690d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-80-big-array.c
> @@ -35,6 +35,7 @@ main1 (float * __restrict__ pa, float * __restrict__ pb, float *__restrict__ pc)
>        pa[i] = q[i] * pc[i];
>      }
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != q[i] * pc[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-80.c b/gcc/testsuite/gcc.dg/vect/vect-80.c
> index 45aac84a578fa55624f1f305e9316bbc98e877bb..44299d3c7fed9ac9c213699f6982ba3858bbe0bb 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-80.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-80.c
> @@ -24,6 +24,7 @@ main1 (float * __restrict__ pa, float * __restrict__ pb, float *__restrict__ pc)
>        pa[i] = q[i] * pc[i];
>      }
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != q[i] * pc[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-82.c b/gcc/testsuite/gcc.dg/vect/vect-82.c
> index fcafb36c06388302775a68f0f056b925725e8aa8..2c1b567d10f2e7e519986c5b1d2e2c6b11353bc2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-82.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-82.c
> @@ -17,6 +17,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ca[i] != 0)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-82_64.c b/gcc/testsuite/gcc.dg/vect/vect-82_64.c
> index 358a85a838f7519a0c1e0b2bae037d6e8aafeea9..d0962e06c62a8888cb5cabb1c1e08438e3a16c8e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-82_64.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-82_64.c
> @@ -20,6 +20,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ca[i] != 0)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-83.c b/gcc/testsuite/gcc.dg/vect/vect-83.c
> index a300a0a08c462c043b2841961c58b8c8f2849cc5..4fd14cac2abd9581cd47d67e8194795b74c68402 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-83.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-83.c
> @@ -17,6 +17,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ca[i] != 2)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-83_64.c b/gcc/testsuite/gcc.dg/vect/vect-83_64.c
> index a5e897e093d955e0d1aff88021f99caf3a70d928..e3691011c7771328b9f83ea70aec20f373b10da4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-83_64.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-83_64.c
> @@ -20,6 +20,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ca[i] != 2)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c
> index ade04016cc3136470db804ea7a1bac3010d6da91..9d527b06c7476c4de7d1f5a8863088c189ce6142 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-85-big-array.c
> @@ -22,10 +22,12 @@ int main1 (int *a)
>      }
>  
>  
> +#pragma GCC novector
>    for (j = 0; j < N; j++)
>      if (a[j] != i + N - 1)
>        abort ();
>  
> +#pragma GCC novector
>    for (j = 0; j < N; j++)
>      if (b[j] != j + N)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-85.c b/gcc/testsuite/gcc.dg/vect/vect-85.c
> index a73bae1ad41a23ab583d7fd1f5cf8234d516d515..367cea72b142d3346acfb62cb16be58104de4f1c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-85.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-85.c
> @@ -22,10 +22,12 @@ int main1 (int *a)
>      }
>  
>  
> +#pragma GCC novector
>    for (j = 0; j < N; j++)
>      if (a[j] != i + N - 1)
>        abort();	
>  
> +#pragma GCC novector
>    for (j = 0; j < N; j++)
>      if (b[j] != j + N)
>        abort();	
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-86.c b/gcc/testsuite/gcc.dg/vect/vect-86.c
> index ff1d41df23f1e1eaab7f066726d5217b48fadb57..fea07f11d74c132fec987db7ac181927abc03564 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-86.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-86.c
> @@ -24,11 +24,12 @@ int main1 (int n)
>        b[i] = k;
>      }
>  
> -
> +#pragma GCC novector
>    for (j = 0; j < n; j++)
>      if (a[j] != i + n - 1)
>        abort();	
>  
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      if (b[i] != i + n)
>        abort();	
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-87.c b/gcc/testsuite/gcc.dg/vect/vect-87.c
> index 17b1dcdee99c819c8a65eadbf9159d9f78242f62..0eadc85eecdf4f8b5ab8e7a94782157534acf0a6 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-87.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-87.c
> @@ -23,10 +23,12 @@ int main1 (int n, int *a)
>      }
>  
>  
> +#pragma GCC novector
>    for (j = 0; j < n; j++)
>      if (a[j] != i + n - 1)
>        abort();	
>  
> +#pragma GCC novector
>    for (j = 0; j < n; j++)
>      if (b[j] != j + n)
>        abort();	
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-88.c b/gcc/testsuite/gcc.dg/vect/vect-88.c
> index b99cb4d89a4b8e94000dc6334514af042e1d2031..64341e66b1227ada7de8f26da353e6c6c440c9a9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-88.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-88.c
> @@ -23,10 +23,12 @@ int main1 (int n, int *a)
>      }
>  
>  
> +#pragma GCC novector
>    for (j = 0; j < n; j++)
>      if (a[j] != i + n - 1)
>        abort();	
>  
> +#pragma GCC novector
>    for (j = 0; j < n; j++)
>      if (b[j] != j + n)
>        abort();	
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c
> index 59e1aae0017d92c5b98858777e7e55bceb73a90a..64578b353fec58c4af632346a546ab655b615125 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-89-big-array.c
> @@ -28,6 +28,7 @@ int main1 ()
>      }
>  
>    /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (p->y[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-89.c b/gcc/testsuite/gcc.dg/vect/vect-89.c
> index 356ab96d330046c553364a585e770653609e5cfe..6e7c875c01e2313ba362506542f6018534bfb443 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-89.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-89.c
> @@ -32,6 +32,7 @@ int main1 ()
>      }
>  
>    /* check results: */  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (p->y[i] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-9.c b/gcc/testsuite/gcc.dg/vect/vect-9.c
> index 87600fb5df0d104daf4438e6a7a020e08c277502..dcecef729a60bf22741407e3470e238840ef6def 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-9.c
> @@ -20,6 +20,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != (int) sb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-92.c b/gcc/testsuite/gcc.dg/vect/vect-92.c
> index 9ceb0fbadcd61ec9a5c3682cf3582abf464ce106..86864126951ccd8392cc7f7e87642be23084d5ea 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-92.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-92.c
> @@ -36,6 +36,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < 10; i++)
>      {
>        if (pa[i+1] != (pb[i+1] * pc[i+1]))
> @@ -56,6 +57,7 @@ main2 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < 12; i++)
>      {
>        if (pa[i+1] != (pb[i+1] * pc[i+1]))
> @@ -76,6 +78,7 @@ main3 (int n)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (pa[i+1] != (pb[i+1] * pc[i+1]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-93.c b/gcc/testsuite/gcc.dg/vect/vect-93.c
> index c3e12783b2c47a4e296fd47cc9dc8e73b7ccebb0..b4ccbeedd08fe1285dc362b28cb6d975c6313137 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-93.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-93.c
> @@ -23,6 +23,7 @@ main1 (float *pa)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N1; i++)
>      {
>        if (pa[i] != 2.0)
> @@ -36,6 +37,7 @@ main1 (float *pa)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <= N2; i++)
>      {
>        if (pa[i] != 3.0)
> @@ -60,6 +62,7 @@ int main (void)
>    for (i = 1; i <= 256; i++) a[i] = b[i-1];
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <= 256; i++)
>      {
>        if (a[i] != i-1)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-95.c b/gcc/testsuite/gcc.dg/vect/vect-95.c
> index 1e8bc1e7240ded152ea81f60addab9f7179d3bfc..cfca253e810ff1caf2ef2eef0d7bafc39896ea3e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-95.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-95.c
> @@ -11,6 +11,7 @@ void bar (float *pd, float *pa, float *pb, float *pc)
>    int i;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (pa[i] != (pb[i] * pc[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-96.c b/gcc/testsuite/gcc.dg/vect/vect-96.c
> index c0d6c37b21db23b175de895a582f48b302255e9f..e36196b50d7527f88a88b4f12bebbe780fe23f08 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-96.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-96.c
> @@ -28,7 +28,8 @@ int main1 (int off)
>    for (i = 0; i < N; i++)
>        pp->ia[i] = ib[i];
>  
> -  /* check results: */  
> +  /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>         if (pp->ia[i] != ib[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c
> index 977a9d57ed4795718722c83344c2efd761e6783e..e015c1684ad856a4732084fbe49783aaeac31e58 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-97-big-array.c
> @@ -32,6 +32,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.p[i] != cb[i])
> @@ -48,6 +49,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.p[i] != s.q[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-97.c b/gcc/testsuite/gcc.dg/vect/vect-97.c
> index 734ba3b6ca36cf56d810a1ce4329f9cb1862dede..e5af7462ef89e7f47b2ca822f563401b7bd95e2c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-97.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-97.c
> @@ -27,6 +27,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.p[i] != cb[i])
> @@ -43,6 +44,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (s.p[i] != s.q[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c
> index 61b749d4669386a890f5c2f5ba83d6e00d269b4f..2d4435d22e476de5b40c6245f26209bff824139c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-98-big-array.c
> @@ -22,6 +22,7 @@ int main1 (int ia[][N])
>      }
>  
>    /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>         if (ic[0][i] != DOT16 (ia[i], ib))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-98.c b/gcc/testsuite/gcc.dg/vect/vect-98.c
> index 2055cce70b20b96dd69d06775e3d6deb9f27e3b2..72a1f37290358b6a89db6c89aada2c1650d2e7a5 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-98.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-98.c
> @@ -19,7 +19,8 @@ int main1 (int ia[][N])
>  	ic[0][i] = DOT4 (ia[i], ib);
>      }
>  
> -  /* check results: */  
> +  /* check results: */
> +#pragma GCC novector
>    for (i = 0; i < M; i++)
>      {
>         if (ic[0][i] != DOT4 (ia[i], ib))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-99.c b/gcc/testsuite/gcc.dg/vect/vect-99.c
> index ae23b3afbd1d42221f6fe876f23ee7b9beaebca3..0ef9051d907209e025a8fee057d04266ee2fcb03 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-99.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-99.c
> @@ -21,6 +21,7 @@ int main (void)
>  
>    foo(100);
>  
> +#pragma GCC novector
>    for (i = 0; i < 100; ++i) {
>      if (ca[i] != 2)
>        abort();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
> index b6cc309dbe87b088c9969e07dea03c7f6b5993dd..8fd3bf407e9db3d188b897112ab1e41b381ae3c5 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
> @@ -45,6 +45,7 @@ typedef unsigned long long ull;
>    }
>  
>  #define DO_TEST(TYPE)					\
> +  _Pragma("GCC novector")				\
>    for (int j = -M; j <= M; ++j)				\
>      {							\
>        TYPE a[N * M], b[N * M];				\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
> index 09a4ebfa69e867869adca3bb5daece02fcee93da..5ecdc3250708e99c30e790da84b002b99a8d7e9b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
> @@ -51,6 +51,7 @@ typedef unsigned long long ull;
>    }
>  
>  #define DO_TEST(TYPE)					\
> +  _Pragma("GCC novector")				\
>    for (int j = -M; j <= M; ++j)				\
>      {							\
>        TYPE a1[N * M], a2[N * M], b1[N], b2[N];		\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
> index 63a897f4bad4894a6ec4b2ff8749eed3f9e33782..23690c45b65a1b95bf88d50f80d021d5c481d5f1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
> @@ -52,6 +52,7 @@ typedef unsigned long long ull;
>    }
>  
>  #define DO_TEST(TYPE)					\
> +  _Pragma("GCC novector")				\
>    for (int j = 0; j <= M; ++j)				\
>      {							\
>        TYPE a1[N * M], a2[N * M], b1[N], b2[N];		\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
> index 29bc571642db8858d3e4ca1027131a1a6559c4c1..b36ad116762e2e3c90ccd79fc4f8564cc57fc3f1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
> @@ -39,6 +39,7 @@ typedef unsigned long long ull;
>        for (int i = 0; i < N + M; ++i)				\
>  	a[i] = TEST_VALUE (i);					\
>        test_##TYPE (a + j, a);					\
> +      _Pragma("GCC novector")					\
>        for (int i = 0; i < N; i += 2)				\
>  	{							\
>  	  TYPE base1 = j == 0 ? TEST_VALUE (i) : a[i];		\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
> index ad74496a6913dcf57ee4573ef1589263a32b074c..f7545e79d935f1d05641415246aabc2dbe9b7d27 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
> @@ -33,6 +33,7 @@ typedef unsigned long long ull;
>      {								\
>        TYPE a[N + DIST * 2] = {};				\
>        test_##TYPE (a + DIST, a + i);				\
> +      _Pragma("GCC novector")					\
>        for (int j = 0; j < N + DIST * 2; ++j)			\
>  	{							\
>  	  TYPE expected = 0;					\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
> index 8a9a6fffde1d39f138c5f54221854e73cef89079..d90adc70e28420e5e8fd0e36c15316da12224b38 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
> @@ -33,12 +33,14 @@ typedef unsigned long long ull;
>    }
>  
>  #define DO_TEST(TYPE)						\
> +  _Pragma("GCC novector")					\
>    for (int i = 0; i < DIST * 2; ++i)				\
>      {								\
>        TYPE a[N + DIST * 2];					\
>        for (int j = 0; j < N + DIST * 2; ++j)			\
>  	a[j] = TEST_VALUE (j);					\
>        TYPE res = test_##TYPE (a + DIST, a + i);			\
> +      _Pragma("GCC novector")					\
>        for (int j = 0; j < N; ++j)				\
>  	if (a[j + DIST] != (TYPE) j)				\
>  	  __builtin_abort ();					\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
> index b9f5d2bbc9f6437e3e8058264cc0c9aaa522b3e2..3b576a4dc432725c67b4e7f31d2bc5937bc34b7a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
> @@ -34,6 +34,7 @@ typedef unsigned long long ull;
>        for (int j = 0; j < N + DIST * 2; ++j)			\
>  	a_##TYPE[j] = TEST_VALUE (j);				\
>        test_##TYPE (i + N - 1, DIST + N - 1);			\
> +      _Pragma("GCC novector")					\
>        for (int j = 0; j < N + DIST * 2; ++j)			\
>  	{							\
>  	  TYPE expected;					\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
> index 7c0ff36a8c43f11197de413cb682bcd0a3afcae8..36771b04ed5cc0d6c14c0fe1a0e9fd49db4265c4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
> @@ -34,6 +34,7 @@ typedef unsigned long long ull;
>      {								\
>        __builtin_memset (a_##TYPE, 0, sizeof (a_##TYPE));	\
>        test_##TYPE (DIST, i);					\
> +      _Pragma("GCC novector")					\
>        for (int j = 0; j < N + DIST * 2; ++j)			\
>  	{							\
>  	  TYPE expected = 0;					\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c
> index 8a699ebfda8bfffdafc5e5f09d137bb0c7e78beb..9658f8ce38e8efb8d19806a4078e1dc4fe57d2ef 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c
> @@ -34,11 +34,13 @@ typedef unsigned long long ull;
>    }
>  
>  #define DO_TEST(TYPE)						\
> +  _Pragma("GCC novector")					\
>    for (int i = 0; i < DIST * 2; ++i)				\
>      {								\
>        for (int j = 0; j < N + DIST * 2; ++j)			\
>  	a_##TYPE[j] = TEST_VALUE (j);				\
>        TYPE res = test_##TYPE (DIST, i);				\
> +      _Pragma("GCC novector")					\
>        for (int j = 0; j < N; ++j)				\
>  	if (a_##TYPE[j + DIST] != (TYPE) j)			\
>  	  __builtin_abort ();					\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
> index 7e5df1389991da8115df2c6784b52ff3e15f8124..3bc78bed676d8267f7512b71849a7d33cb4ab05b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
> @@ -29,6 +29,7 @@ typedef unsigned long long ull;
>    }
>  
>  #define DO_TEST(TYPE)						\
> +  _Pragma("GCC novector")					\
>    for (int i = 0; i < DIST * 2; ++i)				\
>      {								\
>        for (int j = 0; j < N + DIST * 2; ++j)			\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c b/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
> index a7fc1fcebbb2679fbe6a98c6fa340edcde492ba9..c11c1d13e0ba253b00afb02306aeec786cee1161 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
> @@ -37,6 +37,7 @@ typedef unsigned long long ull;
>        for (int i = 0; i < N + M; ++i)			\
>  	a[i] = TEST_VALUE (i);				\
>        test_##TYPE (a + j, a);				\
> +      _Pragma("GCC novector")				\
>        for (int i = 0; i < N; i += 2)			\
>  	if (a[i + j] != (TYPE) (a[i] + 2)		\
>  	    || a[i + j + 1] != (TYPE) (a[i + 1] + 3))	\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-align-1.c b/gcc/testsuite/gcc.dg/vect/vect-align-1.c
> index d56898c4d23406b4c8cc53fa1409974b6ab05485..9630fc0738cdf4aa5db67effdd5eb47de4459f6f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-align-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-align-1.c
> @@ -28,6 +28,7 @@ main1 (struct foo * __restrict__ p)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (p->y[i] != x[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-align-2.c b/gcc/testsuite/gcc.dg/vect/vect-align-2.c
> index 39708648703357e9360e0b63ca7070c4c21def03..98759c155d683475545dc20cae23d54c19bd8aed 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-align-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-align-2.c
> @@ -26,6 +26,7 @@ void fbar(struct foo *fp)
>          f2.y[i][j] = z[i];
>  
>     for (i=0; i<N; i++)
> +#pragma GCC novector
>        for (j=0; j<N; j++)
>  	if (f2.y[i][j] != z[i])
>  	  abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c
> index 6eb9533a8bb17acf7f9e29bfaa7f7a7aca2dc221..3f3137bd12e1462e44889c7e096096beca4d5b40 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-all-big-array.c
> @@ -18,6 +18,7 @@ __attribute__ ((noinline))
>  void icheck_results (int *a, int *results)
>  {
>    int i;
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != results[i])
> @@ -29,6 +30,7 @@ __attribute__ ((noinline))
>  void fcheck_results (float *a, float *results)
>  {
>    int i;
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != results[i])
> @@ -108,6 +110,7 @@ main1 ()
>        ca[i] = cb[i];
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ca[i] != cb[i])
> @@ -163,6 +166,7 @@ main1 ()
>        a[i+3] = b[i-1];
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <=N-4; i++)
>      {
>        if (a[i+3] != b[i-1])
> @@ -180,6 +184,7 @@ main1 ()
>        j++;
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (a[i] != c[i])
> @@ -193,6 +198,7 @@ main1 ()
>        a[N-i] = d[N-i];
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (a[i] != d[i])
> @@ -206,6 +212,7 @@ main1 ()
>        a[i] = 5.0;
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != 5.0)
> @@ -217,6 +224,7 @@ main1 ()
>        sa[i] = 5;
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (sa[i] != 5)
> @@ -228,6 +236,7 @@ main1 ()
>        ia[i] = ib[i] + 5;
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != ib[i] + 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-all.c b/gcc/testsuite/gcc.dg/vect/vect-all.c
> index cc41e2dd3d313a0557dea16204564a5a0c694950..6fd579fa6ad24623f387d9ebf5c863ca6e91dfe6 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-all.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-all.c
> @@ -18,6 +18,7 @@ __attribute__ ((noinline))
>  void icheck_results (int *a, int *results)
>  {
>    int i;
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != results[i])
> @@ -29,6 +30,7 @@ __attribute__ ((noinline))
>  void fcheck_results (float *a, float *results)
>  {
>    int i;
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != results[i])
> @@ -91,6 +93,7 @@ main1 ()
>        ca[i] = cb[i];
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ca[i] != cb[i])
> @@ -134,6 +137,7 @@ main1 ()
>        a[i+3] = b[i-1];
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <=N-4; i++)
>      {
>        if (a[i+3] != b[i-1])
> @@ -151,6 +155,7 @@ main1 ()
>        j++;
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (a[i] != c[i])
> @@ -164,6 +169,7 @@ main1 ()
>        a[N-i] = d[N-i];
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i <N; i++)
>      {
>        if (a[i] != d[i])
> @@ -177,6 +183,7 @@ main1 ()
>        a[i] = 5.0;
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != 5.0)
> @@ -188,6 +195,7 @@ main1 ()
>        sa[i] = 5;
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (sa[i] != 5)
> @@ -199,6 +207,7 @@ main1 ()
>        ia[i] = ib[i] + 5;
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != ib[i] + 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-1.c b/gcc/testsuite/gcc.dg/vect/vect-avg-1.c
> index a7bc7cc90963c8aa8e14d0960d57dc724486247f..4a752cd7d573cd53ea1a59dba0180d017a7f73a5 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-avg-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-avg-1.c
> @@ -35,6 +35,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != ((BASE1 + BASE2 + i * 9 + BIAS) >> 1))
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-11.c b/gcc/testsuite/gcc.dg/vect/vect-avg-11.c
> index 85292f1b82416b70698619e284ae76f3a3d9410d..0046f8ceb4e7b2688059073645175b8845246346 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-avg-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-avg-11.c
> @@ -43,6 +43,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != (((((BASE1 + i * 5) ^ 0x55)
>  		   + (BASE2 + i * 4)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-15.c b/gcc/testsuite/gcc.dg/vect/vect-avg-15.c
> index 48d7ed773000486c42277535cebe34f101e035ef..57b6670cb98cdf92e60dd6c7154b4a8012b05a1e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-avg-15.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-avg-15.c
> @@ -37,6 +37,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c, N / 20, 20);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      {
>        int d = (BASE1 + BASE2 + i * 5) >> 1;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-16.c b/gcc/testsuite/gcc.dg/vect/vect-avg-16.c
> index f3e3839a879b6646aba6237e55e2dcd943eac168..319edba1fa3c04b6b74b343cf5397277a36dd6d1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-avg-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-avg-16.c
> @@ -37,6 +37,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c, N / 20);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      {
>        int d = (BASE1 + BASE2 + i * 5) >> 1;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-avg-5.c b/gcc/testsuite/gcc.dg/vect/vect-avg-5.c
> index 6c43575f448325e84975999c2e8aa91afb525f87..6bdaeff0d5ab4c55bb5cba1df51a85c4525be6fb 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-avg-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-avg-5.c
> @@ -39,6 +39,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != ((BASE1 + BASE2 + i * 9 + BIAS) >> 1))
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
> index 19683d277b1ade1034496136f1d03bb2b446900f..22e6235301417d72e1f85ecbdd96d8e498500991 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-1.c
> @@ -19,6 +19,7 @@ f(struct s *ptr, unsigned n) {
>  
>  void __attribute__ ((noipa))
>  check_f(struct s *ptr) {
> +#pragma GCC novector
>      for (unsigned i = 0; i < N; ++i)
>        if (ptr[i].i != V)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
> index 1a101357ccc9e1b8bb157793eb3f709e99330bf6..0c8291c9363d0de4c09f81525015b7b88004bc94 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-2.c
> @@ -23,6 +23,7 @@ f(struct s *ptr, unsigned n) {
>  
>  void __attribute__ ((noipa))
>  check_f(struct s *ptr) {
> +#pragma GCC novector
>      for (unsigned i = 0; i < N; ++i)
>        if (ptr[i].a != V)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
> index 5dc679627d52e2ad229d0920e5ad8087a71281fe..46fcb02b2f1b6bb2689a6b709901584605cc9a45 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-3.c
> @@ -24,6 +24,7 @@ f(struct s *ptr, unsigned n) {
>  
>  void __attribute__ ((noipa))
>  check_f(struct s *ptr) {
> +#pragma GCC novector
>      for (unsigned i = 0; i < N; ++i)
>        if (ptr[i].a != V)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
> index fae6ea3557dcaba7b330ebdaa471281d33d2ba15..5a7227a93e4665cd10ee564c8b15165dc6cef303 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-4.c
> @@ -22,6 +22,7 @@ f(struct s *ptr, unsigned n) {
>  
>  void __attribute__ ((noipa))
>  check_f(struct s *ptr) {
> +#pragma GCC novector
>      for (unsigned i = 0; i < N; ++i)
>        if (ptr[i].a != V)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
> index 99360c2967b076212c67eb4f34b8fd91711d8821..e0b36e411a4a72335d4043f0f360c2e88b667397 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-write-5.c
> @@ -22,6 +22,7 @@ f(struct s *ptr, unsigned n) {
>  
>  void __attribute__ ((noipa))
>  check_f(struct s *ptr) {
> +#pragma GCC novector
>      for (unsigned i = 0; i < N; ++i)
>        if (ptr[i].a != V)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c b/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c
> index c97da5289141d35a9f7ca220ae62aa82338fa7f5..a1be71167025c960fc2304878c1ed15d90484dfb 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bool-cmp.c
> @@ -183,6 +183,7 @@ check (int *p, cmp_fn fn)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < 32; i++)
>      {
>        int t1 = ((i % 4) > 1) == 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bswap16.c b/gcc/testsuite/gcc.dg/vect/vect-bswap16.c
> index d29b352b832a67e89e7cb3856634390244369daa..7d2cb297738378863ddf78b916036b0998d28e6f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bswap16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bswap16.c
> @@ -30,6 +30,7 @@ main (void)
>  
>    vfoo16 (arr);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; ++i)
>      {
>        if (arr[i] != expect[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bswap32.c b/gcc/testsuite/gcc.dg/vect/vect-bswap32.c
> index 88d88b5f034153cb736391e4fc46a9b786ec28c5..1139754bbf1b8f7ef7a5a86f5621c9fe319dec08 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bswap32.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bswap32.c
> @@ -30,6 +30,7 @@ main (void)
>  
>    vfoo32 (arr);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; ++i)
>      {
>        if (arr[i] != expect[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bswap64.c b/gcc/testsuite/gcc.dg/vect/vect-bswap64.c
> index fd15d713c5d63db335e61c892c670b06ee9da25f..38d598eba33019bfb7c50dc2f0d5b7fec3a4736c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bswap64.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bswap64.c
> @@ -30,6 +30,7 @@ main (void)
>  
>    vfoo64 (arr);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; ++i)
>      {
>        if (arr[i] != expect[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-1.c b/gcc/testsuite/gcc.dg/vect/vect-complex-1.c
> index 2a87e2feadeba7f1eaef3cce72e27a7d0ffafb5f..b3a02fe9c6d840e79764cb6469a86cfce315a337 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-complex-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-complex-1.c
> @@ -43,6 +43,7 @@ main (void)
>    foo ();
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (c[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-2.c b/gcc/testsuite/gcc.dg/vect/vect-complex-2.c
> index 19b24e1eb87feacc8f7b90fb067124007e22c90f..7bbfdd95b5c46f83f24263e33bf5e3d2ecee0a4d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-complex-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-complex-2.c
> @@ -43,6 +43,7 @@ main (void)
>    foo ();
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (c[i] != res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-4.c b/gcc/testsuite/gcc.dg/vect/vect-complex-4.c
> index 49cfdbe1738794c3bf873c330fff4d7f4626e10b..d5e50cc15df66501fe1aa1618f04ff293908469a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-complex-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-complex-4.c
> @@ -92,6 +92,7 @@ main (void)
>    foo ();
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (c[i].f1 != res[i].f1)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-1.c b/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
> index 261d828dbb2855fe680b396d3fcbf094e814b6fd..e438cbb67e196a5b3e5e2e2769efc791b0c2d6b7 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
> @@ -43,6 +43,7 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (j = 0; j < M; j++)
>      if (x_out[j] != check_result[j])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-10.c b/gcc/testsuite/gcc.dg/vect/vect-cond-10.c
> index b2f97d735ef7d94a80a67265b4535a1e228e20ca..dbbe4877db41c43d5be5e3f35cb275b96322c9bc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-10.c
> @@ -120,41 +120,49 @@ main ()
>  	}
>      }
>    f1 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (k, 0, sizeof (k));
>    f2 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (k, 0, sizeof (k));
>    f3 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (k, 0, sizeof (k));
>    f4 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (k, 0, sizeof (k));
>    f5 (k);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (k, 0, sizeof (k));
>    f6 (k);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (k, 0, sizeof (k));
>    f7 (k);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (k, 0, sizeof (k));
>    f8 (k);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-11.c b/gcc/testsuite/gcc.dg/vect/vect-cond-11.c
> index f28af658f331849a0c5103ba96dd2e3b60de428d..38f1f8f50901c3039d0e7cb17d1bd47b18b89c71 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-11.c
> @@ -79,13 +79,16 @@ baz (unsigned int *a, unsigned int *b,
>      }
>    if (fn (a, b) != -512U - (N - 32) * 16U + 32 * 127U)
>      __builtin_abort ();
> +#pragma GCC novector
>    for (i = -64; i < 0; i++)
>      if (a[i] != 19 || b[i] != 17)
>        __builtin_abort ();
> +#pragma GCC novector
>    for (; i < N; i++)
>      if (a[i] != (i - 512U < 32U ? i - 512U + 127 : i - 512U - 16)
>  	|| b[i] != (i - 512U < 32U ? i * 2U : i + 1U))
>        __builtin_abort ();
> +#pragma GCC novector
>    for (; i < N + 64; i++)
>      if (a[i] != 27 || b[i] != 19)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-3.c b/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
> index 8a66b4b52ed8a98dd52ef945afb3822de8fe37e9..1521fedd1b5b9d6f3021a1e5653f9ed8df0610b2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
> @@ -50,6 +50,7 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (j = 0; j < M; j++)
>      if (x_out_a[j] != check_result_a[j]
>          || x_out_b[j] != check_result_b[j])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
> index 2a6577c6db33a49c7fac809f67b7e957c0b707c2..4057d14c702c22ef41f504a8d3714a871866f04f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
> @@ -47,6 +47,7 @@ int main (void)
>  
>    foo (125);
>  
> +#pragma GCC novector
>    for (j = 0; j < M; j++)
>      if (x_out_a[j] != 125
>          || x_out_b[j] != 5)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-5.c b/gcc/testsuite/gcc.dg/vect/vect-cond-5.c
> index 41e57f9235b90347e7842d88c9710ee682ea4bd4..f10feab71df6daa76966f8d6bc3a4deba8a7b56a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-5.c
> @@ -46,6 +46,7 @@ int main ()
>  
>    foo(5);
>  
> +#pragma GCC novector
>    for (k = 0; k < K; k++)
>      if (out[k] != check_result[k])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-6.c b/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
> index 65fdc4a9ef195f7210b08289242e74cda1db4831..a46479a07eb105f5b2635f3d5848e882efd8aabf 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
> @@ -47,6 +47,7 @@ int main ()
>  
>    foo(125);
>  
> +#pragma GCC novector
>    for (k = 0; k < K; k++) 
>      if (out[k] != 33)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-7.c b/gcc/testsuite/gcc.dg/vect/vect-cond-7.c
> index bd2947516584bf0039d91589422acefd0d27cc35..ea11693ff21798e9e792cfc43aca3c59853e84a0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-7.c
> @@ -53,6 +53,7 @@ main ()
>  #undef F
>  #define F(var) f##var ();
>    TESTS
> +#pragma GCC novector
>    for (i = 0; i < 64; i++)
>      {
>        asm volatile ("" : : : "memory");
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-8.c b/gcc/testsuite/gcc.dg/vect/vect-cond-8.c
> index d888442aa456e7520cf57e4a07c0938849758068..88289018b9be7d20edd9c7d898bb51d947ed7806 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-8.c
> @@ -79,18 +79,22 @@ main ()
>        e[i] = 2 * i;
>      }
>    f1 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 ? 17 : 0))
>        abort ();
>    f2 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 ? 0 : 24))
>        abort ();
>    f3 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 ? 51 : 12))
>        abort ();
>    f4 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (f[i] != ((i % 3) == 0 ? d[i] : e[i]))
>        abort ();
> @@ -112,6 +116,7 @@ main ()
>        b[i] = i / 2;
>      }
>    f5 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (c[i] != ((i % 3) == 0 ? a[i] : b[i]))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-9.c b/gcc/testsuite/gcc.dg/vect/vect-cond-9.c
> index 63eee1b47296d8c422b4ff899e5840ca4d4f59f5..87febca10e7049cb0f4547a13d27f533011d44bc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-9.c
> @@ -145,51 +145,61 @@ main ()
>  	}
>      }
>    f1 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (j[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (j, -6, sizeof (j));
>    f2 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (j[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (j, -6, sizeof (j));
>    f3 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (j[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (j, -6, sizeof (j));
>    f4 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (k, -6, sizeof (k));
>    f5 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (k, -6, sizeof (k));
>    f6 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (j[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (j, -6, sizeof (j));
>    f7 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (j[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (j, -6, sizeof (j));
>    f8 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (j[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (j, -6, sizeof (j));
>    f9 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
>        abort ();
>    __builtin_memset (k, -6, sizeof (k));
>    f10 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (k[i] != ((i % 3) == 0 || ((i / 9) % 3) == 0))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c
> index d52e81e9109cc4d81de84adf370b2322799c8c27..5138712731f245eb1f17ef2e9e02e333c8e214de 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-1.c
> @@ -23,6 +23,7 @@
>  #define TEST(OP)					\
>    {							\
>      f_##OP (a, b, 10);					\
> +    _Pragma("GCC novector")				\
>      for (int i = 0; i < N; ++i)				\
>        {							\
>  	int bval = (i % 17) * 10;			\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c
> index f02b0dc5d3a11e3cfa8a23536f570ecb04a039fd..11a680061c21fb7da69739892b79ff37d1599027 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c
> @@ -24,6 +24,7 @@
>  #define TEST(INV)					\
>    {							\
>      f_##INV (a, b, c, d);				\
> +    _Pragma("GCC novector")				\
>      for (int i = 0; i < N; ++i)				\
>        {							\
>  	double mb = (INV & 1 ? -b[i] : b[i]);		\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
> index 55a174a7ec1fa42c40d4359e882ca475a4feaca3..1af0fe642a0f6a186a225e7619bff130bd09246f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-4.c
> @@ -20,6 +20,7 @@
>  #define TEST(OP)					\
>    {							\
>      f_##OP (a, b, 10);					\
> +    _Pragma("GCC novector")				\
>      for (int i = 0; i < N; ++i)				\
>        {							\
>  	int bval = (i % 17) * 10;			\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
> index d2eadc4e9454eba204b94532ee3b002692969ddb..ec3d9db42021c0f1273bf5fa37bd24fa77c1f183 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-5.c
> @@ -21,6 +21,7 @@
>  #define TEST(OP)					\
>    {							\
>      f_##OP (a, b, 10);					\
> +    _Pragma("GCC novector")				\
>      for (int i = 0; i < N; ++i)				\
>        {							\
>  	int bval = (i % 17) * 10;			\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
> index cc70b8a54c44fbc1d20aa9c2599b9a37d9fc135b..2aeebd44f835ee99f110629ded9572b338d6fb50 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
> @@ -23,6 +23,7 @@
>  #define TEST(OP)						\
>    {								\
>      f_##OP (a, b, 10);						\
> +    _Pragma("GCC novector")					\
>      for (int i = 0; i < N; ++i)					\
>        {								\
>  	int bval = (i % 17) * 10;				\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c
> index 739b98f59aece34b73ed4762c2eeda2512834539..9d20f977884213a6b4580b90e1a187161cf5c945 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-arith-7.c
> @@ -22,6 +22,7 @@
>  #define TEST(INV)					\
>    {							\
>      f_##INV (a, b, c, d);				\
> +    _Pragma("GCC novector")				\
>      for (int i = 0; i < N; ++i)				\
>        {							\
>  	double mb = (INV & 1 ? -b[i] : b[i]);		\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c b/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
> index e6ad865303c42c9d5958cb6e7eac6a766752902b..faeccca865f63bc55ee1a8b412a5e738115811e9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
> @@ -73,6 +73,7 @@ main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (out[i].a != result[2*i] || out[i].b != result[2*i+1])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c b/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c
> index 95efe7ad62eac1f66b85ffdc359fd60bd7465cfd..f3b7db076e6b223fcf8b341f41be636e10cc952a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cselim-2.c
> @@ -55,6 +55,7 @@ main (void)
>  
>    foo (a, b);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != result[2*i] || b[i] != result[2*i+1])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
> index c81f8946922250234bf759e0a0a04ea8c1f73e3c..f02f98faf2fad408f7d7e65a09c678f242aa32eb 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
> @@ -16,6 +16,7 @@ int
>  main (void)
>  {
>    V v = foo ((V) { 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff }, 0xffff);
> +#pragma GCC novector
>    for (unsigned i = 0; i < sizeof (v) / sizeof (v[0]); i++)
>      if (v[i] != 0x00010001)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
> index b4eb1a4dacba481e6306b49914d2a29b933de625..80293e50bbc6bbae90cac0fcf436c790b3215c0e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
> @@ -44,6 +44,7 @@ int main ()
>    fun1 (a, N / 2, N);
>    fun2 (b, N / 2, N);
>  
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      {
>        if (DEBUG)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
> index 29a16739aa4b706616367bfd1832f28ebd07993e..bfdc730fe5f7b38117854cffbf2e450dad7c3b5a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
> +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h
> @@ -30,6 +30,7 @@ int main ()
>    fun1 (a, N / 2, N);
>    fun2 (b, N / 2, N);
>  
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      {
>        if (DEBUG)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c
> index 6abf76392c8df94765c63c248fbd7045dc24aab1..6456b3aad8666888fe15061b2be98047c28ffed2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-1.c
> @@ -43,6 +43,7 @@ int main ()
>  
>    foo();
>  
> +#pragma GCC novector
>    for (k = 0; k < K; k++)
>      if (out[k] != check_result[k])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c
> index 4bfd1630c4e9927d89bf23ddc90716e0cc249813..d5613e55eb20731070eabeee8fe49c9e61d8be50 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-2.c
> @@ -43,6 +43,7 @@ int main ()
>  
>    foo();
>  
> +#pragma GCC novector
>    for (k = 0; k < K; k++)
>      if (out[k] != check_result[k])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c
> index 3bdf9efe9472342359b64d51ef308a4d4f8f9a79..239ddb0b444163803c310e4e9910cfe4e4c44be7 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-3.c
> @@ -48,12 +48,14 @@ int main ()
>  
>    foo(0, 0);
>  
> +#pragma GCC novector
>    for (k = 0; k < K; k++)
>      if (out_max[k] != check_max[k] || out_min[k] != 0)
>        abort ();
>  
>    foo(100, 45);
>  
> +#pragma GCC novector
>    for (k = 0; k < K; k++)
>      if (out_min[k] != check_min[k] || out_max[k] != 100)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c
> index e5937705400c7c015513abc513a8629c6d66d140..5344c80741091e4e69b41ce056b9541b75215df2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-4.c
> @@ -43,6 +43,7 @@ int main ()
>  
>    foo();
>  
> +#pragma GCC novector
>    for (k = 0; k < K; k++)
>      if (out[k] != check_result[k])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> index 079704cee81cc17b882b476c42cbeee0280369cf..7465eae1c4762d39c14048077cd4786ffb8e4848 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-5.c
> @@ -43,6 +43,7 @@ int main ()
>  
>    foo();
>  
> +#pragma GCC novector
>    for (k = 0; k < K; k++)
>      if (out[k] != check_result[k])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c
> index 1d9dcdab5e9c09514a8427cd65c419e74962c9de..a032e33993970e65e9e8a90cca4d23a9ff97f1e8 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6-big-array.c
> @@ -49,6 +49,7 @@ int main ()
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (k = 0; k < K; k++)
>      if (out[k] != check_result[k])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c
> index 85aec1bf609582988f06826afb6b7ce77d6d83de..d1d1faf7c3add6ce2c3378d4d094bf0fc2aba046 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-6.c
> @@ -38,6 +38,7 @@ int main ()
>  
>    foo();
>  
> +#pragma GCC novector
>    for (k = 0; k < K; k++)
>      if (out[k] != check_result[k])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c
> index c3145a2ad029f92e96995f59e9be9823e016ec11..1ef7a2d19c8b6ee96280aee0e9d69b441b597a89 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-double-reduc-7.c
> @@ -52,6 +52,7 @@ int main ()
>  
>    foo();
>  
> +#pragma GCC novector
>    for (k = 0; k < K; k++)
>      if (out[k] != check_result[k])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c b/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c
> index 76b436948c185ca73e21203ef68b0a9d4da03408..603f48167d10fe41143f329cd50ca7f6c8e9a154 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-float-extend-1.c
> @@ -21,6 +21,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (da[i] != (double) fb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c b/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c
> index 8b82c82f1cdd1078898847c31c6c06371f4232f6..9f404f0e36e10ebf61b44e95d6771d26a25faea8 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-float-truncate-1.c
> @@ -20,6 +20,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (fa[i] != (float) db[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c
> index fc5081b7e8e143893009b60147d667855efa12ad..f80da6a7ca7f0de224d88860a48f24b4fd8c2ad8 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-1.c
> @@ -20,6 +20,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != (int) fb[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
> index 64fab3876310d60ca016b78938e449201c80997d..dc038857a42813e665591c10eb3ab7f744d691ad 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
> @@ -19,6 +19,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != (int) db[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-fma-1.c b/gcc/testsuite/gcc.dg/vect/vect-fma-1.c
> index 6b6b4f726e9476ac6a90984e15fdd0839dff8885..27d206d9fa0601812b09a3ead2ee9730623e97e4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-fma-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-fma-1.c
> @@ -22,6 +22,7 @@
>  #define TEST(INV)					\
>    {							\
>      f_##INV (a, b, c, d);				\
> +    _Pragma("GCC novector")				\
>      for (int i = 0; i < N; ++i)				\
>        {							\
>  	double mb = (INV & 1 ? -b[i] : b[i]);		\
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
> index 4cee73fc7752681c2f677d3e6fddf7daf6e183eb..e3bbf5c0bf8db8cb258d8d05591c246d80c5e755 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
> @@ -50,6 +50,7 @@ main (void)
>    check_vect ();
>  
>    f (y, x, indices);
> +#pragma GCC novector
>    for (int i = 0; i < 32; ++i)
>      if (y[i] != expected[i])
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
> index 738bd3f3106948754e38ffa93fec5097560511d3..adfef3bf407fb46ef7a2ad01c495e44456b37b7b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
> @@ -54,6 +54,7 @@ main (void)
>    check_vect ();
>  
>    f (y, x, indices);
> +#pragma GCC novector
>    for (int i = 0; i < 32; ++i)
>      if (y[i] != expected[i])
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
> index 7e323693087598942f57aa8b7cf3686dde4a52c9..04d5fd07723e851442e1dc496fdf004d9196caa2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
> @@ -26,6 +26,7 @@ int main ()
>    check_vect ();
>    foo ();
>    /* check results:  */
> +#pragma GCC novector
>    for (int i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c
> index 56a8e01993d1a0701998e377fb7fac4fa2119aed..0f752b716ca811de093373cce75d948923386653 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-16.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] != MAX ? MAX : 0); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c
> index 962be1c0230cca6bef2c097b35833ddd6c270875..8b028d7f75f1de1c8d10376e4f0ce14b60dffc70 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-17.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] == MAX ? 0 : MAX);
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c
> index 6406875951bd52c3a5c3691eb2bc062e5525a4a1..10145d049083b541c95b813f2fd12d3d62041f53 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-2.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] >= MAX ? MAX : 0); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c
> index d55440c9fa421719cb03a30baac5d58ca1ac2fb6..4964343c0ac80abf707fe11cacf473232689123e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-3.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] > MAX ? MAX : 0); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c
> index 5cef85175131bd6b2e08d7801966f5526ededf8e..63f53a4c4eef6e1397d67c7ce5570dfec3160e83 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-4.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] <= MAX ? MAX : 0); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c
> index 3118e2d5a5536e175838284d367a8f2eedf8eb86..38b014336482dc22ecedaed81b79f8e7d5913d1e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-5.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] < MAX ? MAX : 0); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c
> index 272fde09429b6a46ee4a081b49736613136cc328..56e0f71bc799d16725e589a53c99abebe5dca40a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-6.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] != MAX ? MAX : 0); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c
> index c0c7a3cdb2baafa5702a7fcf80b7198175ecc4f2..879d88a5ce9239bf872cc0ee1b4eb921b95235d0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-7.c
> @@ -22,6 +22,7 @@ int main ()
>      A[i] = ( A[i] == MAX ? 0 : MAX); 
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c
> index e6446a765c0298857f71b80ffcaefdf77e4f5ce3..bbeccae0f228ad3fc7478c879ae4a741ae6fe7a3 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-9.c
> @@ -27,6 +27,7 @@ int main ()
>    check_vect ();
>    foo ();
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
> index bef3efa5658ae6d91010d286967e319906f9aeb5..f75c0f5a1a6645fdee6a8a04ffc55bd67cb7ac43 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
> @@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (fa[i] != (float) ib[i]) 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
> index 666ee34a4a753ff1d0e33012d95a77496f1986fa..32df21fb52a0b9f16aff7340eee21e76e832cceb 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
> @@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
>    }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (float_arr[i] != (float) int_arr[i]) 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
> index 78fc3da481c6693611b45d3939fe03d23e84f8f7..db33a84b54d70c9355079adf2ee163c904c68e57 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
> @@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (da[i] != (double) ib[i]) 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
> index af8fe89a7b02b555acc64b578a07c735f5ef45eb..6fc23bb4621eea594a0e70347a8007a85fb53db8 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
> @@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (fa[i] != (float) sb[i]) 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
> index 49c83182026b91c7b52667fec7a44554e3aff638..b570db5dc96db9c6e95b0e4dbebe1dae19c5ba7c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
> @@ -19,6 +19,7 @@ __attribute__ ((noinline)) int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (fa[i] != (float) usb[i]) 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-1.c b/gcc/testsuite/gcc.dg/vect/vect-iv-1.c
> index 90163c440d34bcd70a7024b83f70abb7b83f8077..e6dcf29ebe0d2b2dc6695e754c4a1043f743dd58 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-1.c
> @@ -22,6 +22,7 @@ __attribute__ ((noinline)) int main1 (int X)
>     } while (i < N);
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (arr[i] != result[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-10.c b/gcc/testsuite/gcc.dg/vect/vect-iv-10.c
> index 195474b56441bee9b20f373a6aa991610a551e10..83bc7805c3de27ef3dd697d593ee86c1662e742c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-10.c
> @@ -17,6 +17,7 @@ int main1 ()
>    }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (j=0,i=N;  j<N,i>0;  i--,j++) {
>        if (ia[j] != i)
>          abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-2.c b/gcc/testsuite/gcc.dg/vect/vect-iv-2.c
> index 73e30ee9bac6857b545242136d9c1408f6bfe60e..d85bb3436b2e0abcc4d0d0a7b480f4f267b4898c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-2.c
> @@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 ()
>     } while (i < N);
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (arr1[i] != 2+2*i)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-3.c b/gcc/testsuite/gcc.dg/vect/vect-iv-3.c
> index f8ca94dd17db81d8be824dfb2f023517f05d7c04..c0738ebc469f1780eb8ce90e89caa222df0e1fba 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-3.c
> @@ -24,6 +24,7 @@ __attribute__ ((noinline)) int main1 ()
>     } while (i < N);
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (arr1[i] != 2+2*i || arr2[i] != 5 + 2*i)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-4.c b/gcc/testsuite/gcc.dg/vect/vect-iv-4.c
> index dfe5bc14458c856122f48bd6bc6a50092d7729e1..2dd8ae30513260c858504f8dc0e8c7b6fd3ea59b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-4.c
> @@ -24,6 +24,7 @@ __attribute__ ((noinline)) int main1 ()
>     } while (i < N);
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (arr1[i] != 2+2*i || arr2[i] != 5 + 2*i)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-5.c b/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
> index 2015385fbf5fac1349124dd35d57b26c49af6346..c3c4735f03432f9be07ed2fb14c94234ee8f4e52 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-5.c
> @@ -20,6 +20,7 @@ __attribute__ ((noinline)) int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (arr[i] != 1.0 + 2.0*i)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-6.c b/gcc/testsuite/gcc.dg/vect/vect-iv-6.c
> index ccd7458a98f1d3833b19c838a27e9f582631e89c..4c9d9f19b45825a210ea3fa26160a306facdfea5 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-6.c
> @@ -28,6 +28,7 @@ __attribute__ ((noinline)) int main1 (int X)
>     } while (i < N);
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (arr1[i+1] != X+6*i+2
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-7.c b/gcc/testsuite/gcc.dg/vect/vect-iv-7.c
> index 24b59fe55c498bf21d107bef72bdc93690229c20..f6d93360d8dda6f9380425b5518ea5904f938322 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-7.c
> @@ -22,6 +22,7 @@ __attribute__ ((noinline, noclone)) int main1 (int X)
>     } while (i < N);
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (arr[i] != result[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c
> index 45d82c65e2f85b7b470a22748dacc78a63c3bd3e..26e8c499ce50cc91116c558a2425a47ebe21cdf7 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8-big-array.c
> @@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
>    }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8.c
> index dd37d250e91c3839c21fb3c22dc895be367cdcec..b4bb29d88003d2bbc0e90377351cb46d1ff72b55 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8.c
> @@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
>    }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c
> index 63b6b6e893f7a55a56aef89331610fd76d2c1c42..dceae27bbbee36a13af8055785dd4258b03e3dba 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8a-big-array.c
> @@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
>    }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != (signed char)myX || b[i] != myX || c[i] != (int)myX++)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c b/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c
> index 1f8fedf2716745d469771cfce2629dd05478bce8..dfe3a27f024031427344f337d490d4c75d8a04be 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-iv-8a.c
> @@ -23,6 +23,7 @@ __attribute__ ((noinline)) int main1 (short X)
>    }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (a[i] != (signed char)myX || b[i] != myX || c[i] != (int)myX++)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-1.c b/gcc/testsuite/gcc.dg/vect/vect-live-1.c
> index f628c5d3998930ea3e0cee271c20ff3eb17edf62..e4a6433a89961b008a2b766f6669e16f378ca01e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-live-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-live-1.c
> @@ -38,6 +38,7 @@ main (void)
>    if (ret != MAX + START)
>      abort ();
>  
> +#pragma GCC novector
>    for (i=0; i<MAX; i++)
>      {
>        __asm__ volatile ("");
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-2.c b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
> index 19d8c22859e0804ccab9d25ba69f22e50d635ebb..dae36e9ed67c8f6f5adf735345b817d59a3741f4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-live-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-live-2.c
> @@ -48,6 +48,7 @@ main (void)
>    if (ret != MAX - 1)
>      abort ();
>  
> +#pragma GCC novector
>    for (i=0; i<MAX; i++)
>      {
>        __asm__ volatile ("");
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-3.c b/gcc/testsuite/gcc.dg/vect/vect-live-3.c
> index 8f5ccb27365dea5e8cd8561d3c8a406e47469ebe..1f6b3ea0faf047715484ee64c1a49ef74dc1850e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-live-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-live-3.c
> @@ -45,6 +45,7 @@ main (void)
>    if (ret != (MAX - 1) * 3)
>      abort ();
>  
> +#pragma GCC novector
>    for (i=0; i<MAX; i++)
>      {
>        __asm__ volatile ("");
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-4.c b/gcc/testsuite/gcc.dg/vect/vect-live-4.c
> index 553ffcd49f744cabd6bdd42e6aca8c12d15ceb01..170927802d2d8f1c42890f3c82f9dabd18eb2f38 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-live-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-live-4.c
> @@ -42,6 +42,7 @@ main (void)
>    if (ret != MAX + 4)
>      abort ();
>  
> +#pragma GCC novector
>    for (i=0; i<MAX; i++)
>      {
>        __asm__ volatile ("");
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-5.c b/gcc/testsuite/gcc.dg/vect/vect-live-5.c
> index 7cde1db534bb1201e106ba34c9e8716c1f0445a1..9897552c25ce64130645887439c9d1f0763ed399 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-live-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-live-5.c
> @@ -39,6 +39,7 @@ main (void)
>    if (ret != 99)
>      abort ();
>  
> +#pragma GCC novector
>    for (i=0; i<MAX; i++)
>      {
>        __asm__ volatile ("");
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c b/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
> index 965437c8f03eaa707add3577c6c19e9ec4c50302..6270c11e025ed6e181c7a607da7b1b4fbe82b325 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-1.c
> @@ -51,6 +51,7 @@ main (void)
>        a[i] = i;
>      }
>  
> +#pragma GCC novector
>    for (i=0; i<4; i++)
>      {
>        __asm__ volatile ("");
> @@ -60,6 +61,7 @@ main (void)
>        if (ret != (MAX * 4) - 4 + i)
>  	abort ();
>  
> +#pragma GCC novector
>        for (i=0; i<MAX*4; i++)
>  	{
>  	  __asm__ volatile ("");
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c b/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c
> index 0d2f17f9003178d65c3dc5358e13c45f8ac980e3..c9987018e88b04f5f0ff195baaf528ad86722714 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-2.c
> @@ -45,6 +45,7 @@ main (void)
>        a[i] = i;
>      }
>  
> +#pragma GCC novector
>    for (i=0; i<2; i++)
>      {
>        __asm__ volatile ("");
> @@ -54,6 +55,7 @@ main (void)
>        if (ret != (MAX * 2) - 2 + i)
>  	abort ();
>  
> +#pragma GCC novector
>        for (i=0; i<MAX*2; i++)
>  	{
>  	  __asm__ volatile ("");
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
> index a3f60f6ce6d24fa35e94d95f2dea4bfd14bfdc74..e37822406751b99b3e5e7b33722dcb1912483345 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c
> @@ -52,6 +52,7 @@ main (void)
>        a[i] = i;
>      }
>  
> +#pragma GCC novector
>    for (i=0; i<4; i++)
>      {
>        __asm__ volatile ("");
> @@ -61,6 +62,7 @@ main (void)
>        if (ret != (MAX * 4) - 4 + i)
>  	abort ();
>  
> +#pragma GCC novector
>        for (i=0; i<MAX*4; i++)
>  	{
>  	  __asm__ volatile ("");
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c
> index 992cbda2e91628cd145d28c8fdabdb7a4d63ee68..91d4d40a86013dca896913d082773e20113a17e2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-mask-load-1.c
> @@ -36,6 +36,7 @@ main ()
>        asm ("");
>      }
>    foo (a, b);
> +#pragma GCC novector
>    for (i = 0; i < 1024; i++)
>      if (a[i] != ((i & 1)
>  		 ? 7 * i + 2.0 * (7 * i * 7.0 + 3.0)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
> index 7d9dc5addf54264bf2fd0c733ccfb83bb1c8f20d..76f72597589c6032d298adbc8e687ea4808e9cd4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c
> @@ -36,6 +36,7 @@ main ()
>        asm ("");
>      }
>    foo (a, b, c);
> +#pragma GCC novector
>    for (i = 0; i < 1024; i++)
>      if (a[i] != ((i & 1) ? -i : i)
>  	|| b[i] != ((i & 1) ? a[i] + 2.0f : 7 * i)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c b/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c
> index 8e46ff6b01fe765f597add737e0b64ec5b505dd1..4df0581efe08333df976dfc9c52eaab310d5a1cc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-mulhrs-1.c
> @@ -37,6 +37,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c, N);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != HRS(BASE1 * BASE2 + i * i * (CONST1 * CONST2)
>  		    + i * (BASE1 * CONST2 + BASE2 * CONST1)))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
> index b63e9a8a6d9d0c396c3843069d100fbb9d5fa913..1e90d19a684eb0eebf223f85c4ea2b2fd93aa0c5 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
> @@ -27,6 +27,7 @@ main (void)
>      }
>  
>    foo (data);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (data[i] / 123 != i)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
> index a8253837c3863f5bc5bfea1d188a5588aea501c6..f19829b55a96227f0157527b015291da6abd54bf 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
> @@ -26,6 +26,7 @@ main (void)
>      }
>  
>    foo (data);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (data[i] / -19594LL != i)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
> index 378a5fe642ac415cd20f45e88f06e8d7b9040c98..06dbb427ea11e14879d1856c379934ebdbe50e04 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-1.c
> @@ -39,6 +39,7 @@ __attribute__ ((noinline)) int main1 (int n)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (sa[i + NSHORTS - 1] != sb[i] || ia[i + NINTS - 1] != ib[i + 1])
> @@ -69,6 +70,7 @@ __attribute__ ((noinline)) int main2 (int n)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (sa[i + NINTS - 1] != sb[i + 1] || ia[i + NINTS - 1] != ib[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c
> index 891ba6d8e7169c67e840733402e953eea919274e..c47cf8c11d9ade3c4053f3fcf18bf719fe58c971 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-10.c
> @@ -48,6 +48,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (uresult[i] != (unsigned short)uY[i])
>        abort ();
> @@ -55,6 +56,7 @@ int main (void)
>    
>    foo2 (N);
>    
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != (short)Y[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c
> index c58391f495eb8d19aec9054f4d324a1bdf4461a4..29d178cf88d8df72b546772047b1e99a1a74043b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c
> @@ -30,6 +30,7 @@ int main (void)
>  
>    foo (N,z+2);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (z[i+2] != x[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
> index 4782d3f7d1066e1dcf5c3c1004d055eb56bd3aec..dd5fffaed8e714114dcf964ffc6b5419fba1aa9f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-12.c
> @@ -31,6 +31,7 @@ int main (void)
>  
>    foo (N,z+2);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (z[i+2] != x[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c
> index 2b185bb1f86ede937842596cec86f285a7c40d27..5bf796388f9c41083a69f3d6be3f5a334e9410a1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-13.c
> @@ -44,6 +44,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (uresult[i] != (unsigned int)uX[i])
>        abort ();
> @@ -51,6 +52,7 @@ int main (void)
>    
>    foo2 (N);
>    
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != (int)X[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c
> index ff5f8e989b2ea57fb265e8fca3a39366afb06aaa..6f9b81d1c01ab831a79608074f060b3b231f177d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-14.c
> @@ -36,6 +36,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (uresultX[i] != uX[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
> index cf45703e01867a7954325f6f8642594e31da9744..a61f1a9a2215e238f6c67e229f642db6ec07a00c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-16.c
> @@ -26,6 +26,7 @@ int main (void)
>  
>    foo (N,z+2);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (z[i+2] != x[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c
> index 79ad80e3013e189c0efb9425de2b507cf486f39a..d2eff3a20986593a5185e981ae642fcad9a57a29 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-17.c
> @@ -30,6 +30,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (uresultX[i] != uX[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c
> index 7f93938349f91c0490dad8ea2de3aec780c30b2b..069ef44154effb38f74792e1a00dc3ee236ee6db 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-2.c
> @@ -26,6 +26,7 @@ __attribute__ ((noinline)) int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != ib[i] 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
> index 1f82121df06181ad27478378a2323dbf478eacbe..04b144c869fc2a8f8be91a8252387e09d7fca2f2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-3.c
> @@ -39,6 +39,7 @@ int main1 (int n, int * __restrict__ pib,
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (ia[i] != pib[i] 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
> index b0f74083f2ba992620ebdf3a3874f6c5fa29f84d..18ab9538675b3fd227ae57fafc1bfd1e840b8607 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-4.c
> @@ -41,6 +41,7 @@ int main1 (int n)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> @@ -75,6 +76,7 @@ int main2 (int n)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c
> index ad11d78a548735a67f76b3aa7f98731d88868b56..7c54479db1f684b9661d59816a3cd9b0e5f35619 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-5.c
> @@ -30,6 +30,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (ia[i] != ib[i] + ic[i] 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
> index 864b17ac640577753d8164f1ae3ea84181a553c1..73d3b30384ebc4f15b853a140512d004262db3ef 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-6.c
> @@ -46,6 +46,7 @@ int main1 (int n,
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      {
>        if (ia[i] != pib[i] + pic[i] 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c
> index 315c9aba731ac28189cd5f463262fc973d52abe2..001671ebdc699ca950f6fd157bd93dea0871c5ab 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-8.c
> @@ -36,6 +36,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (uresultX[i] != uX[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c b/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c
> index 8c5c611947f720c9ef744c33bdd09a78967d4a4c..3e599b3462d13a8afcad22144100f8efa58ac921 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-multitypes-9.c
> @@ -44,6 +44,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (uresult[i] != (unsigned short)uX[i])
>        abort ();
> @@ -51,6 +52,7 @@ int main (void)
>    
>    foo2 (N);
>    
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != (short)X[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
> index 75b210c1a21c56c114f25b354fb368bdbe9462d5..357d006177f60a5376597929846efbfaa787f90b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-1.c
> @@ -20,6 +20,7 @@ int main (int argc, const char **argv)
>    int i;
>    check_vect ();
>    foo (31);
> +#pragma GCC novector
>    for (i = 0; i < 31; i++)
>      if (ii[i] != i)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
> index 229ce987db5f5a5b48177d0c9d74e416e417d3f6..dc4c7a64aee4f800997d62550f891b3b35f7b633 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
> @@ -23,6 +23,7 @@ int main (int argc, const char **argv)
>    int i;
>    check_vect ();
>    foo (32);
> +#pragma GCC novector
>    for (i = 0; i < 32; i++)
>      if (ii[i] != i)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
> index 16665265c4062c0a3acb31e01a1473dea3125685..268e65458bf839e2403a7ae3e4c679e7df6dcac7 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-3.c
> @@ -22,6 +22,7 @@ int main (int argc, const char **argv)
>    int i;
>    check_vect ();
>    foo (33);
> +#pragma GCC novector
>    for (i = 0; i < 33; i++)
>      if (ii[i] != i)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c b/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c
> index fca8ee0963860fa0a938db41c865e8225bf554c3..aa6e403b51ce8e9a29ddd39da5d252c9238ca7eb 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-neg-store-1.c
> @@ -28,10 +28,12 @@ int main (void)
>     
>    test1 (x + 16);
>    
> +#pragma GCC novector
>    for (i = 0; i < 128; i++)
>     if (x[i + 16] != 1234)
>       abort ();
>    
> +#pragma GCC novector
>    for (i = 0; i < 16; i++)
>      if (x[i] != 5678
>         || x[i + 144] != 5678)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c b/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c
> index c924b12b02fd438d039d0de6b6639813047839e7..95b16196007488f52b2ec9a2dfb5a4f24ab49bba 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-neg-store-2.c
> @@ -28,10 +28,12 @@ int main (void)
>     
>    test1 (x + 16, 1234);
>    
> +#pragma GCC novector
>    for (i = 0; i < 128; i++)
>     if (x[i + 16] != 1234)
>       abort ();
>    
> +#pragma GCC novector
>    for (i = 0; i < 16; i++)
>      if (x[i] != 5678
>         || x[i + 144] != 5678)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c
> index f52f30aa24e83768f9beb03fb2ac7b17f37e0b77..129dab2ba1cfe8175644e0a2330349974efca679 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-1.c
> @@ -28,6 +28,7 @@ foo ()
>        out[i] = res;
>      }
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)  
>      if (out[i] != check_res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c
> index 5aa977df633c9a5d24e248b0c02ec21751f78241..26ad6fa65c6d1489aa1b1ce9ae09ea6f81ad44d2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-2.c
> @@ -27,6 +27,7 @@ foo ()
>        out[i] = res;
>      }
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)  
>      if (out[i] != check_res[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c
> index f2ab30c63b2e28fbd453af68628d3491d6b4d034..4e3b8343ff7b4b1f43397fe2e71a8de1e89e9a74 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-nest-cycle-3.c
> @@ -27,6 +27,7 @@ main1 ()
>      }
>  
>    /* Check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != DIFF)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c
> index 02f01cfb5791319d766f61465c2d1b64718674de..32c40fb76e325571347993571547fa12dd6255aa 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2-big-array.c
> @@ -28,6 +28,7 @@ int main (void)
>    foo ();
>  
>    for (i = 0; i < N; i++) {
> +#pragma GCC novector
>      for (j = 0; j < N; j++) {
>        if (image[j][i] != j+i)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2.c
> index 55023d594dd2e0cb18c3c9dc838ac831ede938da..a0a419c1547fc451b948628dafeb48ef2f836daa 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2.c
> @@ -28,6 +28,7 @@ int main (void)
>    foo ();
>  
>    for (i = 0; i < N; i++) {
> +#pragma GCC novector
>      for (j = 0; j < N; j++) {
>        if (image[j][i] != j+i)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c
> index 6b9fefedf3a5c9ee43c9201039987468710df62d..5ca835a2dda468bab1cbba969278a74beff0de32 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2a-big-array.c
> @@ -28,6 +28,7 @@ int main (void)
>  
>   for (k=0; k<N; k++) {
>    for (i = 0; i < N; i++) {
> +#pragma GCC novector
>      for (j = 0; j < N; j++) {
>        if (image[k][j][i] != j+i+k)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c
> index 3a4dafee0720bd1a5e532eb2c0062c5eb78556b6..f9924fcb2b40531e8e7a4536d787b5d1b6e2b4ee 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2a.c
> @@ -28,6 +28,7 @@ int main (void)
>  
>   for (k=0; k<N; k++) {
>    for (i = 0; i < N; i++) {
> +#pragma GCC novector
>      for (j = 0; j < N; j++) {
>        if (image[k][j][i] != j+i+k)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c
> index bb4e74b7b333ce036159db4cbf5aaa7107dc35d9..218df61cf4b18709cb891969ae53977081a86f1d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2b.c
> @@ -27,6 +27,7 @@ int main (void)
>  
>   for (k=0; k<N; k++) {
>    for (i = 0; i < N; i++) {
> +#pragma GCC novector
>      for (j = 0; j < N; j++) {
>        if (image[k+i][j][i] != j+i+k)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c
> index 6adde9769215e8c98132ec91ab015e56b710c47a..36c9681201532960b3eecda2b252ebe83036a95a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2c-big-array.c
> @@ -28,6 +28,7 @@ int main (void)
>  
>   for (k=0; k<N; k++) {
>    for (i = 0; i < N; i++) {
> +#pragma GCC novector
>      for (j = 0; j < N; j+=2) {
>        if (image[k][j][i] != j+i+k)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c
> index bf6abfef01fa96904adbf350935de3609550f2af..678d7e46a5513e0bdeaf0ec24f2469d58df2cbc5 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2c.c
> @@ -28,6 +28,7 @@ int main (void)
>  
>   for (k=0; k<N; k++) {
>    for (i = 0; i < N; i++) {
> +#pragma GCC novector
>      for (j = 0; j < N; j+=2) {
>        if (image[k][j][i] != j+i+k)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c b/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c
> index b75281bc3187f84824e1360ba92a18f627686aa5..81a4fc407086372c901b1ff34c75cada3e8efb8a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-2d.c
> @@ -27,6 +27,7 @@ int main (void)
>  
>   for (k=0; k<N; k++) {
>    for (i = 0; i < N; i++) {
> +#pragma GCC novector
>      for (j = 0; j < i+1; j++) {
>        if (image[k][j][i] != j+i+k)
>         abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c
> index fdc8a0544dd941f28a97a22e706bd3f5c3c9d2a3..231989917d7c4d5ff02b4f13a36d32c543114c37 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3-big-array.c
> @@ -36,6 +36,7 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      diff = 0;
>      for (j = 0; j < N; j++) {
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3.c
> index 921db48a0f76763bc724d41f90c74472da8e25fb..c51787fe5753f4317b8c1e82c413b009e865ad11 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3.c
> @@ -36,6 +36,7 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      diff = 0;
>      for (j = 0; j < N; j++) {
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c
> index fd841b182e3c81eed43a249fe401c6213814ea36..7ae931e39be5a4e6da45242b415459e073f1384a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3a-big-array.c
> @@ -36,6 +36,7 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      diff = 0;
>      for (j = 0; j < N; j++) {
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
> index d26440d1a64e887aa2cd6ccf1330cb34d244ef12..bfadac0c5e70b61b23b15afa9271ac9070c267c1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3a.c
> @@ -36,6 +36,7 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      diff = 0;
>      for (j = 0; j < N; j++) {
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c
> index b915c4e370c55293ec00665ddd344b9ddafec3b4..1e2bbf1e7bac29563a530c4bbcd637d8541ddfca 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3b.c
> @@ -36,6 +36,7 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++) {
>      diff = 0;
>      for (j = 0; j < N; j++) {
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c b/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c
> index 091c6826f66acb07dbc412ae687d72c84800146d..952bba4d911956c49a515276c536a87a68433d40 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-3c.c
> @@ -36,6 +36,7 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      diff = 0;
>      for (j = 0; j < N; j+=4) {
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4.c b/gcc/testsuite/gcc.dg/vect/vect-outer-4.c
> index 9614b777aded3c9d2f5229d27ce8e5cfbce0c7d2..8a803cd330f25324669467a595534100878f3ddc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4.c
> @@ -38,6 +38,7 @@ int main (void)
>  
>    foo ();
>    
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      diff = 0;
>      for (j = 0; j < M; j+=4) {
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c
> index b656064697c93177cb9cd9aae8f9f278b9af40b0..587eabaf004705fb6d89882a43a628921361c30e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4d-big-array.c
> @@ -35,6 +35,7 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      diff = 0;
>      for (j = 0; j < M; j+=4) {
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c b/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c
> index 443a00d49e19dae2a0dd32d6e9e28d2bf5972201..0c9115f60a681f48125dfb2a6428202cc1ec7557 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-4d.c
> @@ -35,6 +35,7 @@ int main (void)
>  
>    foo ();
>    
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      diff = 0;
>      for (j = 0; j < M; j+=4) {
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-5.c b/gcc/testsuite/gcc.dg/vect/vect-outer-5.c
> index 10b558fd20905d2c8b9915d44a41e89b406028d9..67be075278847ea09e309c5d2ae2b4cf8c51b736 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-5.c
> @@ -38,6 +38,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N-20; i++)
>      {
>        s = 0;
> @@ -57,6 +58,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < 4; i++)
>      {
>        if (B[i] != E[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-6.c b/gcc/testsuite/gcc.dg/vect/vect-outer-6.c
> index 201ca8424828d6dabe1c6d90dff8396438a71ff4..13a5496f70c069f790d24d036642e0715a133b3b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-6.c
> @@ -48,6 +48,7 @@ int main ()
>    main1();
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        s = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c
> index 6299e9fed4233b3ec2c0b9892afdca42edf0bee0..8114934ed03332aaa682c6d4b5a7f62dfc33a51e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-big-array.c
> @@ -62,6 +62,7 @@ int main (void)
>    foo ();
>    fir ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      if (out[i] != fir_out[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c
> index d575229f2fb3bb6ece1fbc013019ebb0fbaa505e..9c4be4b9f658f7abd1e65b7b5a9124a5670f7ab9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb-big-array.c
> @@ -66,6 +66,7 @@ int main (void)
>    foo ();
>    fir ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      if (out[i] != fir_out[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c
> index 9414e82f3edb1ea00587b916bfaf66847ac07574..4f1ccfccfa229105eb4e8a5c96a5ebfb13384c5d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir-lb.c
> @@ -66,6 +66,7 @@ int main (void)
>    foo ();
>    fir ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      if (out[i] != fir_out[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c b/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c
> index 0d181dfec24a212d430a1cac493ee914ebe25325..1c68c6738580d8670b7b108c52987d576efee4ac 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-fir.c
> @@ -62,6 +62,7 @@ int main (void)
>    foo ();
>    fir ();
>    
> +#pragma GCC novector
>    for (i = 0; i < N; i++) {
>      if (out[i] != fir_out[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c
> index 217b32cd2bde12247f94f36787ccdf67bb014ba2..795bff5f3d5f1629b75cdc7fefdc48ff4c05ad8a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-1.c
> @@ -66,6 +66,7 @@ int main()
>        t2[i] = z1[i]; z1[i] = 1.0f;
>      }
>    foo2 ();  /* scalar variant.  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      if (x1[i] != t1[i] || z1[i] != t2[i])
>        abort ();	
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
> index 3ae1020936f960a5e46d6c74bee80d3b52df6db5..ead8d6f8e79187f0054d874b1d6e5fe3c273b5ca 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
> @@ -67,6 +67,7 @@ int main ()
>        t2[i] = z1[i]; z1[i] = 1.0f;
>      }
>    foo2 (n);  /* scalar variant.  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      if (x1[i] != t1[i] || z1[i] != t2[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c
> index 59e54db075254498b34f673198f8f4f373b728a5..a102ddd7d8d4d9182436646e1ca4d0bd1dd86479 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-3.c
> @@ -70,6 +70,7 @@ int main()
>        t2[i] = z1[i]; z1[i] = 1.0f;
>      }
>    foo2 ();  /* scalar variant.  */
> +#pragma GCC novector
>    for (i=0; i<N; i++)
>      if (x1[i] != t1[i] || z1[i] != t2[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c
> index ec1e1036f57022716361977fb419b0806e55123d..0e5388b46ce80b610d75e18c725b8f05881c244b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-2.c
> @@ -28,6 +28,7 @@ int main ()
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (int i = 0; i < 20; i++)
>      {
>        double suma = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c
> index 53865d4737b1333c3eb49723d35d2f0e385049a3..3dce51426b5b83d85bc93aaaa67bca3e4c29bc44 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c
> @@ -35,6 +35,7 @@ int main ()
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (int i = 0; i < 20; i++)
>      {
>        double suma = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> index 9a5141ee6ecce133ce85edcf75603e0b3ce41f04..a7ce95bcdcefc1b71d84426290a72e8891d8775b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> @@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
>  
>    s = src;
>    d = (unsigned short *)dst;
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        const int b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
> index f2d284ca9bee4af23c25726a54866bfaf054c46c..21fbcf4ed70716b47da6cbd268f041965584d08b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-1.c
> @@ -30,6 +30,7 @@ foo (unsigned char *src, unsigned char *dst)
>  
>    s = src;
>    d = (unsigned short *)dst;
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        const int b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
> index 222d854b2d6e3b606e83131862c2d23a56f11829..1e48dab5ccb4b13c82800d890cdd5a5a5d6dd295 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-11.c
> @@ -43,6 +43,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c, d);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      {
>        int res = BASE_B + BASE_C + i * 9;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
> index b25db881afbc668bb163915a893bfb8b83243f32..08a65ea551812ba48298884ec32c6c7c5e46bdd2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-13.c
> @@ -36,6 +36,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
> index d31050ee926ac7e12c8bce99bf3edc26a1b11fbe..bd7acbb613f47fd61f85b4af777387ae88d4580a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-15.c
> @@ -38,6 +38,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != (BASE_B + BASE_C + i * 9) / 2)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
> index 333d74ac3e7bdf99cc22b8fc7e919e39af7d2ca4..53fcfd0c06c14e5d9ddc06cdb3c36e2add364d3b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
> @@ -33,6 +33,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != ((((i * 3) & 0xef) | 0x80) + (i << 4)))
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
> index ecb74d7793eeaa80b0d48479b2be6c68e64c61b0..aa58cd1c95789ad4f17317c5fa501385a185edc9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-18.c
> @@ -35,6 +35,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != (unsigned char) ((((i * 3) & 0xef) | 0x80) + (i << 4)))
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
> index 11546fe1a502a02c750ea955f483bc3a8b3a0ac7..c93cd4d09af5fc602b5019352073404bb1f5d127 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-19.c
> @@ -40,6 +40,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c, d, e);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != i * 2 + 3
>  	|| b[i] != i + 100
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
> index 82aec9f26517b2e00568f3240ff88d954af29bea..4bbb30ac8aca529d062e0daacfe539177ab92224 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2-big-array.c
> @@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
>  
>    s = src;
>    d = (int *)dst;
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        const int b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
> index 0bcbd4f1b5315ec84e4aa3bd92e058b6ca9ea0ec..ad423f133c0bc25dfad42e30c34eceb5a8b852ab 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-2.c
> @@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
>  
>    s = src;
>    d = (int *)dst;
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        const int b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
> index 47f970d231ee61c74c7c4d5b3f9e9bab0673cfe2..81292d42f0d695f98b62607053daf8a5c94d98d3 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-20.c
> @@ -40,6 +40,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c, d, e);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != i * 2 + 3
>  	|| b[i] != i + 100
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
> index 6e13f260f1124ef221bae41b31f8f52ae35162d3..361f77081a6d0a1d30051107f37aa4a4b764af4f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-21.c
> @@ -38,6 +38,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c, d, e);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != i * 2 + 3
>  	|| b[i] != i + 100
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c
> index 187bdf159feaa770b8497c020bd3bc82becdea15..830f221019871a3df26925026b7b8c506da097db 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-22.c
> @@ -37,6 +37,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c, d, 0x73);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (b[i] != ((i * 2 + 3) ^ 0x73)
>  	|| a[i] != ((i * 11) | b[i]))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> index 6f89aacbebf5094c7b1081b12c7fcce1b97d536b..55de14161d85db871ae253b86086a1341eba275c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> @@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
>  
>    s = src;
>    d = (unsigned short *)dst;
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        const int b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
> index a1e1182c6067db47445ad07b77e5c6e067858488..3d833561972da4a128c1bc01eff277564f084f14 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3.c
> @@ -26,6 +26,7 @@ foo (unsigned char *src, unsigned char *dst)
>  
>    s = src;
>    d = (unsigned short *)dst;
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        const int b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
> index 03a6e6795ec68e5f9a35da93ca7a8d50a3012a21..6b3a2b88abfb6a5cd4587d766b889825c2d53d60 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4-big-array.c
> @@ -28,6 +28,7 @@ foo (unsigned char *src, unsigned char *dst)
>  
>    s = src;
>    d = (unsigned short *)dst;
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        const int b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
> index 0ef377f1f58a6f6466380a59c381333dbc4805df..60c9c2cc1ec272b46b7bb9a5cf856a57591425b0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-4.c
> @@ -32,6 +32,7 @@ foo (unsigned char *src, unsigned char *dst)
>  
>    s = src;
>    d = (unsigned short *)dst;
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        const int b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
> index 269df5387d20c859806da03aed91d77955fa651a..c2ab11a9d325c1e636003e61bdae1bab63e4cf85 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-5.c
> @@ -37,6 +37,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != (BASE_B + BASE_C + i * 9) >> 1)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
> index 314a6828c53c161d2e63b88bdecf0cee9070a794..1d55e13fb1fbc4273d3a64da20dc1e80fb760296 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-7.c
> @@ -39,6 +39,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c, D);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != (BASE_B + BASE_C + D + i * 9) >> 2)
>        __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
> index 5baba09a575f0f316aac1a967e145dbbbdade5b4..36bfc68e05357359b8d9bdfe818910a3d0ddcb5a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-9.c
> @@ -40,6 +40,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      {
>        int res = BASE_B + BASE_C + i * 9;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c b/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c
> index 7980d4dd6438d9a063051c78608f73f1cea1c740..717850a166b2b811797cf9cdd0753afea676bf74 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-peel-1-src.c
> @@ -21,6 +21,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <= N; i++)
>      {
>        if (ia[i] != ib[i+2] + ib[i+6])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c b/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
> index f6fc134c8705567a628dcd62c053ad6f2ca2904d..5e5a358d34bece8bbe5092bf2d617c0995388634 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-peel-2-src.c
> @@ -22,6 +22,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <= N; i++)
>      {
>        if (ia[i] != ib[i+2] + ib[i+6])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c b/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c
> index 33088fb090271c3b97fae2300e5d7fc86242e246..1b85f14351242304af71564660de7db757294400 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-peel-4-src.c
> @@ -18,6 +18,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 1; i <= N; i++)
>      {
>        if (ia[i] != ib[i] + ib[i+5])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
> index 64de22a1db4d7a8b354ad3755685171308a79a00..698ca5bf0672d3bfce0121bd2eae27abb2f75ca2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
> @@ -28,6 +28,7 @@ main ()
>      }
>    int c = 7;
>    foo (a, b, &c);
> +#pragma GCC novector
>    for (int i = 1; i < 64; ++i)
>      if (b[i] != a[i] - a[i-1])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
> index 086b48d9087c02ccbc0aaf36f575a3174f2916af..777051ee4a16a47f20339f97e13ad396837dea9a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
> @@ -29,6 +29,7 @@ main ()
>      }
>    int c = 7;
>    foo (a, b, &c);
> +#pragma GCC novector
>    for (int i = 1; i < 64; ++i)
>      if (b[i] != a[i] - a[i-1])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
> index 3389736ead98df2207a89de3ecb34a4a95faa6f5..aeb7da3877d7e0df77d6fee1a379f352ae2a5750 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
> @@ -29,6 +29,7 @@ main ()
>      }
>    int c = 7;
>    foo (a, b, &c);
> +#pragma GCC novector
>    for (int i = 1; i < 64; ++i)
>      if (b[i] != a[i] - a[i-1])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
> index c0b73cd8f3322ae01b7a1889657bc92d38fa4af6..f4ab59671b7934e3e6f5d893159a3618f4aa3898 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-4.c
> @@ -31,6 +31,7 @@ main ()
>      }
>    int c = 7;
>    foo (a, b, &c);
> +#pragma GCC novector
>    for (int i = 2; i < 64; i+=2)
>      if (b[i] != a[i] - a[i-2]
>  	|| b[i+1] != a[i+1] - a[i-1])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
> index 7327883cc31ae4a37e5e4597b44b35e6376b4ed2..2fed60df68cdfbdc3ebf420db51d132ed335dc14 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-5.c
> @@ -32,6 +32,7 @@ main ()
>      }
>    int c = 7;
>    foo (a, b, &c);
> +#pragma GCC novector
>    for (int i = 2; i < 64; i+=2)
>      if (b[i] != a[i] - a[i-2]
>  	|| b[i+1] != a[i+1] - a[i-1])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c b/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
> index f678b326f1043d2bce51b1d652de5ee2b55d6d0f..c170f4c345cdee1d5078452f9e301e6ef6dff398 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-recurr-6.c
> @@ -28,6 +28,7 @@ main ()
>      }
>    int c = 7;
>    foo (a, b, &c, 63);
> +#pragma GCC novector
>    for (int i = 1; i < 63; ++i)
>      if (b[i] != a[i] - a[i-1])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> index 484efb1e8c826a8dafb43ed18e25794951418a9c..49ecbe216f2740329d5cd2169527a9aeb7ab844c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-sdiv-pow2-1.c
> @@ -70,6 +70,7 @@ main (void)
>        fns[i].div (b, a, N);
>        fns[i].mod (c, a, N);
>  
> +#pragma GCC novector
>        for (int j = 0; j < N; j++)
>  	if (a[j] != (b[j] * p + c[j]))
>            __builtin_abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c b/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c
> index dfd8ebace5610b22cc0da33647953ae33e084a42..0c4025abceb0e36092f5f7be1f813e4a6ebeda15 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-sdivmod-1.c
> @@ -88,6 +88,7 @@ main ()
>    f4 (4095);
>    if (a[0] != (-2048 << 8))
>      abort ();
> +#pragma GCC novector
>    for (i = 1; i < 4096; i++)
>      if (a[i] != ((1 + ((i - 2048) % 16)) << 8))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-shift-1.c
> index 0c3086b1d683441e9b7d0096d4edce37e86d3cc1..d5fc4748758cea2762efc1977126d48df265f1c3 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-shift-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-shift-1.c
> @@ -21,6 +21,7 @@ int main ()
>      A[i] = A[i] >> 3;
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (A[i] != B[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-3.c b/gcc/testsuite/gcc.dg/vect/vect-shift-3.c
> index a1b4b0752291e64d51206fca644e241c8e0063a9..0a9d562feb56ec69e944d0a3581853249d9642ae 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-shift-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-shift-3.c
> @@ -26,6 +26,7 @@ int main()
>  
>    array_shift ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (dst[i] != i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-4.c b/gcc/testsuite/gcc.dg/vect/vect-shift-4.c
> index 09f6e5a9584099b34e539b72dbe95e33da83cd20..d53faa52ee88b00d09eeefa504c9938084fa6230 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-shift-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-shift-4.c
> @@ -26,6 +26,7 @@ int main()
>  
>    array_shift ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (dst[i] != i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-1.c
> index 7c3feeeffae363b8ad42989a3569ca394519a414..09722ae090d0edb875cb91f5b20da71074aee7d3 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-1.c
> @@ -44,19 +44,23 @@ main ()
>  {
>    check_vect ();
>    foo ();
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != 1)
>        abort ();
>    x = 1;
>    foo ();
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != 2)
>        abort ();
>    baz ();
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != 3)
>        abort ();
>    qux ();
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != 4)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-10.c b/gcc/testsuite/gcc.dg/vect/vect-simd-10.c
> index e49566a3847a97dee412148bed63a4b69af8dd1b..af0999a726288890a525fe18966331e0cb5c0cad 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-10.c
> @@ -76,6 +76,7 @@ main ()
>    if (r * 16384.0f != 0.125f)
>      abort ();
>    float m = -175.25f;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s *= a[i];
> @@ -91,6 +92,7 @@ main ()
>    if (bar () != 592.0f)
>      abort ();
>    s = FLT_MIN_VALUE;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (s < a[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-11.c b/gcc/testsuite/gcc.dg/vect/vect-simd-11.c
> index e7d8aa0eb03879fcf0a77a512afc3281fbeabe76..2620dfebbc0dde80d219660dcead43ae01c7c41f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-11.c
> @@ -109,6 +109,7 @@ main ()
>        || r2 != (unsigned short) r
>        || r3 != (unsigned char) r)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += i;
> @@ -129,6 +130,7 @@ main ()
>        || s3 != (unsigned char) (1024 * 1023))
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += 2 * i;
> @@ -152,6 +154,7 @@ main ()
>        || r3 != (unsigned char) r)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += i;
> @@ -174,6 +177,7 @@ main ()
>        || s3 != (unsigned char) (1024 * 1023))
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += 2 * i;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-12.c b/gcc/testsuite/gcc.dg/vect/vect-simd-12.c
> index cdfec81a6e6d761b6959fd434fc3367ad01d7026..45b55384006b1674c36a89f4539d2ffee2e4236e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-12.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-12.c
> @@ -77,6 +77,7 @@ main ()
>    foo (a, b);
>    if (r != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -88,6 +89,7 @@ main ()
>    if (bar () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -101,6 +103,7 @@ main ()
>    if (r != 1024 * 1023 / 2)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -112,6 +115,7 @@ main ()
>    if (qux () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-13.c b/gcc/testsuite/gcc.dg/vect/vect-simd-13.c
> index aee5244d85e18e707163a34cb93a9cd5b1317fc3..3ef4aa9a991c0b6259f3b3057616c1aa298663d9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-13.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-13.c
> @@ -79,6 +79,7 @@ main ()
>    foo (a, b);
>    if (r != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -90,6 +91,7 @@ main ()
>    if (bar () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -103,6 +105,7 @@ main ()
>    if (r != 1024 * 1023 / 2)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -114,6 +117,7 @@ main ()
>    if (qux () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-14.c b/gcc/testsuite/gcc.dg/vect/vect-simd-14.c
> index 9e73792ed7c36030b2f6885e1257a66991cdc4d1..c8a38f85ad4f29c9bbc664a368e23254effdd976 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-14.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-14.c
> @@ -76,6 +76,7 @@ main ()
>    if (r * 16384.0f != 0.125f)
>      abort ();
>    float m = -175.25f;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> @@ -89,6 +90,7 @@ main ()
>    if (bar () != 592.0f)
>      abort ();
>    s = FLT_MIN_VALUE;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-15.c b/gcc/testsuite/gcc.dg/vect/vect-simd-15.c
> index 91e34cd6428c4b841ab55226e49a5fc10444df57..6982a59da78276bad2779827ee0b8c1e1691e2e3 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-15.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-15.c
> @@ -109,6 +109,7 @@ main ()
>        || r2 != (unsigned short) r
>        || r3 != (unsigned char) r)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s
> @@ -129,6 +130,7 @@ main ()
>        || s3 != (unsigned char) (1024 * 1023))
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s
> @@ -152,6 +154,7 @@ main ()
>        || r3 != (unsigned char) r)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s
> @@ -174,6 +177,7 @@ main ()
>        || s3 != (unsigned char) (1024 * 1023))
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        if (b[i] != s
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-16.c b/gcc/testsuite/gcc.dg/vect/vect-simd-16.c
> index ee4459a9341815c7ac4a5f6be4b9ca7679f13022..1ac13a5c5b4f568afa448af8d294d114533c061b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-16.c
> @@ -41,12 +41,14 @@ main ()
>    check_vect ();
>    if (foo (a) != 64)
>      abort ();
> +#pragma GCC novector
>    for (i = 0; i < 64; ++i)
>      if (a[i] != i)
>        abort ();
>      else
>        a[i] = -8;
>    bar (a);
> +#pragma GCC novector
>    for (i = 0; i < 64; ++i)
>      if (a[i] != i + 1)
>        abort ();
> @@ -54,6 +56,7 @@ main ()
>        a[i] = -8;
>    if (baz (a) != 64)
>      abort ();
> +#pragma GCC novector
>    for (i = 0; i < 64; ++i)
>      if (a[i] != i + 2)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-17.c b/gcc/testsuite/gcc.dg/vect/vect-simd-17.c
> index 951ba3afd9e332d7cd22addd273adf733e0fb71a..79b3602a6c08969a84856bf98ba59c18b45d5b11 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-17.c
> @@ -52,12 +52,14 @@ doit (void)
>    if (i != 11 || j != 101 || x != 10340 || niters != 550 || err)
>      abort ();
>    for (i = 1; i <= 10; i++)
> +#pragma GCC novector
>      for (j = 1; j <= 10 * i; j++)
>        if (k[i][j] == 3)
>  	k[i][j] = 0;
>        else
>  	abort ();
>    for (i = 0; i < 11; i++)
> +#pragma GCC novector
>      for (j = 0; j < 101; j++)
>        if (k[i][j] != 0)
>  	abort ();
> @@ -101,12 +103,14 @@ doit (void)
>    if (i != 10 || j != 90 || x != 9305 || niters != 450 || err)
>      abort ();
>    for (i = 0; i < 10; i++)
> +#pragma GCC novector
>      for (j = 0; j < 10 * i; j++)
>        if (k[i][j] == 3)
>  	k[i][j] = 0;
>        else
>  	abort ();
>    for (i = 0; i < 11; i++)
> +#pragma GCC novector
>      for (j = 0; j < 101; j++)
>        if (k[i][j] != 0)
>  	abort ();
> @@ -156,6 +160,7 @@ doit (void)
>        else
>  	abort ();
>    for (i = 0; i < 11; i++)
> +#pragma GCC novector
>      for (j = 0; j < 101; j++)
>        if (k[i][j] != 0)
>  	abort ();
> @@ -199,12 +204,14 @@ doit (void)
>    if (i != 11 || j != 10 || x != 9225 || niters != 25 || err)
>      abort ();
>    for (i = 1; i < 10; i += 2)
> +#pragma GCC novector
>      for (j = 1; j < i + 1; j++)
>        if (k[i][j] == 3)
>  	k[i][j] = 0;
>        else
>  	abort ();
>    for (i = 0; i < 11; i++)
> +#pragma GCC novector
>      for (j = 0; j < 101; j++)
>        if (k[i][j] != 0)
>  	abort ();
> @@ -244,11 +251,13 @@ doit (void)
>        }
>    if (i != 16 || j != 4 || x != 5109 || niters != 3 || err)
>      abort ();
> +#pragma GCC novector
>    for (j = -11; j >= -41; j -= 15)
>      if (k[0][-j] == 3)
>        k[0][-j] = 0;
>      else
>        abort ();
> +#pragma GCC novector
>    for (j = -11; j >= -41; j--)
>      if (k[0][-j] != 0)
>        abort ();
> @@ -288,6 +297,7 @@ doit (void)
>        }
>    if (/*i != 11 || j != 2 || */x != -12295 || niters != 28 || err)
>      abort ();
> +#pragma GCC novector
>    for (j = -34; j <= -7; j++)
>      if (k[0][-j] == 3)
>        k[0][-j] = 0;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-18.c b/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
> index cca350f5c21125fa4380611a1ba42be317fd9d85..e454abe88009a7572cfad1397bbd5770c7086a6b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
> @@ -25,12 +25,14 @@ main ()
>    int i, r;
>    check_vect ();
>    r = foo (78, p);
> +#pragma GCC novector
>    for (i = 0; i < 10000 / 78; i++)
>      if (p[i] != 78 * i)
>        abort ();
>    if (r != (10000 / 78) * (10000 / 78 + 1) / 2 * 78 * 3)
>      abort ();
>    r = foo (87, p);
> +#pragma GCC novector
>    for (i = 0; i < 10000 / 87; i++)
>      if (p[i] != 87 * i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-19.c b/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
> index 67e25c0e07eeff8e3453a8a3b5e4df54b16f3f30..4d25b43f5dca9df6562a146e12e1c3542d094602 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
> @@ -25,12 +25,14 @@ main ()
>    int i, r;
>    check_vect ();
>    r = foo (78, 0, 10000, p);
> +#pragma GCC novector
>    for (i = 0; i < 10000 / 78; i++)
>      if (p[i] != 78 * i)
>        abort ();
>    if (r != (10000 / 78) * (10000 / 78 + 1) / 2 * 78 * 3)
>      abort ();
>    r = foo (87, 0, 10000, p);
> +#pragma GCC novector
>    for (i = 0; i < 10000 / 87; i++)
>      if (p[i] != 87 * i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-20.c b/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
> index 57217c8a6ba4c15095f777cfa64aee9ffbe3e459..9ba7c3ce956a613e175ee6bd1f04b0531e6a79bd 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
> @@ -27,6 +27,7 @@ main ()
>    check_vect ();
>    r = foo (78, 0, 10000, p);
>    for (j = 0; j < 7; j++)
> +#pragma GCC novector
>      for (i = 0; i < 10000 / 78; i++)
>        if (p[j * (10000 / 78 + 1) + i] != 78 * i)
>  	abort ();
> @@ -34,6 +35,7 @@ main ()
>      abort ();
>    r = foo (87, 0, 10000, p);
>    for (j = 0; j < 7; j++)
> +#pragma GCC novector
>      for (i = 0; i < 10000 / 87; i++)
>        if (p[j * (10000 / 87 + 1) + i] != 87 * i)
>  	abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-8.c b/gcc/testsuite/gcc.dg/vect/vect-simd-8.c
> index 5d10ad90501835bf6cac2c2d81ee98bc6ce6db5b..a3c2decee2e36949950ca87a0a9942bc303ee633 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-8.c
> @@ -77,6 +77,7 @@ main ()
>    foo (a, b);
>    if (r != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += i;
> @@ -88,6 +89,7 @@ main ()
>    if (bar () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += 2 * i;
> @@ -101,6 +103,7 @@ main ()
>    if (r != 1024 * 1023 / 2)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += i;
> @@ -112,6 +115,7 @@ main ()
>    if (qux () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += 2 * i;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-9.c b/gcc/testsuite/gcc.dg/vect/vect-simd-9.c
> index 52eb24f680f1362ee93b7a22de5fd46d37119216..b652759e5ad5ec723a644cf9c6cb31677d120e2d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-9.c
> @@ -79,6 +79,7 @@ main ()
>    foo (a, b);
>    if (r != 1024 * 1023 / 2)
>      abort ();
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += i;
> @@ -90,6 +91,7 @@ main ()
>    if (bar () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += 2 * i;
> @@ -103,6 +105,7 @@ main ()
>    if (r != 1024 * 1023 / 2)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += i;
> @@ -114,6 +117,7 @@ main ()
>    if (qux () != 1024 * 1023)
>      abort ();
>    s = 0;
> +#pragma GCC novector
>    for (int i = 0; i < 1024; ++i)
>      {
>        s += 2 * i;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
> index cd65fc343f1893accb6f25a6222a22f64a8b4b2e..c44bfe511a5743198a647247c691075951f2258d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
> @@ -46,10 +46,12 @@ main ()
>    int i;
>    check_vect ();
>    bar ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (array[i] != (i < 30 ? 5 : i * 4 + 123))
>        abort ();
>    baz ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (array[i] != (i < 30 ? 5 : i * 8 + 123))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c
> index 03acd375e089c3a430adbed8d71197f39d7c512b..ed63ff59cc05e5f0a240376c4ca0985213a7eb48 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c
> @@ -65,6 +65,7 @@ main ()
>    check_vect ();
>    fn3 ();
>    fn1 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
>  	|| b[i] != 17 + (i % 37)
> @@ -72,6 +73,7 @@ main ()
>        abort ();
>    fn3 ();
>    fn2 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
>  	|| b[i] != 17 + (i % 37)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c
> index 29acde22f1783e8b11376d1ae2e702e09182350c..4974e5cc0ccdc5e01bf7a61a022bae9c2a6a048b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-11.c
> @@ -44,19 +44,23 @@ main ()
>    if (sizeof (int) * __CHAR_BIT__ < 32)
>      return 0;
>    bar (a + 7);
> +#pragma GCC novector
>    for (i = 0; i < N / 2; i++)
>      if (a[i + 7] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
>        abort ();
>    bar (a);
> +#pragma GCC novector
>    for (i = 0; i < N / 2; i++)
>      if (a[i] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
>        abort ();
>  #if 0
>    baz (a + 7);
> +#pragma GCC novector
>    for (i = 0; i < N / 2; i++)
>      if (a[i + 7] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
>        abort ();
>    baz (a);
> +#pragma GCC novector
>    for (i = 0; i < N / 2; i++)
>      if (a[i] != (i ^ (i * 3 * 512) ^ (((i * 6) + 2) * 512 * 512)))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c
> index 675ac7026b67edda2e573367643eb68063559bc2..866f1000f34098fb578001395f4a35e29cc8c0af 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-15.c
> @@ -32,6 +32,7 @@ main ()
>    int i;
>    check_vect ();
>    bar ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (array[i] != ((i >> 1) + (-3 * i)))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
> index ffcbf9380d609d7a3ed7420a38df5c11f632b46a..feab989cfd595f9fdb839aa8bd3e8486751abf2f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c
> @@ -44,6 +44,7 @@ main ()
>    check_vect ();
>    baz ();
>    bar ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (array[i] != 5 * (i & 7) * i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
> index 18d68779cc5dd8faec77a71a8f1cfa9785ff36ed..fef48c5066918a42fa80f1e14f9800e28ddb2c96 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c
> @@ -37,6 +37,7 @@ main ()
>    int i;
>    check_vect ();
>    bar ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (d[i] != (i < 30 ? 5 : i * 4 + 123) || e[i] != i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
> index e9af0b83162e5bbd40e6a54df7d656ad956a8fd8..42414671c254ffcd93169849d7a982861aa5ac0b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
> @@ -40,6 +40,7 @@ main ()
>    int i;
>    check_vect ();
>    bar ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (d[i] != (i < 30 ? 5.0f : i * 4 + 123.0f) || e[i] || f[i] != 1)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
> index 46da496524d99ff70e3673682040c0d5067afe03..620cec36e4c023e1f52160327a3d5ba21540ad3b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
> @@ -35,6 +35,7 @@ main ()
>    int i;
>    check_vect ();
>    bar ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (d[i] != i * 4 + 123 || e[i] != i)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c
> index 6143a91eaf078d5b73e608bcfa080b70a5896f3d..440091d70e83be80574a6fcf9e034c53aed15786 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c
> @@ -57,6 +57,7 @@ main ()
>    check_vect ();
>    baz ();
>    bar (0);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != 2 * i || b[i] != 6 - 7 * i
>  	|| c[i] != 6 - 5 * i + ((i & 31) << 4))
> @@ -64,6 +65,7 @@ main ()
>      else
>        a[i] = c[i];
>    bar (17);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != 6 - 5 * i + ((i & 31) << 4)
>  	|| b[i] != 6 - 7 * i
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c
> index a0316e9e5813ac4c9076aaf5f762b9cc5dc98b1e..62246e28837272ef1e18860912643422f6dce018 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c
> @@ -57,6 +57,7 @@ main ()
>    check_vect ();
>    baz ();
>    bar (0);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != 2 * i || b[i] != 6 - 7 * i
>  	|| c[i] != 6 - 5 * i + ((i & 31) << 4))
> @@ -64,6 +65,7 @@ main ()
>      else
>        a[i] = c[i];
>    bar (17);
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != 6 - 5 * i + ((i & 31) << 4)
>  	|| b[i] != 6 - 7 * i
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
> index f414285a170c7e3469fdad07256ef09e1b46e17b..11ea2132689137cfb7175b176e39539b9197a330 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
> @@ -76,6 +76,7 @@ main ()
>    check_vect ();
>    fn3 ();
>    fn1 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
>  	|| b[i] != 17 + (i % 37)
> @@ -83,6 +84,7 @@ main ()
>        abort ();
>    fn3 ();
>    fn2 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
>  	|| b[i] != 17 + (i % 37)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c
> index a968b9ce91a17c454f66aa76ec8b094e011e1c74..0112e553f8f130b06ee23a8c269a78d7764dcfff 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c
> @@ -76,6 +76,7 @@ main ()
>    check_vect ();
>    fn3 ();
>    fn1 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
>  	|| b[i] != 17 + (i % 37)
> @@ -83,6 +84,7 @@ main ()
>        abort ();
>    fn3 ();
>    fn2 ();
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
>  	|| b[i] != 17 + (i % 37)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c
> index da47a824cb6046dcd9808bd7bd80161dbc0531b5..1531553651ceb6185ce16ab49f447496ad923408 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c
> @@ -46,6 +46,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].b != arr[i].b - arr[i].a
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c
> index d53b7669a6b50d6bc27e646d08af98ca6fd093e3..b8d094723f9035083a244cfcee98d3de46512206 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c
> @@ -33,6 +33,7 @@ main1 ()
>      }
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != arr[i].b - arr[i].a
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
> index 37ff3abe97d60d9b968addaee7812cb0b05b6f44..0f1344c42017fc2a5bfda3a9c17d46fbdd523127 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
> @@ -44,6 +44,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c
> index 9237a9074deeb72c4d724771d5397d36593ced7c..b0d36486714159c88419ce9e793c27a398ddcbcb 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c
> @@ -39,6 +39,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].b != arr[i].b - arr[i].a
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c
> index 62a8a51e2034b1065a4438a712a80e0a7c149985..1c9906fa65237a7b9e0bbd2162e9c56b6e86074f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c
> @@ -39,6 +39,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i] != arr[i]
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
> index f64a1347350a465b9e7a0c123fe2b5bcbc2bf860..dc9ad168c7161c15f6de4a57d53e301e6754e525 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
> @@ -33,6 +33,7 @@ main1 ()
>      }
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != arr[i].a
> @@ -49,6 +50,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != arr[i].b)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c
> index 2add5b489915cffda25f3c59b41bd1c44edf16ce..d35e427996f472ce9fffdf9570fb6685c3115037 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2-big-array.c
> @@ -62,6 +62,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != check_res[i].a
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
> index 2b7a1a4bb77f4dce44958c50864a0a6ecac90c53..a9524a9d8e5cb152ec879db68f316d5568161ec1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
> @@ -51,6 +51,7 @@ main1 ()
>      } 
>     
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (res[i].c != arr[i].a
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c
> index e487de8b4e7d8e092054a73b337a345ba00e4e02..95ff41930d3f1ab95f0a20947e0527f39c78e715 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c
> @@ -71,6 +71,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != check_res[i].a
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
> index 0f3347e8bb2200f48927b21938e7ebd348a73ada..b2dd1aee116d212bda7df0b0b1ca5470bd35ab83 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
> @@ -56,6 +56,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-float.c b/gcc/testsuite/gcc.dg/vect/vect-strided-float.c
> index 6d6bfae7bc5ce4cbcaeaadc07856773e6d77bdb4..716cce3eecbec0390f85f393e9cc714bd1a1faae 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-float.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-float.c
> @@ -22,6 +22,7 @@ main1 (void)
>      }
>  
>    /* Check results.  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (a[i*2] != b[2*i+1] * c[2*i+1] - b[2*i] * c[2*i]
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c b/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c
> index 82727e595c166a52c8a1060339259ec7c39b594f..59008499192388c618f3eb38d91d9dcb5e47e3d9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c
> @@ -35,6 +35,7 @@ main1 (s *arr, ii *iarr)
>      }
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].b != arr[i].b - arr[i].a 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c
> index 0fac615011601d45c64e83be1a6ec1e1af407192..350223fa23ace9253e8e56bbbbd065e575639b19 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-mult.c
> @@ -35,6 +35,7 @@ main1 (s *arr, ii *iarr)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].b != arr[i].b - arr[i].a
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c b/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c
> index 8c560480bc4eac50c381ed51cfbc6ccc696d0424..e988c5c846911a875a188cbb6ec8a4e4b80b787a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c
> @@ -35,6 +35,7 @@ main1 (s * __restrict__  pIn, s* __restrict__ pOut)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (q->a != p->a + 5
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> index dcae9c26d8621dd393f00d81257262a27913d7a8..37b8eb80ce0ce0dfe1ce5f9e5c13618bffbe41ff 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> @@ -24,6 +24,7 @@ main (int argc, char **argv)
>      }
>    loop ();
>    __asm__ volatile ("" : : : "memory");
> +#pragma GCC novector
>    for (int i = 0; i < N; i++)
>      {
>        if (out[i] != i*2 + 7)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c
> index 6be939eea167992aade397ada0ee50d4daa43066..a55cd32e5896be4c1592e4e815baccede0f30e82 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c
> @@ -38,6 +38,7 @@ main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != a[i] + 3
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c
> index 9d1ad579e6a607f34ec953395f741f180474a77a..170f23472b967cedec88c1fa82dfb898014a6d09 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c
> @@ -34,6 +34,7 @@ main1 (s *arr)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].c != a[i]
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c
> index a081d4e396e36a4633eb224d927543c7379d3108..11c2f2c4df60d8238830c188c3400a324444ab4d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c
> @@ -22,6 +22,7 @@ main1 (void)
>      }
>  
>    /* Check results.  */
> +#pragma GCC novector
>    for (i = 0; i < N/2; i++)
>      {
>        if (a[i*2] != b[i] + c[i]
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-store.c b/gcc/testsuite/gcc.dg/vect/vect-strided-store.c
> index e8303b63bd4812e0643dc96888eeee2ea8ca082a..dfdafe8e8b46ea33e3c9ed759687788784a22607 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-store.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-store.c
> @@ -19,12 +19,14 @@ int main()
>    float src[] = {1, 2, 3, 4, 5, 6, 7, 8};
>    float dest[64];
>    check_vect ();
> +#pragma GCC novector
>    for (stride = 0; stride < 8; stride++)
>      {
>        sumit (dest, src, src, stride, 8);
>        if (!stride && dest[0] != 16)
>  	abort();
>        else if (stride)
> +#pragma GCC novector
>  	for (i = 0; i < 8; i++)
>  	  if (2*src[i] != dest[i*stride])
>  	    abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c
> index 7d264f39c60d668927232a75fe3843dbee087aa5..004db4e1f84735d8857c5591453158c96f213246 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c
> @@ -25,6 +25,7 @@ main1 (s *arr)
>      }
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != arr[i].b - arr[i].a
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c
> index 63a4da797cbeb70bde0b1329fe39f510c24a990c..5d94e8f49bc41431df9de2b809c65e48cc269fa0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c
> @@ -18,6 +18,7 @@ check1 (s *res)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (res[i].a != C (i)
>  	|| res[i].b != A (i)
> @@ -30,6 +31,7 @@ check2 (unsigned short *res)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (res[i] != (unsigned short) (A (i) + B (i) + C (i)))
>        abort ();
> @@ -40,6 +42,7 @@ check3 (s *res)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (res[i].a != i
>  	|| res[i].b != i
> @@ -52,6 +55,7 @@ check4 (unsigned short *res)
>  {
>    int i;
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (res[i] != (unsigned short) (A (i) + B (i)))
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c
> index ee8ea0d666db4b7671cd3f788fc7f6056189f3da..547ad9b9ee3d35802d3f8d7b9c43d578fb14f828 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c
> @@ -34,6 +34,7 @@ main1 (s *arr)
>      }
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c
> index fe41dbd9cf452b9452084e988d48ede232f548bf..8f58e24c4a8b8be2da0a6c136924a370b9952691 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c
> @@ -29,6 +29,7 @@ main1 (s *arr)
>      }
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c
> index a88c0f08456cf278c4fa5a5b9b0a06900cb7c9be..edb13d1b26f5963113917e8882f199c7dd4d8de7 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c
> @@ -37,6 +37,7 @@ main1 (s *arr)
>      } 
>     
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c
> index cddd0c88f42a99f526362ca117e9386c013c768d..0c2bd9d8cbde5e789474595db519d603b374e74c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c
> @@ -29,6 +29,7 @@ main1 (unsigned short *arr, ii *iarr)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i] != arr[i]
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
> index ab841205e4f5b3c0aea29f60045934e84644a6a7..fd7920031dcf6df98114cfde9a56037d655bb74d 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
> @@ -25,6 +25,7 @@ main1 (s *arr)
>      }
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != arr[i].b
> @@ -41,6 +42,7 @@ main1 (s *arr)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != arr[i].b)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c
> index 0afd50db0b8de7758faf7f2bff14247a27a7ee38..ae2345a9787804af0edc45d93f18e75d159326b0 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c
> @@ -24,6 +24,7 @@ main1 (s *arr)
>        ptr++;
>      }
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != arr[i].b - arr[i].a
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c
> index ef532251465e5b1eb16e820fc30844a7995b82a9..c7a1da534baea886fe14add1220c105153d6bb80 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2-big-array.c
> @@ -39,6 +39,7 @@ main1 (s *arr)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != check_res[i].a
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
> index 04f18fbb591d9dc50d56b20bce99cb79903e5e27..2a068d821aebee8ab646ff1b4c33209dc5b2fcbf 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
> @@ -37,6 +37,7 @@ main1 (s *arr)
>      } 
>     
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (res[i].c != arr[i].b
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c
> index b5eb87f4b96e1a577930654f4b1709024256e90e..ac7bf000196b3671044de57d88dd3a32080b68a8 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c
> @@ -41,6 +41,7 @@ main1 (s *arr)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != check_res[i].a
> @@ -64,6 +65,7 @@ main1 (s *arr)
>      }
>  
>    /* Check results.  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != arr[i].b
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c
> index 69b970ef33b9dd8834b10baf7085b88a0c441a46..0a6050ae462332b8d74043fce094776892a80386 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c
> @@ -53,6 +53,7 @@ main1 (s *arr, int n)
>      } 
>     
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      { 
>        if (res[i].c != arr[i].b + arr[i].c
> @@ -67,6 +68,7 @@ main1 (s *arr, int n)
>     }
>  
>    /* Check also that we don't do more iterations than needed.  */
> +#pragma GCC novector
>    for (i = n; i < N; i++)
>      {
>        if (res[i].c == arr[i].b + arr[i].c
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
> index f1d05a5aaf9f6885b921c5ae3370d9c17795ff82..9ead5a776d0b1a69bec804615ffe7639f61f993f 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
> @@ -39,6 +39,7 @@ main1 (s *arr)
>      } 
>     
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (res[i].c != arr[i].b + arr[i].c
> @@ -62,6 +63,7 @@ main1 (s *arr)
>      }
>    
>    /* Check results.  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != arr[i].b 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c
> index b703e636b49f8c7995c4c463b38b585f79acbdf2..176c6a784bc73e0300e3114a74aba05dc8185cac 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c
> @@ -44,6 +44,7 @@ main1 (s *arr)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (res[i].a != check_res[i].a
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c
> index 764f10d0adaca01e664bb45dd4da59a0c3f8a2af..cef88f6bf8246a98933ff84103c090664398cedd 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c
> @@ -42,6 +42,7 @@ main1 (s *arr)
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c
> index 35bab79ce826ac663eabb1a1036ed7afd6d33e8b..c29c3ff6cdc304e5447f0e12aac00cd0fcd7b61e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c
> @@ -44,6 +44,7 @@ main1 (s *arr)
>      } 
>     
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      { 
>        if (res[i].c != arr[i].b - arr[i].a + arr[i].d - arr[i].c
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c
> index ea35835465c8ed18be1a0c9c4f226f078a51acaa..2d5c10a878c7145972aeaa678e0e11c1cf1b79dd 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-01.c
> @@ -27,6 +27,7 @@ main (void)
>    foo (X, Y);
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (Y[i] != result[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c
> index df6b999c5a4d88c8b106829f6f9df8edbe00f86f..4848215a7a8f5fea569c0bfaf5909ac68a81bbf2 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-02.c
> @@ -32,6 +32,7 @@ main (void)
>    foo (X, Y, Z);
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (Y[i] != resultY[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c
> index 36861a059e03b1103adc2dca32409878ca95611e..2a94c73907e813019fcfbc912a1599f7423e2a47 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-03.c
> @@ -40,6 +40,7 @@ main (void)
>    foo (X, Y);
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (Y[i].a != result[i].a)  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c
> index bfbb48b21ee632243f2f5ba63d7eeec0f687daef..b0e9d6f90391cfc05911f7cc709df199d7fbbdf1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-04.c
> @@ -26,6 +26,7 @@ main (void)
>    foo (X, &X[2]);
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N+2; i++)
>      {
>        if (X[i] != result[i])
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c b/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c
> index d775f320e0c1e2c6de2e77a1d8df621971fc3d2d..27d762490908829d54cdbb81247926c2f677fe36 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-vfa-slp.c
> @@ -40,6 +40,7 @@ main (void)
>    foo (X, Y);
>    
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (Y[i].a != result[i].a)
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c
> index 0d6e64081a17fed8d9b9239f9ba02ffa1b7a758d..f3abc9407f52784e391c495152e617b1f0753e92 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-1.c
> @@ -38,6 +38,7 @@ main (void)
>        asm volatile ("" ::: "memory");
>      }
>    f (a, b, c);
> +#pragma GCC novector
>    for (int i = 0; i < N; ++i)
>      if (a[i] != (SIGNEDNESS_1 short) ((BASE + i * 5)
>  				      * (BASE + OFFSET + i * 4)))
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
> index 4c95dd2017922904122aee2925491e9b9b48fe8e..dfbb2171c004565045d91605354b5d6e7219ab19 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c
> @@ -17,6 +17,7 @@ foo (int *__restrict a,
>    for (i = 0; i < n; i++)
>      a[i] = b[i] * 2333;
>  
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      if (a[i] != b[i] * 2333)
>        abort ();
> @@ -32,6 +33,7 @@ bar (int *__restrict a,
>    for (i = 0; i < n; i++)
>      a[i] = b[i] * (short) 2333;
>  
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      if (a[i] != b[i] * (short) 2333)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
> index 4075f815cea0ffbad1e05e0ac8b9b232bf3efe61..c2ad58f69e7fe5b62a9fbc55dd5dab43ba785104 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-u16.c
> @@ -17,6 +17,7 @@ foo (unsigned int *__restrict a,
>    for (i = 0; i < n; i++)
>      a[i] = b[i] * 2333;
>  
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      if (a[i] != b[i] * 2333)
>        abort ();
> @@ -32,6 +33,7 @@ bar (unsigned int *__restrict a,
>    for (i = 0; i < n; i++)
>      a[i] = (unsigned short) 2333 * b[i];
>  
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      if (a[i] != b[i] * (unsigned short) 2333)
>        abort ();
> @@ -47,6 +49,7 @@ baz (unsigned int *__restrict a,
>    for (i = 0; i < n; i++)
>      a[i] = b[i] * 233333333;
>  
> +#pragma GCC novector
>    for (i = 0; i < n; i++)
>      if (a[i] != b[i] * 233333333)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
> index c4ac88e186dbc1a8f36f4d7567a9983446557eea..bfdcbaa09fbd42a16197023b09087cee6642105a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half-u8.c
> @@ -43,12 +43,14 @@ int main (void)
>  
>    foo ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (out[i] != in[i] * COEF)
>        abort ();
>  
>    bar ();
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (out[i] != in[i] * COEF)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
> index ebbf4f5e841b75cb1f5171ddedec85cd327f385e..e46b0cc3135fd982b07e0824955654f0ebc59506 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-half.c
> @@ -38,6 +38,7 @@ int main (void)
>  
>    foo (COEF2);
>  
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (out[i] != in[i] * COEF || out2[i] != in[i] + COEF2)
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c
> index 91a8a290263c9630610a48bce3829de753a4b320..6b094868064e9b86c40018363564f356220125a5 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s16.c
> @@ -33,6 +33,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != X[i] * Y[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c
> index 7e1f7457f1096d4661dcc724a59a0511555ec0e3..444d41169b5c198c6fa146c3bb71336b0f6b0432 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-s8.c
> @@ -33,6 +33,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != X[i] * Y[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
> index 2e28baae0b804cf76ad74926c35126df98857482..14411ef43eda2ff348de9c9c1540e1359f20f55b 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u16.c
> @@ -35,6 +35,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != X[i] * Y[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
> index d277f0b2b9492db77237a489cc8bea4749d8d719..f40def5dddf58f6a6661d9c286b774f954126840 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-s16-s32.c
> @@ -35,6 +35,7 @@ int main (void)
>  
>    foo (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != X[i] * Y[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
> index f50358802587d32c1d6e73c0f6e06bd8ff837fc2..63866390835c55e53b6f90f305a71bbdbff85afa 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8-u32.c
> @@ -34,6 +34,7 @@ int main (void)
>  
>    foo (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != X[i] * Y[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c
> index 03d1379410eb927a3ef705afc6523230eb9fb58b..78ad74b5d499c23256e4ca38a82fefde8720e4e9 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-u8.c
> @@ -34,6 +34,7 @@ int main (void)
>  
>    foo1 (N);
>  
> +#pragma GCC novector
>    for (i=0; i<N; i++) {
>      if (result[i] != X[i] * Y[i])
>        abort ();
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c
> index 5f6c047849b8625f908bc7432b803dff5e671cd3..26d5310807781eb5a7935c51e813bc88892f747c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s16.c
> @@ -32,6 +32,7 @@ foo (short *src, int *dst)
>  
>    s = src;
>    d = dst;
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        b = *s++;
> @@ -60,6 +61,7 @@ foo (short *src, int *dst)
>  
>    s = src;
>    d = dst;
> +#pragma GCC novector
>    for (i = 0; i < N/4; i++)
>      {
>        b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c
> index 46512f2c69ba50521d6c7519a1c3d073e90b7436..7450d2aef75d755db558e471b807bfefb777f472 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-s8.c
> @@ -23,6 +23,7 @@ foo (char *src, int *dst)
>  
>    s = src;
>    d = dst;
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c
> index 212b5dbea18a91bd59d2caf9dc4f4cc3fe531762..ae086b88e7e83f2864d6e74fa94301f7f8ab62f6 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u16.c
> @@ -23,6 +23,7 @@ foo (unsigned short *src, unsigned int *dst)
>  
>    s = src;
>    d = dst;
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c
> index 844e5df3269d0a774d2ab8a88de11f17271d6f60..a8e536adee0f04611115e97725608d0e82e9893c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-shift-u8.c
> @@ -27,6 +27,7 @@ foo (unsigned char *src, unsigned int *dst1, unsigned int *dst2)
>    s = src;
>    d1 = dst1;
>    d2 = dst2;
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        b = *s++;
> diff --git a/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c b/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c
> index c4d2de1a64e2ebc151c4ade2327c8fceb7ba04e4..414bd9d3e1279db574d860b7a721e4310d4972da 100644
> --- a/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c
> @@ -19,6 +19,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (sb[i] != 5)
> @@ -31,6 +32,7 @@ int main1 ()
>      }
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
>        if (sa[i] != 105)
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
@ 2023-07-04 11:52   ` Richard Biener
  2023-07-04 14:57     ` Jan Hubicka
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-07-04 11:52 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw, Jan Hubicka

On Wed, 28 Jun 2023, Tamar Christina wrote:

> Hi All,
> 
> There's an existing bug in loop frequency scaling where the if statement checks
> to see if there's a single exit, and records an dump file note but then
> continues.
> 
> It then tries to access the null pointer, which of course fails.
> 
> For multiple loop exists it's not really clear how to scale the exit
> probablities as it's really unknown which exit is most probably.
> 
> For that reason I ignore the exit edges during scaling but still adjust the
> loop body.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

I can't really make sense of

      /* If latch exists, change its count, since we changed
         probability of exit.  Theoretically we should update everything 
from
         source of exit edge to latch, but for vectorizer this is enough.  
*/
      if (loop->latch && loop->latch != e->src)
        loop->latch->count += count_delta;

since with simple latches the latch itself is an empty forwarder and
e->src is the block with the conditional eventually exiting the block.
That means this condition is always true.

So I think for exits the idea is to "remove" them by redirecting
them "count-wise" back into the loop.  So turn

   if (cond) --(exit-count)-- exit
     |
     | in-loop-count
     |
   latch

into

   [cond-blk-count]
   if (cond) -- (zero count) -- exit
     |
     | in-loop-cound + exit-count (== cond-blk-count)
     |
   latch (now with cond-blk-count)

and the comment correctly suggests all blocks following from here
would need similar adjustment (and on in-loop branches the delta would be
distributed according to probabilities).

Given the code is quite imperfect I would suggest to change the
updating of the latch block count to read

  profile_count count_delta = profile_count::zero ();
  if (loop->latch
      && single_pred_p (loop->latch)
      && loop_exits_from_bb_p (single_pred (loop->latch)))
    {
      count_delta = single_pred (loop->latch)->count - loop->latch->count;
      loop->latch->count = single_pred (loop->latch)->count;
    }

   scale_loop_frequencies (loop, p);

  if (count_delta != 0)
    loop->latch->count -= count_delta;

which should exactly preserve the exit-before-latch behavior independent
on the number of exits of the loop.

Please leave Honza a chance to comment here.

Thanks,
Richard.


> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* cfgloopmanip.cc (scale_loop_frequencies): Fix typo.
> 	(scale_loop_profile): Don't access null pointer.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
> index 6e09dcbb0b1864bc64ffd570a4b923f50c3819b5..b10ef3d2be82902ccd74e52a4318217b2db13bcb 100644
> --- a/gcc/cfgloopmanip.cc
> +++ b/gcc/cfgloopmanip.cc
> @@ -501,7 +501,7 @@ scale_loop_frequencies (class loop *loop, profile_probability p)
>  /* Scale profile in LOOP by P.
>     If ITERATION_BOUND is non-zero, scale even further if loop is predicted
>     to iterate too many times.
> -   Before caling this function, preheader block profile should be already
> +   Before calling this function, preheader block profile should be already
>     scaled to final count.  This is necessary because loop iterations are
>     determined by comparing header edge count to latch ege count and thus
>     they need to be scaled synchronously.  */
> @@ -597,14 +597,14 @@ scale_loop_profile (class loop *loop, profile_probability p,
>        /* If latch exists, change its count, since we changed
>  	 probability of exit.  Theoretically we should update everything from
>  	 source of exit edge to latch, but for vectorizer this is enough.  */
> -      if (loop->latch && loop->latch != e->src)
> +      if (e && loop->latch && loop->latch != e->src)
>  	loop->latch->count += count_delta;
>  
>        /* Scale the probabilities.  */
>        scale_loop_frequencies (loop, p);
>  
>        /* Change latch's count back.  */
> -      if (loop->latch && loop->latch != e->src)
> +      if (e && loop->latch && loop->latch != e->src)
>  	loop->latch->count -= count_delta;
>  
>        if (dump_file && (dump_flags & TDF_DETAILS))
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds
  2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
@ 2023-07-04 12:05   ` Richard Biener
  2023-07-10 15:32     ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-07-04 12:05 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 28 Jun 2023, Tamar Christina wrote:

> Hi All,
> 
> The bitfield vectorization support does not currently recognize bitfields inside
> gconds. This means they can't be used as conditions for early break
> vectorization which is a functionality we require.
> 
> This adds support for them by explicitly matching and handling gcond as a
> source.
> 
> Testcases are added in the testsuite update patch as the only way to get there
> is with the early break vectorization.   See tests:
> 
>   - vect-early-break_20.c
>   - vect-early-break_21.c
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE
> 	from original statement.
> 	(vect_recog_bitfield_ref_pattern): Support bitfields in gcond.
> 
> Co-Authored-By:  Andre Vieira <andre.simoesdiasvieira@arm.com>
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 60bc9be6819af9bd28a81430869417965ba9d82d..c221b1d64449ce3b6c8864bbec4b17ddf938c2d6 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
>    STMT_VINFO_DEF_TYPE (pattern_stmt_info)
>      = STMT_VINFO_DEF_TYPE (orig_stmt_info);
> +  STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info);
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> @@ -2488,27 +2489,37 @@ static gimple *

there's a comment above this mentioning what we look for - please
update it with the gcond case.

>  vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
>  				 tree *type_out)
>  {
> -  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
> +  gassign *conv_stmt = dyn_cast <gassign *> (stmt_info->stmt);
> +  gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
>  
> -  if (!first_stmt)
> -    return NULL;
> -
> -  gassign *bf_stmt;
> -  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
> -      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
> +  gimple *bf_stmt = NULL;
> +  tree cond_cst = NULL_TREE;
> +  if (cond_stmt)

please make that

     if (gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt))

>      {
> -      gimple *second_stmt
> -	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
> -      bf_stmt = dyn_cast <gassign *> (second_stmt);
> -      if (!bf_stmt
> -	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
> +      tree op = gimple_cond_lhs (cond_stmt);
> +      if (TREE_CODE (op) != SSA_NAME)
> +	return NULL;
> +      bf_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (op));
> +      cond_cst = gimple_cond_rhs (cond_stmt);
> +      if (TREE_CODE (cond_cst) != INTEGER_CST)
>  	return NULL;
>      }
> -  else
> +  else if (conv_stmt

similar

> +	   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (conv_stmt))
> +	   && TREE_CODE (gimple_assign_rhs1 (conv_stmt)) == SSA_NAME)
> +    {
> +      gimple *second_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (conv_stmt));
> +      bf_stmt = dyn_cast <gassign *> (second_stmt);
> +    }
> +
> +  if (!bf_stmt
> +      || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
>      return NULL;
>  
>    tree bf_ref = gimple_assign_rhs1 (bf_stmt);
>    tree container = TREE_OPERAND (bf_ref, 0);
> +  tree ret_type = cond_cst ? TREE_TYPE (container)
> +			   : TREE_TYPE (gimple_assign_lhs (conv_stmt));
>  
>    if (!bit_field_offset (bf_ref).is_constant ()
>        || !bit_field_size (bf_ref).is_constant ()
> @@ -2522,8 +2533,6 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
>  
>    gimple *use_stmt, *pattern_stmt;
>    use_operand_p use_p;
> -  tree ret = gimple_assign_lhs (first_stmt);
> -  tree ret_type = TREE_TYPE (ret);
>    bool shift_first = true;
>    tree container_type = TREE_TYPE (container);
>    tree vectype = get_vectype_for_scalar_type (vinfo, container_type);
> @@ -2560,7 +2569,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
>    /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
>       PLUS_EXPR then do the shift last as some targets can combine the shift and
>       add into a single instruction.  */
> -  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
> +  if (conv_stmt
> +      && single_imm_use (gimple_assign_lhs (conv_stmt), &use_p, &use_stmt))
>      {
>        if (gimple_code (use_stmt) == GIMPLE_ASSIGN
>  	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
> @@ -2620,7 +2630,21 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
>  			       NOP_EXPR, result);
>      }
>  
> -  *type_out = STMT_VINFO_VECTYPE (stmt_info);
> +  if (cond_cst)
> +    {
> +      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
> +      pattern_stmt
> +	= gimple_build_cond (gimple_cond_code (cond_stmt),
> +			     gimple_get_lhs (pattern_stmt),
> +			     fold_convert (ret_type, cond_cst),
> +			     gimple_cond_true_label (cond_stmt),
> +			     gimple_cond_false_label (cond_stmt));
> +      *type_out = STMT_VINFO_VECTYPE (stmt_info);

is there any vectype set for a gcond?

I must say the flow of the function is a bit convoluted now.  Is it
possible to factor out a helper so we can fully separate the
gassign vs. gcond handling in this function?

Thanks,
Richard.

> +    }
> +  else
> +    *type_out
> +      = get_vectype_for_scalar_type (vinfo,
> +				     TREE_TYPE (gimple_get_lhs (pattern_stmt)));
>    vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
>  
>    return pattern_stmt;
> 

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant.
  2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
@ 2023-07-04 12:10   ` Richard Biener
  2023-07-06 10:37     ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-07-04 12:10 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 28 Jun 2023, Tamar Christina wrote:

> Hi All,
> 
> expand_vector_piecewise does not support VLA expansion as it has a hard assert
> on the type not being VLA.
> 
> Instead of just failing to expand and so the call marked unsupported we ICE.
> This adjust it so we don't and can gracefully handle the expansion in support
> checks.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

Hmm, do we support _any_ VLA "generic" vectors?  That is, why do
we get here at all?  Doesn't that mean the vectorizer creates
code that vector lowering thinks is not supported by the target?

In any case I'd expect expand_vector_operations_1 at

  if (compute_type == NULL_TREE)
    compute_type = get_compute_type (code, op, type);
  if (compute_type == type)
    return;

 <----  here

  new_rhs = expand_vector_operation (gsi, type, compute_type, stmt, code,
                                     dce_ssa_names);

to be able to assert that compute_type (or even type) isn't VLA?

So, why do we arrive here?

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-generic.cc (expand_vector_comparison): Skip piecewise if not
> 	constant.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
> index df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6ed0b2f4c3c222d58a8d 100644
> --- a/gcc/tree-vect-generic.cc
> +++ b/gcc/tree-vect-generic.cc
> @@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
>  	    }
>  	  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
>  	}
> -      else
> +      else if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
>  	t = expand_vector_piecewise (gsi, do_compare, type,
>  				     TREE_TYPE (TREE_TYPE (op0)), op0, op1,
>  				     code, false);
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-04 11:52   ` Richard Biener
@ 2023-07-04 14:57     ` Jan Hubicka
  2023-07-06 14:34       ` Jan Hubicka
  0 siblings, 1 reply; 200+ messages in thread
From: Jan Hubicka @ 2023-07-04 14:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tamar Christina, gcc-patches, nd, jlaw

> On Wed, 28 Jun 2023, Tamar Christina wrote:
> 
> > Hi All,
> > 
> > There's an existing bug in loop frequency scaling where the if statement checks
> > to see if there's a single exit, and records an dump file note but then
> > continues.
> > 
> > It then tries to access the null pointer, which of course fails.
> > 
> > For multiple loop exists it's not really clear how to scale the exit
> > probablities as it's really unknown which exit is most probably.
> > 
> > For that reason I ignore the exit edges during scaling but still adjust the
> > loop body.
> > 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > 
> > Ok for master?
> 
> I can't really make sense of
> 
>       /* If latch exists, change its count, since we changed
>          probability of exit.  Theoretically we should update everything 
> from
>          source of exit edge to latch, but for vectorizer this is enough.  
> */
>       if (loop->latch && loop->latch != e->src)
>         loop->latch->count += count_delta;
> 
> since with simple latches the latch itself is an empty forwarder and
> e->src is the block with the conditional eventually exiting the block.
> That means this condition is always true.
> 
> So I think for exits the idea is to "remove" them by redirecting
> them "count-wise" back into the loop.  So turn
> 
>    if (cond) --(exit-count)-- exit
>      |
>      | in-loop-count
>      |
>    latch
> 
> into
> 
>    [cond-blk-count]
>    if (cond) -- (zero count) -- exit
>      |
>      | in-loop-cound + exit-count (== cond-blk-count)
>      |
>    latch (now with cond-blk-count)

This is oposite situation.  You have loop predicted to iterate 10 times,
but you found it actually iterates at most twice.  So you want to
 1) scale down profile of every BB in the loop
    so header is 2*sum_of_counts_to_from_entry_edges
    instead of 10*
 2) reduce probability of loopback and instead increase probability of
    exit.

The code attemts to get right only case where loop has one exit 
and instead of
  if (cond) -- (original-wrong-exit-probability) -- exit
it does
  if (cond) -- (exit-probability=1/#iterations) -- exit
Now it should adjust in-loop-count for every path from source of exit to
latch edge.  It just assumes that there is one basic block that is latch
and does it there.

I was just looking into using this for profile update when loop-ch or
complete unrolling proves that loop is iterating fewer times then
profile.  I can cleanup the funtion - it was originall written for the
old reperesentation of probabilities and cound and I did not do very
good job on updating it to new code.

Honza
> 
> and the comment correctly suggests all blocks following from here
> would need similar adjustment (and on in-loop branches the delta would be
> distributed according to probabilities).
> 
> Given the code is quite imperfect I would suggest to change the
> updating of the latch block count to read
> 
>   profile_count count_delta = profile_count::zero ();
>   if (loop->latch
>       && single_pred_p (loop->latch)
>       && loop_exits_from_bb_p (single_pred (loop->latch)))
>     {
>       count_delta = single_pred (loop->latch)->count - loop->latch->count;
>       loop->latch->count = single_pred (loop->latch)->count;
>     }
> 
>    scale_loop_frequencies (loop, p);
> 
>   if (count_delta != 0)
>     loop->latch->count -= count_delta;
> 
> which should exactly preserve the exit-before-latch behavior independent
> on the number of exits of the loop.
> 
> Please leave Honza a chance to comment here.
> 
> Thanks,
> Richard.
> 
> 
> > Thanks,
> > Tamar
> > 
> > gcc/ChangeLog:
> > 
> > 	* cfgloopmanip.cc (scale_loop_frequencies): Fix typo.
> > 	(scale_loop_profile): Don't access null pointer.
> > 
> > --- inline copy of patch -- 
> > diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
> > index 6e09dcbb0b1864bc64ffd570a4b923f50c3819b5..b10ef3d2be82902ccd74e52a4318217b2db13bcb 100644
> > --- a/gcc/cfgloopmanip.cc
> > +++ b/gcc/cfgloopmanip.cc
> > @@ -501,7 +501,7 @@ scale_loop_frequencies (class loop *loop, profile_probability p)
> >  /* Scale profile in LOOP by P.
> >     If ITERATION_BOUND is non-zero, scale even further if loop is predicted
> >     to iterate too many times.
> > -   Before caling this function, preheader block profile should be already
> > +   Before calling this function, preheader block profile should be already
> >     scaled to final count.  This is necessary because loop iterations are
> >     determined by comparing header edge count to latch ege count and thus
> >     they need to be scaled synchronously.  */
> > @@ -597,14 +597,14 @@ scale_loop_profile (class loop *loop, profile_probability p,
> >        /* If latch exists, change its count, since we changed
> >  	 probability of exit.  Theoretically we should update everything from
> >  	 source of exit edge to latch, but for vectorizer this is enough.  */
> > -      if (loop->latch && loop->latch != e->src)
> > +      if (e && loop->latch && loop->latch != e->src)
> >  	loop->latch->count += count_delta;
> >  
> >        /* Scale the probabilities.  */
> >        scale_loop_frequencies (loop, p);
> >  
> >        /* Change latch's count back.  */
> > -      if (loop->latch && loop->latch != e->src)
> > +      if (e && loop->latch && loop->latch != e->src)
> >  	loop->latch->count -= count_delta;
> >  
> >        if (dump_file && (dump_flags & TDF_DETAILS))
> > 
> > 
> > 
> > 
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant.
  2023-07-04 12:10   ` Richard Biener
@ 2023-07-06 10:37     ` Tamar Christina
  2023-07-06 10:51       ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-07-06 10:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> On Wed, 28 Jun 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > expand_vector_piecewise does not support VLA expansion as it has a
> > hard assert on the type not being VLA.
> >
> > Instead of just failing to expand and so the call marked unsupported we ICE.
> > This adjust it so we don't and can gracefully handle the expansion in
> > support checks.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Hmm, do we support _any_ VLA "generic" vectors?  That is, why do we get
> here at all?  Doesn't that mean the vectorizer creates code that vector lowering
> thinks is not supported by the target?
> 
> In any case I'd expect expand_vector_operations_1 at
> 
>   if (compute_type == NULL_TREE)
>     compute_type = get_compute_type (code, op, type);
>   if (compute_type == type)
>     return;
> 
>  <----  here
> 
>   new_rhs = expand_vector_operation (gsi, type, compute_type, stmt, code,
>                                      dce_ssa_names);
> 
> to be able to assert that compute_type (or even type) isn't VLA?
> 
> So, why do we arrive here?
> 

I think we used to arrive here because the patch last year didn't properly check the cmp,
I don't his it with this new patch so I'll drop it.  I thought it was an actual bug hence why I
submitted the patch 😊

Thanks,
Tamar
> Richard.
> 
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-generic.cc (expand_vector_comparison): Skip piecewise if
> not
> > 	constant.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index
> >
> df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6
> ed0b2f4
> > c3c222d58a8d 100644
> > --- a/gcc/tree-vect-generic.cc
> > +++ b/gcc/tree-vect-generic.cc
> > @@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator
> *gsi, tree type, tree op0,
> >  	    }
> >  	  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
> >  	}
> > -      else
> > +      else if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
> >  	t = expand_vector_piecewise (gsi, do_compare, type,
> >  				     TREE_TYPE (TREE_TYPE (op0)), op0, op1,
> >  				     code, false);
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> Boudien Moerman; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant.
  2023-07-06 10:37     ` Tamar Christina
@ 2023-07-06 10:51       ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-06 10:51 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Thu, 6 Jul 2023, Tamar Christina wrote:

> > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > expand_vector_piecewise does not support VLA expansion as it has a
> > > hard assert on the type not being VLA.
> > >
> > > Instead of just failing to expand and so the call marked unsupported we ICE.
> > > This adjust it so we don't and can gracefully handle the expansion in
> > > support checks.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > 
> > Hmm, do we support _any_ VLA "generic" vectors?  That is, why do we get
> > here at all?  Doesn't that mean the vectorizer creates code that vector lowering
> > thinks is not supported by the target?
> > 
> > In any case I'd expect expand_vector_operations_1 at
> > 
> >   if (compute_type == NULL_TREE)
> >     compute_type = get_compute_type (code, op, type);
> >   if (compute_type == type)
> >     return;
> > 
> >  <----  here
> > 
> >   new_rhs = expand_vector_operation (gsi, type, compute_type, stmt, code,
> >                                      dce_ssa_names);
> > 
> > to be able to assert that compute_type (or even type) isn't VLA?
> > 
> > So, why do we arrive here?
> > 
> 
> I think we used to arrive here because the patch last year didn't properly check the cmp,
> I don't his it with this new patch so I'll drop it.  I thought it was an actual bug hence why I
> submitted the patch ?

If it's a genuine bug then the fix at least looks wrong ;)

Anyway, dropping is fine with me of course.

Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-04 14:57     ` Jan Hubicka
@ 2023-07-06 14:34       ` Jan Hubicka
  2023-07-07  5:59         ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Jan Hubicka @ 2023-07-06 14:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tamar Christina, gcc-patches, nd, jlaw

Hi,
original scale_loop_profile was implemented to only handle very simple loops
produced by vectorizer at that time (basically loops with only one exit and no
subloops). It also has not been updated to new profile-count API very carefully.
Since I want to use it from loop peeling and unlooping, I need the
function to at least not get profile worse on general loops.

The function does two thigs
 1) scales down the loop profile by a given probability.
    This is useful, for example, to scale down profile after peeling when loop
    body is executed less often than before
 2) after scaling is done and if profile indicates too large iteration
    count update profile to cap iteration count by ITERATION_BOUND parameter.

Step 1 is easy and unchanged.

I changed ITERATION_BOUND to be actual bound on number of iterations as
used elsewhere (i.e. number of executions of latch edge) rather then
number of iterations + 1 as it was before.

To do 2) one needs to do the following
  a) scale own loop profile so frquency o header is at most
     the sum of in-edge counts * (iteration_bound + 1)
  b) update loop exit probabilities so their count is the same
     as before scaling.
  c) reduce frequencies of basic blocks after loop exit

old code did b) by setting probability to 1 / iteration_bound which is
correctly only of the basic block containing exit executes precisely one per
iteration (it is not insie other conditional or inner loop).  This is fixed
now by using set_edge_probability_and_rescale_others

aldo c) was implemented only for special case when the exit was just before
latch bacis block.  I now use dominance info to get right some of addional
case.

I still did not try to do anything for multiple exit loops, though the
implementatoin could be generalized.

Bootstrapped/regtested x86_64-linux.  Plan to cmmit it tonight if there
are no complains.

gcc/ChangeLog:

	* cfgloopmanip.cc (scale_loop_profile): Rewrite exit edge
	probability update to be safe on loops with subloops.
	Make bound parameter to be iteration bound.
	* tree-ssa-loop-ivcanon.cc (try_peel_loop): Update call
	of scale_loop_profile.
	* tree-vect-loop-manip.cc (vect_do_peeling): Likewise.

diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 6e09dcbb0b1..524b979a546 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -499,7 +499,7 @@ scale_loop_frequencies (class loop *loop, profile_probability p)
 }
 
 /* Scale profile in LOOP by P.
-   If ITERATION_BOUND is non-zero, scale even further if loop is predicted
+   If ITERATION_BOUND is not -1, scale even further if loop is predicted
    to iterate too many times.
    Before caling this function, preheader block profile should be already
    scaled to final count.  This is necessary because loop iterations are
@@ -510,106 +510,123 @@ void
 scale_loop_profile (class loop *loop, profile_probability p,
 		    gcov_type iteration_bound)
 {
-  edge e, preheader_e;
-  edge_iterator ei;
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
+  if (!(p == profile_probability::always ()))
     {
-      fprintf (dump_file, ";; Scaling loop %i with scale ",
-	       loop->num);
-      p.dump (dump_file);
-      fprintf (dump_file, " bounding iterations to %i\n",
-	       (int)iteration_bound);
-    }
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, ";; Scaling loop %i with scale ",
+		   loop->num);
+	  p.dump (dump_file);
+	  fprintf (dump_file, "\n");
+	}
 
-  /* Scale the probabilities.  */
-  scale_loop_frequencies (loop, p);
+      /* Scale the probabilities.  */
+      scale_loop_frequencies (loop, p);
+    }
 
-  if (iteration_bound == 0)
+  if (iteration_bound == -1)
     return;
 
   gcov_type iterations = expected_loop_iterations_unbounded (loop, NULL, true);
+  if (iterations == -1)
+    return;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      fprintf (dump_file, ";; guessed iterations after scaling %i\n",
-	       (int)iterations);
+      fprintf (dump_file,
+	       ";; guessed iterations of loop %i:%i new upper bound %i:\n",
+	       loop->num,
+	       (int)iterations,
+	       (int)iteration_bound);
     }
 
   /* See if loop is predicted to iterate too many times.  */
   if (iterations <= iteration_bound)
     return;
 
-  preheader_e = loop_preheader_edge (loop);
-
-  /* We could handle also loops without preheaders, but bounding is
-     currently used only by optimizers that have preheaders constructed.  */
-  gcc_checking_assert (preheader_e);
-  profile_count count_in = preheader_e->count ();
+  /* Compute number of invocations of the loop.  */
+  profile_count count_in = profile_count::zero ();
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+    count_in += e->count ();
 
-  if (count_in > profile_count::zero ()
-      && loop->header->count.initialized_p ())
+  /* Now scale the loop body so header count is
+     count_in * (iteration_bound + 1)  */
+  profile_probability scale_prob
+    = (count_in *= iteration_bound).probability_in (loop->header->count);
+  if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      profile_count count_delta = profile_count::zero ();
-
-      e = single_exit (loop);
-      if (e)
-	{
-	  edge other_e;
-	  FOR_EACH_EDGE (other_e, ei, e->src->succs)
-	    if (!(other_e->flags & (EDGE_ABNORMAL | EDGE_FAKE))
-		&& e != other_e)
-	      break;
-
-	  /* Probability of exit must be 1/iterations.  */
-	  count_delta = e->count ();
-	  e->probability = profile_probability::always () / iteration_bound;
-	  other_e->probability = e->probability.invert ();
-
-	  /* In code below we only handle the following two updates.  */
-	  if (other_e->dest != loop->header
-	      && other_e->dest != loop->latch
-	      && (dump_file && (dump_flags & TDF_DETAILS)))
-	    {
-	      fprintf (dump_file, ";; giving up on update of paths from "
-		       "exit condition to latch\n");
-	    }
-	}
+      fprintf (dump_file, ";; Scaling loop %i with scale ",
+	       loop->num);
+      p.dump (dump_file);
+      fprintf (dump_file, " to reach upper bound %i\n",
+	       (int)iteration_bound);
+    }
+  /* Finally attempt to fix exit edge probability.  */
+  auto_vec<edge> exits = get_loop_exit_edges  (loop);
+  edge exit_edge = single_likely_exit (loop, exits);
+
+  /* In a consistent profile unadjusted_exit_count should be same as count_in,
+     however to preserve as much of the original info, avoid recomputing
+     it.  */
+  profile_count unadjusted_exit_count;
+  if (exit_edge)
+    unadjusted_exit_count = exit_edge->count ();
+  scale_loop_frequencies (loop, scale_prob);
+
+  if (exit_edge)
+    {
+      profile_count old_exit_count = exit_edge->count ();
+      profile_probability new_probability;
+      if (iteration_bound > 0)
+	new_probability
+	  = unadjusted_exit_count.probability_in (exit_edge->src->count);
       else
-        if (dump_file && (dump_flags & TDF_DETAILS))
-	  fprintf (dump_file, ";; Loop has multiple exit edges; "
-	      		      "giving up on exit condition update\n");
-
-      /* Roughly speaking we want to reduce the loop body profile by the
-	 difference of loop iterations.  We however can do better if
-	 we look at the actual profile, if it is available.  */
-      p = profile_probability::always ();
-
-      count_in *= iteration_bound;
-      p = count_in.probability_in (loop->header->count);
-      if (!(p > profile_probability::never ()))
-	p = profile_probability::very_unlikely ();
-
-      if (p == profile_probability::always ()
-	  || !p.initialized_p ())
-	return;
-
-      /* If latch exists, change its count, since we changed
-	 probability of exit.  Theoretically we should update everything from
-	 source of exit edge to latch, but for vectorizer this is enough.  */
-      if (loop->latch && loop->latch != e->src)
-	loop->latch->count += count_delta;
-
-      /* Scale the probabilities.  */
-      scale_loop_frequencies (loop, p);
+	new_probability = profile_probability::always ();
+      set_edge_probability_and_rescale_others (exit_edge, new_probability);
+      profile_count new_exit_count = exit_edge->count ();
+
+      /* Rescale the remaining edge probabilities and see if there is only
+	 one.  */
+      edge other_edge = NULL;
+      bool found = false;
+      FOR_EACH_EDGE (e, ei, exit_edge->src->succs)
+	if (!(e->flags & EDGE_FAKE)
+	    && !(e->probability == profile_probability::never ())
+	    && !loop_exit_edge_p (loop, e))
+	  {
+	    if (found)
+	      {
+		other_edge = NULL;
+		break;
+	      }
+	    other_edge = e;
+	    found = true;
+	  }
+      /* If there is only loop latch after other edge,
+	 update its profile.  */
+      if (other_edge && other_edge->dest == loop->latch)
+	loop->latch->count -= new_exit_count - old_exit_count;
+      else
+	{
+	  basic_block *body = get_loop_body (loop);
+	  profile_count new_count = exit_edge->src->count - new_exit_count;
+	  profile_count old_count = exit_edge->src->count - old_exit_count;
 
-      /* Change latch's count back.  */
-      if (loop->latch && loop->latch != e->src)
-	loop->latch->count -= count_delta;
+	  for (unsigned int i = 0; i < loop->num_nodes; i++)
+	    if (body[i] != exit_edge->src
+		&& dominated_by_p (CDI_DOMINATORS, body[i], exit_edge->src))
+	      body[i]->count.apply_scale (new_count, old_count);
 
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, ";; guessed iterations are now %i\n",
-		 (int)expected_loop_iterations_unbounded (loop, NULL, true));
+	  free (body);
+	}
+    }
+  else if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file,
+	       ";; Loop has mulitple exits;"
+	       " will leave exit probabilities inconsistent\n");
     }
 }
 
diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index 491b57ec0f1..184c08eec75 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loop-ivcanon.cc
@@ -1173,7 +1179,7 @@ try_peel_loop (class loop *loop,
       }
   profile_probability p;
   p = entry_count.probability_in (loop->header->count);
-  scale_loop_profile (loop, p, 0);
+  scale_loop_profile (loop, p, -1);
   bitmap_set_bit (peeled_loops, loop->num);
   return true;
 }
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index d66d4a6de69..2361cb328ab 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3191,7 +3191,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       if (prob_vector.initialized_p ())
 	{
 	  scale_bbs_frequencies (&bb_before_loop, 1, prob_vector);
-	  scale_loop_profile (loop, prob_vector, 0);
+	  scale_loop_profile (loop, prob_vector, -1);
 	}
     }
 
@@ -3236,7 +3236,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	  slpeel_update_phi_nodes_for_guard1 (prolog, loop, guard_e, e);
 
 	  scale_bbs_frequencies (&bb_after_prolog, 1, prob_prolog);
-	  scale_loop_profile (prolog, prob_prolog, bound_prolog);
+	  scale_loop_profile (prolog, prob_prolog, bound_prolog - 1);
 	}
 
       /* Update init address of DRs.  */
@@ -3378,7 +3378,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 
 	      scale_bbs_frequencies (&bb_before_epilog, 1, prob_epilog);
 	    }
-	  scale_loop_profile (epilog, prob_epilog, 0);
+	  scale_loop_profile (epilog, prob_epilog, -1);
 	}
       else
 	slpeel_update_phi_nodes_for_lcssa (epilog);

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-06 14:34       ` Jan Hubicka
@ 2023-07-07  5:59         ` Richard Biener
  2023-07-07 12:20           ` Jan Hubicka
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-07-07  5:59 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Tamar Christina, gcc-patches, nd, jlaw

On Thu, 6 Jul 2023, Jan Hubicka wrote:

> Hi,
> original scale_loop_profile was implemented to only handle very simple loops
> produced by vectorizer at that time (basically loops with only one exit and no
> subloops). It also has not been updated to new profile-count API very carefully.
> Since I want to use it from loop peeling and unlooping, I need the
> function to at least not get profile worse on general loops.
> 
> The function does two thigs
>  1) scales down the loop profile by a given probability.
>     This is useful, for example, to scale down profile after peeling when loop
>     body is executed less often than before
>  2) after scaling is done and if profile indicates too large iteration
>     count update profile to cap iteration count by ITERATION_BOUND parameter.
> 
> Step 1 is easy and unchanged.
> 
> I changed ITERATION_BOUND to be actual bound on number of iterations as
> used elsewhere (i.e. number of executions of latch edge) rather then
> number of iterations + 1 as it was before.
> 
> To do 2) one needs to do the following
>   a) scale own loop profile so frquency o header is at most
>      the sum of in-edge counts * (iteration_bound + 1)
>   b) update loop exit probabilities so their count is the same
>      as before scaling.
>   c) reduce frequencies of basic blocks after loop exit
> 
> old code did b) by setting probability to 1 / iteration_bound which is
> correctly only of the basic block containing exit executes precisely one per
> iteration (it is not insie other conditional or inner loop).  This is fixed
> now by using set_edge_probability_and_rescale_others
> 
> aldo c) was implemented only for special case when the exit was just before
> latch bacis block.  I now use dominance info to get right some of addional
> case.
> 
> I still did not try to do anything for multiple exit loops, though the
> implementatoin could be generalized.
> 
> Bootstrapped/regtested x86_64-linux.  Plan to cmmit it tonight if there
> are no complains.

Looks good, but I wonder what we can do to at least make the
multiple exit case behave reasonably?  The vectorizer keeps track
of a "canonical" exit, would it be possible to pass in the main
exit edge and use that instead of single_exit (), would other
exits then behave somewhat reasonable or would we totally screw
things up here?  That is, the "canonical" exit would be the
counting exit while the other exits are on data driven conditions
and thus wouldn't change probability when we reduce the number
of iterations(?)

Richard.

> gcc/ChangeLog:
> 
> 	* cfgloopmanip.cc (scale_loop_profile): Rewrite exit edge
> 	probability update to be safe on loops with subloops.
> 	Make bound parameter to be iteration bound.
> 	* tree-ssa-loop-ivcanon.cc (try_peel_loop): Update call
> 	of scale_loop_profile.
> 	* tree-vect-loop-manip.cc (vect_do_peeling): Likewise.
> 
> diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
> index 6e09dcbb0b1..524b979a546 100644
> --- a/gcc/cfgloopmanip.cc
> +++ b/gcc/cfgloopmanip.cc
> @@ -499,7 +499,7 @@ scale_loop_frequencies (class loop *loop, profile_probability p)
>  }
>  
>  /* Scale profile in LOOP by P.
> -   If ITERATION_BOUND is non-zero, scale even further if loop is predicted
> +   If ITERATION_BOUND is not -1, scale even further if loop is predicted
>     to iterate too many times.
>     Before caling this function, preheader block profile should be already
>     scaled to final count.  This is necessary because loop iterations are
> @@ -510,106 +510,123 @@ void
>  scale_loop_profile (class loop *loop, profile_probability p,
>  		    gcov_type iteration_bound)
>  {
> -  edge e, preheader_e;
> -  edge_iterator ei;
> -
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> +  if (!(p == profile_probability::always ()))
>      {
> -      fprintf (dump_file, ";; Scaling loop %i with scale ",
> -	       loop->num);
> -      p.dump (dump_file);
> -      fprintf (dump_file, " bounding iterations to %i\n",
> -	       (int)iteration_bound);
> -    }
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	{
> +	  fprintf (dump_file, ";; Scaling loop %i with scale ",
> +		   loop->num);
> +	  p.dump (dump_file);
> +	  fprintf (dump_file, "\n");
> +	}
>  
> -  /* Scale the probabilities.  */
> -  scale_loop_frequencies (loop, p);
> +      /* Scale the probabilities.  */
> +      scale_loop_frequencies (loop, p);
> +    }
>  
> -  if (iteration_bound == 0)
> +  if (iteration_bound == -1)
>      return;
>  
>    gcov_type iterations = expected_loop_iterations_unbounded (loop, NULL, true);
> +  if (iterations == -1)
> +    return;
>  
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> -      fprintf (dump_file, ";; guessed iterations after scaling %i\n",
> -	       (int)iterations);
> +      fprintf (dump_file,
> +	       ";; guessed iterations of loop %i:%i new upper bound %i:\n",
> +	       loop->num,
> +	       (int)iterations,
> +	       (int)iteration_bound);
>      }
>  
>    /* See if loop is predicted to iterate too many times.  */
>    if (iterations <= iteration_bound)
>      return;
>  
> -  preheader_e = loop_preheader_edge (loop);
> -
> -  /* We could handle also loops without preheaders, but bounding is
> -     currently used only by optimizers that have preheaders constructed.  */
> -  gcc_checking_assert (preheader_e);
> -  profile_count count_in = preheader_e->count ();
> +  /* Compute number of invocations of the loop.  */
> +  profile_count count_in = profile_count::zero ();
> +  edge e;
> +  edge_iterator ei;
> +  FOR_EACH_EDGE (e, ei, loop->header->preds)
> +    count_in += e->count ();
>  
> -  if (count_in > profile_count::zero ()
> -      && loop->header->count.initialized_p ())
> +  /* Now scale the loop body so header count is
> +     count_in * (iteration_bound + 1)  */
> +  profile_probability scale_prob
> +    = (count_in *= iteration_bound).probability_in (loop->header->count);
> +  if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> -      profile_count count_delta = profile_count::zero ();
> -
> -      e = single_exit (loop);
> -      if (e)
> -	{
> -	  edge other_e;
> -	  FOR_EACH_EDGE (other_e, ei, e->src->succs)
> -	    if (!(other_e->flags & (EDGE_ABNORMAL | EDGE_FAKE))
> -		&& e != other_e)
> -	      break;
> -
> -	  /* Probability of exit must be 1/iterations.  */
> -	  count_delta = e->count ();
> -	  e->probability = profile_probability::always () / iteration_bound;
> -	  other_e->probability = e->probability.invert ();
> -
> -	  /* In code below we only handle the following two updates.  */
> -	  if (other_e->dest != loop->header
> -	      && other_e->dest != loop->latch
> -	      && (dump_file && (dump_flags & TDF_DETAILS)))
> -	    {
> -	      fprintf (dump_file, ";; giving up on update of paths from "
> -		       "exit condition to latch\n");
> -	    }
> -	}
> +      fprintf (dump_file, ";; Scaling loop %i with scale ",
> +	       loop->num);
> +      p.dump (dump_file);
> +      fprintf (dump_file, " to reach upper bound %i\n",
> +	       (int)iteration_bound);
> +    }
> +  /* Finally attempt to fix exit edge probability.  */
> +  auto_vec<edge> exits = get_loop_exit_edges  (loop);
> +  edge exit_edge = single_likely_exit (loop, exits);
> +
> +  /* In a consistent profile unadjusted_exit_count should be same as count_in,
> +     however to preserve as much of the original info, avoid recomputing
> +     it.  */
> +  profile_count unadjusted_exit_count;
> +  if (exit_edge)
> +    unadjusted_exit_count = exit_edge->count ();
> +  scale_loop_frequencies (loop, scale_prob);
> +
> +  if (exit_edge)
> +    {
> +      profile_count old_exit_count = exit_edge->count ();
> +      profile_probability new_probability;
> +      if (iteration_bound > 0)
> +	new_probability
> +	  = unadjusted_exit_count.probability_in (exit_edge->src->count);
>        else
> -        if (dump_file && (dump_flags & TDF_DETAILS))
> -	  fprintf (dump_file, ";; Loop has multiple exit edges; "
> -	      		      "giving up on exit condition update\n");
> -
> -      /* Roughly speaking we want to reduce the loop body profile by the
> -	 difference of loop iterations.  We however can do better if
> -	 we look at the actual profile, if it is available.  */
> -      p = profile_probability::always ();
> -
> -      count_in *= iteration_bound;
> -      p = count_in.probability_in (loop->header->count);
> -      if (!(p > profile_probability::never ()))
> -	p = profile_probability::very_unlikely ();
> -
> -      if (p == profile_probability::always ()
> -	  || !p.initialized_p ())
> -	return;
> -
> -      /* If latch exists, change its count, since we changed
> -	 probability of exit.  Theoretically we should update everything from
> -	 source of exit edge to latch, but for vectorizer this is enough.  */
> -      if (loop->latch && loop->latch != e->src)
> -	loop->latch->count += count_delta;
> -
> -      /* Scale the probabilities.  */
> -      scale_loop_frequencies (loop, p);
> +	new_probability = profile_probability::always ();
> +      set_edge_probability_and_rescale_others (exit_edge, new_probability);
> +      profile_count new_exit_count = exit_edge->count ();
> +
> +      /* Rescale the remaining edge probabilities and see if there is only
> +	 one.  */
> +      edge other_edge = NULL;
> +      bool found = false;
> +      FOR_EACH_EDGE (e, ei, exit_edge->src->succs)
> +	if (!(e->flags & EDGE_FAKE)
> +	    && !(e->probability == profile_probability::never ())
> +	    && !loop_exit_edge_p (loop, e))
> +	  {
> +	    if (found)
> +	      {
> +		other_edge = NULL;
> +		break;
> +	      }
> +	    other_edge = e;
> +	    found = true;
> +	  }
> +      /* If there is only loop latch after other edge,
> +	 update its profile.  */
> +      if (other_edge && other_edge->dest == loop->latch)
> +	loop->latch->count -= new_exit_count - old_exit_count;
> +      else
> +	{
> +	  basic_block *body = get_loop_body (loop);
> +	  profile_count new_count = exit_edge->src->count - new_exit_count;
> +	  profile_count old_count = exit_edge->src->count - old_exit_count;
>  
> -      /* Change latch's count back.  */
> -      if (loop->latch && loop->latch != e->src)
> -	loop->latch->count -= count_delta;
> +	  for (unsigned int i = 0; i < loop->num_nodes; i++)
> +	    if (body[i] != exit_edge->src
> +		&& dominated_by_p (CDI_DOMINATORS, body[i], exit_edge->src))
> +	      body[i]->count.apply_scale (new_count, old_count);
>  
> -      if (dump_file && (dump_flags & TDF_DETAILS))
> -	fprintf (dump_file, ";; guessed iterations are now %i\n",
> -		 (int)expected_loop_iterations_unbounded (loop, NULL, true));
> +	  free (body);
> +	}
> +    }
> +  else if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file,
> +	       ";; Loop has mulitple exits;"
> +	       " will leave exit probabilities inconsistent\n");
>      }
>  }
>  
> diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
> index 491b57ec0f1..184c08eec75 100644
> --- a/gcc/tree-ssa-loop-ivcanon.cc
> +++ b/gcc/tree-ssa-loop-ivcanon.cc
> @@ -1173,7 +1179,7 @@ try_peel_loop (class loop *loop,
>        }
>    profile_probability p;
>    p = entry_count.probability_in (loop->header->count);
> -  scale_loop_profile (loop, p, 0);
> +  scale_loop_profile (loop, p, -1);
>    bitmap_set_bit (peeled_loops, loop->num);
>    return true;
>  }
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index d66d4a6de69..2361cb328ab 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3191,7 +3191,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>        if (prob_vector.initialized_p ())
>  	{
>  	  scale_bbs_frequencies (&bb_before_loop, 1, prob_vector);
> -	  scale_loop_profile (loop, prob_vector, 0);
> +	  scale_loop_profile (loop, prob_vector, -1);
>  	}
>      }
>  
> @@ -3236,7 +3236,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	  slpeel_update_phi_nodes_for_guard1 (prolog, loop, guard_e, e);
>  
>  	  scale_bbs_frequencies (&bb_after_prolog, 1, prob_prolog);
> -	  scale_loop_profile (prolog, prob_prolog, bound_prolog);
> +	  scale_loop_profile (prolog, prob_prolog, bound_prolog - 1);
>  	}
>  
>        /* Update init address of DRs.  */
> @@ -3378,7 +3378,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  
>  	      scale_bbs_frequencies (&bb_before_epilog, 1, prob_epilog);
>  	    }
> -	  scale_loop_profile (epilog, prob_epilog, 0);
> +	  scale_loop_profile (epilog, prob_epilog, -1);
>  	}
>        else
>  	slpeel_update_phi_nodes_for_lcssa (epilog);
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-07  5:59         ` Richard Biener
@ 2023-07-07 12:20           ` Jan Hubicka
  2023-07-07 12:27             ` Tamar Christina
  2023-07-10  7:07             ` Richard Biener
  0 siblings, 2 replies; 200+ messages in thread
From: Jan Hubicka @ 2023-07-07 12:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tamar Christina, gcc-patches, nd, jlaw

> 
> Looks good, but I wonder what we can do to at least make the
> multiple exit case behave reasonably?  The vectorizer keeps track

> of a "canonical" exit, would it be possible to pass in the main
> exit edge and use that instead of single_exit (), would other
> exits then behave somewhat reasonable or would we totally screw
> things up here?  That is, the "canonical" exit would be the
> counting exit while the other exits are on data driven conditions
> and thus wouldn't change probability when we reduce the number
> of iterations(?)

I can add canonical_exit parameter and make the function to direct flow
to it if possible.  However overall I think fixup depends on what
transformation led to the change.

Assuming that vectorizer did no prologues and apilogues and we
vectorized with factor N, then I think the update could be done more
specifically as follows.

We know that header block count dropped by 4. So we can start from that
and each time we reach basic block with exit edge, we know the original
count of the edge.  This count is unchanged, so one can rescale
probabilities out of that BB accordingly.  If loop has no inner loops,
we can just walk the body in RPO and propagate scales downwards and we
sould arrive to right result

I originally added the bound parameter to handle prologues/epilogues
which gets new artificial bound.  In prologue I think you are right that
the flow will be probably directed to the conditional counting
iterations.

In epilogue we add no artificial iteration cap, so maybe it is more
realistic to simply scale up probability of all exits?

To see what is going on I tried following testcase:

int a[99];
test()
{
  for (int i = 0; i < 99; i++)
      a[i]++;
}

What surprises me is that vectorizer at -O2 does nothing and we end up
unrolling the loop:

L2:
        addl    $1, (%rax)
        addl    $1, 4(%rax)
        addl    $1, 8(%rax)
        addq    $12, %rax
        cmpq    $a+396, %rax

Which seems sily thing to do. Vectorized loop with epilogue doing 2 and
1 addition would be better.

With -O3 we vectorize it:


.L2:
        movdqa  (%rax), %xmm0
        addq    $16, %rax
        paddd   %xmm1, %xmm0
        movaps  %xmm0, -16(%rax)
        cmpq    %rax, %rdx
        jne     .L2
        movq    a+384(%rip), %xmm0
        addl    $1, a+392(%rip)
        movq    .LC1(%rip), %xmm1
        paddd   %xmm1, %xmm0
        movq    %xmm0, a+384(%rip)


and correctly drop vectorized loop body to 24 iterations. However the
epilogue has loop for vector size 2 predicted to iterate once (it won't)

;;   basic block 7, loop depth 0, count 10737416 (estimated locally), maybe hot 
;;    prev block 5, next block 8, flags: (NEW, VISITED)                         
;;    pred:       3 [4.0% (adjusted)]  count:10737416 (estimated locally) (FALSE_VALUE,EXECUTABLE)
;;    succ:       8 [always]  count:10737416 (estimated locally) (FALLTHRU,EXECUTABLE)
                                                                                
;;   basic block 8, loop depth 1, count 21474835 (estimated locally), maybe hot 
;;    prev block 7, next block 9, flags: (NEW, REACHABLE, VISITED)              
;;    pred:       9 [always]  count:10737417 (estimated locally) (FALLTHRU,DFS_BACK,EXECUTABLE)
;;                7 [always]  count:10737416 (estimated locally) (FALLTHRU,EXECUTABLE)
  # i_9 = PHI <i_17(9), 96(7)>                                                  
  # ivtmp_13 = PHI <ivtmp_18(9), 3(7)>                                          
  # vectp_a.14_40 = PHI <vectp_a.14_41(9), &MEM <int[99]> [(void *)&a + 384B](7)>
  # vectp_a.18_46 = PHI <vectp_a.18_47(9), &MEM <int[99]> [(void *)&a + 384B](7)>
  # ivtmp_49 = PHI <ivtmp_50(9), 0(7)>                                          
  vect__14.16_42 = MEM <vector(2) int> [(int *)vectp_a.14_40];                  
  _14 = a[i_9];                                                                 
  vect__15.17_44 = vect__14.16_42 + { 1, 1 };                                   
  _15 = _14 + 1;                                                                
  MEM <vector(2) int> [(int *)vectp_a.18_46] = vect__15.17_44;                  
  i_17 = i_9 + 1;                                                               
  ivtmp_18 = ivtmp_13 - 1;                                                      
  vectp_a.14_41 = vectp_a.14_40 + 8;                                            
  vectp_a.18_47 = vectp_a.18_46 + 8;                                            
  ivtmp_50 = ivtmp_49 + 1;                                                      
  if (ivtmp_50 < 1)                                                             
    goto <bb 9>; [50.00%]                                                       
  else                                                                          
    goto <bb 12>; [50.00%]                                                      

and finally the scalar copy

;;   basic block 12, loop depth 0, count 10737416 (estimated locally), maybe hot
;;    prev block 9, next block 13, flags: (NEW, VISITED)                        
;;    pred:       8 [50.0% (adjusted)]  count:10737418 (estimated locally) (FALSE_VALUE,EXECUTABLE)
;;    succ:       13 [always]  count:10737416 (estimated locally) (FALLTHRU)    
                                                                                
;;   basic block 13, loop depth 1, count 1063004409 (estimated locally), maybe hot
;;    prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED)            
;;    pred:       14 [always]  count:1052266996 (estimated locally) (FALLTHRU,DFS_BACK,EXECUTABLE)
;;                12 [always]  count:10737416 (estimated locally) (FALLTHRU)    
  # i_30 = PHI <i_36(14), 98(12)>                                               
  # ivtmp_32 = PHI <ivtmp_37(14), 1(12)>                                        
  _33 = a[i_30];                                                                
  _34 = _33 + 1;                                                                
  a[i_30] = _34;                                                                
  i_36 = i_30 + 1;                                                              
  ivtmp_37 = ivtmp_32 - 1;                                                      
  if (ivtmp_37 != 0)                                                            
    goto <bb 14>; [98.99%]                                                      
  else                                                                          
    goto <bb 4>; [1.01%]                                                        

With also small but non-zero iteration probability.   This is papered
over by my yesterday patch. But it seems to me that it would be a lot
better if vectorizer understood that the epilogue will be loopless and
accounted it to the cost model that would probably make it easy to
enable it at cheap costs too.

Clang 16 at -O2 is much more aggressive by both vectorizing and unroling:

test:                                   # @test
        .cfi_startproc
# %bb.0:
        movdqa  a(%rip), %xmm1
        pcmpeqd %xmm0, %xmm0
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a(%rip)
        movdqa  a+16(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+16(%rip)
        movdqa  a+32(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+32(%rip)
        movdqa  a+48(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+48(%rip)
        movdqa  a+64(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+64(%rip)
        movdqa  a+80(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+80(%rip)
        movdqa  a+96(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+96(%rip)
        movdqa  a+112(%rip), %xmm1
        psubd   %xmm0, %xmm1
....
        movdqa  %xmm1, a+240(%rip)
        movdqa  a+256(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+256(%rip)
        movdqa  a+272(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+272(%rip)
        movdqa  a+288(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+288(%rip)
        movdqa  a+304(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+304(%rip)
        movdqa  a+320(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+320(%rip)
        movdqa  a+336(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+336(%rip)
        movdqa  a+352(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+352(%rip)
        movdqa  a+368(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+368(%rip)
        addl    $1, a+384(%rip)
        addl    $1, a+388(%rip)
        addl    $1, a+392(%rip)
        retq

Honza

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-07 12:20           ` Jan Hubicka
@ 2023-07-07 12:27             ` Tamar Christina
  2023-07-07 14:10               ` Jan Hubicka
  2023-07-10  7:07             ` Richard Biener
  1 sibling, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-07-07 12:27 UTC (permalink / raw)
  To: Jan Hubicka, Richard Biener; +Cc: gcc-patches, nd, jlaw

Hi Both,

Thanks for all the reviews/patches so far 😊

> >
> > Looks good, but I wonder what we can do to at least make the multiple
> > exit case behave reasonably?  The vectorizer keeps track
> 
> > of a "canonical" exit, would it be possible to pass in the main exit
> > edge and use that instead of single_exit (), would other exits then
> > behave somewhat reasonable or would we totally screw things up here?
> > That is, the "canonical" exit would be the counting exit while the
> > other exits are on data driven conditions and thus wouldn't change
> > probability when we reduce the number of iterations(?)
> 
> I can add canonical_exit parameter and make the function to direct flow to it if
> possible.  However overall I think fixup depends on what transformation led to
> the change.
> 
> Assuming that vectorizer did no prologues and apilogues and we vectorized
> with factor N, then I think the update could be done more specifically as
> follows.
> 

If it helps, how this patch series addresses multiple exits by forcing a scalar
epilogue, all non canonical_exits would have been redirected to this scalar
epilogue, so the remaining scalar iteration count will be at most VF.

Regards,
Tamar

> We know that header block count dropped by 4. So we can start from that
> and each time we reach basic block with exit edge, we know the original count
> of the edge.  This count is unchanged, so one can rescale probabilities out of
> that BB accordingly.  If loop has no inner loops, we can just walk the body in
> RPO and propagate scales downwards and we sould arrive to right result
> 
> I originally added the bound parameter to handle prologues/epilogues which
> gets new artificial bound.  In prologue I think you are right that the flow will be
> probably directed to the conditional counting iterations.
> 
> In epilogue we add no artificial iteration cap, so maybe it is more realistic to
> simply scale up probability of all exits?
> 
> To see what is going on I tried following testcase:
> 
> int a[99];
> test()
> {
>   for (int i = 0; i < 99; i++)
>       a[i]++;
> }
> 
> What surprises me is that vectorizer at -O2 does nothing and we end up
> unrolling the loop:
> 
> L2:
>         addl    $1, (%rax)
>         addl    $1, 4(%rax)
>         addl    $1, 8(%rax)
>         addq    $12, %rax
>         cmpq    $a+396, %rax
> 
> Which seems sily thing to do. Vectorized loop with epilogue doing 2 and
> 1 addition would be better.
> 
> With -O3 we vectorize it:
> 
> 
> .L2:
>         movdqa  (%rax), %xmm0
>         addq    $16, %rax
>         paddd   %xmm1, %xmm0
>         movaps  %xmm0, -16(%rax)
>         cmpq    %rax, %rdx
>         jne     .L2
>         movq    a+384(%rip), %xmm0
>         addl    $1, a+392(%rip)
>         movq    .LC1(%rip), %xmm1
>         paddd   %xmm1, %xmm0
>         movq    %xmm0, a+384(%rip)
> 
> 
> and correctly drop vectorized loop body to 24 iterations. However the
> epilogue has loop for vector size 2 predicted to iterate once (it won't)
> 
> ;;   basic block 7, loop depth 0, count 10737416 (estimated locally), maybe
> hot
> ;;    prev block 5, next block 8, flags: (NEW, VISITED)
> ;;    pred:       3 [4.0% (adjusted)]  count:10737416 (estimated locally)
> (FALSE_VALUE,EXECUTABLE)
> ;;    succ:       8 [always]  count:10737416 (estimated locally)
> (FALLTHRU,EXECUTABLE)
> 
> ;;   basic block 8, loop depth 1, count 21474835 (estimated locally), maybe
> hot
> ;;    prev block 7, next block 9, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       9 [always]  count:10737417 (estimated locally)
> (FALLTHRU,DFS_BACK,EXECUTABLE)
> ;;                7 [always]  count:10737416 (estimated locally)
> (FALLTHRU,EXECUTABLE)
>   # i_9 = PHI <i_17(9), 96(7)>
>   # ivtmp_13 = PHI <ivtmp_18(9), 3(7)>
>   # vectp_a.14_40 = PHI <vectp_a.14_41(9), &MEM <int[99]> [(void *)&a +
> 384B](7)>
>   # vectp_a.18_46 = PHI <vectp_a.18_47(9), &MEM <int[99]> [(void *)&a +
> 384B](7)>
>   # ivtmp_49 = PHI <ivtmp_50(9), 0(7)>
>   vect__14.16_42 = MEM <vector(2) int> [(int *)vectp_a.14_40];
>   _14 = a[i_9];
>   vect__15.17_44 = vect__14.16_42 + { 1, 1 };
>   _15 = _14 + 1;
>   MEM <vector(2) int> [(int *)vectp_a.18_46] = vect__15.17_44;
>   i_17 = i_9 + 1;
>   ivtmp_18 = ivtmp_13 - 1;
>   vectp_a.14_41 = vectp_a.14_40 + 8;
>   vectp_a.18_47 = vectp_a.18_46 + 8;
>   ivtmp_50 = ivtmp_49 + 1;
>   if (ivtmp_50 < 1)
>     goto <bb 9>; [50.00%]
>   else
>     goto <bb 12>; [50.00%]
> 
> and finally the scalar copy
> 
> ;;   basic block 12, loop depth 0, count 10737416 (estimated locally), maybe
> hot
> ;;    prev block 9, next block 13, flags: (NEW, VISITED)
> ;;    pred:       8 [50.0% (adjusted)]  count:10737418 (estimated locally)
> (FALSE_VALUE,EXECUTABLE)
> ;;    succ:       13 [always]  count:10737416 (estimated locally) (FALLTHRU)
> 
> ;;   basic block 13, loop depth 1, count 1063004409 (estimated locally),
> maybe hot
> ;;    prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       14 [always]  count:1052266996 (estimated locally)
> (FALLTHRU,DFS_BACK,EXECUTABLE)
> ;;                12 [always]  count:10737416 (estimated locally) (FALLTHRU)
>   # i_30 = PHI <i_36(14), 98(12)>
>   # ivtmp_32 = PHI <ivtmp_37(14), 1(12)>
>   _33 = a[i_30];
>   _34 = _33 + 1;
>   a[i_30] = _34;
>   i_36 = i_30 + 1;
>   ivtmp_37 = ivtmp_32 - 1;
>   if (ivtmp_37 != 0)
>     goto <bb 14>; [98.99%]
>   else
>     goto <bb 4>; [1.01%]
> 
> With also small but non-zero iteration probability.   This is papered
> over by my yesterday patch. But it seems to me that it would be a lot better if
> vectorizer understood that the epilogue will be loopless and accounted it to
> the cost model that would probably make it easy to enable it at cheap costs
> too.
> 
> Clang 16 at -O2 is much more aggressive by both vectorizing and unroling:
> 
> test:                                   # @test
>         .cfi_startproc
> # %bb.0:
>         movdqa  a(%rip), %xmm1
>         pcmpeqd %xmm0, %xmm0
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a(%rip)
>         movdqa  a+16(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+16(%rip)
>         movdqa  a+32(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+32(%rip)
>         movdqa  a+48(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+48(%rip)
>         movdqa  a+64(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+64(%rip)
>         movdqa  a+80(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+80(%rip)
>         movdqa  a+96(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+96(%rip)
>         movdqa  a+112(%rip), %xmm1
>         psubd   %xmm0, %xmm1
> ....
>         movdqa  %xmm1, a+240(%rip)
>         movdqa  a+256(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+256(%rip)
>         movdqa  a+272(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+272(%rip)
>         movdqa  a+288(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+288(%rip)
>         movdqa  a+304(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+304(%rip)
>         movdqa  a+320(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+320(%rip)
>         movdqa  a+336(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+336(%rip)
>         movdqa  a+352(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+352(%rip)
>         movdqa  a+368(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+368(%rip)
>         addl    $1, a+384(%rip)
>         addl    $1, a+388(%rip)
>         addl    $1, a+392(%rip)
>         retq
> 
> Honza

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-07 12:27             ` Tamar Christina
@ 2023-07-07 14:10               ` Jan Hubicka
  0 siblings, 0 replies; 200+ messages in thread
From: Jan Hubicka @ 2023-07-07 14:10 UTC (permalink / raw)
  To: Tamar Christina; +Cc: Richard Biener, gcc-patches, nd, jlaw

> Hi Both,
> 
> Thanks for all the reviews/patches so far 😊
> 
> > >
> > > Looks good, but I wonder what we can do to at least make the multiple
> > > exit case behave reasonably?  The vectorizer keeps track
> > 
> > > of a "canonical" exit, would it be possible to pass in the main exit
> > > edge and use that instead of single_exit (), would other exits then
> > > behave somewhat reasonable or would we totally screw things up here?
> > > That is, the "canonical" exit would be the counting exit while the
> > > other exits are on data driven conditions and thus wouldn't change
> > > probability when we reduce the number of iterations(?)
> > 
> > I can add canonical_exit parameter and make the function to direct flow to it if
> > possible.  However overall I think fixup depends on what transformation led to
> > the change.
> > 
> > Assuming that vectorizer did no prologues and apilogues and we vectorized
> > with factor N, then I think the update could be done more specifically as
> > follows.
> > 
> 
> If it helps, how this patch series addresses multiple exits by forcing a scalar
> epilogue, all non canonical_exits would have been redirected to this scalar
> epilogue, so the remaining scalar iteration count will be at most VF.

It looks like profile update after vectorization needs quite some TLC.
My student Ondrej Kubanek also implemented loop histogram profiling
which gives better idea on how commonly prologues/epilogues are needed
and it would be also nice to handle it.
> > ;;   basic block 12, loop depth 0, count 10737416 (estimated locally), maybe
> > hot
> > ;;    prev block 9, next block 13, flags: (NEW, VISITED)
> > ;;    pred:       8 [50.0% (adjusted)]  count:10737418 (estimated locally)
> > (FALSE_VALUE,EXECUTABLE)
> > ;;    succ:       13 [always]  count:10737416 (estimated locally) (FALLTHRU)
> > 
> > ;;   basic block 13, loop depth 1, count 1063004409 (estimated locally),
> > maybe hot
> > ;;    prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED)
> > ;;    pred:       14 [always]  count:1052266996 (estimated locally)
> > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > ;;                12 [always]  count:10737416 (estimated locally) (FALLTHRU)
> >   # i_30 = PHI <i_36(14), 98(12)>
> >   # ivtmp_32 = PHI <ivtmp_37(14), 1(12)>
> >   _33 = a[i_30];
> >   _34 = _33 + 1;
> >   a[i_30] = _34;
> >   i_36 = i_30 + 1;
> >   ivtmp_37 = ivtmp_32 - 1;
> >   if (ivtmp_37 != 0)
> >     goto <bb 14>; [98.99%]
> >   else
> >     goto <bb 4>; [1.01%]

Actually it seems that the scalar epilogue loop is with oriignal profile
(predicted to iterate 99 times) which is quite wrong.
Looking at the statistics for yesterday patch, on tramp3d we got 86%
reduction in cummulative profile mismatches after whole optimization
pipeline.  More interestingly however the overall time esimtate
dropped by 18%, so it seems that the profile adjustment done by cunroll
are afecting the profile a lot.

I think the fact that iteration counts of epilogues is not capped is one
of main problems.

We seem to call scale_loop_profile 3 times:

       scale_loop_profile (loop, prob_vector, -1);

This seems to account for the probability that control flow is
redirected to prolog/epilog later.  So it only scales down the profile
but is not responsible 

       scale_loop_profile (prolog, prob_prolog, bound_prolog - 1);

This is does prolog and sets bound.

       scale_loop_profile (epilog, prob_epilog, -1);

This scales epilog but does not set bound at all. 
I think the information is availale since we update the loop_info
datastructures.

Honza

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-07 12:20           ` Jan Hubicka
  2023-07-07 12:27             ` Tamar Christina
@ 2023-07-10  7:07             ` Richard Biener
  2023-07-10  8:33               ` Jan Hubicka
  2023-07-10  9:23               ` Jan Hubicka
  1 sibling, 2 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-10  7:07 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Tamar Christina, gcc-patches, nd, jlaw

On Fri, 7 Jul 2023, Jan Hubicka wrote:

> > 
> > Looks good, but I wonder what we can do to at least make the
> > multiple exit case behave reasonably?  The vectorizer keeps track
> 
> > of a "canonical" exit, would it be possible to pass in the main
> > exit edge and use that instead of single_exit (), would other
> > exits then behave somewhat reasonable or would we totally screw
> > things up here?  That is, the "canonical" exit would be the
> > counting exit while the other exits are on data driven conditions
> > and thus wouldn't change probability when we reduce the number
> > of iterations(?)
> 
> I can add canonical_exit parameter and make the function to direct flow
> to it if possible.  However overall I think fixup depends on what
> transformation led to the change.

I think the vectorizer knows there's a single counting IV and all
other exits are dependent on data processed, so the scaling the
vectorizer just changes the counting IV.  So I think it makes
sense to pass that exit to the function in all cases.

> Assuming that vectorizer did no prologues and apilogues and we
> vectorized with factor N, then I think the update could be done more
> specifically as follows.
> 
> We know that header block count dropped by 4. So we can start from that
> and each time we reach basic block with exit edge, we know the original
> count of the edge.  This count is unchanged, so one can rescale
> probabilities out of that BB accordingly.  If loop has no inner loops,
> we can just walk the body in RPO and propagate scales downwards and we
> sould arrive to right result

That should work for alternate exits as well, no?

> I originally added the bound parameter to handle prologues/epilogues
> which gets new artificial bound.  In prologue I think you are right that
> the flow will be probably directed to the conditional counting
> iterations.

I suppose we'd need to scale both main and epilogue together since
the epilogue "steals" from the main loop counts.  Likewise if there's
a skip edge around the vector loop.  I think currently we simply
set the edge probability of those skip conds rather than basing
this off the niter values they work on.  Aka if (niter < VF) goto
epilogue; do {} while (niter / VF); epilogue: do {} while (niter);

There's also the cost model which might require niter > VF to enter
the main loop body.

> In epilogue we add no artificial iteration cap, so maybe it is more
> realistic to simply scale up probability of all exits?

Probably.

> To see what is going on I tried following testcase:
> 
> int a[99];
> test()
> {
>   for (int i = 0; i < 99; i++)
>       a[i]++;
> }
> 
> What surprises me is that vectorizer at -O2 does nothing and we end up
> unrolling the loop:
> 
> L2:
>         addl    $1, (%rax)
>         addl    $1, 4(%rax)
>         addl    $1, 8(%rax)
>         addq    $12, %rax
>         cmpq    $a+396, %rax
> 
> Which seems sily thing to do. Vectorized loop with epilogue doing 2 and
> 1 addition would be better.
> 
> With -O3 we vectorize it:
> 
> 
> .L2:
>         movdqa  (%rax), %xmm0
>         addq    $16, %rax
>         paddd   %xmm1, %xmm0
>         movaps  %xmm0, -16(%rax)
>         cmpq    %rax, %rdx
>         jne     .L2
>         movq    a+384(%rip), %xmm0
>         addl    $1, a+392(%rip)
>         movq    .LC1(%rip), %xmm1
>         paddd   %xmm1, %xmm0
>         movq    %xmm0, a+384(%rip)

The -O2 cost model doesn't want to do epilogues:

  /* If using the "very cheap" model. reject cases in which we'd keep
     a copy of the scalar code (even if we might be able to vectorize it).  
*/
  if (loop_cost_model (loop) == VECT_COST_MODEL_VERY_CHEAP
      && (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
          || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
          || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)))
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "some scalar iterations would need to be 
peeled\n");
      return 0;
    }

it's because of the code size increase.

> and correctly drop vectorized loop body to 24 iterations. However the
> epilogue has loop for vector size 2 predicted to iterate once (it won't)
> 
> ;;   basic block 7, loop depth 0, count 10737416 (estimated locally), maybe hot 
> ;;    prev block 5, next block 8, flags: (NEW, VISITED)                         
> ;;    pred:       3 [4.0% (adjusted)]  count:10737416 (estimated locally) (FALSE_VALUE,EXECUTABLE)
> ;;    succ:       8 [always]  count:10737416 (estimated locally) (FALLTHRU,EXECUTABLE)
>                                                                                 
> ;;   basic block 8, loop depth 1, count 21474835 (estimated locally), maybe hot 
> ;;    prev block 7, next block 9, flags: (NEW, REACHABLE, VISITED)              
> ;;    pred:       9 [always]  count:10737417 (estimated locally) (FALLTHRU,DFS_BACK,EXECUTABLE)
> ;;                7 [always]  count:10737416 (estimated locally) (FALLTHRU,EXECUTABLE)
>   # i_9 = PHI <i_17(9), 96(7)>                                                  
>   # ivtmp_13 = PHI <ivtmp_18(9), 3(7)>                                          
>   # vectp_a.14_40 = PHI <vectp_a.14_41(9), &MEM <int[99]> [(void *)&a + 384B](7)>
>   # vectp_a.18_46 = PHI <vectp_a.18_47(9), &MEM <int[99]> [(void *)&a + 384B](7)>
>   # ivtmp_49 = PHI <ivtmp_50(9), 0(7)>                                          
>   vect__14.16_42 = MEM <vector(2) int> [(int *)vectp_a.14_40];                  
>   _14 = a[i_9];                                                                 
>   vect__15.17_44 = vect__14.16_42 + { 1, 1 };                                   
>   _15 = _14 + 1;                                                                
>   MEM <vector(2) int> [(int *)vectp_a.18_46] = vect__15.17_44;                  
>   i_17 = i_9 + 1;                                                               
>   ivtmp_18 = ivtmp_13 - 1;                                                      
>   vectp_a.14_41 = vectp_a.14_40 + 8;                                            
>   vectp_a.18_47 = vectp_a.18_46 + 8;                                            
>   ivtmp_50 = ivtmp_49 + 1;                                                      
>   if (ivtmp_50 < 1)                                                             
>     goto <bb 9>; [50.00%]                                                       
>   else                                                                          
>     goto <bb 12>; [50.00%]                                                      
> 
> and finally the scalar copy
> 
> ;;   basic block 12, loop depth 0, count 10737416 (estimated locally), maybe hot
> ;;    prev block 9, next block 13, flags: (NEW, VISITED)                        
> ;;    pred:       8 [50.0% (adjusted)]  count:10737418 (estimated locally) (FALSE_VALUE,EXECUTABLE)
> ;;    succ:       13 [always]  count:10737416 (estimated locally) (FALLTHRU)    
>                                                                                 
> ;;   basic block 13, loop depth 1, count 1063004409 (estimated locally), maybe hot
> ;;    prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED)            
> ;;    pred:       14 [always]  count:1052266996 (estimated locally) (FALLTHRU,DFS_BACK,EXECUTABLE)
> ;;                12 [always]  count:10737416 (estimated locally) (FALLTHRU)    
>   # i_30 = PHI <i_36(14), 98(12)>                                               
>   # ivtmp_32 = PHI <ivtmp_37(14), 1(12)>                                        
>   _33 = a[i_30];                                                                
>   _34 = _33 + 1;                                                                
>   a[i_30] = _34;                                                                
>   i_36 = i_30 + 1;                                                              
>   ivtmp_37 = ivtmp_32 - 1;                                                      
>   if (ivtmp_37 != 0)                                                            
>     goto <bb 14>; [98.99%]                                                      
>   else                                                                          
>     goto <bb 4>; [1.01%]                                                        
> 
> With also small but non-zero iteration probability.   This is papered
> over by my yesterday patch. But it seems to me that it would be a lot
> better if vectorizer understood that the epilogue will be loopless and
> accounted it to the cost model that would probably make it easy to
> enable it at cheap costs too.

The epilogue will be "unrolled" later I think because we can correctly
compute it won't iterate.

> Clang 16 at -O2 is much more aggressive by both vectorizing and unroling:
> 
> test:                                   # @test
>         .cfi_startproc
> # %bb.0:
>         movdqa  a(%rip), %xmm1
>         pcmpeqd %xmm0, %xmm0
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a(%rip)
>         movdqa  a+16(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+16(%rip)
>         movdqa  a+32(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+32(%rip)
>         movdqa  a+48(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+48(%rip)
>         movdqa  a+64(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+64(%rip)
>         movdqa  a+80(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+80(%rip)
>         movdqa  a+96(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+96(%rip)
>         movdqa  a+112(%rip), %xmm1
>         psubd   %xmm0, %xmm1
> ....
>         movdqa  %xmm1, a+240(%rip)
>         movdqa  a+256(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+256(%rip)
>         movdqa  a+272(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+272(%rip)
>         movdqa  a+288(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+288(%rip)
>         movdqa  a+304(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+304(%rip)
>         movdqa  a+320(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+320(%rip)
>         movdqa  a+336(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+336(%rip)
>         movdqa  a+352(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+352(%rip)
>         movdqa  a+368(%rip), %xmm1
>         psubd   %xmm0, %xmm1
>         movdqa  %xmm1, a+368(%rip)
>         addl    $1, a+384(%rip)
>         addl    $1, a+388(%rip)
>         addl    $1, a+392(%rip)
>         retq

That's clearly much larger code.  On x86 we're also fighting with
large instruction encodings here, in particular EVEX for AVX512 is
"bad" here.  We hardly get more than two instructions decoded per
cycle due to their size.

Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-10  7:07             ` Richard Biener
@ 2023-07-10  8:33               ` Jan Hubicka
  2023-07-10  9:24                 ` Richard Biener
  2023-07-10  9:23               ` Jan Hubicka
  1 sibling, 1 reply; 200+ messages in thread
From: Jan Hubicka @ 2023-07-10  8:33 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tamar Christina, gcc-patches, nd, jlaw

Hi,
over weekend I found that vectorizer is missing scale_loop_profile for
epilogues.  It already adjusts loop_info to set max iteraitons, so
adding it was easy. However now predicts the first loop to iterate at
most once (which is too much, I suppose it forgets to divide by epilogue
unrolling factor) and second never.
> 
> The -O2 cost model doesn't want to do epilogues:
> 
>   /* If using the "very cheap" model. reject cases in which we'd keep
>      a copy of the scalar code (even if we might be able to vectorize it).  
> */
>   if (loop_cost_model (loop) == VECT_COST_MODEL_VERY_CHEAP
>       && (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
>           || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>           || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)))
>     {
>       if (dump_enabled_p ())
>         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                          "some scalar iterations would need to be 
> peeled\n");
>       return 0;
>     }
> 
> it's because of the code size increase.

I know, however -O2 is not -Os and here the tradeoffs of
performance/code size seems a lot better than other code expanding
things we do at -O2 (such as the unrolling 3 times).
I think we set the very cheap cost model very conservatively in order to
get -ftree-vectorize enabled with -O2 and there is some room for finding
right balance.

I get:

jan@localhost:~> cat t.c
int a[99];
__attribute((noipa, weak))
void
test()
{
        for (int i = 0 ; i < 99; i++)
                a[i]++;
}
void
main()
{
        for (int j = 0; j < 10000000; j++)
                test();
}
jan@localhost:~> gcc -O2 t.c -fno-unroll-loops ; time ./a.out

real    0m0.529s
user    0m0.528s
sys     0m0.000s

jan@localhost:~> gcc -O2 t.c ; time ./a.out

real    0m0.427s
user    0m0.426s
sys     0m0.000s
jan@localhost:~> gcc -O3 t.c ; time ./a.out

real    0m0.136s
user    0m0.135s
sys     0m0.000s
jan@localhost:~> clang -O2 t.c ; time ./a.out
<warnings>

real    0m0.116s
user    0m0.116s
sys     0m0.000s

Code size (of function test):
 gcc -O2 -fno-unroll-loops 17  bytes
 gcc -O2                   29  bytes
 gcc -O3                   50  bytes
 clang -O2                 510 bytes

So unroling 70% code size growth for 23% speedup.
Vectorizing is 294% code size growth for 388% speedup
Clang does 3000% codde size growth for 456% speedup
> 
> That's clearly much larger code.  On x86 we're also fighting with
> large instruction encodings here, in particular EVEX for AVX512 is
> "bad" here.  We hardly get more than two instructions decoded per
> cycle due to their size.

Agreed, I found it surprising clang does that much of complette unrolling
at -O2. However vectorizing and not unrolling here seems like it may be
a better default for -O2 than what we do currently...

Honza
> 
> Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-10  7:07             ` Richard Biener
  2023-07-10  8:33               ` Jan Hubicka
@ 2023-07-10  9:23               ` Jan Hubicka
  2023-07-10  9:29                 ` Richard Biener
  1 sibling, 1 reply; 200+ messages in thread
From: Jan Hubicka @ 2023-07-10  9:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tamar Christina, gcc-patches, nd, jlaw

> On Fri, 7 Jul 2023, Jan Hubicka wrote:
> 
> > > 
> > > Looks good, but I wonder what we can do to at least make the
> > > multiple exit case behave reasonably?  The vectorizer keeps track
> > 
> > > of a "canonical" exit, would it be possible to pass in the main
> > > exit edge and use that instead of single_exit (), would other
> > > exits then behave somewhat reasonable or would we totally screw
> > > things up here?  That is, the "canonical" exit would be the
> > > counting exit while the other exits are on data driven conditions
> > > and thus wouldn't change probability when we reduce the number
> > > of iterations(?)
> > 
> > I can add canonical_exit parameter and make the function to direct flow
> > to it if possible.  However overall I think fixup depends on what
> > transformation led to the change.
> 
> I think the vectorizer knows there's a single counting IV and all
> other exits are dependent on data processed, so the scaling the
> vectorizer just changes the counting IV.  So I think it makes
> sense to pass that exit to the function in all cases.

It really seems to me that vectorized loop is like N loops happening 
in parallel, so the probabilities of alternative exits grows as well.
But canonical exit is right thing to do for prologues - here we really
add extra conditions to the iteration counting exit.
> 
> > Assuming that vectorizer did no prologues and apilogues and we
> > vectorized with factor N, then I think the update could be done more
> > specifically as follows.
> > 
> > We know that header block count dropped by 4. So we can start from that
> > and each time we reach basic block with exit edge, we know the original
> > count of the edge.  This count is unchanged, so one can rescale
> > probabilities out of that BB accordingly.  If loop has no inner loops,
> > we can just walk the body in RPO and propagate scales downwards and we
> > sould arrive to right result
> 
> That should work for alternate exits as well, no?
Yes, i think it could omstly work for acyclic bodies. I ended up
implementing a special case of this for loop-ch in order to handle
corectly loop invariant conditionals.  Will send patch after some
cleanups. (There seems to be more loop invariant conditionals in real
code than I would tought)

Tampering only with loop exit probabilities is not always enought.
If you have:
  while (1)
    if (test1)
      {
        if (test2)
	  break;
      }
increasing count of exit may require increasing probablity of the outer
conditional.   Do we support this in vectorization at all and if so, do
we know something here?
For example if the test1 is triggered if test1 is true in one of
iterations packed togehter, its probability also increases by
vectorization factor.  

We run into this in peeling i.e. when we prove that test1 will trigger
undefined behaviour after one or two iterations but the orignal
esimtated profile believes in higher iteration count.  I added special
case for this yesterday to avoid turning if (test2) to 100% in this case
as that triggers strange codegen in some of fortran testcases.

We also can have
  while (1)
    while (test1)
      {
        if (test2)
	  break;
      }
Which is harder because changing probability of test2 affects number
of iteraitons of the inner loop.  So I am giving up on this.
I think currently it happens mostly with unlooping.
> 
> > I originally added the bound parameter to handle prologues/epilogues
> > which gets new artificial bound.  In prologue I think you are right that
> > the flow will be probably directed to the conditional counting
> > iterations.
> 
> I suppose we'd need to scale both main and epilogue together since
> the epilogue "steals" from the main loop counts.  Likewise if there's
> a skip edge around the vector loop.  I think currently we simply
> set the edge probability of those skip conds rather than basing
> this off the niter values they work on.  Aka if (niter < VF) goto
> epilogue; do {} while (niter / VF); epilogue: do {} while (niter);
> 
> There's also the cost model which might require niter > VF to enter
> the main loop body.

I think I mostly understand this since we was playing with it with Ondra's
histograms (that can be used to get some of the unknowns in the
transformation right). The unknowns (how many times we end up jumpig to
epilogue, for instance, probably can't be reasonably well guessed if we
do not know the loop histogram which currently we know only if we prove
that loop has constant number of iterations.  So I am trying to get
right at least this case first.

Theoretically correct approach would be to first determine entry counts
of prologue and epilogue, then produce what we believe to be correct
profile of those and subtract it from the main loop profile updating
also probabilities in basic blocks where we did nontrivial changes while
updating prologs/epilogs. Finally scale down the main loop profile and
increase exit probabilities.

Honza

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-10  8:33               ` Jan Hubicka
@ 2023-07-10  9:24                 ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-10  9:24 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Tamar Christina, gcc-patches, nd, jlaw

On Mon, 10 Jul 2023, Jan Hubicka wrote:

> Hi,
> over weekend I found that vectorizer is missing scale_loop_profile for
> epilogues.  It already adjusts loop_info to set max iteraitons, so
> adding it was easy. However now predicts the first loop to iterate at
> most once (which is too much, I suppose it forgets to divide by epilogue
> unrolling factor) and second never.
> > 
> > The -O2 cost model doesn't want to do epilogues:
> > 
> >   /* If using the "very cheap" model. reject cases in which we'd keep
> >      a copy of the scalar code (even if we might be able to vectorize it).  
> > */
> >   if (loop_cost_model (loop) == VECT_COST_MODEL_VERY_CHEAP
> >       && (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> >           || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> >           || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)))
> >     {
> >       if (dump_enabled_p ())
> >         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >                          "some scalar iterations would need to be 
> > peeled\n");
> >       return 0;
> >     }
> > 
> > it's because of the code size increase.
> 
> I know, however -O2 is not -Os and here the tradeoffs of
> performance/code size seems a lot better than other code expanding
> things we do at -O2 (such as the unrolling 3 times).
> I think we set the very cheap cost model very conservatively in order to
> get -ftree-vectorize enabled with -O2 and there is some room for finding
> right balance.
> 
> I get:
> 
> jan@localhost:~> cat t.c
> int a[99];
> __attribute((noipa, weak))
> void
> test()
> {
>         for (int i = 0 ; i < 99; i++)
>                 a[i]++;
> }
> void
> main()
> {
>         for (int j = 0; j < 10000000; j++)
>                 test();
> }
> jan@localhost:~> gcc -O2 t.c -fno-unroll-loops ; time ./a.out
> 
> real    0m0.529s
> user    0m0.528s
> sys     0m0.000s
> 
> jan@localhost:~> gcc -O2 t.c ; time ./a.out
> 
> real    0m0.427s
> user    0m0.426s
> sys     0m0.000s
> jan@localhost:~> gcc -O3 t.c ; time ./a.out
> 
> real    0m0.136s
> user    0m0.135s
> sys     0m0.000s
> jan@localhost:~> clang -O2 t.c ; time ./a.out
> <warnings>
> 
> real    0m0.116s
> user    0m0.116s
> sys     0m0.000s
> 
> Code size (of function test):
>  gcc -O2 -fno-unroll-loops 17  bytes
>  gcc -O2                   29  bytes
>  gcc -O3                   50  bytes
>  clang -O2                 510 bytes
> 
> So unroling 70% code size growth for 23% speedup.
> Vectorizing is 294% code size growth for 388% speedup
> Clang does 3000% codde size growth for 456% speedup
> > 
> > That's clearly much larger code.  On x86 we're also fighting with
> > large instruction encodings here, in particular EVEX for AVX512 is
> > "bad" here.  We hardly get more than two instructions decoded per
> > cycle due to their size.
> 
> Agreed, I found it surprising clang does that much of complette unrolling
> at -O2. However vectorizing and not unrolling here seems like it may be
> a better default for -O2 than what we do currently...

I was also playing with AVX512 fully masked loops here which avoids
the epilogue but due to the instruction encoding size that doesn't
usually win.  I agree that size isn't everything at least for -O2.

Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-10  9:23               ` Jan Hubicka
@ 2023-07-10  9:29                 ` Richard Biener
  2023-07-11  9:28                   ` Jan Hubicka
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-07-10  9:29 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Tamar Christina, gcc-patches, nd, jlaw

On Mon, 10 Jul 2023, Jan Hubicka wrote:

> > On Fri, 7 Jul 2023, Jan Hubicka wrote:
> > 
> > > > 
> > > > Looks good, but I wonder what we can do to at least make the
> > > > multiple exit case behave reasonably?  The vectorizer keeps track
> > > 
> > > > of a "canonical" exit, would it be possible to pass in the main
> > > > exit edge and use that instead of single_exit (), would other
> > > > exits then behave somewhat reasonable or would we totally screw
> > > > things up here?  That is, the "canonical" exit would be the
> > > > counting exit while the other exits are on data driven conditions
> > > > and thus wouldn't change probability when we reduce the number
> > > > of iterations(?)
> > > 
> > > I can add canonical_exit parameter and make the function to direct flow
> > > to it if possible.  However overall I think fixup depends on what
> > > transformation led to the change.
> > 
> > I think the vectorizer knows there's a single counting IV and all
> > other exits are dependent on data processed, so the scaling the
> > vectorizer just changes the counting IV.  So I think it makes
> > sense to pass that exit to the function in all cases.
> 
> It really seems to me that vectorized loop is like N loops happening 
> in parallel, so the probabilities of alternative exits grows as well.
> But canonical exit is right thing to do for prologues - here we really
> add extra conditions to the iteration counting exit.
> > 
> > > Assuming that vectorizer did no prologues and apilogues and we
> > > vectorized with factor N, then I think the update could be done more
> > > specifically as follows.
> > > 
> > > We know that header block count dropped by 4. So we can start from that
> > > and each time we reach basic block with exit edge, we know the original
> > > count of the edge.  This count is unchanged, so one can rescale
> > > probabilities out of that BB accordingly.  If loop has no inner loops,
> > > we can just walk the body in RPO and propagate scales downwards and we
> > > sould arrive to right result
> > 
> > That should work for alternate exits as well, no?
> Yes, i think it could omstly work for acyclic bodies. I ended up
> implementing a special case of this for loop-ch in order to handle
> corectly loop invariant conditionals.  Will send patch after some
> cleanups. (There seems to be more loop invariant conditionals in real
> code than I would tought)
> 
> Tampering only with loop exit probabilities is not always enought.
> If you have:
>   while (1)
>     if (test1)
>       {
>         if (test2)
> 	  break;
>       }
> increasing count of exit may require increasing probablity of the outer
> conditional.   Do we support this in vectorization at all and if so, do
> we know something here?

Tamar would need to answer this but without early break vectorization
the if-conversion pass will flatten everything and I think even early
breaks will be in the end a non-nested sequence of BBs with
exit conds at the end (or a loopback branch).

Note the (scalar) epilogue is copied from the original scalar loop
body so it doesn't see any if-conversion.

> For example if the test1 is triggered if test1 is true in one of
> iterations packed togehter, its probability also increases by
> vectorization factor.  
> 
> We run into this in peeling i.e. when we prove that test1 will trigger
> undefined behaviour after one or two iterations but the orignal
> esimtated profile believes in higher iteration count.  I added special
> case for this yesterday to avoid turning if (test2) to 100% in this case
> as that triggers strange codegen in some of fortran testcases.
> 
> We also can have
>   while (1)
>     while (test1)
>       {
>         if (test2)
> 	  break;
>       }
> Which is harder because changing probability of test2 affects number
> of iteraitons of the inner loop.  So I am giving up on this.
> I think currently it happens mostly with unlooping.

What I saw most wrecking the profile is when passes turn
if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup
which then simply deletes one of the outgoing edges without doing
anything to the (guessed) profile.

> > > I originally added the bound parameter to handle prologues/epilogues
> > > which gets new artificial bound.  In prologue I think you are right that
> > > the flow will be probably directed to the conditional counting
> > > iterations.
> > 
> > I suppose we'd need to scale both main and epilogue together since
> > the epilogue "steals" from the main loop counts.  Likewise if there's
> > a skip edge around the vector loop.  I think currently we simply
> > set the edge probability of those skip conds rather than basing
> > this off the niter values they work on.  Aka if (niter < VF) goto
> > epilogue; do {} while (niter / VF); epilogue: do {} while (niter);
> > 
> > There's also the cost model which might require niter > VF to enter
> > the main loop body.
> 
> I think I mostly understand this since we was playing with it with Ondra's
> histograms (that can be used to get some of the unknowns in the
> transformation right). The unknowns (how many times we end up jumpig to
> epilogue, for instance, probably can't be reasonably well guessed if we
> do not know the loop histogram which currently we know only if we prove
> that loop has constant number of iterations.  So I am trying to get
> right at least this case first.
> 
> Theoretically correct approach would be to first determine entry counts
> of prologue and epilogue, then produce what we believe to be correct
> profile of those and subtract it from the main loop profile updating
> also probabilities in basic blocks where we did nontrivial changes while
> updating prologs/epilogs. Finally scale down the main loop profile and
> increase exit probabilities.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds
  2023-07-04 12:05   ` Richard Biener
@ 2023-07-10 15:32     ` Tamar Christina
  2023-07-11 11:03       ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-07-10 15:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 7276 bytes --]

> > -  *type_out = STMT_VINFO_VECTYPE (stmt_info);
> > +  if (cond_cst)
> > +    {
> > +      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
> > +      pattern_stmt
> > +	= gimple_build_cond (gimple_cond_code (cond_stmt),
> > +			     gimple_get_lhs (pattern_stmt),
> > +			     fold_convert (ret_type, cond_cst),
> > +			     gimple_cond_true_label (cond_stmt),
> > +			     gimple_cond_false_label (cond_stmt));
> > +      *type_out = STMT_VINFO_VECTYPE (stmt_info);
> 
> is there any vectype set for a gcond?

No, because gconds can't be codegen'd yet, atm we must replace the original
gcond when generating code.

However looking at the diff this code, don't think the else is needed here.
Testing an updated patch.

> 
> I must say the flow of the function is a bit convoluted now.  Is it possible to
> factor out a helper so we can fully separate the gassign vs. gcond handling in
> this function?

I am not sure, the only place the changes are are at the start (e.g. how we determine bf_stmt)
and how we determine ret_type, and when determining shift_first for the single use case.

Now I can't move the ret_type anywhere as I need to decompose bf_stmt first.  And the shift_first
can be simplified by moving it up into the part that determined bf_stmt, but then we walk the
immediate uses even on cases where we early exit.  Which seems inefficient.

Then there's the final clause which just generates an additional gcond if the original statement was
a gcond. But not sure that'll help, since it's just something done *in addition* to the normal assign.

So there doesn't seem to be enough, or big enough divergence to justify a split.   I have however made
an attempt at cleaning it up a bit, is this one better?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE
	from original statement.
	(vect_recog_bitfield_ref_pattern): Support bitfields in gcond.

Co-Authored-By:  Andre Vieira <andre.simoesdiasvieira@arm.com>

--- inline copy of patch ---

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 60bc9be6819af9bd28a81430869417965ba9d82d..b842f7d983405cd04f6760be7d91c1f55b30aac4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
   STMT_VINFO_DEF_TYPE (pattern_stmt_info)
     = STMT_VINFO_DEF_TYPE (orig_stmt_info);
+  STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info);
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
@@ -2441,6 +2442,10 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
    bf_value = BIT_FIELD_REF (container, bitsize, bitpos);
    result = (type_out) bf_value;
 
+   or
+
+   if (BIT_FIELD_REF (container, bitsize, bitpos) `cmp` <constant>)
+
    where type_out is a non-bitfield type, that is to say, it's precision matches
    2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
 
@@ -2450,6 +2455,10 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
    here it starts with:
    result = (type_out) bf_value;
 
+   or
+
+   if (BIT_FIELD_REF (container, bitsize, bitpos) `cmp` <constant>)
+
    Output:
 
    * TYPE_OUT: The vector type of the output of this pattern.
@@ -2482,33 +2491,45 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
 
    The shifting is always optional depending on whether bitpos != 0.
 
+   When the original bitfield was inside a gcond then an new gcond is also
+   generated with the newly `result` as the operand to the comparison.
+
 */
 
 static gimple *
 vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 				 tree *type_out)
 {
-  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
-
-  if (!first_stmt)
-    return NULL;
-
-  gassign *bf_stmt;
-  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
-      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+  gimple *bf_stmt = NULL;
+  tree lhs = NULL_TREE;
+  tree ret_type = NULL_TREE;
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (gcond *cond_stmt = dyn_cast <gcond *> (stmt))
+    {
+      tree op = gimple_cond_lhs (cond_stmt);
+      if (TREE_CODE (op) != SSA_NAME)
+	return NULL;
+      bf_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (op));
+      if (TREE_CODE (gimple_cond_rhs (cond_stmt)) != INTEGER_CST)
+	return NULL;
+    }
+  else if (is_gimple_assign (stmt)
+	   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))
+	   && TREE_CODE (gimple_assign_rhs1 (stmt)) == SSA_NAME)
     {
-      gimple *second_stmt
-	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
+      gimple *second_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
       bf_stmt = dyn_cast <gassign *> (second_stmt);
-      if (!bf_stmt
-	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
-	return NULL;
+      lhs = gimple_assign_lhs (stmt);
+      ret_type = TREE_TYPE (lhs);
     }
-  else
+
+  if (!bf_stmt
+      || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
     return NULL;
 
   tree bf_ref = gimple_assign_rhs1 (bf_stmt);
   tree container = TREE_OPERAND (bf_ref, 0);
+  ret_type = ret_type ? ret_type : TREE_TYPE (container);
 
   if (!bit_field_offset (bf_ref).is_constant ()
       || !bit_field_size (bf_ref).is_constant ()
@@ -2522,8 +2543,6 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 
   gimple *use_stmt, *pattern_stmt;
   use_operand_p use_p;
-  tree ret = gimple_assign_lhs (first_stmt);
-  tree ret_type = TREE_TYPE (ret);
   bool shift_first = true;
   tree container_type = TREE_TYPE (container);
   tree vectype = get_vectype_for_scalar_type (vinfo, container_type);
@@ -2560,7 +2579,7 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
   /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
      PLUS_EXPR then do the shift last as some targets can combine the shift and
      add into a single instruction.  */
-  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+  if (lhs && single_imm_use (lhs, &use_p, &use_stmt))
     {
       if (gimple_code (use_stmt) == GIMPLE_ASSIGN
 	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
@@ -2620,6 +2639,19 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 			       NOP_EXPR, result);
     }
 
+  if (!lhs)
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
+      tree cond_cst = gimple_cond_rhs (cond_stmt);
+      pattern_stmt
+	= gimple_build_cond (gimple_cond_code (cond_stmt),
+			     gimple_get_lhs (pattern_stmt),
+			     fold_convert (ret_type, cond_cst),
+			     gimple_cond_true_label (cond_stmt),
+			     gimple_cond_false_label (cond_stmt));
+    }
+
   *type_out = STMT_VINFO_VECTYPE (stmt_info);
   vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);


[-- Attachment #2: rb17499.patch --]
[-- Type: application/octet-stream, Size: 4978 bytes --]

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 60bc9be6819af9bd28a81430869417965ba9d82d..b842f7d983405cd04f6760be7d91c1f55b30aac4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
   STMT_VINFO_DEF_TYPE (pattern_stmt_info)
     = STMT_VINFO_DEF_TYPE (orig_stmt_info);
+  STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info);
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
@@ -2441,6 +2442,10 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
    bf_value = BIT_FIELD_REF (container, bitsize, bitpos);
    result = (type_out) bf_value;
 
+   or
+
+   if (BIT_FIELD_REF (container, bitsize, bitpos) `cmp` <constant>)
+
    where type_out is a non-bitfield type, that is to say, it's precision matches
    2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
 
@@ -2450,6 +2455,10 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
    here it starts with:
    result = (type_out) bf_value;
 
+   or
+
+   if (BIT_FIELD_REF (container, bitsize, bitpos) `cmp` <constant>)
+
    Output:
 
    * TYPE_OUT: The vector type of the output of this pattern.
@@ -2482,33 +2491,45 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
 
    The shifting is always optional depending on whether bitpos != 0.
 
+   When the original bitfield was inside a gcond then an new gcond is also
+   generated with the newly `result` as the operand to the comparison.
+
 */
 
 static gimple *
 vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 				 tree *type_out)
 {
-  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
-
-  if (!first_stmt)
-    return NULL;
-
-  gassign *bf_stmt;
-  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
-      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+  gimple *bf_stmt = NULL;
+  tree lhs = NULL_TREE;
+  tree ret_type = NULL_TREE;
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (gcond *cond_stmt = dyn_cast <gcond *> (stmt))
+    {
+      tree op = gimple_cond_lhs (cond_stmt);
+      if (TREE_CODE (op) != SSA_NAME)
+	return NULL;
+      bf_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (op));
+      if (TREE_CODE (gimple_cond_rhs (cond_stmt)) != INTEGER_CST)
+	return NULL;
+    }
+  else if (is_gimple_assign (stmt)
+	   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))
+	   && TREE_CODE (gimple_assign_rhs1 (stmt)) == SSA_NAME)
     {
-      gimple *second_stmt
-	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
+      gimple *second_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
       bf_stmt = dyn_cast <gassign *> (second_stmt);
-      if (!bf_stmt
-	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
-	return NULL;
+      lhs = gimple_assign_lhs (stmt);
+      ret_type = TREE_TYPE (lhs);
     }
-  else
+
+  if (!bf_stmt
+      || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
     return NULL;
 
   tree bf_ref = gimple_assign_rhs1 (bf_stmt);
   tree container = TREE_OPERAND (bf_ref, 0);
+  ret_type = ret_type ? ret_type : TREE_TYPE (container);
 
   if (!bit_field_offset (bf_ref).is_constant ()
       || !bit_field_size (bf_ref).is_constant ()
@@ -2522,8 +2543,6 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 
   gimple *use_stmt, *pattern_stmt;
   use_operand_p use_p;
-  tree ret = gimple_assign_lhs (first_stmt);
-  tree ret_type = TREE_TYPE (ret);
   bool shift_first = true;
   tree container_type = TREE_TYPE (container);
   tree vectype = get_vectype_for_scalar_type (vinfo, container_type);
@@ -2560,7 +2579,7 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
   /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
      PLUS_EXPR then do the shift last as some targets can combine the shift and
      add into a single instruction.  */
-  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
+  if (lhs && single_imm_use (lhs, &use_p, &use_stmt))
     {
       if (gimple_code (use_stmt) == GIMPLE_ASSIGN
 	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
@@ -2620,6 +2639,19 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 			       NOP_EXPR, result);
     }
 
+  if (!lhs)
+    {
+      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
+      tree cond_cst = gimple_cond_rhs (cond_stmt);
+      pattern_stmt
+	= gimple_build_cond (gimple_cond_code (cond_stmt),
+			     gimple_get_lhs (pattern_stmt),
+			     fold_convert (ret_type, cond_cst),
+			     gimple_cond_true_label (cond_stmt),
+			     gimple_cond_false_label (cond_stmt));
+    }
+
   *type_out = STMT_VINFO_VECTYPE (stmt_info);
   vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
 

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-10  9:29                 ` Richard Biener
@ 2023-07-11  9:28                   ` Jan Hubicka
  2023-07-11 10:31                     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Jan Hubicka @ 2023-07-11  9:28 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tamar Christina, gcc-patches, nd, jlaw

> 
> What I saw most wrecking the profile is when passes turn
> if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup
> which then simply deletes one of the outgoing edges without doing
> anything to the (guessed) profile.

Yep, I agree that this is disturbing.  At the cfg cleanup time one can
hardly do anything useful, since the knowledge of transform that caused
profile inconsistency is forgotten.  however I think it is not a complete
disaster.

With profile feedback the most common case of this happening is a
situation where we duplicated code (by inlining, unrolling etc.) into a
context where it behaves differently then the typical behaviour
represented by the profile.

So if one ends up zapping edge with large probability, one also knows
that the code being optimized does not exhibit typical behaviour from
the train run and thus is not very hot.  So profile inconsistency should
not affect performance that much.

So doing nothing is IMO may end up being safter than trying to get the
in/out counts right without really known what is going on.

This is mostly about the scenario "constant propagated this conditional
and profile disagrees with me".  There are other cases where update is
IMO important.  i.e. vectorizer forgetting to cap #of iterations of
epilogue may cause issue since the epilogue loop looks more frequent
than the main vectorized loop and it may cause IRA to insert spilling
into it or so.

When we duplicate we have chance to figure out profile updates.
Also we may try to get as much as possible done early.
I think we should again do loop header copying that does not expand code
at early opts again.  I have some more plans on cleaning up loop-ch and
then we can give it a try.

With guessed profile we always have option to re-do the propagation.
There is TODO_rebuild_frequencies for that which we do after inlining.
This is mostly to handle possible overflows on large loops nests
constructed by inliner.  

We can re-propagate once again after late cleanup passes. Looking at the
queue, we have:

      NEXT_PASS (pass_remove_cgraph_callee_edges);
      /* Initial scalar cleanups before alias computation.
         They ensure memory accesses are not indirect wherever possible.  */
      NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
      NEXT_PASS (pass_ccp, true /* nonzero_p */);
      /* After CCP we rewrite no longer addressed locals into SSA
         form if possible.  */
      NEXT_PASS (pass_object_sizes);
      NEXT_PASS (pass_post_ipa_warn);
      /* Must run before loop unrolling.  */
      NEXT_PASS (pass_warn_access, /*early=*/true);
      NEXT_PASS (pass_complete_unrolli);
^^^^ here we care about profile
      NEXT_PASS (pass_backprop);
      NEXT_PASS (pass_phiprop);
      NEXT_PASS (pass_forwprop);
      /* pass_build_alias is a dummy pass that ensures that we
         execute TODO_rebuild_alias at this point.  */
      NEXT_PASS (pass_build_alias);
      NEXT_PASS (pass_return_slot);
      NEXT_PASS (pass_fre, true /* may_iterate */);
      NEXT_PASS (pass_merge_phi);
      NEXT_PASS (pass_thread_jumps_full, /*first=*/true);
^^^^ here

By now we did CCP and FRE so we likely optimized out most of constant
conditionals exposed by inline.
Honza

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-11  9:28                   ` Jan Hubicka
@ 2023-07-11 10:31                     ` Richard Biener
  2023-07-11 12:40                       ` Jan Hubicka
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-07-11 10:31 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Tamar Christina, gcc-patches, nd, jlaw

On Tue, 11 Jul 2023, Jan Hubicka wrote:

> > 
> > What I saw most wrecking the profile is when passes turn
> > if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup
> > which then simply deletes one of the outgoing edges without doing
> > anything to the (guessed) profile.
> 
> Yep, I agree that this is disturbing.  At the cfg cleanup time one can
> hardly do anything useful, since the knowledge of transform that caused
> profile inconsistency is forgotten.  however I think it is not a complete
> disaster.
> 
> With profile feedback the most common case of this happening is a
> situation where we duplicated code (by inlining, unrolling etc.) into a
> context where it behaves differently then the typical behaviour
> represented by the profile.
> 
> So if one ends up zapping edge with large probability, one also knows
> that the code being optimized does not exhibit typical behaviour from
> the train run and thus is not very hot.  So profile inconsistency should
> not affect performance that much.
> 
> So doing nothing is IMO may end up being safter than trying to get the
> in/out counts right without really known what is going on.
> 
> This is mostly about the scenario "constant propagated this conditional
> and profile disagrees with me".  There are other cases where update is
> IMO important.  i.e. vectorizer forgetting to cap #of iterations of
> epilogue may cause issue since the epilogue loop looks more frequent
> than the main vectorized loop and it may cause IRA to insert spilling
> into it or so.
> 
> When we duplicate we have chance to figure out profile updates.
> Also we may try to get as much as possible done early.
> I think we should again do loop header copying that does not expand code
> at early opts again.  I have some more plans on cleaning up loop-ch and
> then we can give it a try.
> 
> With guessed profile we always have option to re-do the propagation.
> There is TODO_rebuild_frequencies for that which we do after inlining.
> This is mostly to handle possible overflows on large loops nests
> constructed by inliner.  
> 
> We can re-propagate once again after late cleanup passes. Looking at the
> queue, we have:
> 
>       NEXT_PASS (pass_remove_cgraph_callee_edges);
>       /* Initial scalar cleanups before alias computation.
>          They ensure memory accesses are not indirect wherever possible.  */
>       NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
>       NEXT_PASS (pass_ccp, true /* nonzero_p */);
>       /* After CCP we rewrite no longer addressed locals into SSA
>          form if possible.  */
>       NEXT_PASS (pass_object_sizes);
>       NEXT_PASS (pass_post_ipa_warn);
>       /* Must run before loop unrolling.  */
>       NEXT_PASS (pass_warn_access, /*early=*/true);
>       NEXT_PASS (pass_complete_unrolli);
> ^^^^ here we care about profile
>       NEXT_PASS (pass_backprop);
>       NEXT_PASS (pass_phiprop);
>       NEXT_PASS (pass_forwprop);
>       /* pass_build_alias is a dummy pass that ensures that we
>          execute TODO_rebuild_alias at this point.  */
>       NEXT_PASS (pass_build_alias);
>       NEXT_PASS (pass_return_slot);
>       NEXT_PASS (pass_fre, true /* may_iterate */);
>       NEXT_PASS (pass_merge_phi);
>       NEXT_PASS (pass_thread_jumps_full, /*first=*/true);
> ^^^^ here
> 
> By now we did CCP and FRE so we likely optimized out most of constant
> conditionals exposed by inline.

So maybe we should simply delay re-propagation of the profile?  I
think cunrolli doesn't so much care about the profile - cunrolli
is (was) about abstraction removal.  Jump threading should be
the first pass to care.

Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds
  2023-07-10 15:32     ` Tamar Christina
@ 2023-07-11 11:03       ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-11 11:03 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 10 Jul 2023, Tamar Christina wrote:

> > > -  *type_out = STMT_VINFO_VECTYPE (stmt_info);
> > > +  if (cond_cst)
> > > +    {
> > > +      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
> > > +      pattern_stmt
> > > +	= gimple_build_cond (gimple_cond_code (cond_stmt),
> > > +			     gimple_get_lhs (pattern_stmt),
> > > +			     fold_convert (ret_type, cond_cst),
> > > +			     gimple_cond_true_label (cond_stmt),
> > > +			     gimple_cond_false_label (cond_stmt));
> > > +      *type_out = STMT_VINFO_VECTYPE (stmt_info);
> > 
> > is there any vectype set for a gcond?
> 
> No, because gconds can't be codegen'd yet, atm we must replace the original
> gcond when generating code.
> 
> However looking at the diff this code, don't think the else is needed here.
> Testing an updated patch.
> 
> > 
> > I must say the flow of the function is a bit convoluted now.  Is it possible to
> > factor out a helper so we can fully separate the gassign vs. gcond handling in
> > this function?
> 
> I am not sure, the only place the changes are are at the start (e.g. how we determine bf_stmt)
> and how we determine ret_type, and when determining shift_first for the single use case.
> 
> Now I can't move the ret_type anywhere as I need to decompose bf_stmt first.  And the shift_first
> can be simplified by moving it up into the part that determined bf_stmt, but then we walk the
> immediate uses even on cases where we early exit.  Which seems inefficient.
> 
> Then there's the final clause which just generates an additional gcond if the original statement was
> a gcond. But not sure that'll help, since it's just something done *in addition* to the normal assign.
> 
> So there doesn't seem to be enough, or big enough divergence to justify a split.   I have however made
> an attempt at cleaning it up a bit, is this one better?

Yeah, it is.
 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE
> 	from original statement.
> 	(vect_recog_bitfield_ref_pattern): Support bitfields in gcond.
> 
> Co-Authored-By:  Andre Vieira <andre.simoesdiasvieira@arm.com>
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 60bc9be6819af9bd28a81430869417965ba9d82d..b842f7d983405cd04f6760be7d91c1f55b30aac4 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
>    STMT_VINFO_DEF_TYPE (pattern_stmt_info)
>      = STMT_VINFO_DEF_TYPE (orig_stmt_info);
> +  STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info);
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> @@ -2441,6 +2442,10 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
>     bf_value = BIT_FIELD_REF (container, bitsize, bitpos);
>     result = (type_out) bf_value;
>  
> +   or
> +
> +   if (BIT_FIELD_REF (container, bitsize, bitpos) `cmp` <constant>)
> +
>     where type_out is a non-bitfield type, that is to say, it's precision matches
>     2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
>  
> @@ -2450,6 +2455,10 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
>     here it starts with:
>     result = (type_out) bf_value;
>  
> +   or
> +
> +   if (BIT_FIELD_REF (container, bitsize, bitpos) `cmp` <constant>)
> +
>     Output:
>  
>     * TYPE_OUT: The vector type of the output of this pattern.
> @@ -2482,33 +2491,45 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
>  
>     The shifting is always optional depending on whether bitpos != 0.
>  
> +   When the original bitfield was inside a gcond then an new gcond is also
> +   generated with the newly `result` as the operand to the comparison.
> +
>  */
>  
>  static gimple *
>  vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
>  				 tree *type_out)
>  {
> -  gassign *first_stmt = dyn_cast <gassign *> (stmt_info->stmt);
> -
> -  if (!first_stmt)
> -    return NULL;
> -
> -  gassign *bf_stmt;
> -  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
> -      && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
> +  gimple *bf_stmt = NULL;
> +  tree lhs = NULL_TREE;
> +  tree ret_type = NULL_TREE;
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  if (gcond *cond_stmt = dyn_cast <gcond *> (stmt))
> +    {
> +      tree op = gimple_cond_lhs (cond_stmt);
> +      if (TREE_CODE (op) != SSA_NAME)
> +	return NULL;
> +      bf_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (op));
> +      if (TREE_CODE (gimple_cond_rhs (cond_stmt)) != INTEGER_CST)
> +	return NULL;
> +    }
> +  else if (is_gimple_assign (stmt)
> +	   && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))
> +	   && TREE_CODE (gimple_assign_rhs1 (stmt)) == SSA_NAME)
>      {
> -      gimple *second_stmt
> -	= SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
> +      gimple *second_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
>        bf_stmt = dyn_cast <gassign *> (second_stmt);
> -      if (!bf_stmt
> -	  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
> -	return NULL;
> +      lhs = gimple_assign_lhs (stmt);
> +      ret_type = TREE_TYPE (lhs);
>      }
> -  else
> +
> +  if (!bf_stmt
> +      || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
>      return NULL;
>  
>    tree bf_ref = gimple_assign_rhs1 (bf_stmt);
>    tree container = TREE_OPERAND (bf_ref, 0);
> +  ret_type = ret_type ? ret_type : TREE_TYPE (container);
>  
>    if (!bit_field_offset (bf_ref).is_constant ()
>        || !bit_field_size (bf_ref).is_constant ()
> @@ -2522,8 +2543,6 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
>  
>    gimple *use_stmt, *pattern_stmt;
>    use_operand_p use_p;
> -  tree ret = gimple_assign_lhs (first_stmt);
> -  tree ret_type = TREE_TYPE (ret);
>    bool shift_first = true;
>    tree container_type = TREE_TYPE (container);
>    tree vectype = get_vectype_for_scalar_type (vinfo, container_type);
> @@ -2560,7 +2579,7 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
>    /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
>       PLUS_EXPR then do the shift last as some targets can combine the shift and
>       add into a single instruction.  */
> -  if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt))
> +  if (lhs && single_imm_use (lhs, &use_p, &use_stmt))
>      {
>        if (gimple_code (use_stmt) == GIMPLE_ASSIGN
>  	  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
> @@ -2620,6 +2639,19 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
>  			       NOP_EXPR, result);
>      }
>  
> +  if (!lhs)
> +    {
> +      append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
> +      gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
> +      tree cond_cst = gimple_cond_rhs (cond_stmt);
> +      pattern_stmt
> +	= gimple_build_cond (gimple_cond_code (cond_stmt),
> +			     gimple_get_lhs (pattern_stmt),
> +			     fold_convert (ret_type, cond_cst),
> +			     gimple_cond_true_label (cond_stmt),
> +			     gimple_cond_false_label (cond_stmt));
> +    }
> +
>    *type_out = STMT_VINFO_VECTYPE (stmt_info);
>    vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt);
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-11 10:31                     ` Richard Biener
@ 2023-07-11 12:40                       ` Jan Hubicka
  2023-07-11 13:04                         ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Jan Hubicka @ 2023-07-11 12:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tamar Christina, gcc-patches, nd, jlaw

> > By now we did CCP and FRE so we likely optimized out most of constant
> > conditionals exposed by inline.
> 
> So maybe we should simply delay re-propagation of the profile?  I
> think cunrolli doesn't so much care about the profile - cunrolli
> is (was) about abstraction removal.  Jump threading should be
> the first pass to care.

That is what I was thinking too.  After inlining the profile counts may
be in quite bad shape. If you inline together loop like in exchange that
has large loop nest, we will definitely end up capping counts to avoid
overflow.

cunrolli does:

 ret = tree_unroll_loops_completely (optimize >= 3, false);

which sets may_increase_size to true for -O3 and then

 may_increase_size && optimize_loop_nest_for_speed_p (loop)

which seems reasonable guard and it may get random answers on capped
profile.  It is not big deal to try propagating before cunrolli and then
again before threading and see how much potential this idea has.
I guess I should also double check that the other passes are indeed
safe, but I think it is quite obvoius they should be.

Honza
> 
> Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
  2023-07-11 12:40                       ` Jan Hubicka
@ 2023-07-11 13:04                         ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-11 13:04 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Tamar Christina, gcc-patches, nd, jlaw

On Tue, 11 Jul 2023, Jan Hubicka wrote:

> > > By now we did CCP and FRE so we likely optimized out most of constant
> > > conditionals exposed by inline.
> > 
> > So maybe we should simply delay re-propagation of the profile?  I
> > think cunrolli doesn't so much care about the profile - cunrolli
> > is (was) about abstraction removal.  Jump threading should be
> > the first pass to care.
> 
> That is what I was thinking too.  After inlining the profile counts may
> be in quite bad shape. If you inline together loop like in exchange that
> has large loop nest, we will definitely end up capping counts to avoid
> overflow.
> 
> cunrolli does:
> 
>  ret = tree_unroll_loops_completely (optimize >= 3, false);

Ah, yeah - that used to be false, false ...

> which sets may_increase_size to true for -O3 and then
> 
>  may_increase_size && optimize_loop_nest_for_speed_p (loop)
> 
> which seems reasonable guard and it may get random answers on capped
> profile.  It is not big deal to try propagating before cunrolli and then
> again before threading and see how much potential this idea has.
> I guess I should also double check that the other passes are indeed
> safe, but I think it is quite obvoius they should be.

Yeah.

Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables
  2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
@ 2023-07-13 11:32   ` Richard Biener
  2023-07-13 11:54     ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-07-13 11:32 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 28 Jun 2023, Tamar Christina wrote:

> Hi All,
> 
> This patch splits off the vectorizer's understanding of the main loop exit off
> from the normal loop infrastructure.
> 
> Essentially we're relaxing the use of single_exit() in the vectorizer as we will
> no longer have a single single and need a well defined split between the main
> and secondary exits of loops for vectorization.

General comments here, more comments inline below.

> These new values were added to the loop class even though they're only used by
> the vectorizer for a couple of reasons:
>   - We need access to them in places where we have no loop_vinfo.

I've been passing down loop_vinfo to more places when cleaning up stuff
so this shouldn't be a limiting factor.  Passing down the relevant edge
if it's the middle-end that needs access is then the other option.

>   - We only have a single loop_vinfo for each loop under consideration, however
>     that same loop can have different copies, e.g. peeled/versioned copies or
>     the scalar variant of the loop.  For each of these we still need to be able
>     to have a coherent exit definition.

I've noticed this as well dealing with how epilogue vectorization is
bolted on ... I think in an ideal world the main loop vectorization
would create loop_vinfo for each of those loops so it can push info
there.

> For these reason the placement in the loop class was the only way to keep the
> book keeping together with the loops and avoid possibly expensive lookups.
> 
> For this version of the patch the `main` exit of a loop is defined as the exit
> that is closest to the loop latch. This is stored in vec_loop_iv.  The remaining
> exits which are relevant for the vectorizer are stored inside
> vec_loop_alt_exits.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* cfgloop.cc (alloc_loop): Initialize vec_loop_iv.
> 	* cfgloop.h (class loop): Add vec_loop_iv and vec_loop_alt_exits.
> 	* doc/loop.texi: Document get_edge_condition.
> 	* tree-loop-distribution.cc (loop_distribution::distribute_loop):
> 	Initialize vec_loop_iv since loop distributions calls loop peeling which
> 	only understands vec_loop_iv now.
> 	* tree-scalar-evolution.cc (get_edge_condition): New.
> 	(get_loop_exit_condition): Refactor into get_edge_condition.
> 	* tree-scalar-evolution.h (get_edge_condition): New.
> 	* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Update use
> 	of single_exit.
> 	* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
> 	vect_set_loop_condition_normal, vect_set_loop_condition,
> 	slpeel_tree_duplicate_loop_to_edge_cfg, slpeel_can_duplicate_loop_p,
> 	find_loop_location, vect_update_ivs_after_vectorizer,
> 	vect_gen_vector_loop_niters_mult_vf, find_guard_arg, vect_do_peeling):
> 	Replace usages of single_exit.
> 	(vec_init_exit_info): New.
> 	* tree-vect-loop.cc (vect_analyze_loop_form,
> 	vect_create_epilog_for_reduction, vectorizable_live_operation,
> 	scale_profile_for_vect_loop, vect_transform_loop): New.
> 	* tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_ALT_EXITS,
> 	vec_init_exit_info): New.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index e7ac2b5f3db55de3dbbab7bd2bfe08388f4ec533..cab82d7960e5be517bba2621f7f4888e7bf3c295 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -272,6 +272,14 @@ public:
>       the basic-block from being collected but its index can still be
>       reused.  */
>    basic_block former_header;
> +
> +  /* The controlling loop IV for the current loop when vectorizing.  This IV
> +     controls the natural exits of the loop.  */
> +  edge  GTY ((skip (""))) vec_loop_iv;
> +
> +  /* If the loop has multiple exits this structure contains the alternate
> +     exits of the loop which are relevant for vectorization.  */
> +  vec<edge> GTY ((skip (""))) vec_loop_alt_exits;

That's a quite heavy representation and as you say it's vectorizer
specific.  May I ask you to eliminate at _least_ vec_loop_alt_exits?
Are there not all exits in that vector?  Note there's already
the list of exits and if you have the canonical counting IV exit
you can match against that to get all the others?

>  };
>  
>  /* Set if the loop is known to be infinite.  */
> diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
> index ccda7415d7037e26048425b5d85f3633a39fd325..98123f7dce98227c8dffe4833e159fbb05596831 100644
> --- a/gcc/cfgloop.cc
> +++ b/gcc/cfgloop.cc
> @@ -355,6 +355,7 @@ alloc_loop (void)
>    loop->nb_iterations_upper_bound = 0;
>    loop->nb_iterations_likely_upper_bound = 0;
>    loop->nb_iterations_estimate = 0;
> +  loop->vec_loop_iv = NULL;
>    return loop;
>  }
>  
> diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
> index b357e9de7bcb1898ab9dda25738b9f003ca6f9f5..4ba6bb2585c81f7af34943b0493b94d5c3a8bf60 100644
> --- a/gcc/doc/loop.texi
> +++ b/gcc/doc/loop.texi
> @@ -212,6 +212,7 @@ relation, and breath-first search order, respectively.
>  @code{NULL} if the loop has more than one exit.  You can only use this
>  function if @code{LOOPS_HAVE_RECORDED_EXITS} is used.
>  @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
> +@item @code{get_edge_condition}: Get the condition belonging to an exit edge.
>  @item @code{just_once_each_iteration_p}: Returns true if the basic block
>  is executed exactly once during each iteration of a loop (that is, it
>  does not belong to a sub-loop, and it dominates the latch of the loop).
> diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
> index cf7c197aaf7919a0ecd56a10db0a42f93707ca58..97879498db46dd3c34181ae9aa6e5476004dd5b5 100644
> --- a/gcc/tree-loop-distribution.cc
> +++ b/gcc/tree-loop-distribution.cc
> @@ -3042,6 +3042,24 @@ loop_distribution::distribute_loop (class loop *loop,
>        return 0;
>      }
>  
> +  /* Loop distribution only does prologue peeling but we still need to
> +     initialize loop exit information.  However we only support single exits at
> +     the moment.  As such, should exit information not have been provided and we
> +     have more than one exit, bail out.  */
> +  if (!(loop->vec_loop_iv = single_exit (loop)))
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	fprintf (dump_file,
> +		 "Loop %d not distributed: too many exits.\n",
> +		 loop->num);
> +
> +      free_rdg (rdg);
> +      loop_nest.release ();
> +      free_data_refs (datarefs_vec);
> +      delete ddrs_table;
> +      return 0;
> +    }
> +
>    data_reference_p dref;
>    for (i = 0; datarefs_vec.iterate (i, &dref); ++i)
>      dref->aux = (void *) (uintptr_t) i;
> diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> index c58a8a16e81573aada38e912b7c58b3e1b23b66d..2e83836911ec8e968e90cf9b489dc7fe121ff80e 100644
> --- a/gcc/tree-scalar-evolution.h
> +++ b/gcc/tree-scalar-evolution.h
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>  
>  extern tree number_of_latch_executions (class loop *);
>  extern gcond *get_loop_exit_condition (const class loop *);
> +extern gcond *get_edge_condition (edge);
>  
>  extern void scev_initialize (void);
>  extern bool scev_initialized_p (void);
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index ba47a684f4b373fb4f2dc16ddb8edb0ef39da6ed..af8be618b0748258132ccbef2d387bfddbe3c16b 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -1293,8 +1293,15 @@ scev_dfs::follow_ssa_edge_expr (gimple *at_stmt, tree expr,
>  gcond *
>  get_loop_exit_condition (const class loop *loop)
>  {
> +  return get_edge_condition (single_exit (loop));
> +}
> +
> +/* If the statement just before the EXIT_EDGE contains a condition then
> +   return the condition, otherwise NULL. */
> +
> +gcond *
> +get_edge_condition (edge exit_edge){

{ belongs to the next line

Please use an overload here, thus

get_loop_exit_condition (edge exit_edge)
{
...

the name 'get_edge_condition' is too generic.

>    gcond *res = NULL;
> -  edge exit_edge = single_exit (loop);
>  
>    if (dump_file && (dump_flags & TDF_SCEV))
>      fprintf (dump_file, "(get_loop_exit_condition \n  ");
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index ebe93832b1e89120eab2fdac0fc30fe35c0356a2..fcc950f528b2d1e044be12424c2df11f692ee8ba 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -2070,7 +2070,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
>  
>    /* Check if we can possibly peel the loop.  */
>    if (!vect_can_advance_ivs_p (loop_vinfo)
> -      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> +      || !slpeel_can_duplicate_loop_p (loop_vinfo,
> +				       LOOP_VINFO_IV_EXIT (loop_vinfo))
>        || loop->inner)
>      do_peeling = false;
>  
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 20f570e4a0d64610d7b63fe492eba5254ab5dc2c..299dfb75e3372b6a91637101b4bab0e82eb560ad 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -904,7 +904,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
>    add_header_seq (loop, header_seq);
>  
>    /* Get a boolean result that tells us whether to iterate.  */
> -  edge exit_edge = single_exit (loop);
> +  edge exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
>    gcond *cond_stmt;
>    if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
>        && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
> @@ -935,7 +935,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
>    if (final_iv)
>      {
>        gassign *assign = gimple_build_assign (final_iv, orig_niters);
> -      gsi_insert_on_edge_immediate (single_exit (loop), assign);
> +      gsi_insert_on_edge_immediate (exit_edge, assign);
>      }
>  
>    return cond_stmt;
> @@ -1183,7 +1183,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
>     loop handles exactly VF scalars per iteration.  */
>  
>  static gcond *
> -vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
> +vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
> +				class loop *loop, tree niters, tree step,
>  				tree final_iv, bool niters_maybe_zero,
>  				gimple_stmt_iterator loop_cond_gsi)
>  {
> @@ -1191,13 +1192,13 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
>    gcond *cond_stmt;
>    gcond *orig_cond;
>    edge pe = loop_preheader_edge (loop);
> -  edge exit_edge = single_exit (loop);
> +  edge exit_edge = loop->vec_loop_iv;

Above you used LOOP_VINFO_IV_EXIT, please use it here as well.

>    gimple_stmt_iterator incr_gsi;
>    bool insert_after;
>    enum tree_code code;
>    tree niters_type = TREE_TYPE (niters);
>  
> -  orig_cond = get_loop_exit_condition (loop);
> +  orig_cond = get_edge_condition (exit_edge);
>    gcc_assert (orig_cond);
>    loop_cond_gsi = gsi_for_stmt (orig_cond);
>  
> @@ -1305,7 +1306,7 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step,
>    if (final_iv)
>      {
>        gassign *assign;
> -      edge exit = single_exit (loop);
> +      edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo);
>        gcc_assert (single_pred_p (exit->dest));
>        tree phi_dest
>  	= integer_zerop (init) ? final_iv : copy_ssa_name (indx_after_incr);
> @@ -1353,7 +1354,7 @@ vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo,
>  			 bool niters_maybe_zero)
>  {
>    gcond *cond_stmt;
> -  gcond *orig_cond = get_loop_exit_condition (loop);
> +  gcond *orig_cond = get_edge_condition (loop->vec_loop_iv);

Likewise.

>    gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond);
>  
>    if (loop_vinfo && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> @@ -1370,7 +1371,8 @@ vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo,
>  							     loop_cond_gsi);
>      }
>    else
> -    cond_stmt = vect_set_loop_condition_normal (loop, niters, step, final_iv,
> +    cond_stmt = vect_set_loop_condition_normal (loop_vinfo, loop, niters,
> +						step, final_iv,
>  						niters_maybe_zero,
>  						loop_cond_gsi);
>  
> @@ -1439,6 +1441,69 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
>  		     get_current_def (PHI_ARG_DEF_FROM_EDGE (from_phi, from)));
>  }
>  
> +/* When copies of the same loop are created the copies won't have any SCEV
> +   information and so we can't determine what their exits are.  However since
> +   they are copies of an original loop the exits should be the same.
> +
> +   I don't really like this, and think we need a different way, but I don't
> +   know what.  So sending this up so Richi can comment.  */
> +
> +void
> +vec_init_exit_info (class loop *loop)
> +{
> +  if (loop->vec_loop_iv)
> +    return;
> +
> +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> +  if (exits.is_empty ())
> +    return;
> +
> +  if ((loop->vec_loop_iv = single_exit (loop)))
> +    return;
> +
> +  loop->vec_loop_alt_exits.create (exits.length () - 1);
> +
> +  /* The main IV is to be determined by the block that's the first reachable
> +     block from the latch.  We cannot rely on the order the loop analysis
> +     returns and we don't have any SCEV analysis on the loop.  */
> +  auto_vec <edge> workset;
> +  workset.safe_push (loop_latch_edge (loop));
> +  hash_set <edge> visited;
> +
> +  while (!workset.is_empty ())
> +    {
> +      edge e = workset.pop ();
> +      if (visited.contains (e))
> +	continue;
> +
> +      bool found_p = false;
> +      for (edge ex : e->src->succs)
> +	{
> +	  if (exits.contains (ex))
> +	    {
> +	      found_p = true;
> +	      e = ex;
> +	      break;
> +	    }
> +	}
> +
> +      if (found_p)
> +	{
> +	  loop->vec_loop_iv = e;
> +	  for (edge ex : exits)
> +	    if (e != ex)
> +	      loop->vec_loop_alt_exits.safe_push (ex);
> +	  return;
> +	}
> +      else
> +	{
> +	  for (edge ex : e->src->preds)
> +	    workset.safe_insert (0, ex);
> +	}
> +      visited.add (e);
> +    }
> +  gcc_unreachable ();
> +}
>  
>  /* Given LOOP this function generates a new copy of it and puts it
>     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
> @@ -1458,13 +1523,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
>    edge exit, new_exit;
>    bool duplicate_outer_loop = false;
>  
> -  exit = single_exit (loop);
> +  exit = loop->vec_loop_iv;
>    at_exit = (e == exit);
>    if (!at_exit && e != loop_preheader_edge (loop))
>      return NULL;
>  
>    if (scalar_loop == NULL)
>      scalar_loop = loop;
> +  else
> +    vec_init_exit_info (scalar_loop);
>  
>    bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
>    pbbs = bbs + 1;
> @@ -1490,13 +1557,17 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
>    bbs[0] = preheader;
>    new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
>  
> -  exit = single_exit (scalar_loop);
> +  exit = scalar_loop->vec_loop_iv;
>    copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
>  	    &exit, 1, &new_exit, NULL,
>  	    at_exit ? loop->latch : e->src, true);
> -  exit = single_exit (loop);
> +  exit = loop->vec_loop_iv;
>    basic_block new_preheader = new_bbs[0];
>  
> +  /* Record the new loop exit information.  new_loop doesn't have SCEV data and
> +     so we must initialize the exit information.  */
> +  vec_init_exit_info (new_loop);
> +

You have a mapping of old to new BB so you should be able to
map old to new exit by mapping e->src/dest and looking up the new edge?

The vec_loop_iv exit is mapped directly (new_exit).

So I don't really understand what's missing there.

>    /* Before installing PHI arguments make sure that the edges
>       into them match that of the scalar loop we analyzed.  This
>       makes sure the SLP tree matches up between the main vectorized
> @@ -1537,7 +1608,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
>  	 but LOOP will not.  slpeel_update_phi_nodes_for_guard{1,2} expects
>  	 the LOOP SSA_NAMEs (on the exit edge and edge from latch to
>  	 header) to have current_def set, so copy them over.  */
> -      slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop),
> +      slpeel_duplicate_current_defs_from_edges (scalar_loop->vec_loop_iv,
>  						exit);
>        slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch,
>  							   0),
> @@ -1696,11 +1767,12 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond,
>   */
>  
>  bool
> -slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
> +slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo, const_edge e)
>  {

Note slpeel_* is also used by others (loop distribution) so we shouldn't
require loop_vec_info here.  Instead pass in the (important) exit edge.
We're doing similar for gimple_duplicate_sese_region.

> -  edge exit_e = single_exit (loop);
> +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
>    edge entry_e = loop_preheader_edge (loop);
> -  gcond *orig_cond = get_loop_exit_condition (loop);
> +  gcond *orig_cond = get_edge_condition (exit_e);
>    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
>    unsigned int num_bb = loop->inner? 5 : 2;
>  
> @@ -1709,7 +1781,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
>    if (!loop_outer (loop)
>        || loop->num_nodes != num_bb
>        || !empty_block_p (loop->latch)
> -      || !single_exit (loop)
> +      || !LOOP_VINFO_IV_EXIT (loop_vinfo)
>        /* Verify that new loop exit condition can be trivially modified.  */
>        || (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi))
>        || (e != exit_e && e != entry_e))
> @@ -1722,7 +1794,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e)
>    return ret;
>  }
>  
> -/* Function vect_get_loop_location.
> +/* Function find_loop_location.
>  
>     Extract the location of the loop in the source code.
>     If the loop is not well formed for vectorization, an estimated
> @@ -1739,11 +1811,19 @@ find_loop_location (class loop *loop)
>    if (!loop)
>      return dump_user_location_t ();
>  
> -  stmt = get_loop_exit_condition (loop);
> +  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
> +    {
> +      /* We only care about the loop location, so use any exit with location
> +	 information.  */
> +      for (edge e : get_loop_exit_edges (loop))
> +	{
> +	  stmt = get_edge_condition (e);
>  
> -  if (stmt
> -      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
> -    return stmt;
> +	  if (stmt
> +	      && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION)
> +	    return stmt;
> +	}
> +    }
>    /* If we got here the loop is probably not "well formed",
>       try to estimate the loop location */
> @@ -1962,7 +2042,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>    gphi_iterator gsi, gsi1;
>    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>    basic_block update_bb = update_e->dest;
> -  basic_block exit_bb = single_exit (loop)->dest;
> +
> +  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
>  
>    /* Make sure there exists a single-predecessor exit bb:  */
>    gcc_assert (single_pred_p (exit_bb));
> @@ -2529,10 +2610,9 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
>  {
>    /* We should be using a step_vector of VF if VF is variable.  */
>    int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ();
> -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>    tree type = TREE_TYPE (niters_vector);
>    tree log_vf = build_int_cst (type, exact_log2 (vf));
> -  basic_block exit_bb = single_exit (loop)->dest;
> +  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
>  
>    gcc_assert (niters_vector_mult_vf_ptr != NULL);
>    tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
> @@ -2559,7 +2639,7 @@ find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
>  		gphi *lcssa_phi)
>  {
>    gphi_iterator gsi;
> -  edge e = single_exit (loop);
> +  edge e = loop->vec_loop_iv;

Pass in the edge.

>  
>    gcc_assert (single_pred_p (e->dest));
>    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> @@ -3328,8 +3408,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  
>    if (epilog_peeling)
>      {
> -      e = single_exit (loop);
> -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> +      e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
>  
>        /* Peel epilog and put it on exit edge of loop.  If we are vectorizing
>  	 said epilog then we should use a copy of the main loop as a starting
> @@ -3419,8 +3499,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	{
>  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
>  				    niters, niters_vector_mult_vf);
> -	  guard_bb = single_exit (loop)->dest;
> -	  guard_to = split_edge (single_exit (epilog));
> +	  guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +	  guard_to = split_edge (epilog->vec_loop_iv);
>  	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
>  					   skip_vector ? anchor : guard_bb,
>  					   prob_epilog.invert (),
> @@ -3428,7 +3508,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	  if (vect_epilogues)
>  	    epilogue_vinfo->skip_this_loop_edge = guard_e;
>  	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> -					      single_exit (epilog));
> +					      epilog->vec_loop_iv);
>  	  /* Only need to handle basic block before epilog loop if it's not
>  	     the guard_bb, which is the case when skip_vector is true.  */
>  	  if (guard_bb != bb_before_epilog)
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 0a03f56aae7b51fb4c5ce0e49d96888bae634ef7..0bca5932d237cf1cfbbb48271db3f4430672b5dc 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1641,6 +1641,13 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>  {
>    DUMP_VECT_SCOPE ("vect_analyze_loop_form");
>  
> +  vec_init_exit_info (loop);

I think this shows the "exit" edge stuff should be in
vect_loop_form_info (I didn't remember we have that).

> +  if (!loop->vec_loop_iv)
> +    return opt_result::failure_at (vect_location,
> +				   "not vectorized:"
> +				   " could not determine main exit from"
> +				   " loop with multiple exits.\n");
> +
>    /* Different restrictions apply when we are considering an inner-most loop,
>       vs. an outer (nested) loop.
>       (FORNOW. May want to relax some of these restrictions in the future).  */
> @@ -3025,9 +3032,8 @@ start_over:
>        if (dump_enabled_p ())
>          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
>        if (!vect_can_advance_ivs_p (loop_vinfo)
> -	  || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> -					   single_exit (LOOP_VINFO_LOOP
> -							 (loop_vinfo))))
> +	  || !slpeel_can_duplicate_loop_p (loop_vinfo,
> +					   LOOP_VINFO_IV_EXIT (loop_vinfo)))
>          {
>  	  ok = opt_result::failure_at (vect_location,
>  				       "not vectorized: can't create required "
> @@ -5964,7 +5970,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>           Store them in NEW_PHIS.  */
>    if (double_reduc)
>      loop = outer_loop;
> -  exit_bb = single_exit (loop)->dest;
> +  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
>    exit_gsi = gsi_after_labels (exit_bb);
>    reduc_inputs.create (slp_node ? vec_num : ncopies);
>    for (unsigned i = 0; i < vec_num; i++)
> @@ -5980,7 +5986,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>  	  phi = create_phi_node (new_def, exit_bb);
>  	  if (j)
>  	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> -	  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
> +	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
>  	  new_def = gimple_convert (&stmts, vectype, new_def);
>  	  reduc_inputs.quick_push (new_def);
>  	}
> @@ -10301,12 +10307,12 @@ vectorizable_live_operation (vec_info *vinfo,
>  	   lhs' = new_tree;  */
>  
>        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -      basic_block exit_bb = single_exit (loop)->dest;
> +      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
>        gcc_assert (single_pred_p (exit_bb));
>  
>        tree vec_lhs_phi = copy_ssa_name (vec_lhs);
>        gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> -      SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
> +      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
>  
>        gimple_seq stmts = NULL;
>        tree new_tree;
> @@ -10829,7 +10835,8 @@ scale_profile_for_vect_loop (class loop *loop, unsigned vf)
>        scale_loop_frequencies (loop, p);
>      }
>  
> -  edge exit_e = single_exit (loop);
> +  edge exit_e = loop->vec_loop_iv;
> +
>    exit_e->probability = profile_probability::always () / (new_est_niter + 1);
>  
>    edge exit_l = single_pred_edge (loop->latch);
> @@ -11177,7 +11184,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>  
>    /* Make sure there exists a single-predecessor exit bb.  Do this before 
>       versioning.   */
> -  edge e = single_exit (loop);
> +  edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
>    if (! single_pred_p (e->dest))
>      {
>        split_loop_exit_edge (e, true);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index a36974c2c0d2103b0a2d0397d06ab84dace08129..bd5eceb5da7a45ef036cd14609ebe091799320bf 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -917,6 +917,8 @@ public:
>  
>  /* Access Functions.  */
>  #define LOOP_VINFO_LOOP(L)                 (L)->loop
> +#define LOOP_VINFO_IV_EXIT(L)              (L)->loop->vec_loop_iv
> +#define LOOP_VINFO_ALT_EXITS(L)            (L)->loop->vec_loop_alt_exits
>  #define LOOP_VINFO_BBS(L)                  (L)->bbs
>  #define LOOP_VINFO_NITERSM1(L)             (L)->num_itersm1
>  #define LOOP_VINFO_NITERS(L)               (L)->num_iters
> @@ -2162,6 +2164,7 @@ extern void vect_prepare_for_masked_peels (loop_vec_info);
>  extern dump_user_location_t find_loop_location (class loop *);
>  extern bool vect_can_advance_ivs_p (loop_vec_info);
>  extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
> +extern void vec_init_exit_info (class loop *);
>  
>  /* In tree-vect-stmts.cc.  */
>  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,

So I didn't really see why we should need to have the info in
struct loop.

Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits.
  2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
@ 2023-07-13 11:49   ` Richard Biener
  2023-07-13 12:03     ` Tamar Christina
  2023-07-14  9:09     ` Richard Biener
  0 siblings, 2 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-13 11:49 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 28 Jun 2023, Tamar Christina wrote:

> Hi All,
> 
> For early break vectorization we have to update niters analysis to record and
> analyze all exits of the loop, and so all conds.
> 
> The niters of the loop is still determined by the main/natural exit of the loop
> as this is the O(n) bounds.  For now we don't do much with the secondary conds,
> but their assumptions can be used to generate versioning checks later.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

I probably confused vec_init_exit_info in the previous patch - that said,
I'm missing a clear function that determines the natural exit of the
original (if-converted) scalar loop.  As vec_init_exit_info seems
to (re-)compute that I'll comment on it here.

+  /* The main IV is to be determined by the block that's the first 
reachable
+     block from the latch.  We cannot rely on the order the loop analysis
+     returns and we don't have any SCEV analysis on the loop.  */
+  auto_vec <edge> workset;
+  workset.safe_push (loop_latch_edge (loop));
+  hash_set <edge> visited;
+
+  while (!workset.is_empty ())
+    {
+      edge e = workset.pop ();
+      if (visited.contains (e))
+       continue;
+
+      bool found_p = false;
+      for (edge ex : e->src->succs)
+       {
+         if (exits.contains (ex))
+           {
+             found_p = true;
+             e = ex;
+             break;
+           }
+       }
+
+      if (found_p)
+       {
+         loop->vec_loop_iv = e;
+         for (edge ex : exits)
+           if (e != ex)
+             loop->vec_loop_alt_exits.safe_push (ex);
+         return;
+       }
+      else
+       {
+         for (edge ex : e->src->preds)
+           workset.safe_insert (0, ex);
+       }
+      visited.add (e);
+    }

So this greedily follows edges from the latch and takes the first
exit.  Why's that better than simply choosing the first?

I'd have done

 auto_vec<edge> exits = get_loop_exit_edges (loop);
 for (e : exits)
   {
     if (vect_get_loop_niters (...))
       {
         if no assumptions use that edge, if assumptions continue
         searching, maybe ther's an edge w/o assumptions
       }
   }
 use (first) exit with assumptions

we probably want to know 'may_be_zero' as well and prefer an edge
without that.  So eventually call number_of_iterations_exit_assumptions
directly and look for the best niter_desc and pass that to
vect_get_loop_niters (or re-do the work).

As said for "copying" the exit to the loop copies use the block mapping.


> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop.cc (vect_get_loop_niters): Analyze all exits and return
> 	all gconds.
> 	(vect_analyze_loop_form): Update code checking for conds.
> 	(vect_create_loop_vinfo): Handle having multiple conds.
> 	(vect_analyze_loop): Release extra loop conds structures.
> 	* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
> 	LOOP_VINFO_LOOP_IV_COND): New.
> 	(struct vect_loop_form_info): Add conds, loop_iv_cond.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 55e69a7ca0b24e0872477141db6f74dbf90b7981..9065811b3b9c2a550baf44768603172b9e26b94b 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -849,80 +849,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
>     in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
>     niter information holds in ASSUMPTIONS.
>  
> -   Return the loop exit condition.  */
> +   Return the loop exit conditions.  */
>  
>  
> -static gcond *
> +static vec<gcond *>
>  vect_get_loop_niters (class loop *loop, tree *assumptions,
>  		      tree *number_of_iterations, tree *number_of_iterationsm1)
>  {
> -  edge exit = single_exit (loop);
> +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> +  vec<gcond *> conds;
> +  conds.create (exits.length ());
>    class tree_niter_desc niter_desc;
>    tree niter_assumptions, niter, may_be_zero;
> -  gcond *cond = get_loop_exit_condition (loop);
>  
>    *assumptions = boolean_true_node;
>    *number_of_iterationsm1 = chrec_dont_know;
>    *number_of_iterations = chrec_dont_know;
> +
>    DUMP_VECT_SCOPE ("get_loop_niters");
>  
> -  if (!exit)
> -    return cond;
> +  if (exits.is_empty ())
> +    return conds;
>  
> -  may_be_zero = NULL_TREE;
> -  if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
> -      || chrec_contains_undetermined (niter_desc.niter))
> -    return cond;
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
> +		     exits.length ());
>  
> -  niter_assumptions = niter_desc.assumptions;
> -  may_be_zero = niter_desc.may_be_zero;
> -  niter = niter_desc.niter;
> +  edge exit;
> +  unsigned int i;
> +  FOR_EACH_VEC_ELT (exits, i, exit)
> +    {
> +      gcond *cond = get_edge_condition (exit);
> +      if (cond)
> +	conds.safe_push (cond);
>  
> -  if (may_be_zero && integer_zerop (may_be_zero))
> -    may_be_zero = NULL_TREE;
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i);
>  
> -  if (may_be_zero)
> -    {
> -      if (COMPARISON_CLASS_P (may_be_zero))
> +      may_be_zero = NULL_TREE;
> +      if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
> +          || chrec_contains_undetermined (niter_desc.niter))
> +	continue;
> +
> +      niter_assumptions = niter_desc.assumptions;
> +      may_be_zero = niter_desc.may_be_zero;
> +      niter = niter_desc.niter;
> +
> +      if (may_be_zero && integer_zerop (may_be_zero))
> +	may_be_zero = NULL_TREE;
> +
> +      if (may_be_zero)
>  	{
> -	  /* Try to combine may_be_zero with assumptions, this can simplify
> -	     computation of niter expression.  */
> -	  if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> -	    niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> -					     niter_assumptions,
> -					     fold_build1 (TRUTH_NOT_EXPR,
> -							  boolean_type_node,
> -							  may_be_zero));
> +	  if (COMPARISON_CLASS_P (may_be_zero))
> +	    {
> +	      /* Try to combine may_be_zero with assumptions, this can simplify
> +		 computation of niter expression.  */
> +	      if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> +		niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> +						 niter_assumptions,
> +						 fold_build1 (TRUTH_NOT_EXPR,
> +							      boolean_type_node,
> +							      may_be_zero));
> +	      else
> +		niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
> +				     build_int_cst (TREE_TYPE (niter), 0),
> +				     rewrite_to_non_trapping_overflow (niter));
> +
> +	      may_be_zero = NULL_TREE;
> +	    }
> +	  else if (integer_nonzerop (may_be_zero) && exit == loop->vec_loop_iv)
> +	    {
> +	      *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> +	      *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> +	      continue;
> +	    }
>  	  else
> -	    niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
> -				 build_int_cst (TREE_TYPE (niter), 0),
> -				 rewrite_to_non_trapping_overflow (niter));
> +	    continue;
> +       }
>  
> -	  may_be_zero = NULL_TREE;
> -	}
> -      else if (integer_nonzerop (may_be_zero))
> +      /* Loop assumptions are based off the normal exit.  */
> +      if (exit == loop->vec_loop_iv)
>  	{
> -	  *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> -	  *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> -	  return cond;
> +	  *assumptions = niter_assumptions;
> +	  *number_of_iterationsm1 = niter;
> +
> +	  /* We want the number of loop header executions which is the number
> +	     of latch executions plus one.
> +	     ???  For UINT_MAX latch executions this number overflows to zero
> +	     for loops like do { n++; } while (n != 0);  */
> +	  if (niter && !chrec_contains_undetermined (niter))
> +	    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> +				 unshare_expr (niter),
> +				 build_int_cst (TREE_TYPE (niter), 1));
> +	  *number_of_iterations = niter;
>  	}
> -      else
> -	return cond;
>      }
>  
> -  *assumptions = niter_assumptions;
> -  *number_of_iterationsm1 = niter;
> -
> -  /* We want the number of loop header executions which is the number
> -     of latch executions plus one.
> -     ???  For UINT_MAX latch executions this number overflows to zero
> -     for loops like do { n++; } while (n != 0);  */
> -  if (niter && !chrec_contains_undetermined (niter))
> -    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter),
> -			  build_int_cst (TREE_TYPE (niter), 1));
> -  *number_of_iterations = niter;
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "All loop exits successfully analyzed.\n");
>  
> -  return cond;
> +  return conds;
>  }
>  
>  /* Function bb_in_loop_p
> @@ -1768,15 +1794,26 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>  				   "not vectorized:"
>  				   " abnormal loop exit edge.\n");
>  
> -  info->loop_cond
> +  info->conds
>      = vect_get_loop_niters (loop, &info->assumptions,
>  			    &info->number_of_iterations,
>  			    &info->number_of_iterationsm1);
> -  if (!info->loop_cond)
> +
> +  if (info->conds.is_empty ())
>      return opt_result::failure_at
>        (vect_location,
>         "not vectorized: complicated exit condition.\n");
>  
> +  /* Determine what the primary and alternate exit conds are.  */
> +  info->alt_loop_conds.create (info->conds.length () - 1);
> +  for (gcond *cond : info->conds)
> +    {
> +      if (loop->vec_loop_iv->src != gimple_bb (cond))
> +	info->alt_loop_conds.quick_push (cond);
> +      else
> +	info->loop_cond = cond;
> +    }

Do you really need those explicitely?  ->conds and ->alt_loop_conds
looks redundant at least.

> +
>    if (integer_zerop (info->assumptions)
>        || !info->number_of_iterations
>        || chrec_contains_undetermined (info->number_of_iterations))
> @@ -1821,8 +1858,14 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
>    if (!integer_onep (info->assumptions) && !main_loop_info)
>      LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
>  
> -  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
> -  STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +  for (gcond *cond : info->alt_loop_conds)
> +    {
> +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
> +      STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> +    }
> +  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
> +  LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
> +
>    if (info->inner_loop_cond)
>      {
>        stmt_vec_info inner_loop_cond_info
> @@ -3520,6 +3563,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>  		     "***** Choosing vector mode %s\n",
>  		     GET_MODE_NAME (first_loop_vinfo->vector_mode));
>  
> +  loop_form_info.conds.release ();
> +  loop_form_info.alt_loop_conds.release ();
> +
>    /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
>       enabled, SIMDUID is not set, it is the innermost loop and we have
>       either already found the loop's SIMDLEN or there was no SIMDLEN to
> @@ -3631,6 +3677,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>  			   (first_loop_vinfo->epilogue_vinfos[0]->vector_mode));
>      }
>  
> +  loop_form_info.conds.release ();
> +  loop_form_info.alt_loop_conds.release ();
> +
>    return first_loop_vinfo;
>  }
>  
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index bd5eceb5da7a45ef036cd14609ebe091799320bf..1cc003c12e2447eca878f56cb019236f56e96f85 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -876,6 +876,12 @@ public:
>       we need to peel off iterations at the end to form an epilogue loop.  */
>    bool peeling_for_niter;
>  
> +  /* List of loop additional IV conditionals found in the loop.  */

drop "IV"

> +  auto_vec<gcond *> conds;
> +
> +  /* Main loop IV cond.  */
> +  gcond* loop_iv_cond;
> +

I guess I have to look at the followup patches to see how often we
have to access loop_iv_cond/conds.

>    /* True if there are no loop carried data dependencies in the loop.
>       If loop->safelen <= 1, then this is always true, either the loop
>       didn't have any loop carried data dependencies, or the loop is being
> @@ -966,6 +972,8 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> +#define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> +#define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
>  #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
>  #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
> @@ -2353,7 +2361,9 @@ struct vect_loop_form_info
>    tree number_of_iterations;
>    tree number_of_iterationsm1;
>    tree assumptions;
> +  vec<gcond *> conds;
>    gcond *loop_cond;
> +  vec<gcond *> alt_loop_conds;
>    gcond *inner_loop_cond;
>  };
>  extern opt_result vect_analyze_loop_form (class loop *, vect_loop_form_info *);
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables
  2023-07-13 11:32   ` Richard Biener
@ 2023-07-13 11:54     ` Tamar Christina
  2023-07-13 12:10       ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-07-13 11:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> e7ac2b5f3db55de3dbbab7bd2bfe08388f4ec533..cab82d7960e5be517bba2
> 621f7f4
> > 888e7bf3c295 100644
> > --- a/gcc/cfgloop.h
> > +++ b/gcc/cfgloop.h
> > @@ -272,6 +272,14 @@ public:
> >       the basic-block from being collected but its index can still be
> >       reused.  */
> >    basic_block former_header;
> > +
> > +  /* The controlling loop IV for the current loop when vectorizing.  This IV
> > +     controls the natural exits of the loop.  */  edge  GTY ((skip
> > + (""))) vec_loop_iv;
> > +
> > +  /* If the loop has multiple exits this structure contains the alternate
> > +     exits of the loop which are relevant for vectorization.  */
> > + vec<edge> GTY ((skip (""))) vec_loop_alt_exits;
> 
> That's a quite heavy representation and as you say it's vectorizer specific.  May
> I ask you to eliminate at _least_ vec_loop_alt_exits?
> Are there not all exits in that vector?  Note there's already the list of exits and if
> you have the canonical counting IV exit you can match against that to get all
> the others?
> 

Sure, though that means some filtering whenever one iterates over the alt exits,
not a problem though.

> >  /* Given LOOP this function generates a new copy of it and puts it
> >     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
> > @@ -1458,13 +1523,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *loop,
> >    edge exit, new_exit;
> >    bool duplicate_outer_loop = false;
> >
> > -  exit = single_exit (loop);
> > +  exit = loop->vec_loop_iv;
> >    at_exit = (e == exit);
> >    if (!at_exit && e != loop_preheader_edge (loop))
> >      return NULL;
> >
> >    if (scalar_loop == NULL)
> >      scalar_loop = loop;
> > +  else
> > +    vec_init_exit_info (scalar_loop);
> >
> >    bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
> >    pbbs = bbs + 1;
> > @@ -1490,13 +1557,17 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *loop,
> >    bbs[0] = preheader;
> >    new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
> >
> > -  exit = single_exit (scalar_loop);
> > +  exit = scalar_loop->vec_loop_iv;
> >    copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
> >  	    &exit, 1, &new_exit, NULL,
> >  	    at_exit ? loop->latch : e->src, true);
> > -  exit = single_exit (loop);
> > +  exit = loop->vec_loop_iv;
> >    basic_block new_preheader = new_bbs[0];
> >
> > +  /* Record the new loop exit information.  new_loop doesn't have SCEV
> data and
> > +     so we must initialize the exit information.  */
> > +  vec_init_exit_info (new_loop);
> > +
> 
> You have a mapping of old to new BB so you should be able to
> map old to new exit by mapping e->src/dest and looking up the new edge?
> 
> The vec_loop_iv exit is mapped directly (new_exit).
> 
> So I don't really understand what's missing there.

But I don't have the mapping when the loop as versioned, e.g. by ifcvt.  So in the cases
where scalar_loop != loop in which case I still need them to match up.

vect_loop_form_info is destroyed after analysis though and is not available during
peeling. That's why we copy relevant information out in vect_create_loop_vinfo.

But in general we only have 1 per loop as well, so it would be the same as using loop_vinfo.

I could move it into loop_vinfo and then require you to pass the edges to the peeling function
as you mentioned.  This would solve the location we place them in, but still not sure what to do
about versioned loops.  Would need to get its main edge "somewhere", would another field in
loop_vinfo be ok?

Cheers,
Tamar

> > +  if (!loop->vec_loop_iv)
> > +    return opt_result::failure_at (vect_location,
> > +				   "not vectorized:"
> > +				   " could not determine main exit from"
> > +				   " loop with multiple exits.\n");
> > +
> >    /* Different restrictions apply when we are considering an inner-most loop,
> >       vs. an outer (nested) loop.
> >       (FORNOW. May want to relax some of these restrictions in the future).  */
> > @@ -3025,9 +3032,8 @@ start_over:
> >        if (dump_enabled_p ())
> >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
> >        if (!vect_can_advance_ivs_p (loop_vinfo)
> > -	  || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> > -					   single_exit (LOOP_VINFO_LOOP
> > -							 (loop_vinfo))))
> > +	  || !slpeel_can_duplicate_loop_p (loop_vinfo,
> > +					   LOOP_VINFO_IV_EXIT (loop_vinfo)))
> >          {
> >  	  ok = opt_result::failure_at (vect_location,
> >  				       "not vectorized: can't create required "
> > @@ -5964,7 +5970,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
> >           Store them in NEW_PHIS.  */
> >    if (double_reduc)
> >      loop = outer_loop;
> > -  exit_bb = single_exit (loop)->dest;
> > +  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> >    exit_gsi = gsi_after_labels (exit_bb);
> >    reduc_inputs.create (slp_node ? vec_num : ncopies);
> >    for (unsigned i = 0; i < vec_num; i++)
> > @@ -5980,7 +5986,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
> >  	  phi = create_phi_node (new_def, exit_bb);
> >  	  if (j)
> >  	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> > -	  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
> > +	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)-
> >dest_idx, def);
> >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> >  	  reduc_inputs.quick_push (new_def);
> >  	}
> > @@ -10301,12 +10307,12 @@ vectorizable_live_operation (vec_info
> *vinfo,
> >  	   lhs' = new_tree;  */
> >
> >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > -      basic_block exit_bb = single_exit (loop)->dest;
> > +      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> >        gcc_assert (single_pred_p (exit_bb));
> >
> >        tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> >        gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> > -      SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
> > +      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx,
> vec_lhs);
> >
> >        gimple_seq stmts = NULL;
> >        tree new_tree;
> > @@ -10829,7 +10835,8 @@ scale_profile_for_vect_loop (class loop *loop,
> unsigned vf)
> >        scale_loop_frequencies (loop, p);
> >      }
> >
> > -  edge exit_e = single_exit (loop);
> > +  edge exit_e = loop->vec_loop_iv;
> > +
> >    exit_e->probability = profile_probability::always () / (new_est_niter + 1);
> >
> >    edge exit_l = single_pred_edge (loop->latch);
> > @@ -11177,7 +11184,7 @@ vect_transform_loop (loop_vec_info
> loop_vinfo, gimple *loop_vectorized_call)
> >
> >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> >       versioning.   */
> > -  edge e = single_exit (loop);
> > +  edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> >    if (! single_pred_p (e->dest))
> >      {
> >        split_loop_exit_edge (e, true);
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index
> a36974c2c0d2103b0a2d0397d06ab84dace08129..bd5eceb5da7a45ef036c
> d14609ebe091799320bf 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -917,6 +917,8 @@ public:
> >
> >  /* Access Functions.  */
> >  #define LOOP_VINFO_LOOP(L)                 (L)->loop
> > +#define LOOP_VINFO_IV_EXIT(L)              (L)->loop->vec_loop_iv
> > +#define LOOP_VINFO_ALT_EXITS(L)            (L)->loop->vec_loop_alt_exits
> >  #define LOOP_VINFO_BBS(L)                  (L)->bbs
> >  #define LOOP_VINFO_NITERSM1(L)             (L)->num_itersm1
> >  #define LOOP_VINFO_NITERS(L)               (L)->num_iters
> > @@ -2162,6 +2164,7 @@ extern void vect_prepare_for_masked_peels
> (loop_vec_info);
> >  extern dump_user_location_t find_loop_location (class loop *);
> >  extern bool vect_can_advance_ivs_p (loop_vec_info);
> >  extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
> > +extern void vec_init_exit_info (class loop *);
> >
> >  /* In tree-vect-stmts.cc.  */
> >  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> 
> So I didn't really see why we should need to have the info in
> struct loop.
> 
> Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits.
  2023-07-13 11:49   ` Richard Biener
@ 2023-07-13 12:03     ` Tamar Christina
  2023-07-14  9:09     ` Richard Biener
  1 sibling, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-07-13 12:03 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, July 13, 2023 12:49 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: Re: [PATCH 8/19]middle-end: updated niters analysis to handle
> multiple exits.
> 
> On Wed, 28 Jun 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > For early break vectorization we have to update niters analysis to
> > record and analyze all exits of the loop, and so all conds.
> >
> > The niters of the loop is still determined by the main/natural exit of
> > the loop as this is the O(n) bounds.  For now we don't do much with
> > the secondary conds, but their assumptions can be used to generate
> versioning checks later.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> I probably confused vec_init_exit_info in the previous patch - that said, I'm
> missing a clear function that determines the natural exit of the original (if-
> converted) scalar loop.  As vec_init_exit_info seems to (re-)compute that I'll
> comment on it here.

Ah was wondering if you'd seen it 😊

> 
> +  /* The main IV is to be determined by the block that's the first
> reachable
> +     block from the latch.  We cannot rely on the order the loop analysis
> +     returns and we don't have any SCEV analysis on the loop.  */
> + auto_vec <edge> workset;  workset.safe_push (loop_latch_edge (loop));
> + hash_set <edge> visited;
> +
> +  while (!workset.is_empty ())
> +    {
> +      edge e = workset.pop ();
> +      if (visited.contains (e))
> +       continue;
> +
> +      bool found_p = false;
> +      for (edge ex : e->src->succs)
> +       {
> +         if (exits.contains (ex))
> +           {
> +             found_p = true;
> +             e = ex;
> +             break;
> +           }
> +       }
> +
> +      if (found_p)
> +       {
> +         loop->vec_loop_iv = e;
> +         for (edge ex : exits)
> +           if (e != ex)
> +             loop->vec_loop_alt_exits.safe_push (ex);
> +         return;
> +       }
> +      else
> +       {
> +         for (edge ex : e->src->preds)
> +           workset.safe_insert (0, ex);
> +       }
> +      visited.add (e);
> +    }
> 
> So this greedily follows edges from the latch and takes the first exit.  Why's
> that better than simply choosing the first?
> 
> I'd have done
> 
>  auto_vec<edge> exits = get_loop_exit_edges (loop);  for (e : exits)
>    {
>      if (vect_get_loop_niters (...))
>        {
>          if no assumptions use that edge, if assumptions continue
>          searching, maybe ther's an edge w/o assumptions
>        }
>    }
>  use (first) exit with assumptions
> 
> we probably want to know 'may_be_zero' as well and prefer an edge without
> that.  So eventually call number_of_iterations_exit_assumptions
> directly and look for the best niter_desc and pass that to vect_get_loop_niters
> (or re-do the work).
> 
> As said for "copying" the exit to the loop copies use the block mapping.
> 

The issue is with the scalar loops, where we have no SCEV data and also no
SSA mapping data (from what I can tell, the map was cleared in ifcvt itself).

So for this to work with SCEV, we'd have to start analyzing the loop coming out of
LOOP_VINFO_SCALAR_LOOP as well unless I'm missing something?

Regards,
Tamar

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables
  2023-07-13 11:54     ` Tamar Christina
@ 2023-07-13 12:10       ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-13 12:10 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Thu, 13 Jul 2023, Tamar Christina wrote:

> > e7ac2b5f3db55de3dbbab7bd2bfe08388f4ec533..cab82d7960e5be517bba2
> > 621f7f4
> > > 888e7bf3c295 100644
> > > --- a/gcc/cfgloop.h
> > > +++ b/gcc/cfgloop.h
> > > @@ -272,6 +272,14 @@ public:
> > >       the basic-block from being collected but its index can still be
> > >       reused.  */
> > >    basic_block former_header;
> > > +
> > > +  /* The controlling loop IV for the current loop when vectorizing.  This IV
> > > +     controls the natural exits of the loop.  */  edge  GTY ((skip
> > > + (""))) vec_loop_iv;
> > > +
> > > +  /* If the loop has multiple exits this structure contains the alternate
> > > +     exits of the loop which are relevant for vectorization.  */
> > > + vec<edge> GTY ((skip (""))) vec_loop_alt_exits;
> > 
> > That's a quite heavy representation and as you say it's vectorizer specific.  May
> > I ask you to eliminate at _least_ vec_loop_alt_exits?
> > Are there not all exits in that vector?  Note there's already the list of exits and if
> > you have the canonical counting IV exit you can match against that to get all
> > the others?
> > 
> 
> Sure, though that means some filtering whenever one iterates over the alt exits,
> not a problem though.
> 
> > >  /* Given LOOP this function generates a new copy of it and puts it
> > >     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
> > > @@ -1458,13 +1523,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *loop,
> > >    edge exit, new_exit;
> > >    bool duplicate_outer_loop = false;
> > >
> > > -  exit = single_exit (loop);
> > > +  exit = loop->vec_loop_iv;
> > >    at_exit = (e == exit);
> > >    if (!at_exit && e != loop_preheader_edge (loop))
> > >      return NULL;
> > >
> > >    if (scalar_loop == NULL)
> > >      scalar_loop = loop;
> > > +  else
> > > +    vec_init_exit_info (scalar_loop);
> > >
> > >    bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
> > >    pbbs = bbs + 1;
> > > @@ -1490,13 +1557,17 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *loop,
> > >    bbs[0] = preheader;
> > >    new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
> > >
> > > -  exit = single_exit (scalar_loop);
> > > +  exit = scalar_loop->vec_loop_iv;
> > >    copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
> > >  	    &exit, 1, &new_exit, NULL,
> > >  	    at_exit ? loop->latch : e->src, true);
> > > -  exit = single_exit (loop);
> > > +  exit = loop->vec_loop_iv;
> > >    basic_block new_preheader = new_bbs[0];
> > >
> > > +  /* Record the new loop exit information.  new_loop doesn't have SCEV
> > data and
> > > +     so we must initialize the exit information.  */
> > > +  vec_init_exit_info (new_loop);
> > > +
> > 
> > You have a mapping of old to new BB so you should be able to
> > map old to new exit by mapping e->src/dest and looking up the new edge?
> > 
> > The vec_loop_iv exit is mapped directly (new_exit).
> > 
> > So I don't really understand what's missing there.
> 
> But I don't have the mapping when the loop as versioned, e.g. by ifcvt.  So in the cases
> where scalar_loop != loop in which case I still need them to match up.
> 
> vect_loop_form_info is destroyed after analysis though and is not available during
> peeling. That's why we copy relevant information out in vect_create_loop_vinfo.
> 
> But in general we only have 1 per loop as well, so it would be the same as using loop_vinfo.
> 
> I could move it into loop_vinfo and then require you to pass the edges to the peeling function
> as you mentioned.  This would solve the location we place them in, but still not sure what to do
> about versioned loops.  Would need to get its main edge "somewhere", would another field in
> loop_vinfo be ok?

I suppose since we're having ->scalar_loop adding ->scalar_loop_iv_exit
is straight-forward indeed.  As for matching them up I don't see how
you do that reliably right now?  It might be even that the if-converted
loop has one of the exits removed as unreachable (since we run VN
on its body) ...

What I could see working (but ick) is to extend the contract between
if-conversion and vectorization and for example record corresponding exit 
numbers in exits.  We have conveniently (*cough*) unused edge->aux
for this.  If you assign numbers to all edges of the original
loop the loop copies should inherit those (if I traced things
correctly - duplicate_block copies edge->aux but not bb->aux).

So in the vectorizer you could then match them up.

Richard.


> Cheers,
> Tamar
> 
> > > +  if (!loop->vec_loop_iv)
> > > +    return opt_result::failure_at (vect_location,
> > > +				   "not vectorized:"
> > > +				   " could not determine main exit from"
> > > +				   " loop with multiple exits.\n");
> > > +
> > >    /* Different restrictions apply when we are considering an inner-most loop,
> > >       vs. an outer (nested) loop.
> > >       (FORNOW. May want to relax some of these restrictions in the future).  */
> > > @@ -3025,9 +3032,8 @@ start_over:
> > >        if (dump_enabled_p ())
> > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
> > >        if (!vect_can_advance_ivs_p (loop_vinfo)
> > > -	  || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> > > -					   single_exit (LOOP_VINFO_LOOP
> > > -							 (loop_vinfo))))
> > > +	  || !slpeel_can_duplicate_loop_p (loop_vinfo,
> > > +					   LOOP_VINFO_IV_EXIT (loop_vinfo)))
> > >          {
> > >  	  ok = opt_result::failure_at (vect_location,
> > >  				       "not vectorized: can't create required "
> > > @@ -5964,7 +5970,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> > loop_vinfo,
> > >           Store them in NEW_PHIS.  */
> > >    if (double_reduc)
> > >      loop = outer_loop;
> > > -  exit_bb = single_exit (loop)->dest;
> > > +  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > >    exit_gsi = gsi_after_labels (exit_bb);
> > >    reduc_inputs.create (slp_node ? vec_num : ncopies);
> > >    for (unsigned i = 0; i < vec_num; i++)
> > > @@ -5980,7 +5986,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> > loop_vinfo,
> > >  	  phi = create_phi_node (new_def, exit_bb);
> > >  	  if (j)
> > >  	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> > > -	  SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def);
> > > +	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)-
> > >dest_idx, def);
> > >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> > >  	  reduc_inputs.quick_push (new_def);
> > >  	}
> > > @@ -10301,12 +10307,12 @@ vectorizable_live_operation (vec_info
> > *vinfo,
> > >  	   lhs' = new_tree;  */
> > >
> > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > -      basic_block exit_bb = single_exit (loop)->dest;
> > > +      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > >        gcc_assert (single_pred_p (exit_bb));
> > >
> > >        tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> > >        gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> > > -      SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs);
> > > +      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx,
> > vec_lhs);
> > >
> > >        gimple_seq stmts = NULL;
> > >        tree new_tree;
> > > @@ -10829,7 +10835,8 @@ scale_profile_for_vect_loop (class loop *loop,
> > unsigned vf)
> > >        scale_loop_frequencies (loop, p);
> > >      }
> > >
> > > -  edge exit_e = single_exit (loop);
> > > +  edge exit_e = loop->vec_loop_iv;
> > > +
> > >    exit_e->probability = profile_probability::always () / (new_est_niter + 1);
> > >
> > >    edge exit_l = single_pred_edge (loop->latch);
> > > @@ -11177,7 +11184,7 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > >
> > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > >       versioning.   */
> > > -  edge e = single_exit (loop);
> > > +  edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > >    if (! single_pred_p (e->dest))
> > >      {
> > >        split_loop_exit_edge (e, true);
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > index
> > a36974c2c0d2103b0a2d0397d06ab84dace08129..bd5eceb5da7a45ef036c
> > d14609ebe091799320bf 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -917,6 +917,8 @@ public:
> > >
> > >  /* Access Functions.  */
> > >  #define LOOP_VINFO_LOOP(L)                 (L)->loop
> > > +#define LOOP_VINFO_IV_EXIT(L)              (L)->loop->vec_loop_iv
> > > +#define LOOP_VINFO_ALT_EXITS(L)            (L)->loop->vec_loop_alt_exits
> > >  #define LOOP_VINFO_BBS(L)                  (L)->bbs
> > >  #define LOOP_VINFO_NITERSM1(L)             (L)->num_itersm1
> > >  #define LOOP_VINFO_NITERS(L)               (L)->num_iters
> > > @@ -2162,6 +2164,7 @@ extern void vect_prepare_for_masked_peels
> > (loop_vec_info);
> > >  extern dump_user_location_t find_loop_location (class loop *);
> > >  extern bool vect_can_advance_ivs_p (loop_vec_info);
> > >  extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
> > > +extern void vec_init_exit_info (class loop *);
> > >
> > >  /* In tree-vect-stmts.cc.  */
> > >  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> > 
> > So I didn't really see why we should need to have the info in
> > struct loop.
> > 
> > Richard.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/19] middle-end: refactor vectorizable_comparison to make the main body re-usable.
  2023-06-28 13:55   ` [PATCH 9/19] " Tamar Christina
@ 2023-07-13 16:23     ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-13 16:23 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 28 Jun 2023, Tamar Christina wrote:

> Adding proper maintainers.
> 
> > -----Original Message-----
> > From: Tamar Christina <tamar.christina@arm.com>
> > Sent: Wednesday, June 28, 2023 2:46 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> > <Kyrylo.Tkachov@arm.com>; Richard Sandiford
> > <Richard.Sandiford@arm.com>
> > Subject: [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison
> > to make the main body re-usable.
> > 
> > Hi All,
> > 
> > Vectorization of a gcond starts off essentially the same as vectorizing a
> > comparison witht he only difference being how the operands are extracted.
> > 
> > This refactors vectorable_comparison such that we now have a generic
> > function that can be used from vectorizable_early_break.  The refactoring
> > splits the gassign checks and actual validation/codegen off to a helper
> > function.
> > 
> > No change in functionality expected.
> > 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > 
> > Ok for master?
> > 
> > Thanks,
> > Tamar
> > 
> > gcc/ChangeLog:
> > 
> > 	* tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting
> > body
> > 	to ...
> > 	(vectorizable_comparison_1): ...This.
> > 
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> > ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca8
> > 1acd197693fc3457c31 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -11332,21 +11332,22 @@ vectorizable_condition (vec_info *vinfo,
> > 
> >  /* vectorizable_comparison.
> > 
> > -   Check if STMT_INFO is comparison expression that can be vectorized.
> > +/* Helper of vectorizable_comparison.
> > +
> > +   Check if STMT_INFO is comparison expression CODE that can be vectorized.
> >     If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
> >     comparison, put it in VEC_STMT, and insert it at GSI.
> > 
> >     Return true if STMT_INFO is vectorizable in this way.  */
> > 
> >  static bool
> > -vectorizable_comparison (vec_info *vinfo,
> > -			 stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> > -			 gimple **vec_stmt,
> > -			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> > +vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
> > +			   stmt_vec_info stmt_info, tree_code code,
> > +			   gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > +			   slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> >  {
> >    tree lhs, rhs1, rhs2;
> >    tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
> > -  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> >    tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
> >    tree new_temp;
> >    loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo); @@ -11354,7
> > +11355,7 @@ vectorizable_comparison (vec_info *vinfo,
> >    int ndts = 2;
> >    poly_uint64 nunits;
> >    int ncopies;
> > -  enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
> > +  enum tree_code bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
> >    int i;
> >    bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo);
> >    vec<tree> vec_oprnds0 = vNULL;
> > @@ -11377,14 +11378,6 @@ vectorizable_comparison (vec_info *vinfo,
> >      ncopies = vect_get_num_copies (loop_vinfo, vectype);
> > 
> >    gcc_assert (ncopies >= 1);
> > -  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
> > -    return false;
> > -
> > -  gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt);
> > -  if (!stmt)
> > -    return false;
> > -
> > -  code = gimple_assign_rhs_code (stmt);
> > 
> >    if (TREE_CODE_CLASS (code) != tcc_comparison)
> >      return false;
> > @@ -11499,7 +11492,6 @@ vectorizable_comparison (vec_info *vinfo,
> >  	  return false;
> >  	}
> > 
> > -      STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
> >        vect_model_simple_cost (vinfo, stmt_info,
> >  			      ncopies * (1 + (bitop2 != NOP_EXPR)),
> >  			      dts, ndts, slp_node, cost_vec); @@ -11565,6
> > +11557,44 @@ vectorizable_comparison (vec_info *vinfo,
> >    return true;
> >  }
> > 
> > +/* vectorizable_comparison.
> > +
> > +   Check if STMT_INFO is comparison expression that can be vectorized.
> > +   If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
> > +   comparison, put it in VEC_STMT, and insert it at GSI.
> > +
> > +   Return true if STMT_INFO is vectorizable in this way.  */
> > +
> > +static bool
> > +vectorizable_comparison (vec_info *vinfo,
> > +			 stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> > +			 gimple **vec_stmt,
> > +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec) {

{ to the next line

> > +  bb_vec_info bb_vinfo = dyn_cast <bb_vec_info> (vinfo);
> > +
> > +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> > +    return false;
> > +
> > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
> > +    return false;
> > +
> > +  gassign *stmt = dyn_cast <gassign *> (stmt_info->stmt);  if (!stmt)
> > +    return false;

new line before 'if'

otherwise OK.

Thanks,
Richard.

> > +  enum tree_code code = gimple_assign_rhs_code (stmt);
> > +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> > +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > +				  vec_stmt, slp_node, cost_vec))
> > +    return false;
> > +
> > +  if (!vec_stmt)
> > +    STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
> > +
> > +  return true;
> > +}
> > +
> >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> >     can handle all live statements in the node.  Otherwise return true
> >     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> > 
> > 
> > 
> > 
> > --
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
  2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
@ 2023-07-13 17:31   ` Richard Biener
  2023-07-13 19:05     ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-07-13 17:31 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 28 Jun 2023, Tamar Christina wrote:

> Hi All,
> 
> This patch updates the peeling code to maintain LCSSA during peeling.
> The rewrite also naturally takes into account multiple exits and so it didn't
> make sense to split them off.
> 
> For the purposes of peeling the only change for multiple exits is that the
> secondary exits are all wired to the start of the new loop preheader when doing
> epilogue peeling.
> 
> When doing prologue peeling the CFG is kept in tact.
> 
> For both epilogue and prologue peeling we wire through between the two loops any
> PHI nodes that escape the first loop into the second loop if flow_loops is
> specified.  The reason for this conditionality is because
> slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
>   - prologue peeling
>   - epilogue peeling
>   - loop distribution
> 
> for the last case the loops should remain independent, and so not be connected.
> Because of this propagation of only used phi nodes get_current_def can be used
> to easily find the previous definitions.  However live statements that are
> not used inside the loop itself are not propagated (since if unused, the moment
> we add the guard in between the two loops the value across the bypass edge can
> be wrong if the loop has been peeled.)
> 
> This is dealt with easily enough in find_guard_arg.
> 
> For multiple exits, while we are in LCSSA form, and have a correct DOM tree, the
> moment we add the guard block we will change the dominators again.  To deal with
> this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the blocks to
> update without having to recompute the list of blocks to update again.
> 
> When multiple exits and doing epilogue peeling we will also temporarily have an
> incorrect VUSES chain for the secondary exits as it anticipates the final result
> after the VDEFs have been moved.  This will thus be corrected once the code
> motion is applied.
> 
> Lastly by doing things this way we can remove the helper functions that
> previously did lock step iterations to update things as it went along.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

Not sure if I get through all of this in one go - so be prepared that
the rest of the review follows another day.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-loop-distribution.cc (copy_loop_before): Pass flow_loops = false.
> 	* tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
> 	* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
> 	assert.
> 	(vect_set_loop_condition_normal): Skip modifying loop IV for multiple
> 	exits.
> 	(slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit peeling.
> 	(slpeel_can_duplicate_loop_p): Likewise.
> 	(vect_update_ivs_after_vectorizer): Don't enter this...
> 	(vect_update_ivs_after_early_break): ...but instead enter here.
> 	(find_guard_arg): Update for new peeling code.
> 	(slpeel_update_phi_nodes_for_loops): Remove.
> 	(slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0 checks.
> 	(slpeel_update_phi_nodes_for_lcssa): Remove.
> 	(vect_do_peeling): Fix VF for multiple exits and force epilogue.
> 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> 	non_break_control_flow and early_breaks.
> 	(vect_need_peeling_or_partial_vectors_p): Force partial vector if
> 	multiple exits and VLA.
> 	(vect_analyze_loop_form): Support inner loop multiple exits.
> 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> 	(vect_create_epilog_for_reduction):  Update live phi nodes.
> 	(vectorizable_live_operation): Ignore live operations in vector loop
> 	when multiple exits.
> 	(vect_transform_loop): Force unrolling for VF loops and multiple exits.
> 	* tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
> 	(vect_mark_stmts_to_be_vectorized): Check for non-exit control flow and
> 	analyze gcond params.
> 	(vect_analyze_stmt): Support gcond.
> 	* tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
> 	in RPO pass.
> 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> 	(LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW): New.
> 	(loop_vec_info_for_loop): Change to const and static.
> 	(is_loop_header_bb_p): Drop assert.
> 	(slpeel_can_duplicate_loop_p): Update prototype.
> 	(class loop): Add early_breaks and non_break_control_flow.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
> index 97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40d833a968314a4442b9e 100644
> --- a/gcc/tree-loop-distribution.cc
> +++ b/gcc/tree-loop-distribution.cc
> @@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs)
>    edge preheader = loop_preheader_edge (loop);
>  
>    initialize_original_copy_tables ();
> -  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
> +  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader, false);
>    gcc_assert (res != NULL);
>  
>    /* When a not last partition is supposed to keep the LC PHIs computed
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 5d398b67e68c7076760854119590f18b19c622b6..79686f6c4945b7139ba377300430c04b7aeefe6c 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -3072,7 +3072,12 @@ loop_only_exit_p (const class loop *loop, basic_block *body, const_edge exit)
>    gimple_stmt_iterator bsi;
>    unsigned i;
>  
> -  if (exit != single_exit (loop))
> +  /* We need to check for alternative exits since exit can be NULL.  */

You mean we pass in exit == NULL in some cases?  I'm not sure what
the desired behavior in that case is - can you point out the
callers you are fixing here?

I think we should add gcc_assert (exit != nullptr)

>    for (i = 0; i < loop->num_nodes; i++)
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 6b93fb3f9af8f2bbdf5dec28f0009177aa5171ab..550d7f40002cf0b58f8a927cb150edd7c2aa9999 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple *update_phi, edge e, tree new_def)
>  {
>    tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
>  
> +  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
> +	      || orig_def != new_def);
> +
>    SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
>  
>    if (MAY_HAVE_DEBUG_BIND_STMTS)
> @@ -1292,7 +1295,8 @@ vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
>    gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
>  
>    /* Record the number of latch iterations.  */
> -  if (limit == niters)
> +  if (limit == niters
> +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      /* Case A: the loop iterates NITERS times.  Subtract one to get the
>         latch count.  */
>      loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> @@ -1303,7 +1307,13 @@ vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
>      loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type,
>  				       limit, step);
>  
> -  if (final_iv)
> +  /* For multiple exits we've already maintained LCSSA form and handled
> +     the scalar iteration update in the code that deals with the merge
> +     block and its updated guard.  I could move that code here instead
> +     of in vect_update_ivs_after_early_break but I have to still deal
> +     with the updates to the counter `i`.  So for now I'll keep them
> +     together.  */
> +  if (final_iv && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      {
>        gassign *assign;
>        edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo);
> @@ -1509,11 +1519,19 @@ vec_init_exit_info (class loop *loop)
>     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
>     non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
>     basic blocks from SCALAR_LOOP instead of LOOP, but to either the
> -   entry or exit of LOOP.  */
> +   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to SCALAR_LOOP as a
> +   continuation.  This is correct for cases where one loop continues from the
> +   other like in the vectorizer, but not true for uses in e.g. loop distribution
> +   where the loop is duplicated and then modified.
> +

but for loop distribution the flow also continues?  I'm not sure what you
are refering to here.  Do you by chance have a branch with the patches
installed?

> +   If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
> +   dominators were updated during the peeling.  */
>  
>  class loop *
>  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
> -					class loop *scalar_loop, edge e)
> +					class loop *scalar_loop, edge e,
> +					bool flow_loops,
> +					vec<basic_block> *updated_doms)
>  {
>    class loop *new_loop;
>    basic_block *new_bbs, *bbs, *pbbs;
> @@ -1602,6 +1620,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
>    for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++)
>      rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
>  
> +  /* Rename the exit uses.  */
> +  for (edge exit : get_loop_exit_edges (new_loop))
> +    for (auto gsi = gsi_start_phis (exit->dest);
> +	 !gsi_end_p (gsi); gsi_next (&gsi))
> +      {
> +	tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit);
> +	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit));
> +	if (MAY_HAVE_DEBUG_BIND_STMTS)
> +	  adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest);
> +      }
> +
> +  /* This condition happens when the loop has been versioned. e.g. due to ifcvt
> +     versioning the loop.  */
>    if (scalar_loop != loop)
>      {
>        /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from
> @@ -1616,28 +1647,106 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
>  						EDGE_SUCC (loop->latch, 0));
>      }
>  
> +  vec<edge> alt_exits = loop->vec_loop_alt_exits;

So 'e' is not one of alt_exits, right?  I wonder if we can simply
compute the vector from all exits of 'loop' and removing 'e'?

> +  bool multiple_exits_p = !alt_exits.is_empty ();
> +  auto_vec<basic_block> doms;
> +  class loop *update_loop = NULL;
> +
>    if (at_exit) /* Add the loop copy at exit.  */
>      {
> -      if (scalar_loop != loop)
> +      if (scalar_loop != loop && new_exit->dest != exit_dest)
>  	{
> -	  gphi_iterator gsi;
>  	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
> +	  flush_pending_stmts (new_exit);
> +	}
>  
> -	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
> -	       gsi_next (&gsi))
> +      auto loop_exits = get_loop_exit_edges (loop);
> +      for (edge exit : loop_exits)
> +	redirect_edge_and_branch (exit, new_preheader);
> +
> +

one line vertical space too much

> +      /* Copy the current loop LC PHI nodes between the original loop exit
> +	 block and the new loop header.  This allows us to later split the
> +	 preheader block and still find the right LC nodes.  */
> +      edge latch_new = single_succ_edge (new_preheader);
> +      edge latch_old = loop_latch_edge (loop);
> +      hash_set <tree> lcssa_vars;
> +      for (auto gsi_from = gsi_start_phis (latch_old->dest),

so that's loop->header (and makes it more clear which PHI nodes you are 
looking at)

> +	   gsi_to = gsi_start_phis (latch_new->dest);

likewise new_loop->header

> +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> +	{
> +	  gimple *from_phi = gsi_stmt (gsi_from);
> +	  gimple *to_phi = gsi_stmt (gsi_to);
> +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, latch_old);
> +	  /* In all cases, even in early break situations we're only
> +	     interested in the number of fully executed loop iters.  As such
> +	     we discard any partially done iteration.  So we simply propagate
> +	     the phi nodes from the latch to the merge block.  */
> +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> +
> +	  lcssa_vars.add (new_arg);
> +
> +	  /* Main loop exit should use the final iter value.  */
> +	  add_phi_arg (lcssa_phi, new_arg, loop->vec_loop_iv, UNKNOWN_LOCATION);

above you are creating the PHI node at e->dest but here add the PHI arg to
loop->vec_loop_iv - that's 'e' here, no?  Consistency makes it easier
to follow.  I _think_ this code doesn't need to know about the "special"
edge.

> +
> +	  /* All other exits use the previous iters.  */
> +	  for (edge e : alt_exits)
> +	    add_phi_arg (lcssa_phi, gimple_phi_result (from_phi), e,
> +			 UNKNOWN_LOCATION);
> +
> +	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
> +	}
> +
> +      /* Copy over any live SSA vars that may not have been materialized in the
> +	 loops themselves but would be in the exit block.  However when the live
> +	 value is not used inside the loop then we don't need to do this,  if we do
> +	 then when we split the guard block the branch edge can end up containing the
> +	 wrong reference,  particularly if it shares an edge with something that has
> +	 bypassed the loop.  This is not something peeling can check so we need to
> +	 anticipate the usage of the live variable here.  */
> +      auto exit_map = redirect_edge_var_map_vector (exit);

Hmm, did I use that in my attemt to refactor things? ...

> +      if (exit_map)
> +        for (auto vm : exit_map)
> +	{
> +	  if (lcssa_vars.contains (vm.def)
> +	      || TREE_CODE (vm.def) != SSA_NAME)

the latter check is cheaper so it should come first

> +	    continue;
> +
> +	  imm_use_iterator imm_iter;
> +	  use_operand_p use_p;
> +	  bool use_in_loop = false;
> +
> +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def)
>  	    {
> -	      gphi *phi = gsi.phi ();
> -	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> -	      location_t orig_locus
> -		= gimple_phi_arg_location_from_edge (phi, e);
> +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> +	      if (flow_bb_inside_loop_p (loop, bb)
> +		  && !gimple_vuse (USE_STMT (use_p)))
> +		{
> +		  use_in_loop = true;
> +		  break;
> +		}
> +	    }
>  
> -	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
> +	  if (!use_in_loop)
> +	    {
> +	       /* Do a final check to see if it's perhaps defined in the loop.  This
> +		  mirrors the relevancy analysis's used_outside_scope.  */
> +	      gimple *stmt = SSA_NAME_DEF_STMT (vm.def);
> +	      if (!stmt || !flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
> +		continue;
>  	    }
> +
> +	  tree new_res = copy_ssa_name (vm.result);
> +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> +	  for (edge exit : loop_exits)
> +	     add_phi_arg (lcssa_phi, vm.def, exit, vm.locus);

not sure what you are doing above - I guess I have to play with it
in a debug session.

>  	}
> -      redirect_edge_and_branch_force (e, new_preheader);
> -      flush_pending_stmts (e);
> +
>        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
> -      if (was_imm_dom || duplicate_outer_loop)
> +
> +      if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p)
>  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
>  
>        /* And remove the non-necessary forwarder again.  Keep the other
> @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
>        delete_basic_block (preheader);
>        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
>  			       loop_preheader_edge (scalar_loop)->src);
> +
> +      /* Finally after wiring the new epilogue we need to update its main exit
> +	 to the original function exit we recorded.  Other exits are already
> +	 correct.  */
> +      if (multiple_exits_p)
> +	{
> +	  for (edge e : get_loop_exit_edges (loop))
> +	    doms.safe_push (e->dest);
> +	  update_loop = new_loop;
> +	  doms.safe_push (exit_dest);
> +
> +	  /* Likely a fall-through edge, so update if needed.  */
> +	  if (single_succ_p (exit_dest))
> +	    doms.safe_push (single_succ (exit_dest));
> +	}
>      }
>    else /* Add the copy at entry.  */
>      {
> +      /* Copy the current loop LC PHI nodes between the original loop exit
> +	 block and the new loop header.  This allows us to later split the
> +	 preheader block and still find the right LC nodes.  */
> +      edge old_latch_loop = loop_latch_edge (loop);
> +      edge old_latch_init = loop_preheader_edge (loop);
> +      edge new_latch_loop = loop_latch_edge (new_loop);
> +      edge new_latch_init = loop_preheader_edge (new_loop);
> +      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),

see above

> +	   gsi_to = gsi_start_phis (old_latch_loop->dest);
> +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> +	{
> +	  gimple *from_phi = gsi_stmt (gsi_from);
> +	  gimple *to_phi = gsi_stmt (gsi_to);
> +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, new_latch_loop);
> +	  adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg);
> +	}
> +
>        if (scalar_loop != loop)
>  	{
>  	  /* Remove the non-necessary forwarder of scalar_loop again.  */
> @@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
>        delete_basic_block (new_preheader);
>        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
>  			       loop_preheader_edge (new_loop)->src);
> +
> +      if (multiple_exits_p)
> +	update_loop = loop;
>      }
>  
> -  if (scalar_loop != loop)
> +  if (multiple_exits_p)
>      {
> -      /* Update new_loop->header PHIs, so that on the preheader
> -	 edge they are the ones from loop rather than scalar_loop.  */
> -      gphi_iterator gsi_orig, gsi_new;
> -      edge orig_e = loop_preheader_edge (loop);
> -      edge new_e = loop_preheader_edge (new_loop);
> -
> -      for (gsi_orig = gsi_start_phis (loop->header),
> -	   gsi_new = gsi_start_phis (new_loop->header);
> -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> +      for (edge e : get_loop_exit_edges (update_loop))
>  	{
> -	  gphi *orig_phi = gsi_orig.phi ();
> -	  gphi *new_phi = gsi_new.phi ();
> -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> -	  location_t orig_locus
> -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> -
> -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> +	  edge ex;
> +	  edge_iterator ei;
> +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> +	    {
> +	      /* Find the first non-fallthrough block as fall-throughs can't
> +		 dominate other blocks.  */
> +	      while ((ex->flags & EDGE_FALLTHRU)

I don't think EDGE_FALLTHRU is set correctly, what's wrong with
just using single_succ_p here?  A fallthru edge src dominates the
fallthru edge dest, so the sentence above doesn't make sense.

> +		     && single_succ_p (ex->dest))
> +		{
> +		  doms.safe_push (ex->dest);
> +		  ex = single_succ_edge (ex->dest);
> +		}
> +	      doms.safe_push (ex->dest);
> +	    }
> +	  doms.safe_push (e->dest);
>  	}
> -    }
>  
> +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> +      if (updated_doms)
> +	updated_doms->safe_splice (doms);
> +    }
>    free (new_bbs);
>    free (bbs);
>  
> @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo, const_edge e)
>    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
>    unsigned int num_bb = loop->inner? 5 : 2;
>  
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
> +

I think checking the number of BBs is odd, I don't remember anything
in slpeel is specifically tied to that?  I think we can simply drop
this or do you remember anything that would depend on ->num_nodes
being only exactly 5 or 2?

>    /* All loops have an outer scope; the only case loop->outer is NULL is for
>       the function itself.  */
>    if (!loop_outer (loop)
> @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>    basic_block update_bb = update_e->dest;
>  
> +  /* For early exits we'll update the IVs in
> +     vect_update_ivs_after_early_break.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    return;
> +
>    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
>  
>    /* Make sure there exists a single-predecessor exit bb:  */
> @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>        /* Fix phi expressions in the successor bb.  */
>        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
>      }
> +  return;

we don't usually place a return at the end of void functions

> +}
> +
> +/*   Function vect_update_ivs_after_early_break.
> +
> +     "Advance" the induction variables of LOOP to the value they should take
> +     after the execution of LOOP.  This is currently necessary because the
> +     vectorizer does not handle induction variables that are used after the
> +     loop.  Such a situation occurs when the last iterations of LOOP are
> +     peeled, because of the early exit.  With an early exit we always peel the
> +     loop.
> +
> +     Input:
> +     - LOOP_VINFO - a loop info structure for the loop that is going to be
> +		    vectorized. The last few iterations of LOOP were peeled.
> +     - LOOP - a loop that is going to be vectorized. The last few iterations
> +	      of LOOP were peeled.
> +     - VF - The loop vectorization factor.
> +     - NITERS_ORIG - the number of iterations that LOOP executes (before it is
> +		     vectorized). i.e, the number of times the ivs should be
> +		     bumped.
> +     - NITERS_VECTOR - The number of iterations that the vector LOOP executes.
> +     - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
> +		  coming out from LOOP on which there are uses of the LOOP ivs
> +		  (this is the path from LOOP->exit to epilog_loop->preheader).
> +
> +		  The new definitions of the ivs are placed in LOOP->exit.
> +		  The phi args associated with the edge UPDATE_E in the bb
> +		  UPDATE_E->dest are updated accordingly.
> +
> +     Output:
> +       - If available, the LCSSA phi node for the loop IV temp.
> +
> +     Assumption 1: Like the rest of the vectorizer, this function assumes
> +     a single loop exit that has a single predecessor.
> +
> +     Assumption 2: The phi nodes in the LOOP header and in update_bb are
> +     organized in the same order.
> +
> +     Assumption 3: The access function of the ivs is simple enough (see
> +     vect_can_advance_ivs_p).  This assumption will be relaxed in the future.
> +
> +     Assumption 4: Exactly one of the successors of LOOP exit-bb is on a path
> +     coming out of LOOP on which the ivs of LOOP are used (this is the path
> +     that leads to the epilog loop; other paths skip the epilog loop).  This
> +     path starts with the edge UPDATE_E, and its destination (denoted update_bb)
> +     needs to have its phis updated.
> + */
> +
> +static tree
> +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class loop * epilog,
> +				   poly_int64 vf, tree niters_orig,
> +				   tree niters_vector, edge update_e)
> +{
> +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    return NULL;
> +
> +  gphi_iterator gsi, gsi1;
> +  tree ni_name, ivtmp = NULL;
> +  basic_block update_bb = update_e->dest;
> +  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> +  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +  basic_block exit_bb = loop_iv->dest;
> +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
> +
> +  gcc_assert (cond);
> +
> +  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
> +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> +       gsi_next (&gsi), gsi_next (&gsi1))
> +    {
> +      tree init_expr, final_expr, step_expr;
> +      tree type;
> +      tree var, ni, off;
> +      gimple_stmt_iterator last_gsi;
> +
> +      gphi *phi = gsi1.phi ();
> +      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (epilog));

I'm confused about the setup.  update_bb looks like the block with the
loop-closed PHI nodes of 'loop' and the exit (update_e)?  How does
loop_preheader_edge (epilog) come into play here?  That would feed into
epilog->header PHIs?!

It would be nice to name 'gsi[1]', 'update_e' and 'update_bb' in a
better way?  Is update_bb really epilog->header?!

We're missing checking in PHI_ARG_DEF_FROM_EDGE, namely that
E->dest == gimple_bb (PHI) - we're just using E->dest_idx there
which "works" even for totally unrelated edges.

> +      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
> +      if (!phi1)

shouldn't that be an assert?

> +	continue;
> +      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location,
> +			 "vect_update_ivs_after_early_break: phi: %G",
> +			 (gimple *)phi);
> +
> +      /* Skip reduction and virtual phis.  */
> +      if (!iv_phi_p (phi_info))
> +	{
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_NOTE, vect_location,
> +			     "reduc or virtual phi. skip.\n");
> +	  continue;
> +	}
> +
> +      /* For multiple exits where we handle early exits we need to carry on
> +	 with the previous IV as loop iteration was not done because we exited
> +	 early.  As such just grab the original IV.  */
> +      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge (loop));

but this should be taken care of by LC SSA?

OK, have to continue tomorrow from here.

Richard.

> +      if (gimple_cond_lhs (cond) != phi_ssa
> +	  && gimple_cond_rhs (cond) != phi_ssa)
> +	{
> +	  type = TREE_TYPE (gimple_phi_result (phi));
> +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> +	  step_expr = unshare_expr (step_expr);
> +
> +	  /* We previously generated the new merged phi in the same BB as the
> +	     guard.  So use that to perform the scaling on rather than the
> +	     normal loop phi which don't take the early breaks into account.  */
> +	  final_expr = gimple_phi_result (phi1);
> +	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_preheader_edge (loop));
> +
> +	  tree stype = TREE_TYPE (step_expr);
> +	  /* For early break the final loop IV is:
> +	     init + (final - init) * vf which takes into account peeling
> +	     values and non-single steps.  */
> +	  off = fold_build2 (MINUS_EXPR, stype,
> +			     fold_convert (stype, final_expr),
> +			     fold_convert (stype, init_expr));
> +	  /* Now adjust for VF to get the final iteration value.  */
> +	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf));
> +
> +	  /* Adjust the value with the offset.  */
> +	  if (POINTER_TYPE_P (type))
> +	    ni = fold_build_pointer_plus (init_expr, off);
> +	  else
> +	    ni = fold_convert (type,
> +			       fold_build2 (PLUS_EXPR, stype,
> +					    fold_convert (stype, init_expr),
> +					    off));
> +	  var = create_tmp_var (type, "tmp");
> +
> +	  last_gsi = gsi_last_bb (exit_bb);
> +	  gimple_seq new_stmts = NULL;
> +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> +	  /* Exit_bb shouldn't be empty.  */
> +	  if (!gsi_end_p (last_gsi))
> +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> +	  else
> +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> +
> +	  /* Fix phi expressions in the successor bb.  */
> +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> +	}
> +      else
> +	{
> +	  type = TREE_TYPE (gimple_phi_result (phi));
> +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> +	  step_expr = unshare_expr (step_expr);
> +
> +	  /* We previously generated the new merged phi in the same BB as the
> +	     guard.  So use that to perform the scaling on rather than the
> +	     normal loop phi which don't take the early breaks into account.  */
> +	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge (loop));
> +	  tree stype = TREE_TYPE (step_expr);
> +
> +	  if (vf.is_constant ())
> +	    {
> +	      ni = fold_build2 (MULT_EXPR, stype,
> +				fold_convert (stype,
> +					      niters_vector),
> +				build_int_cst (stype, vf));
> +
> +	      ni = fold_build2 (MINUS_EXPR, stype,
> +				fold_convert (stype,
> +					      niters_orig),
> +				fold_convert (stype, ni));
> +	    }
> +	  else
> +	    /* If the loop's VF isn't constant then the loop must have been
> +	       masked, so at the end of the loop we know we have finished
> +	       the entire loop and found nothing.  */
> +	    ni = build_zero_cst (stype);
> +
> +	  ni = fold_convert (type, ni);
> +	  /* We don't support variable n in this version yet.  */
> +	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> +
> +	  var = create_tmp_var (type, "tmp");
> +
> +	  last_gsi = gsi_last_bb (exit_bb);
> +	  gimple_seq new_stmts = NULL;
> +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> +	  /* Exit_bb shouldn't be empty.  */
> +	  if (!gsi_end_p (last_gsi))
> +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> +	  else
> +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> +
> +	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
> +
> +	  for (edge exit : alt_exits)
> +	    adjust_phi_and_debug_stmts (phi1, exit,
> +					build_int_cst (TREE_TYPE (step_expr),
> +						       vf));
> +	  ivtmp = gimple_phi_result (phi1);
> +	}
> +    }
> +
> +  return ivtmp;
>  }
>  
>  /* Return a gimple value containing the misalignment (measured in vector
> @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
>  
>  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
>     this function searches for the corresponding lcssa phi node in exit
> -   bb of LOOP.  If it is found, return the phi result; otherwise return
> -   NULL.  */
> +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
> +   return the phi result; otherwise return NULL.  */
>  
>  static tree
>  find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
> -		gphi *lcssa_phi)
> +		gphi *lcssa_phi, int lcssa_edge = 0)
>  {
>    gphi_iterator gsi;
>    edge e = loop->vec_loop_iv;
>  
> -  gcc_assert (single_pred_p (e->dest));
>    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
>      {
>        gphi *phi = gsi.phi ();
> -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> -	return PHI_RESULT (phi);
> -    }
> -  return NULL_TREE;
> -}
> -
> -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates FIRST/SECOND
> -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> -   edge, the two loops are arranged as below:
> -
> -       preheader_a:
> -     first_loop:
> -       header_a:
> -	 i_1 = PHI<i_0, i_2>;
> -	 ...
> -	 i_2 = i_1 + 1;
> -	 if (cond_a)
> -	   goto latch_a;
> -	 else
> -	   goto between_bb;
> -       latch_a:
> -	 goto header_a;
> -
> -       between_bb:
> -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> -
> -     second_loop:
> -       header_b:
> -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> -				 or with i_2 if no LCSSA phi is created
> -				 under condition of CREATE_LCSSA_FOR_IV_PHIS.
> -	 ...
> -	 i_4 = i_3 + 1;
> -	 if (cond_b)
> -	   goto latch_b;
> -	 else
> -	   goto exit_bb;
> -       latch_b:
> -	 goto header_b;
> -
> -       exit_bb:
> -
> -   This function creates loop closed SSA for the first loop; update the
> -   second loop's PHI nodes by replacing argument on incoming edge with the
> -   result of newly created lcssa PHI nodes.  IF CREATE_LCSSA_FOR_IV_PHIS
> -   is false, Loop closed ssa phis will only be created for non-iv phis for
> -   the first loop.
> -
> -   This function assumes exit bb of the first loop is preheader bb of the
> -   second loop, i.e, between_bb in the example code.  With PHIs updated,
> -   the second loop will execute rest iterations of the first.  */
> -
> -static void
> -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> -				   class loop *first, class loop *second,
> -				   bool create_lcssa_for_iv_phis)
> -{
> -  gphi_iterator gsi_update, gsi_orig;
> -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -
> -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> -  edge second_preheader_e = loop_preheader_edge (second);
> -  basic_block between_bb = single_exit (first)->dest;
> -
> -  gcc_assert (between_bb == second_preheader_e->src);
> -  gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
> -  /* Either the first loop or the second is the loop to be vectorized.  */
> -  gcc_assert (loop == first || loop == second);
> -
> -  for (gsi_orig = gsi_start_phis (first->header),
> -       gsi_update = gsi_start_phis (second->header);
> -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> -    {
> -      gphi *orig_phi = gsi_orig.phi ();
> -      gphi *update_phi = gsi_update.phi ();
> -
> -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> -      /* Generate lcssa PHI node for the first loop.  */
> -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
> -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> +      /* Nested loops with multiple exits can have different no# phi node
> +	 arguments between the main loop and epilog as epilog falls to the
> +	 second loop.  */
> +      if (gimple_phi_num_args (phi) > e->dest_idx)
>  	{
> -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> -	  add_phi_arg (lcssa_phi, arg, single_exit (first), UNKNOWN_LOCATION);
> -	  arg = new_res;
> -	}
> -
> -      /* Update PHI node in the second loop by replacing arg on the loop's
> -	 incoming edge.  */
> -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg);
> -    }
> -
> -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> -     for correct vectorization of live stmts.  */
> -  if (loop == first)
> -    {
> -      basic_block orig_exit = single_exit (second)->dest;
> -      for (gsi_orig = gsi_start_phis (orig_exit);
> -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> -	{
> -	  gphi *orig_phi = gsi_orig.phi ();
> -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p  (orig_arg))
> -	    continue;
> -
> -	  /* Already created in the above loop.   */
> -	  if (find_guard_arg (first, second, orig_phi))
> +	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
> +	  if (TREE_CODE (var) != SSA_NAME)
>  	    continue;
>  
> -	  tree new_res = copy_ssa_name (orig_arg);
> -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> -	  add_phi_arg (lcphi, orig_arg, single_exit (first), UNKNOWN_LOCATION);
> +	  if (operand_equal_p (get_current_def (var),
> +			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> +	    return PHI_RESULT (phi);
>  	}
>      }
> +  return NULL_TREE;
>  }
>  
>  /* Function slpeel_add_loop_guard adds guard skipping from the beginning
> @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
>    gcc_assert (single_succ_p (merge_bb));
>    edge e = single_succ_edge (merge_bb);
>    basic_block exit_bb = e->dest;
> -  gcc_assert (single_pred_p (exit_bb));
> -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
>  
>    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
>      {
>        gphi *update_phi = gsi.phi ();
> -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
>  
>        tree merge_arg = NULL_TREE;
>  
> @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
>        if (!merge_arg)
>  	merge_arg = old_arg;
>  
> -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> +      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e->dest_idx);
>        /* If the var is live after loop but not a reduction, we simply
>  	 use the old arg.  */
>        if (!guard_arg)
> @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
>      }
>  }
>  
> -/* EPILOG loop is duplicated from the original loop for vectorizing,
> -   the arg of its loop closed ssa PHI needs to be updated.  */
> -
> -static void
> -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> -{
> -  gphi_iterator gsi;
> -  basic_block exit_bb = single_exit (epilog)->dest;
> -
> -  gcc_assert (single_pred_p (exit_bb));
> -  edge e = EDGE_PRED (exit_bb, 0);
> -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> -}
> -
>  /* EPILOGUE_VINFO is an epilogue loop that we now know would need to
>     iterate exactly CONST_NITERS times.  Make a final decision about
>     whether the epilogue loop should be used, returning true if so.  */
> @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>      bound_epilog += vf - 1;
>    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>      bound_epilog += 1;
> +  /* For early breaks the scalar loop needs to execute at most VF times
> +     to find the element that caused the break.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    {
> +      bound_epilog = vf;
> +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> +      vect_epilogues = false;
> +    }
>    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
>    poly_uint64 bound_scalar = bound_epilog;
>  
> @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  				  bound_prolog + bound_epilog)
>  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
>  			 || vect_epilogues));
> +
> +  /* We only support early break vectorization on known bounds at this time.
> +     This means that if the vector loop can't be entered then we won't generate
> +     it at all.  So for now force skip_vector off because the additional control
> +     flow messes with the BB exits and we've already analyzed them.  */
> + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> +
>    /* Epilog loop must be executed if the number of iterations for epilog
>       loop is known at compile time, otherwise we need to add a check at
>       the end of vector loop and skip to the end of epilog loop.  */
>    bool skip_epilog = (prolog_peeling < 0
>  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>  		      || !vf.is_constant ());
> -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
> -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> +  /* PEELING_FOR_GAPS and peeling for early breaks are special because epilog
> +     loop must be executed.  */
> +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      skip_epilog = false;
> -
>    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
>    auto_vec<profile_count> original_counts;
>    basic_block *original_bbs = NULL;
> @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>    if (prolog_peeling)
>      {
>        e = loop_preheader_edge (loop);
> -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> -
> +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
>        /* Peel prolog and put it on preheader edge of loop.  */
> -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
> +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e,
> +						       true);
>        gcc_assert (prolog);
>        prolog->force_vectorize = false;
> -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> +
>        first_loop = prolog;
>        reset_original_copy_tables ();
>  
> @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	 as the transformations mentioned above make less or no sense when not
>  	 vectorizing.  */
>        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> +      auto_vec<basic_block> doms;
> +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e, true,
> +						       &doms);
>        gcc_assert (epilog);
>  
>        epilog->force_vectorize = false;
> -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
>  
>        /* Scalar version loop may be preferred.  In this case, add guard
>  	 and skip to epilog.  Note this only happens when the number of
> @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>        vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
>  					update_e);
>  
> +      /* For early breaks we must create a guard to check how many iterations
> +	 of the scalar loop are yet to be performed.  */
> +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	{
> +	  tree ivtmp =
> +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters,
> +					       *niters_vector, update_e);
> +
> +	  gcc_assert (ivtmp);
> +	  tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> +					 fold_convert (TREE_TYPE (niters),
> +						       ivtmp),
> +					 build_zero_cst (TREE_TYPE (niters)));
> +	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +
> +	  /* If we had a fallthrough edge, the guard will the threaded through
> +	     and so we may need to find the actual final edge.  */
> +	  edge final_edge = epilog->vec_loop_iv;
> +	  /* slpeel_update_phi_nodes_for_guard2 expects an empty block in
> +	     between the guard and the exit edge.  It only adds new nodes and
> +	     doesn't update existing one in the current scheme.  */
> +	  basic_block guard_to = split_edge (final_edge);
> +	  edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to,
> +						guard_bb, prob_epilog.invert (),
> +						irred_flag);
> +	  doms.safe_push (guard_bb);
> +
> +	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> +
> +	  /* We must update all the edges from the new guard_bb.  */
> +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> +					      final_edge);
> +
> +	  /* If the loop was versioned we'll have an intermediate BB between
> +	     the guard and the exit.  This intermediate block is required
> +	     because in the current scheme of things the guard block phi
> +	     updating can only maintain LCSSA by creating new blocks.  In this
> +	     case we just need to update the uses in this block as well.  */
> +	  if (loop != scalar_loop)
> +	    {
> +	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
> +		   !gsi_end_p (gsi); gsi_next (&gsi))
> +		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), guard_e));
> +	    }
> +
> +	  flush_pending_stmts (guard_e);
> +	}
> +
>        if (skip_epilog)
>  	{
>  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	    }
>  	  scale_loop_profile (epilog, prob_epilog, 0);
>  	}
> -      else
> -	slpeel_update_phi_nodes_for_lcssa (epilog);
>  
>        unsigned HOST_WIDE_INT bound;
>        if (bound_scalar.is_constant (&bound))
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49f21421958e7ee25eada 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>      partial_load_store_bias (0),
>      peeling_for_gaps (false),
>      peeling_for_niter (false),
> +    early_breaks (false),
> +    non_break_control_flow (false),
>      no_data_dependencies (false),
>      has_mask_store (false),
>      scalar_loop_scaling (profile_probability::uninitialized ()),
> @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
>      th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
>  					  (loop_vinfo));
>  
> +  /* When we have multiple exits and VF is unknown, we must require partial
> +     vectors because the loop bounds is not a minimum but a maximum.  That is to
> +     say we cannot unpredicate the main loop unless we peel or use partial
> +     vectors in the epilogue.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> +    return true;
> +
>    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
>      {
> @@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
>    loop_vinfo->scalar_costs->finish_cost (nullptr);
>  }
>  
> -
>  /* Function vect_analyze_loop_form.
>  
>     Verify that certain CFG restrictions hold, including:
>     - the loop has a pre-header
> -   - the loop has a single entry and exit
> +   - the loop has a single entry
> +   - nested loops can have only a single exit.
>     - the loop exit condition is simple enough
>     - the number of iterations can be analyzed, i.e, a countable loop.  The
>       niter could be analyzed under some assumptions.  */
> @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>                             |
>                          (exit-bb)  */
>  
> -      if (loop->num_nodes != 2)
> -	return opt_result::failure_at (vect_location,
> -				       "not vectorized:"
> -				       " control flow in loop.\n");
> -
>        if (empty_block_p (loop->header))
>  	return opt_result::failure_at (vect_location,
>  				       "not vectorized: empty loop.\n");
> @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>          dump_printf_loc (MSG_NOTE, vect_location,
>  			 "Considering outer-loop vectorization.\n");
>        info->inner_loop_cond = inner.loop_cond;
> +
> +      if (!single_exit (loop))
> +	return opt_result::failure_at (vect_location,
> +				       "not vectorized: multiple exits.\n");
> +
>      }
>  
> -  if (!single_exit (loop))
> -    return opt_result::failure_at (vect_location,
> -				   "not vectorized: multiple exits.\n");
>    if (EDGE_COUNT (loop->header->preds) != 2)
>      return opt_result::failure_at (vect_location,
>  				   "not vectorized:"
> @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>  				   "not vectorized: latch block not empty.\n");
>  
>    /* Make sure the exit is not abnormal.  */
> -  edge e = single_exit (loop);
> -  if (e->flags & EDGE_ABNORMAL)
> -    return opt_result::failure_at (vect_location,
> -				   "not vectorized:"
> -				   " abnormal loop exit edge.\n");
> +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> +  edge nexit = loop->vec_loop_iv;
> +  for (edge e : exits)
> +    {
> +      if (e->flags & EDGE_ABNORMAL)
> +	return opt_result::failure_at (vect_location,
> +				       "not vectorized:"
> +				       " abnormal loop exit edge.\n");
> +      /* Early break BB must be after the main exit BB.  In theory we should
> +	 be able to vectorize the inverse order, but the current flow in the
> +	 the vectorizer always assumes you update successor PHI nodes, not
> +	 preds.  */
> +      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e->src))
> +	return opt_result::failure_at (vect_location,
> +				       "not vectorized:"
> +				       " abnormal loop exit edge order.\n");
> +    }
> +
> +  /* We currently only support early exit loops with known bounds.   */
> +  if (exits.length () > 1)
> +    {
> +      class tree_niter_desc niter;
> +      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter, NULL)
> +	  || chrec_contains_undetermined (niter.niter)
> +	  || !evolution_function_is_constant_p (niter.niter))
> +	return opt_result::failure_at (vect_location,
> +				       "not vectorized:"
> +				       " early breaks only supported on loops"
> +				       " with known iteration bounds.\n");
> +    }
>  
>    info->conds
>      = vect_get_loop_niters (loop, &info->assumptions,
> @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
>    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
>    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
>  
> +  /* Check to see if we're vectorizing multiple exits.  */
> +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> +
>    if (info->inner_loop_cond)
>      {
>        stmt_vec_info inner_loop_cond_info
> @@ -3070,7 +3106,8 @@ start_over:
>  
>    /* If an epilogue loop is required make sure we can create one.  */
>    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      {
>        if (dump_enabled_p ())
>          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
> @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>    basic_block exit_bb;
>    tree scalar_dest;
>    tree scalar_type;
> -  gimple *new_phi = NULL, *phi;
> +  gimple *new_phi = NULL, *phi = NULL;
>    gimple_stmt_iterator exit_gsi;
>    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
>    gimple *epilog_stmt = NULL;
> @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>  	  new_def = gimple_convert (&stmts, vectype, new_def);
>  	  reduc_inputs.quick_push (new_def);
>  	}
> +
> +	/* Update the other exits.  */
> +	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	  {
> +	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> +	    gphi_iterator gsi, gsi1;
> +	    for (edge exit : alt_exits)
> +	      {
> +		/* Find the phi node to propaget into the exit block for each
> +		   exit edge.  */
> +		for (gsi = gsi_start_phis (exit_bb),
> +		     gsi1 = gsi_start_phis (exit->src);
> +		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> +		     gsi_next (&gsi), gsi_next (&gsi1))
> +		  {
> +		    /* There really should be a function to just get the number
> +		       of phis inside a bb.  */
> +		    if (phi && phi == gsi.phi ())
> +		      {
> +			gphi *phi1 = gsi1.phi ();
> +			SET_PHI_ARG_DEF (phi, exit->dest_idx,
> +					 PHI_RESULT (phi1));
> +			break;
> +		      }
> +		  }
> +	      }
> +	  }
>        gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
>      }
>  
> @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info *vinfo,
>  	   new_tree = lane_extract <vec_lhs', ...>;
>  	   lhs' = new_tree;  */
>  
> +      /* When vectorizing an early break, any live statement that is used
> +	 outside of the loop are dead.  The loop will never get to them.
> +	 We could change the liveness value during analysis instead but since
> +	 the below code is invalid anyway just ignore it during codegen.  */
> +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	return true;
> +
>        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>        basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
>        gcc_assert (single_pred_p (exit_bb));
> @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>    /* Make sure there exists a single-predecessor exit bb.  Do this before 
>       versioning.   */
>    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> -  if (! single_pred_p (e->dest))
> +  if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      {
>        split_loop_exit_edge (e, true);
>        if (dump_enabled_p ())
> @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
>      {
>        e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> -      if (! single_pred_p (e->dest))
> +      if (e && ! single_pred_p (e->dest))
>  	{
>  	  split_loop_exit_edge (e, true);
>  	  if (dump_enabled_p ())
> @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>  
>    /* Loops vectorized with a variable factor won't benefit from
>       unrolling/peeling.  */
> -  if (!vf.is_constant ())
> +  if (!vf.is_constant ()
> +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      {
>        loop->unroll = 1;
>        if (dump_enabled_p ())
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82afe78ee2a7c627be45b 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>    *live_p = false;
>  
>    /* cond stmt other than loop exit cond.  */
> -  if (is_ctrl_stmt (stmt_info->stmt)
> -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> -    *relevant = vect_used_in_scope;
> +  if (is_ctrl_stmt (stmt_info->stmt))
> +    {
> +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge, but
> +	 it looks like loop_manip doesn't do that..  So we have to do it
> +	 the hard way.  */
> +      basic_block bb = gimple_bb (stmt_info->stmt);
> +      bool exit_bb = false, early_exit = false;
> +      edge_iterator ei;
> +      edge e;
> +      FOR_EACH_EDGE (e, ei, bb->succs)
> +        if (!flow_bb_inside_loop_p (loop, e->dest))
> +	  {
> +	    exit_bb = true;
> +	    early_exit = loop->vec_loop_iv->src != bb;
> +	    break;
> +	  }
> +
> +      /* We should have processed any exit edge, so an edge not an early
> +	 break must be a loop IV edge.  We need to distinguish between the
> +	 two as we don't want to generate code for the main loop IV.  */
> +      if (exit_bb)
> +	{
> +	  if (early_exit)
> +	    *relevant = vect_used_in_scope;
> +	}
> +      else if (bb->loop_father == loop)
> +	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;
> +    }
>  
>    /* changing memory.  */
>    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>  	*relevant = vect_used_in_scope;
>        }
>  
> +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> +  auto_bitmap exit_bbs;
> +  for (edge exit : exits)
> +    bitmap_set_bit (exit_bbs, exit->dest->index);
> +
>    /* uses outside the loop.  */
>    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter, SSA_OP_DEF)
>      {
> @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>  	      /* We expect all such uses to be in the loop exit phis
>  		 (because of loop closed form)   */
>  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> -	      gcc_assert (bb == single_exit (loop)->dest);
> +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
>  
>                *live_p = true;
>  	    }
> @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
>  	}
>      }
>  
> +  /* Ideally this should be in vect_analyze_loop_form but we haven't seen all
> +     the conds yet at that point and there's no quick way to retrieve them.  */
> +  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
> +    return opt_result::failure_at (vect_location,
> +				   "not vectorized:"
> +				   " unsupported control flow in loop.\n");
> +
>    /* 2. Process_worklist */
>    while (worklist.length () > 0)
>      {
> @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
>  			return res;
>  		    }
>                   }
> +	    }
> +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> +	    {
> +	      enum tree_code rhs_code = gimple_cond_code (cond);
> +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> +	      opt_result res
> +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> +			       loop_vinfo, relevant, &worklist, false);
> +	      if (!res)
> +		return res;
> +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> +				loop_vinfo, relevant, &worklist, false);
> +	      if (!res)
> +		return res;
>              }
>  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
>  	    {
> @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
>  			     node_instance, cost_vec);
>        if (!res)
>  	return res;
> -   }
> +    }
> +
> +  if (is_ctrl_stmt (stmt_info->stmt))
> +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
>  
>    switch (STMT_VINFO_DEF_TYPE (stmt_info))
>      {
>        case vect_internal_def:
> +      case vect_early_exit_def:
>          break;
>  
>        case vect_reduction_def:
> @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
>      {
>        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
>        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
>  		  || (call && gimple_call_lhs (call) == NULL_TREE));
>        *need_to_vectorize = true;
>      }
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b340baff61d18da8e242dd 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -63,6 +63,7 @@ enum vect_def_type {
>    vect_internal_def,
>    vect_induction_def,
>    vect_reduction_def,
> +  vect_early_exit_def,
>    vect_double_reduction_def,
>    vect_nested_cycle,
>    vect_first_order_recurrence,
> @@ -876,6 +877,13 @@ public:
>       we need to peel off iterations at the end to form an epilogue loop.  */
>    bool peeling_for_niter;
>  
> +  /* When the loop has early breaks that we can vectorize we need to peel
> +     the loop for the break finding loop.  */
> +  bool early_breaks;
> +
> +  /* When the loop has a non-early break control flow inside.  */
> +  bool non_break_control_flow;
> +
>    /* List of loop additional IV conditionals found in the loop.  */
>    auto_vec<gcond *> conds;
>  
> @@ -985,9 +993,11 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
>  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
>  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
>  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> +#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)->non_break_control_flow
>  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
>  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
> @@ -1038,8 +1048,8 @@ public:
>     stack.  */
>  typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
>  
> -inline loop_vec_info
> -loop_vec_info_for_loop (class loop *loop)
> +static inline loop_vec_info
> +loop_vec_info_for_loop (const class loop *loop)
>  {
>    return (loop_vec_info) loop->aux;
>  }
> @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
>  {
>    if (bb == (bb->loop_father)->header)
>      return true;
> -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> +
>    return false;
>  }
>  
> @@ -2176,9 +2186,10 @@ class auto_purge_vect_location
>     in tree-vect-loop-manip.cc.  */
>  extern void vect_set_loop_condition (class loop *, loop_vec_info,
>  				     tree, tree, tree, bool);
> -extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge);
> +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info, const_edge);
>  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> -						     class loop *, edge);
> +						    class loop *, edge, bool,
> +						    vec<basic_block> * = NULL);
>  class loop *vect_loop_versioning (loop_vec_info, gimple *);
>  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
>  				    tree *, tree *, tree *, int, bool, bool,
> diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> index a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c67f29f5dc9a8d72235f4 100644
> --- a/gcc/tree-vectorizer.cc
> +++ b/gcc/tree-vectorizer.cc
> @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
>  	 predicates that need to be shared for optimal predicate usage.
>  	 However reassoc will re-order them and prevent CSE from working
>  	 as it should.  CSE only the loop body, not the entry.  */
> -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> +      for (edge exit : exits)
> +	bitmap_set_bit (exit_bbs, exit->dest->index);
>  
>        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
>        do_rpo_vn (fun, entry, exit_bbs);
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
  2023-07-13 17:31   ` Richard Biener
@ 2023-07-13 19:05     ` Tamar Christina
  2023-07-14 13:34       ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-07-13 19:05 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, July 13, 2023 6:31 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Wed, 28 Jun 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch updates the peeling code to maintain LCSSA during peeling.
> > The rewrite also naturally takes into account multiple exits and so it didn't
> > make sense to split them off.
> >
> > For the purposes of peeling the only change for multiple exits is that the
> > secondary exits are all wired to the start of the new loop preheader when
> doing
> > epilogue peeling.
> >
> > When doing prologue peeling the CFG is kept in tact.
> >
> > For both epilogue and prologue peeling we wire through between the two
> loops any
> > PHI nodes that escape the first loop into the second loop if flow_loops is
> > specified.  The reason for this conditionality is because
> > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
> >   - prologue peeling
> >   - epilogue peeling
> >   - loop distribution
> >
> > for the last case the loops should remain independent, and so not be
> connected.
> > Because of this propagation of only used phi nodes get_current_def can be
> used
> > to easily find the previous definitions.  However live statements that are
> > not used inside the loop itself are not propagated (since if unused, the
> moment
> > we add the guard in between the two loops the value across the bypass edge
> can
> > be wrong if the loop has been peeled.)
> >
> > This is dealt with easily enough in find_guard_arg.
> >
> > For multiple exits, while we are in LCSSA form, and have a correct DOM tree,
> the
> > moment we add the guard block we will change the dominators again.  To
> deal with
> > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the blocks
> to
> > update without having to recompute the list of blocks to update again.
> >
> > When multiple exits and doing epilogue peeling we will also temporarily have
> an
> > incorrect VUSES chain for the secondary exits as it anticipates the final result
> > after the VDEFs have been moved.  This will thus be corrected once the code
> > motion is applied.
> >
> > Lastly by doing things this way we can remove the helper functions that
> > previously did lock step iterations to update things as it went along.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Not sure if I get through all of this in one go - so be prepared that
> the rest of the review follows another day.

No worries, I appreciate the reviews!
Just giving some quick replies for when you continue.

> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-loop-distribution.cc (copy_loop_before): Pass flow_loops =
> false.
> > 	* tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
> > 	* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> additional
> > 	assert.
> > 	(vect_set_loop_condition_normal): Skip modifying loop IV for multiple
> > 	exits.
> > 	(slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> peeling.
> > 	(slpeel_can_duplicate_loop_p): Likewise.
> > 	(vect_update_ivs_after_vectorizer): Don't enter this...
> > 	(vect_update_ivs_after_early_break): ...but instead enter here.
> > 	(find_guard_arg): Update for new peeling code.
> > 	(slpeel_update_phi_nodes_for_loops): Remove.
> > 	(slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> checks.
> > 	(slpeel_update_phi_nodes_for_lcssa): Remove.
> > 	(vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > 	non_break_control_flow and early_breaks.
> > 	(vect_need_peeling_or_partial_vectors_p): Force partial vector if
> > 	multiple exits and VLA.
> > 	(vect_analyze_loop_form): Support inner loop multiple exits.
> > 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > 	(vect_create_epilog_for_reduction):  Update live phi nodes.
> > 	(vectorizable_live_operation): Ignore live operations in vector loop
> > 	when multiple exits.
> > 	(vect_transform_loop): Force unrolling for VF loops and multiple exits.
> > 	* tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
> > 	(vect_mark_stmts_to_be_vectorized): Check for non-exit control flow
> and
> > 	analyze gcond params.
> > 	(vect_analyze_stmt): Support gcond.
> > 	* tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
> > 	in RPO pass.
> > 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > 	(LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW):
> New.
> > 	(loop_vec_info_for_loop): Change to const and static.
> > 	(is_loop_header_bb_p): Drop assert.
> > 	(slpeel_can_duplicate_loop_p): Update prototype.
> > 	(class loop): Add early_breaks and non_break_control_flow.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
> > index
> 97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40
> d833a968314a4442b9e 100644
> > --- a/gcc/tree-loop-distribution.cc
> > +++ b/gcc/tree-loop-distribution.cc
> > @@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool
> redirect_lc_phi_defs)
> >    edge preheader = loop_preheader_edge (loop);
> >
> >    initialize_original_copy_tables ();
> > -  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
> > +  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader,
> false);
> >    gcc_assert (res != NULL);
> >
> >    /* When a not last partition is supposed to keep the LC PHIs computed
> > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > index
> 5d398b67e68c7076760854119590f18b19c622b6..79686f6c4945b7139ba
> 377300430c04b7aeefe6c 100644
> > --- a/gcc/tree-ssa-loop-niter.cc
> > +++ b/gcc/tree-ssa-loop-niter.cc
> > @@ -3072,7 +3072,12 @@ loop_only_exit_p (const class loop *loop,
> basic_block *body, const_edge exit)
> >    gimple_stmt_iterator bsi;
> >    unsigned i;
> >
> > -  if (exit != single_exit (loop))
> > +  /* We need to check for alternative exits since exit can be NULL.  */
> 
> You mean we pass in exit == NULL in some cases?  I'm not sure what
> the desired behavior in that case is - can you point out the
> callers you are fixing here?
> 
> I think we should add gcc_assert (exit != nullptr)
> 
> >    for (i = 0; i < loop->num_nodes; i++)
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> 6b93fb3f9af8f2bbdf5dec28f0009177aa5171ab..550d7f40002cf0b58f8a92
> 7cb150edd7c2aa9999 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple *update_phi,
> edge e, tree new_def)
> >  {
> >    tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
> >
> > +  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
> > +	      || orig_def != new_def);
> > +
> >    SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
> >
> >    if (MAY_HAVE_DEBUG_BIND_STMTS)
> > @@ -1292,7 +1295,8 @@ vect_set_loop_condition_normal (loop_vec_info
> loop_vinfo,
> >    gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
> >
> >    /* Record the number of latch iterations.  */
> > -  if (limit == niters)
> > +  if (limit == niters
> > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >      /* Case A: the loop iterates NITERS times.  Subtract one to get the
> >         latch count.  */
> >      loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> > @@ -1303,7 +1307,13 @@ vect_set_loop_condition_normal
> (loop_vec_info loop_vinfo,
> >      loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type,
> >  				       limit, step);
> >
> > -  if (final_iv)
> > +  /* For multiple exits we've already maintained LCSSA form and handled
> > +     the scalar iteration update in the code that deals with the merge
> > +     block and its updated guard.  I could move that code here instead
> > +     of in vect_update_ivs_after_early_break but I have to still deal
> > +     with the updates to the counter `i`.  So for now I'll keep them
> > +     together.  */
> > +  if (final_iv && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >      {
> >        gassign *assign;
> >        edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > @@ -1509,11 +1519,19 @@ vec_init_exit_info (class loop *loop)
> >     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
> >     non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
> >     basic blocks from SCALAR_LOOP instead of LOOP, but to either the
> > -   entry or exit of LOOP.  */
> > +   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to
> SCALAR_LOOP as a
> > +   continuation.  This is correct for cases where one loop continues from the
> > +   other like in the vectorizer, but not true for uses in e.g. loop distribution
> > +   where the loop is duplicated and then modified.
> > +
> 
> but for loop distribution the flow also continues?  I'm not sure what you
> are refering to here.  Do you by chance have a branch with the patches
> installed?

Yup, they're at refs/users/tnfchris/heads/gcc-14-early-break in the repo.

> 
> > +   If UPDATED_DOMS is not NULL it is update with the list of basic blocks
> whoms
> > +   dominators were updated during the peeling.  */
> >
> >  class loop *
> >  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
> > -					class loop *scalar_loop, edge e)
> > +					class loop *scalar_loop, edge e,
> > +					bool flow_loops,
> > +					vec<basic_block> *updated_doms)
> >  {
> >    class loop *new_loop;
> >    basic_block *new_bbs, *bbs, *pbbs;
> > @@ -1602,6 +1620,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *loop,
> >    for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++)
> >      rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
> >
> > +  /* Rename the exit uses.  */
> > +  for (edge exit : get_loop_exit_edges (new_loop))
> > +    for (auto gsi = gsi_start_phis (exit->dest);
> > +	 !gsi_end_p (gsi); gsi_next (&gsi))
> > +      {
> > +	tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit);
> > +	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit));
> > +	if (MAY_HAVE_DEBUG_BIND_STMTS)
> > +	  adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest);
> > +      }
> > +
> > +  /* This condition happens when the loop has been versioned. e.g. due to
> ifcvt
> > +     versioning the loop.  */
> >    if (scalar_loop != loop)
> >      {
> >        /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from
> > @@ -1616,28 +1647,106 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class loop *loop,
> >  						EDGE_SUCC (loop->latch, 0));
> >      }
> >
> > +  vec<edge> alt_exits = loop->vec_loop_alt_exits;
> 
> So 'e' is not one of alt_exits, right?  I wonder if we can simply
> compute the vector from all exits of 'loop' and removing 'e'?
> 
> > +  bool multiple_exits_p = !alt_exits.is_empty ();
> > +  auto_vec<basic_block> doms;
> > +  class loop *update_loop = NULL;
> > +
> >    if (at_exit) /* Add the loop copy at exit.  */
> >      {
> > -      if (scalar_loop != loop)
> > +      if (scalar_loop != loop && new_exit->dest != exit_dest)
> >  	{
> > -	  gphi_iterator gsi;
> >  	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
> > +	  flush_pending_stmts (new_exit);
> > +	}
> >
> > -	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
> > -	       gsi_next (&gsi))
> > +      auto loop_exits = get_loop_exit_edges (loop);
> > +      for (edge exit : loop_exits)
> > +	redirect_edge_and_branch (exit, new_preheader);
> > +
> > +
> 
> one line vertical space too much
> 
> > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > +	 block and the new loop header.  This allows us to later split the
> > +	 preheader block and still find the right LC nodes.  */
> > +      edge latch_new = single_succ_edge (new_preheader);
> > +      edge latch_old = loop_latch_edge (loop);
> > +      hash_set <tree> lcssa_vars;
> > +      for (auto gsi_from = gsi_start_phis (latch_old->dest),
> 
> so that's loop->header (and makes it more clear which PHI nodes you are
> looking at)
> 
> > +	   gsi_to = gsi_start_phis (latch_new->dest);
> 
> likewise new_loop->header
> 
> > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > +	{
> > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, latch_old);
> > +	  /* In all cases, even in early break situations we're only
> > +	     interested in the number of fully executed loop iters.  As such
> > +	     we discard any partially done iteration.  So we simply propagate
> > +	     the phi nodes from the latch to the merge block.  */
> > +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > +
> > +	  lcssa_vars.add (new_arg);
> > +
> > +	  /* Main loop exit should use the final iter value.  */
> > +	  add_phi_arg (lcssa_phi, new_arg, loop->vec_loop_iv,
> UNKNOWN_LOCATION);
> 
> above you are creating the PHI node at e->dest but here add the PHI arg to
> loop->vec_loop_iv - that's 'e' here, no?  Consistency makes it easier
> to follow.  I _think_ this code doesn't need to know about the "special"
> edge.
> 
> > +
> > +	  /* All other exits use the previous iters.  */
> > +	  for (edge e : alt_exits)
> > +	    add_phi_arg (lcssa_phi, gimple_phi_result (from_phi), e,
> > +			 UNKNOWN_LOCATION);
> > +
> > +	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
> > +	}
> > +
> > +      /* Copy over any live SSA vars that may not have been materialized in
> the
> > +	 loops themselves but would be in the exit block.  However when the
> live
> > +	 value is not used inside the loop then we don't need to do this,  if we
> do
> > +	 then when we split the guard block the branch edge can end up
> containing the
> > +	 wrong reference,  particularly if it shares an edge with something that
> has
> > +	 bypassed the loop.  This is not something peeling can check so we
> need to
> > +	 anticipate the usage of the live variable here.  */
> > +      auto exit_map = redirect_edge_var_map_vector (exit);
> 
> Hmm, did I use that in my attemt to refactor things? ...

Indeed, I didn't always use it, but found it was the best way to deal with the
variables being live in various BB after the loop.

> 
> > +      if (exit_map)
> > +        for (auto vm : exit_map)
> > +	{
> > +	  if (lcssa_vars.contains (vm.def)
> > +	      || TREE_CODE (vm.def) != SSA_NAME)
> 
> the latter check is cheaper so it should come first
> 
> > +	    continue;
> > +
> > +	  imm_use_iterator imm_iter;
> > +	  use_operand_p use_p;
> > +	  bool use_in_loop = false;
> > +
> > +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def)
> >  	    {
> > -	      gphi *phi = gsi.phi ();
> > -	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> > -	      location_t orig_locus
> > -		= gimple_phi_arg_location_from_edge (phi, e);
> > +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> > +	      if (flow_bb_inside_loop_p (loop, bb)
> > +		  && !gimple_vuse (USE_STMT (use_p)))
> > +		{
> > +		  use_in_loop = true;
> > +		  break;
> > +		}
> > +	    }
> >
> > -	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
> > +	  if (!use_in_loop)
> > +	    {
> > +	       /* Do a final check to see if it's perhaps defined in the loop.  This
> > +		  mirrors the relevancy analysis's used_outside_scope.  */
> > +	      gimple *stmt = SSA_NAME_DEF_STMT (vm.def);
> > +	      if (!stmt || !flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
> > +		continue;
> >  	    }
> > +
> > +	  tree new_res = copy_ssa_name (vm.result);
> > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > +	  for (edge exit : loop_exits)
> > +	     add_phi_arg (lcssa_phi, vm.def, exit, vm.locus);
> 
> not sure what you are doing above - I guess I have to play with it
> in a debug session.

Yeah if you comment it out one of the testcases should fail.

> 
> >  	}
> > -      redirect_edge_and_branch_force (e, new_preheader);
> > -      flush_pending_stmts (e);
> > +
> >        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
> > -      if (was_imm_dom || duplicate_outer_loop)
> > +
> > +      if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p)
> >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit-
> >src);
> >
> >        /* And remove the non-necessary forwarder again.  Keep the other
> > @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *loop,
> >        delete_basic_block (preheader);
> >        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
> >  			       loop_preheader_edge (scalar_loop)->src);
> > +
> > +      /* Finally after wiring the new epilogue we need to update its main exit
> > +	 to the original function exit we recorded.  Other exits are already
> > +	 correct.  */
> > +      if (multiple_exits_p)
> > +	{
> > +	  for (edge e : get_loop_exit_edges (loop))
> > +	    doms.safe_push (e->dest);
> > +	  update_loop = new_loop;
> > +	  doms.safe_push (exit_dest);
> > +
> > +	  /* Likely a fall-through edge, so update if needed.  */
> > +	  if (single_succ_p (exit_dest))
> > +	    doms.safe_push (single_succ (exit_dest));
> > +	}
> >      }
> >    else /* Add the copy at entry.  */
> >      {
> > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > +	 block and the new loop header.  This allows us to later split the
> > +	 preheader block and still find the right LC nodes.  */
> > +      edge old_latch_loop = loop_latch_edge (loop);
> > +      edge old_latch_init = loop_preheader_edge (loop);
> > +      edge new_latch_loop = loop_latch_edge (new_loop);
> > +      edge new_latch_init = loop_preheader_edge (new_loop);
> > +      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),
> 
> see above
> 
> > +	   gsi_to = gsi_start_phis (old_latch_loop->dest);
> > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > +	{
> > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> new_latch_loop);
> > +	  adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg);
> > +	}
> > +
> >        if (scalar_loop != loop)
> >  	{
> >  	  /* Remove the non-necessary forwarder of scalar_loop again.  */
> > @@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *loop,
> >        delete_basic_block (new_preheader);
> >        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
> >  			       loop_preheader_edge (new_loop)->src);
> > +
> > +      if (multiple_exits_p)
> > +	update_loop = loop;
> >      }
> >
> > -  if (scalar_loop != loop)
> > +  if (multiple_exits_p)
> >      {
> > -      /* Update new_loop->header PHIs, so that on the preheader
> > -	 edge they are the ones from loop rather than scalar_loop.  */
> > -      gphi_iterator gsi_orig, gsi_new;
> > -      edge orig_e = loop_preheader_edge (loop);
> > -      edge new_e = loop_preheader_edge (new_loop);
> > -
> > -      for (gsi_orig = gsi_start_phis (loop->header),
> > -	   gsi_new = gsi_start_phis (new_loop->header);
> > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > +      for (edge e : get_loop_exit_edges (update_loop))
> >  	{
> > -	  gphi *orig_phi = gsi_orig.phi ();
> > -	  gphi *new_phi = gsi_new.phi ();
> > -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > -	  location_t orig_locus
> > -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > -
> > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > +	  edge ex;
> > +	  edge_iterator ei;
> > +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> > +	    {
> > +	      /* Find the first non-fallthrough block as fall-throughs can't
> > +		 dominate other blocks.  */
> > +	      while ((ex->flags & EDGE_FALLTHRU)
> 
> I don't think EDGE_FALLTHRU is set correctly, what's wrong with
> just using single_succ_p here?  A fallthru edge src dominates the
> fallthru edge dest, so the sentence above doesn't make sense.

I wanted to say, that the immediate dominator of a block is never
an fall through block.  At least from what I understood from how
the dominators are calculated in the code, though may have missed
something.

> 
> > +		     && single_succ_p (ex->dest))
> > +		{
> > +		  doms.safe_push (ex->dest);
> > +		  ex = single_succ_edge (ex->dest);
> > +		}
> > +	      doms.safe_push (ex->dest);
> > +	    }
> > +	  doms.safe_push (e->dest);
> >  	}
> > -    }
> >
> > +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > +      if (updated_doms)
> > +	updated_doms->safe_splice (doms);
> > +    }
> >    free (new_bbs);
> >    free (bbs);
> >
> > @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const
> loop_vec_info loop_vinfo, const_edge e)
> >    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> >    unsigned int num_bb = loop->inner? 5 : 2;
> >
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
> > +
> 
> I think checking the number of BBs is odd, I don't remember anything
> in slpeel is specifically tied to that?  I think we can simply drop
> this or do you remember anything that would depend on ->num_nodes
> being only exactly 5 or 2?

Never actually seemed to require it, but they're used as some check to
see if there are unexpected control flow in the loop.

i.e. this would say no if you have an if statement in the loop that wasn't
converted.  The other part of this and the accompanying explanation is in
vect_analyze_loop_form.  In the patch series I had to remove the hard
num_nodes == 2 check from there because number of nodes restricted
things too much.  If you have an empty fall through block, which seems to
happen often between the main exit and the latch block then we'd not
vectorize.

So instead I now rejects loops after analyzing the gcond.  So think this check
can go/needs to be different.

> 
> >    /* All loops have an outer scope; the only case loop->outer is NULL is for
> >       the function itself.  */
> >    if (!loop_outer (loop)
> > @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer
> (loop_vec_info loop_vinfo,
> >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> >    basic_block update_bb = update_e->dest;
> >
> > +  /* For early exits we'll update the IVs in
> > +     vect_update_ivs_after_early_break.  */
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +    return;
> > +
> >    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> >
> >    /* Make sure there exists a single-predecessor exit bb:  */
> > @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer
> (loop_vec_info loop_vinfo,
> >        /* Fix phi expressions in the successor bb.  */
> >        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> >      }
> > +  return;
> 
> we don't usually place a return at the end of void functions
> 
> > +}
> > +
> > +/*   Function vect_update_ivs_after_early_break.
> > +
> > +     "Advance" the induction variables of LOOP to the value they should take
> > +     after the execution of LOOP.  This is currently necessary because the
> > +     vectorizer does not handle induction variables that are used after the
> > +     loop.  Such a situation occurs when the last iterations of LOOP are
> > +     peeled, because of the early exit.  With an early exit we always peel the
> > +     loop.
> > +
> > +     Input:
> > +     - LOOP_VINFO - a loop info structure for the loop that is going to be
> > +		    vectorized. The last few iterations of LOOP were peeled.
> > +     - LOOP - a loop that is going to be vectorized. The last few iterations
> > +	      of LOOP were peeled.
> > +     - VF - The loop vectorization factor.
> > +     - NITERS_ORIG - the number of iterations that LOOP executes (before it is
> > +		     vectorized). i.e, the number of times the ivs should be
> > +		     bumped.
> > +     - NITERS_VECTOR - The number of iterations that the vector LOOP
> executes.
> > +     - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
> > +		  coming out from LOOP on which there are uses of the LOOP
> ivs
> > +		  (this is the path from LOOP->exit to epilog_loop->preheader).
> > +
> > +		  The new definitions of the ivs are placed in LOOP->exit.
> > +		  The phi args associated with the edge UPDATE_E in the bb
> > +		  UPDATE_E->dest are updated accordingly.
> > +
> > +     Output:
> > +       - If available, the LCSSA phi node for the loop IV temp.
> > +
> > +     Assumption 1: Like the rest of the vectorizer, this function assumes
> > +     a single loop exit that has a single predecessor.
> > +
> > +     Assumption 2: The phi nodes in the LOOP header and in update_bb are
> > +     organized in the same order.
> > +
> > +     Assumption 3: The access function of the ivs is simple enough (see
> > +     vect_can_advance_ivs_p).  This assumption will be relaxed in the future.
> > +
> > +     Assumption 4: Exactly one of the successors of LOOP exit-bb is on a path
> > +     coming out of LOOP on which the ivs of LOOP are used (this is the path
> > +     that leads to the epilog loop; other paths skip the epilog loop).  This
> > +     path starts with the edge UPDATE_E, and its destination (denoted
> update_bb)
> > +     needs to have its phis updated.
> > + */
> > +
> > +static tree
> > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class loop *
> epilog,
> > +				   poly_int64 vf, tree niters_orig,
> > +				   tree niters_vector, edge update_e)
> > +{
> > +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +    return NULL;
> > +
> > +  gphi_iterator gsi, gsi1;
> > +  tree ni_name, ivtmp = NULL;
> > +  basic_block update_bb = update_e->dest;
> > +  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > +  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > +  basic_block exit_bb = loop_iv->dest;
> > +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > +  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
> > +
> > +  gcc_assert (cond);
> > +
> > +  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
> > +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > +       gsi_next (&gsi), gsi_next (&gsi1))
> > +    {
> > +      tree init_expr, final_expr, step_expr;
> > +      tree type;
> > +      tree var, ni, off;
> > +      gimple_stmt_iterator last_gsi;
> > +
> > +      gphi *phi = gsi1.phi ();
> > +      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge
> (epilog));
> 
> I'm confused about the setup.  update_bb looks like the block with the
> loop-closed PHI nodes of 'loop' and the exit (update_e)?  How does
> loop_preheader_edge (epilog) come into play here?  That would feed into
> epilog->header PHIs?!

We can't query the type of the phis in the block with the LC PHI nodes, so the
Typical pattern seems to be that we iterate over a block that's part of the loop
and that would have the PHIs in the same order, just so we can get to the
stmt_vec_info.

> 
> It would be nice to name 'gsi[1]', 'update_e' and 'update_bb' in a
> better way?  Is update_bb really epilog->header?!
> 
> We're missing checking in PHI_ARG_DEF_FROM_EDGE, namely that
> E->dest == gimple_bb (PHI) - we're just using E->dest_idx there
> which "works" even for totally unrelated edges.
> 
> > +      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
> > +      if (!phi1)
> 
> shouldn't that be an assert?
> 
> > +	continue;
> > +      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location,
> > +			 "vect_update_ivs_after_early_break: phi: %G",
> > +			 (gimple *)phi);
> > +
> > +      /* Skip reduction and virtual phis.  */
> > +      if (!iv_phi_p (phi_info))
> > +	{
> > +	  if (dump_enabled_p ())
> > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > +			     "reduc or virtual phi. skip.\n");
> > +	  continue;
> > +	}
> > +
> > +      /* For multiple exits where we handle early exits we need to carry on
> > +	 with the previous IV as loop iteration was not done because we exited
> > +	 early.  As such just grab the original IV.  */
> > +      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge
> (loop));
> 
> but this should be taken care of by LC SSA?

It is, the comment is probably missing details, this part just scales the counter
from VF to scalar counts.  It's just a reminder that this scaling is done differently
from normal single exit vectorization.

> 
> OK, have to continue tomorrow from here.

Cheers, Thank you!

Tamar

> 
> Richard.
> 
> > +      if (gimple_cond_lhs (cond) != phi_ssa
> > +	  && gimple_cond_rhs (cond) != phi_ssa)
> > +	{
> > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > +	  step_expr = unshare_expr (step_expr);
> > +
> > +	  /* We previously generated the new merged phi in the same BB as
> the
> > +	     guard.  So use that to perform the scaling on rather than the
> > +	     normal loop phi which don't take the early breaks into account.  */
> > +	  final_expr = gimple_phi_result (phi1);
> > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (),
> loop_preheader_edge (loop));
> > +
> > +	  tree stype = TREE_TYPE (step_expr);
> > +	  /* For early break the final loop IV is:
> > +	     init + (final - init) * vf which takes into account peeling
> > +	     values and non-single steps.  */
> > +	  off = fold_build2 (MINUS_EXPR, stype,
> > +			     fold_convert (stype, final_expr),
> > +			     fold_convert (stype, init_expr));
> > +	  /* Now adjust for VF to get the final iteration value.  */
> > +	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf));
> > +
> > +	  /* Adjust the value with the offset.  */
> > +	  if (POINTER_TYPE_P (type))
> > +	    ni = fold_build_pointer_plus (init_expr, off);
> > +	  else
> > +	    ni = fold_convert (type,
> > +			       fold_build2 (PLUS_EXPR, stype,
> > +					    fold_convert (stype, init_expr),
> > +					    off));
> > +	  var = create_tmp_var (type, "tmp");
> > +
> > +	  last_gsi = gsi_last_bb (exit_bb);
> > +	  gimple_seq new_stmts = NULL;
> > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > +	  /* Exit_bb shouldn't be empty.  */
> > +	  if (!gsi_end_p (last_gsi))
> > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > +	  else
> > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > +
> > +	  /* Fix phi expressions in the successor bb.  */
> > +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > +	}
> > +      else
> > +	{
> > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > +	  step_expr = unshare_expr (step_expr);
> > +
> > +	  /* We previously generated the new merged phi in the same BB as
> the
> > +	     guard.  So use that to perform the scaling on rather than the
> > +	     normal loop phi which don't take the early breaks into account.  */
> > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge
> (loop));
> > +	  tree stype = TREE_TYPE (step_expr);
> > +
> > +	  if (vf.is_constant ())
> > +	    {
> > +	      ni = fold_build2 (MULT_EXPR, stype,
> > +				fold_convert (stype,
> > +					      niters_vector),
> > +				build_int_cst (stype, vf));
> > +
> > +	      ni = fold_build2 (MINUS_EXPR, stype,
> > +				fold_convert (stype,
> > +					      niters_orig),
> > +				fold_convert (stype, ni));
> > +	    }
> > +	  else
> > +	    /* If the loop's VF isn't constant then the loop must have been
> > +	       masked, so at the end of the loop we know we have finished
> > +	       the entire loop and found nothing.  */
> > +	    ni = build_zero_cst (stype);
> > +
> > +	  ni = fold_convert (type, ni);
> > +	  /* We don't support variable n in this version yet.  */
> > +	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> > +
> > +	  var = create_tmp_var (type, "tmp");
> > +
> > +	  last_gsi = gsi_last_bb (exit_bb);
> > +	  gimple_seq new_stmts = NULL;
> > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > +	  /* Exit_bb shouldn't be empty.  */
> > +	  if (!gsi_end_p (last_gsi))
> > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > +	  else
> > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > +
> > +	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
> > +
> > +	  for (edge exit : alt_exits)
> > +	    adjust_phi_and_debug_stmts (phi1, exit,
> > +					build_int_cst (TREE_TYPE (step_expr),
> > +						       vf));
> > +	  ivtmp = gimple_phi_result (phi1);
> > +	}
> > +    }
> > +
> > +  return ivtmp;
> >  }
> >
> >  /* Return a gimple value containing the misalignment (measured in vector
> > @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf
> (loop_vec_info loop_vinfo,
> >
> >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
> >     this function searches for the corresponding lcssa phi node in exit
> > -   bb of LOOP.  If it is found, return the phi result; otherwise return
> > -   NULL.  */
> > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
> > +   return the phi result; otherwise return NULL.  */
> >
> >  static tree
> >  find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
> > -		gphi *lcssa_phi)
> > +		gphi *lcssa_phi, int lcssa_edge = 0)
> >  {
> >    gphi_iterator gsi;
> >    edge e = loop->vec_loop_iv;
> >
> > -  gcc_assert (single_pred_p (e->dest));
> >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> >      {
> >        gphi *phi = gsi.phi ();
> > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > -	return PHI_RESULT (phi);
> > -    }
> > -  return NULL_TREE;
> > -}
> > -
> > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> FIRST/SECOND
> > -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> > -   edge, the two loops are arranged as below:
> > -
> > -       preheader_a:
> > -     first_loop:
> > -       header_a:
> > -	 i_1 = PHI<i_0, i_2>;
> > -	 ...
> > -	 i_2 = i_1 + 1;
> > -	 if (cond_a)
> > -	   goto latch_a;
> > -	 else
> > -	   goto between_bb;
> > -       latch_a:
> > -	 goto header_a;
> > -
> > -       between_bb:
> > -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> > -
> > -     second_loop:
> > -       header_b:
> > -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > -				 or with i_2 if no LCSSA phi is created
> > -				 under condition of
> CREATE_LCSSA_FOR_IV_PHIS.
> > -	 ...
> > -	 i_4 = i_3 + 1;
> > -	 if (cond_b)
> > -	   goto latch_b;
> > -	 else
> > -	   goto exit_bb;
> > -       latch_b:
> > -	 goto header_b;
> > -
> > -       exit_bb:
> > -
> > -   This function creates loop closed SSA for the first loop; update the
> > -   second loop's PHI nodes by replacing argument on incoming edge with the
> > -   result of newly created lcssa PHI nodes.  IF CREATE_LCSSA_FOR_IV_PHIS
> > -   is false, Loop closed ssa phis will only be created for non-iv phis for
> > -   the first loop.
> > -
> > -   This function assumes exit bb of the first loop is preheader bb of the
> > -   second loop, i.e, between_bb in the example code.  With PHIs updated,
> > -   the second loop will execute rest iterations of the first.  */
> > -
> > -static void
> > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > -				   class loop *first, class loop *second,
> > -				   bool create_lcssa_for_iv_phis)
> > -{
> > -  gphi_iterator gsi_update, gsi_orig;
> > -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > -
> > -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > -  edge second_preheader_e = loop_preheader_edge (second);
> > -  basic_block between_bb = single_exit (first)->dest;
> > -
> > -  gcc_assert (between_bb == second_preheader_e->src);
> > -  gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
> > -  /* Either the first loop or the second is the loop to be vectorized.  */
> > -  gcc_assert (loop == first || loop == second);
> > -
> > -  for (gsi_orig = gsi_start_phis (first->header),
> > -       gsi_update = gsi_start_phis (second->header);
> > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > -    {
> > -      gphi *orig_phi = gsi_orig.phi ();
> > -      gphi *update_phi = gsi_update.phi ();
> > -
> > -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> > -      /* Generate lcssa PHI node for the first loop.  */
> > -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> > -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
> > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > +      /* Nested loops with multiple exits can have different no# phi node
> > +	 arguments between the main loop and epilog as epilog falls to the
> > +	 second loop.  */
> > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> >  	{
> > -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > -	  add_phi_arg (lcssa_phi, arg, single_exit (first),
> UNKNOWN_LOCATION);
> > -	  arg = new_res;
> > -	}
> > -
> > -      /* Update PHI node in the second loop by replacing arg on the loop's
> > -	 incoming edge.  */
> > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg);
> > -    }
> > -
> > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > -     for correct vectorization of live stmts.  */
> > -  if (loop == first)
> > -    {
> > -      basic_block orig_exit = single_exit (second)->dest;
> > -      for (gsi_orig = gsi_start_phis (orig_exit);
> > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > -	{
> > -	  gphi *orig_phi = gsi_orig.phi ();
> > -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> > -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> (orig_arg))
> > -	    continue;
> > -
> > -	  /* Already created in the above loop.   */
> > -	  if (find_guard_arg (first, second, orig_phi))
> > +	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
> > +	  if (TREE_CODE (var) != SSA_NAME)
> >  	    continue;
> >
> > -	  tree new_res = copy_ssa_name (orig_arg);
> > -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> UNKNOWN_LOCATION);
> > +	  if (operand_equal_p (get_current_def (var),
> > +			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > +	    return PHI_RESULT (phi);
> >  	}
> >      }
> > +  return NULL_TREE;
> >  }
> >
> >  /* Function slpeel_add_loop_guard adds guard skipping from the beginning
> > @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2 (class
> loop *loop, class loop *epilog,
> >    gcc_assert (single_succ_p (merge_bb));
> >    edge e = single_succ_edge (merge_bb);
> >    basic_block exit_bb = e->dest;
> > -  gcc_assert (single_pred_p (exit_bb));
> > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> >
> >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> >      {
> >        gphi *update_phi = gsi.phi ();
> > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> >
> >        tree merge_arg = NULL_TREE;
> >
> > @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class loop
> *loop, class loop *epilog,
> >        if (!merge_arg)
> >  	merge_arg = old_arg;
> >
> > -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> > +      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e->dest_idx);
> >        /* If the var is live after loop but not a reduction, we simply
> >  	 use the old arg.  */
> >        if (!guard_arg)
> > @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class
> loop *loop, class loop *epilog,
> >      }
> >  }
> >
> > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > -
> > -static void
> > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > -{
> > -  gphi_iterator gsi;
> > -  basic_block exit_bb = single_exit (epilog)->dest;
> > -
> > -  gcc_assert (single_pred_p (exit_bb));
> > -  edge e = EDGE_PRED (exit_bb, 0);
> > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > -}
> > -
> >  /* EPILOGUE_VINFO is an epilogue loop that we now know would need to
> >     iterate exactly CONST_NITERS times.  Make a final decision about
> >     whether the epilogue loop should be used, returning true if so.  */
> > @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >      bound_epilog += vf - 1;
> >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> >      bound_epilog += 1;
> > +  /* For early breaks the scalar loop needs to execute at most VF times
> > +     to find the element that caused the break.  */
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +    {
> > +      bound_epilog = vf;
> > +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> > +      vect_epilogues = false;
> > +    }
> >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> >    poly_uint64 bound_scalar = bound_epilog;
> >
> > @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree niters, tree nitersm1,
> >  				  bound_prolog + bound_epilog)
> >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> >  			 || vect_epilogues));
> > +
> > +  /* We only support early break vectorization on known bounds at this
> time.
> > +     This means that if the vector loop can't be entered then we won't
> generate
> > +     it at all.  So for now force skip_vector off because the additional control
> > +     flow messes with the BB exits and we've already analyzed them.  */
> > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> > +
> >    /* Epilog loop must be executed if the number of iterations for epilog
> >       loop is known at compile time, otherwise we need to add a check at
> >       the end of vector loop and skip to the end of epilog loop.  */
> >    bool skip_epilog = (prolog_peeling < 0
> >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >  		      || !vf.is_constant ());
> > -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
> > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > +  /* PEELING_FOR_GAPS and peeling for early breaks are special because
> epilog
> > +     loop must be executed.  */
> > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >      skip_epilog = false;
> > -
> >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> >    auto_vec<profile_count> original_counts;
> >    basic_block *original_bbs = NULL;
> > @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree niters, tree nitersm1,
> >    if (prolog_peeling)
> >      {
> >        e = loop_preheader_edge (loop);
> > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > -
> > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
> >        /* Peel prolog and put it on preheader edge of loop.  */
> > -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
> > +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e,
> > +						       true);
> >        gcc_assert (prolog);
> >        prolog->force_vectorize = false;
> > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> > +
> >        first_loop = prolog;
> >        reset_original_copy_tables ();
> >
> > @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree niters, tree nitersm1,
> >  	 as the transformations mentioned above make less or no sense when
> not
> >  	 vectorizing.  */
> >        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> > -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > +      auto_vec<basic_block> doms;
> > +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e, true,
> > +						       &doms);
> >        gcc_assert (epilog);
> >
> >        epilog->force_vectorize = false;
> > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
> >
> >        /* Scalar version loop may be preferred.  In this case, add guard
> >  	 and skip to epilog.  Note this only happens when the number of
> > @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >        vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> >  					update_e);
> >
> > +      /* For early breaks we must create a guard to check how many iterations
> > +	 of the scalar loop are yet to be performed.  */
> > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +	{
> > +	  tree ivtmp =
> > +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters,
> > +					       *niters_vector, update_e);
> > +
> > +	  gcc_assert (ivtmp);
> > +	  tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > +					 fold_convert (TREE_TYPE (niters),
> > +						       ivtmp),
> > +					 build_zero_cst (TREE_TYPE (niters)));
> > +	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > +
> > +	  /* If we had a fallthrough edge, the guard will the threaded through
> > +	     and so we may need to find the actual final edge.  */
> > +	  edge final_edge = epilog->vec_loop_iv;
> > +	  /* slpeel_update_phi_nodes_for_guard2 expects an empty block in
> > +	     between the guard and the exit edge.  It only adds new nodes and
> > +	     doesn't update existing one in the current scheme.  */
> > +	  basic_block guard_to = split_edge (final_edge);
> > +	  edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
> guard_to,
> > +						guard_bb, prob_epilog.invert
> (),
> > +						irred_flag);
> > +	  doms.safe_push (guard_bb);
> > +
> > +	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > +
> > +	  /* We must update all the edges from the new guard_bb.  */
> > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > +					      final_edge);
> > +
> > +	  /* If the loop was versioned we'll have an intermediate BB between
> > +	     the guard and the exit.  This intermediate block is required
> > +	     because in the current scheme of things the guard block phi
> > +	     updating can only maintain LCSSA by creating new blocks.  In this
> > +	     case we just need to update the uses in this block as well.  */
> > +	  if (loop != scalar_loop)
> > +	    {
> > +	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
> > +		   !gsi_end_p (gsi); gsi_next (&gsi))
> > +		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (),
> guard_e));
> > +	    }
> > +
> > +	  flush_pending_stmts (guard_e);
> > +	}
> > +
> >        if (skip_epilog)
> >  	{
> >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >  	    }
> >  	  scale_loop_profile (epilog, prob_epilog, 0);
> >  	}
> > -      else
> > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> >
> >        unsigned HOST_WIDE_INT bound;
> >        if (bound_scalar.is_constant (&bound))
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index
> b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49
> f21421958e7ee25eada 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop
> *loop_in, vec_info_shared *shared)
> >      partial_load_store_bias (0),
> >      peeling_for_gaps (false),
> >      peeling_for_niter (false),
> > +    early_breaks (false),
> > +    non_break_control_flow (false),
> >      no_data_dependencies (false),
> >      has_mask_store (false),
> >      scalar_loop_scaling (profile_probability::uninitialized ()),
> > @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p
> (loop_vec_info loop_vinfo)
> >      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> (LOOP_VINFO_ORIG_LOOP_INFO
> >  					  (loop_vinfo));
> >
> > +  /* When we have multiple exits and VF is unknown, we must require
> partial
> > +     vectors because the loop bounds is not a minimum but a maximum.
> That is to
> > +     say we cannot unpredicate the main loop unless we peel or use partial
> > +     vectors in the epilogue.  */
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > +    return true;
> > +
> >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> >      {
> > @@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost
> (loop_vec_info loop_vinfo)
> >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> >  }
> >
> > -
> >  /* Function vect_analyze_loop_form.
> >
> >     Verify that certain CFG restrictions hold, including:
> >     - the loop has a pre-header
> > -   - the loop has a single entry and exit
> > +   - the loop has a single entry
> > +   - nested loops can have only a single exit.
> >     - the loop exit condition is simple enough
> >     - the number of iterations can be analyzed, i.e, a countable loop.  The
> >       niter could be analyzed under some assumptions.  */
> > @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop,
> vect_loop_form_info *info)
> >                             |
> >                          (exit-bb)  */
> >
> > -      if (loop->num_nodes != 2)
> > -	return opt_result::failure_at (vect_location,
> > -				       "not vectorized:"
> > -				       " control flow in loop.\n");
> > -
> >        if (empty_block_p (loop->header))
> >  	return opt_result::failure_at (vect_location,
> >  				       "not vectorized: empty loop.\n");
> > @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop,
> vect_loop_form_info *info)
> >          dump_printf_loc (MSG_NOTE, vect_location,
> >  			 "Considering outer-loop vectorization.\n");
> >        info->inner_loop_cond = inner.loop_cond;
> > +
> > +      if (!single_exit (loop))
> > +	return opt_result::failure_at (vect_location,
> > +				       "not vectorized: multiple exits.\n");
> > +
> >      }
> >
> > -  if (!single_exit (loop))
> > -    return opt_result::failure_at (vect_location,
> > -				   "not vectorized: multiple exits.\n");
> >    if (EDGE_COUNT (loop->header->preds) != 2)
> >      return opt_result::failure_at (vect_location,
> >  				   "not vectorized:"
> > @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop,
> vect_loop_form_info *info)
> >  				   "not vectorized: latch block not empty.\n");
> >
> >    /* Make sure the exit is not abnormal.  */
> > -  edge e = single_exit (loop);
> > -  if (e->flags & EDGE_ABNORMAL)
> > -    return opt_result::failure_at (vect_location,
> > -				   "not vectorized:"
> > -				   " abnormal loop exit edge.\n");
> > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > +  edge nexit = loop->vec_loop_iv;
> > +  for (edge e : exits)
> > +    {
> > +      if (e->flags & EDGE_ABNORMAL)
> > +	return opt_result::failure_at (vect_location,
> > +				       "not vectorized:"
> > +				       " abnormal loop exit edge.\n");
> > +      /* Early break BB must be after the main exit BB.  In theory we should
> > +	 be able to vectorize the inverse order, but the current flow in the
> > +	 the vectorizer always assumes you update successor PHI nodes, not
> > +	 preds.  */
> > +      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e-
> >src))
> > +	return opt_result::failure_at (vect_location,
> > +				       "not vectorized:"
> > +				       " abnormal loop exit edge order.\n");
> > +    }
> > +
> > +  /* We currently only support early exit loops with known bounds.   */
> > +  if (exits.length () > 1)
> > +    {
> > +      class tree_niter_desc niter;
> > +      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter, NULL)
> > +	  || chrec_contains_undetermined (niter.niter)
> > +	  || !evolution_function_is_constant_p (niter.niter))
> > +	return opt_result::failure_at (vect_location,
> > +				       "not vectorized:"
> > +				       " early breaks only supported on loops"
> > +				       " with known iteration bounds.\n");
> > +    }
> >
> >    info->conds
> >      = vect_get_loop_niters (loop, &info->assumptions,
> > @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop,
> vec_info_shared *shared,
> >    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info-
> >alt_loop_conds);
> >    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
> >
> > +  /* Check to see if we're vectorizing multiple exits.  */
> > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > +
> >    if (info->inner_loop_cond)
> >      {
> >        stmt_vec_info inner_loop_cond_info
> > @@ -3070,7 +3106,8 @@ start_over:
> >
> >    /* If an epilogue loop is required make sure we can create one.  */
> >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >      {
> >        if (dump_enabled_p ())
> >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
> > @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
> >    basic_block exit_bb;
> >    tree scalar_dest;
> >    tree scalar_type;
> > -  gimple *new_phi = NULL, *phi;
> > +  gimple *new_phi = NULL, *phi = NULL;
> >    gimple_stmt_iterator exit_gsi;
> >    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> >    gimple *epilog_stmt = NULL;
> > @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction
> (loop_vec_info loop_vinfo,
> >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> >  	  reduc_inputs.quick_push (new_def);
> >  	}
> > +
> > +	/* Update the other exits.  */
> > +	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +	  {
> > +	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > +	    gphi_iterator gsi, gsi1;
> > +	    for (edge exit : alt_exits)
> > +	      {
> > +		/* Find the phi node to propaget into the exit block for each
> > +		   exit edge.  */
> > +		for (gsi = gsi_start_phis (exit_bb),
> > +		     gsi1 = gsi_start_phis (exit->src);
> > +		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > +		     gsi_next (&gsi), gsi_next (&gsi1))
> > +		  {
> > +		    /* There really should be a function to just get the number
> > +		       of phis inside a bb.  */
> > +		    if (phi && phi == gsi.phi ())
> > +		      {
> > +			gphi *phi1 = gsi1.phi ();
> > +			SET_PHI_ARG_DEF (phi, exit->dest_idx,
> > +					 PHI_RESULT (phi1));
> > +			break;
> > +		      }
> > +		  }
> > +	      }
> > +	  }
> >        gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> >      }
> >
> > @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info *vinfo,
> >  	   new_tree = lane_extract <vec_lhs', ...>;
> >  	   lhs' = new_tree;  */
> >
> > +      /* When vectorizing an early break, any live statement that is used
> > +	 outside of the loop are dead.  The loop will never get to them.
> > +	 We could change the liveness value during analysis instead but since
> > +	 the below code is invalid anyway just ignore it during codegen.  */
> > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +	return true;
> > +
> >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> >        basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> >        gcc_assert (single_pred_p (exit_bb));
> > @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info
> loop_vinfo, gimple *loop_vectorized_call)
> >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> >       versioning.   */
> >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > -  if (! single_pred_p (e->dest))
> > +  if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> (loop_vinfo))
> >      {
> >        split_loop_exit_edge (e, true);
> >        if (dump_enabled_p ())
> > @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info
> loop_vinfo, gimple *loop_vectorized_call)
> >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> >      {
> >        e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > -      if (! single_pred_p (e->dest))
> > +      if (e && ! single_pred_p (e->dest))
> >  	{
> >  	  split_loop_exit_edge (e, true);
> >  	  if (dump_enabled_p ())
> > @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info
> loop_vinfo, gimple *loop_vectorized_call)
> >
> >    /* Loops vectorized with a variable factor won't benefit from
> >       unrolling/peeling.  */
> > -  if (!vf.is_constant ())
> > +  if (!vf.is_constant ()
> > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >      {
> >        loop->unroll = 1;
> >        if (dump_enabled_p ())
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82af
> e78ee2a7c627be45b 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> loop_vec_info loop_vinfo,
> >    *live_p = false;
> >
> >    /* cond stmt other than loop exit cond.  */
> > -  if (is_ctrl_stmt (stmt_info->stmt)
> > -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> > -    *relevant = vect_used_in_scope;
> > +  if (is_ctrl_stmt (stmt_info->stmt))
> > +    {
> > +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge, but
> > +	 it looks like loop_manip doesn't do that..  So we have to do it
> > +	 the hard way.  */
> > +      basic_block bb = gimple_bb (stmt_info->stmt);
> > +      bool exit_bb = false, early_exit = false;
> > +      edge_iterator ei;
> > +      edge e;
> > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > +        if (!flow_bb_inside_loop_p (loop, e->dest))
> > +	  {
> > +	    exit_bb = true;
> > +	    early_exit = loop->vec_loop_iv->src != bb;
> > +	    break;
> > +	  }
> > +
> > +      /* We should have processed any exit edge, so an edge not an early
> > +	 break must be a loop IV edge.  We need to distinguish between the
> > +	 two as we don't want to generate code for the main loop IV.  */
> > +      if (exit_bb)
> > +	{
> > +	  if (early_exit)
> > +	    *relevant = vect_used_in_scope;
> > +	}
> > +      else if (bb->loop_father == loop)
> > +	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;
> > +    }
> >
> >    /* changing memory.  */
> >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> > @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> loop_vec_info loop_vinfo,
> >  	*relevant = vect_used_in_scope;
> >        }
> >
> > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > +  auto_bitmap exit_bbs;
> > +  for (edge exit : exits)
> > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > +
> >    /* uses outside the loop.  */
> >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> SSA_OP_DEF)
> >      {
> > @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> loop_vec_info loop_vinfo,
> >  	      /* We expect all such uses to be in the loop exit phis
> >  		 (because of loop closed form)   */
> >  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > -	      gcc_assert (bb == single_exit (loop)->dest);
> > +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
> >
> >                *live_p = true;
> >  	    }
> > @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized
> (loop_vec_info loop_vinfo, bool *fatal)
> >  	}
> >      }
> >
> > +  /* Ideally this should be in vect_analyze_loop_form but we haven't seen all
> > +     the conds yet at that point and there's no quick way to retrieve them.  */
> > +  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
> > +    return opt_result::failure_at (vect_location,
> > +				   "not vectorized:"
> > +				   " unsupported control flow in loop.\n");
> > +
> >    /* 2. Process_worklist */
> >    while (worklist.length () > 0)
> >      {
> > @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized
> (loop_vec_info loop_vinfo, bool *fatal)
> >  			return res;
> >  		    }
> >                   }
> > +	    }
> > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> > +	    {
> > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > +	      opt_result res
> > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > +			       loop_vinfo, relevant, &worklist, false);
> > +	      if (!res)
> > +		return res;
> > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > +				loop_vinfo, relevant, &worklist, false);
> > +	      if (!res)
> > +		return res;
> >              }
> >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> >  	    {
> > @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
> >  			     node_instance, cost_vec);
> >        if (!res)
> >  	return res;
> > -   }
> > +    }
> > +
> > +  if (is_ctrl_stmt (stmt_info->stmt))
> > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> >
> >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> >      {
> >        case vect_internal_def:
> > +      case vect_early_exit_def:
> >          break;
> >
> >        case vect_reduction_def:
> > @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
> >      {
> >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> >        *need_to_vectorize = true;
> >      }
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index
> ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b3
> 40baff61d18da8e242dd 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -63,6 +63,7 @@ enum vect_def_type {
> >    vect_internal_def,
> >    vect_induction_def,
> >    vect_reduction_def,
> > +  vect_early_exit_def,
> >    vect_double_reduction_def,
> >    vect_nested_cycle,
> >    vect_first_order_recurrence,
> > @@ -876,6 +877,13 @@ public:
> >       we need to peel off iterations at the end to form an epilogue loop.  */
> >    bool peeling_for_niter;
> >
> > +  /* When the loop has early breaks that we can vectorize we need to peel
> > +     the loop for the break finding loop.  */
> > +  bool early_breaks;
> > +
> > +  /* When the loop has a non-early break control flow inside.  */
> > +  bool non_break_control_flow;
> > +
> >    /* List of loop additional IV conditionals found in the loop.  */
> >    auto_vec<gcond *> conds;
> >
> > @@ -985,9 +993,11 @@ public:
> >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> >  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> >early_break_conflict
> >  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> >  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> > +#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)-
> >non_break_control_flow
> >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> >no_data_dependencies
> > @@ -1038,8 +1048,8 @@ public:
> >     stack.  */
> >  typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
> >
> > -inline loop_vec_info
> > -loop_vec_info_for_loop (class loop *loop)
> > +static inline loop_vec_info
> > +loop_vec_info_for_loop (const class loop *loop)
> >  {
> >    return (loop_vec_info) loop->aux;
> >  }
> > @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
> >  {
> >    if (bb == (bb->loop_father)->header)
> >      return true;
> > -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> > +
> >    return false;
> >  }
> >
> > @@ -2176,9 +2186,10 @@ class auto_purge_vect_location
> >     in tree-vect-loop-manip.cc.  */
> >  extern void vect_set_loop_condition (class loop *, loop_vec_info,
> >  				     tree, tree, tree, bool);
> > -extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge);
> > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> const_edge);
> >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > -						     class loop *, edge);
> > +						    class loop *, edge, bool,
> > +						    vec<basic_block> * = NULL);
> >  class loop *vect_loop_versioning (loop_vec_info, gimple *);
> >  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
> >  				    tree *, tree *, tree *, int, bool, bool,
> > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > index
> a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c
> 67f29f5dc9a8d72235f4 100644
> > --- a/gcc/tree-vectorizer.cc
> > +++ b/gcc/tree-vectorizer.cc
> > @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
> >  	 predicates that need to be shared for optimal predicate usage.
> >  	 However reassoc will re-order them and prevent CSE from working
> >  	 as it should.  CSE only the loop body, not the entry.  */
> > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> > +      for (edge exit : exits)
> > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> >
> >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> >        do_rpo_vn (fun, entry, exit_bbs);
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> Moerman;
> HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits.
  2023-07-13 11:49   ` Richard Biener
  2023-07-13 12:03     ` Tamar Christina
@ 2023-07-14  9:09     ` Richard Biener
  1 sibling, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-14  9:09 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Thu, 13 Jul 2023, Richard Biener wrote:

> On Wed, 28 Jun 2023, Tamar Christina wrote:
> 
> > Hi All,
> > 
> > For early break vectorization we have to update niters analysis to record and
> > analyze all exits of the loop, and so all conds.
> > 
> > The niters of the loop is still determined by the main/natural exit of the loop
> > as this is the O(n) bounds.  For now we don't do much with the secondary conds,
> > but their assumptions can be used to generate versioning checks later.
> > 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > 
> > Ok for master?
> 
> I probably confused vec_init_exit_info in the previous patch - that said,
> I'm missing a clear function that determines the natural exit of the
> original (if-converted) scalar loop.  As vec_init_exit_info seems
> to (re-)compute that I'll comment on it here.
> 
> +  /* The main IV is to be determined by the block that's the first 
> reachable
> +     block from the latch.  We cannot rely on the order the loop analysis
> +     returns and we don't have any SCEV analysis on the loop.  */
> +  auto_vec <edge> workset;
> +  workset.safe_push (loop_latch_edge (loop));
> +  hash_set <edge> visited;
> +
> +  while (!workset.is_empty ())
> +    {
> +      edge e = workset.pop ();
> +      if (visited.contains (e))
> +       continue;
> +
> +      bool found_p = false;
> +      for (edge ex : e->src->succs)
> +       {
> +         if (exits.contains (ex))
> +           {
> +             found_p = true;
> +             e = ex;
> +             break;
> +           }
> +       }
> +
> +      if (found_p)
> +       {
> +         loop->vec_loop_iv = e;
> +         for (edge ex : exits)
> +           if (e != ex)
> +             loop->vec_loop_alt_exits.safe_push (ex);
> +         return;
> +       }
> +      else
> +       {
> +         for (edge ex : e->src->preds)
> +           workset.safe_insert (0, ex);
> +       }
> +      visited.add (e);
> +    }
> 
> So this greedily follows edges from the latch and takes the first
> exit.  Why's that better than simply choosing the first?
> 
> I'd have done
> 
>  auto_vec<edge> exits = get_loop_exit_edges (loop);
>  for (e : exits)
>    {
>      if (vect_get_loop_niters (...))
>        {
>          if no assumptions use that edge, if assumptions continue
>          searching, maybe ther's an edge w/o assumptions
>        }
>    }
>  use (first) exit with assumptions
> 
> we probably want to know 'may_be_zero' as well and prefer an edge
> without that.  So eventually call number_of_iterations_exit_assumptions
> directly and look for the best niter_desc and pass that to
> vect_get_loop_niters (or re-do the work).
> 
> As said for "copying" the exit to the loop copies use the block mapping.

In case you only support treating the last exit as IV exit you can
also simply walk dominators from the latch block until you reach
the header and pick the first [niter analyzable] exit block you reach.

But I'm not yet sure this restriction exists.

Richard.

> 
> > Thanks,
> > Tamar
> > 
> > gcc/ChangeLog:
> > 
> > 	* tree-vect-loop.cc (vect_get_loop_niters): Analyze all exits and return
> > 	all gconds.
> > 	(vect_analyze_loop_form): Update code checking for conds.
> > 	(vect_create_loop_vinfo): Handle having multiple conds.
> > 	(vect_analyze_loop): Release extra loop conds structures.
> > 	* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
> > 	LOOP_VINFO_LOOP_IV_COND): New.
> > 	(struct vect_loop_form_info): Add conds, loop_iv_cond.
> > 
> > --- inline copy of patch -- 
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 55e69a7ca0b24e0872477141db6f74dbf90b7981..9065811b3b9c2a550baf44768603172b9e26b94b 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -849,80 +849,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo)
> >     in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
> >     niter information holds in ASSUMPTIONS.
> >  
> > -   Return the loop exit condition.  */
> > +   Return the loop exit conditions.  */
> >  
> >  
> > -static gcond *
> > +static vec<gcond *>
> >  vect_get_loop_niters (class loop *loop, tree *assumptions,
> >  		      tree *number_of_iterations, tree *number_of_iterationsm1)
> >  {
> > -  edge exit = single_exit (loop);
> > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > +  vec<gcond *> conds;
> > +  conds.create (exits.length ());
> >    class tree_niter_desc niter_desc;
> >    tree niter_assumptions, niter, may_be_zero;
> > -  gcond *cond = get_loop_exit_condition (loop);
> >  
> >    *assumptions = boolean_true_node;
> >    *number_of_iterationsm1 = chrec_dont_know;
> >    *number_of_iterations = chrec_dont_know;
> > +
> >    DUMP_VECT_SCOPE ("get_loop_niters");
> >  
> > -  if (!exit)
> > -    return cond;
> > +  if (exits.is_empty ())
> > +    return conds;
> >  
> > -  may_be_zero = NULL_TREE;
> > -  if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
> > -      || chrec_contains_undetermined (niter_desc.niter))
> > -    return cond;
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
> > +		     exits.length ());
> >  
> > -  niter_assumptions = niter_desc.assumptions;
> > -  may_be_zero = niter_desc.may_be_zero;
> > -  niter = niter_desc.niter;
> > +  edge exit;
> > +  unsigned int i;
> > +  FOR_EACH_VEC_ELT (exits, i, exit)
> > +    {
> > +      gcond *cond = get_edge_condition (exit);
> > +      if (cond)
> > +	conds.safe_push (cond);
> >  
> > -  if (may_be_zero && integer_zerop (may_be_zero))
> > -    may_be_zero = NULL_TREE;
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i);
> >  
> > -  if (may_be_zero)
> > -    {
> > -      if (COMPARISON_CLASS_P (may_be_zero))
> > +      may_be_zero = NULL_TREE;
> > +      if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL)
> > +          || chrec_contains_undetermined (niter_desc.niter))
> > +	continue;
> > +
> > +      niter_assumptions = niter_desc.assumptions;
> > +      may_be_zero = niter_desc.may_be_zero;
> > +      niter = niter_desc.niter;
> > +
> > +      if (may_be_zero && integer_zerop (may_be_zero))
> > +	may_be_zero = NULL_TREE;
> > +
> > +      if (may_be_zero)
> >  	{
> > -	  /* Try to combine may_be_zero with assumptions, this can simplify
> > -	     computation of niter expression.  */
> > -	  if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> > -	    niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> > -					     niter_assumptions,
> > -					     fold_build1 (TRUTH_NOT_EXPR,
> > -							  boolean_type_node,
> > -							  may_be_zero));
> > +	  if (COMPARISON_CLASS_P (may_be_zero))
> > +	    {
> > +	      /* Try to combine may_be_zero with assumptions, this can simplify
> > +		 computation of niter expression.  */
> > +	      if (niter_assumptions && !integer_nonzerop (niter_assumptions))
> > +		niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
> > +						 niter_assumptions,
> > +						 fold_build1 (TRUTH_NOT_EXPR,
> > +							      boolean_type_node,
> > +							      may_be_zero));
> > +	      else
> > +		niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
> > +				     build_int_cst (TREE_TYPE (niter), 0),
> > +				     rewrite_to_non_trapping_overflow (niter));
> > +
> > +	      may_be_zero = NULL_TREE;
> > +	    }
> > +	  else if (integer_nonzerop (may_be_zero) && exit == loop->vec_loop_iv)
> > +	    {
> > +	      *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> > +	      *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> > +	      continue;
> > +	    }
> >  	  else
> > -	    niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
> > -				 build_int_cst (TREE_TYPE (niter), 0),
> > -				 rewrite_to_non_trapping_overflow (niter));
> > +	    continue;
> > +       }
> >  
> > -	  may_be_zero = NULL_TREE;
> > -	}
> > -      else if (integer_nonzerop (may_be_zero))
> > +      /* Loop assumptions are based off the normal exit.  */
> > +      if (exit == loop->vec_loop_iv)
> >  	{
> > -	  *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0);
> > -	  *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1);
> > -	  return cond;
> > +	  *assumptions = niter_assumptions;
> > +	  *number_of_iterationsm1 = niter;
> > +
> > +	  /* We want the number of loop header executions which is the number
> > +	     of latch executions plus one.
> > +	     ???  For UINT_MAX latch executions this number overflows to zero
> > +	     for loops like do { n++; } while (n != 0);  */
> > +	  if (niter && !chrec_contains_undetermined (niter))
> > +	    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter),
> > +				 unshare_expr (niter),
> > +				 build_int_cst (TREE_TYPE (niter), 1));
> > +	  *number_of_iterations = niter;
> >  	}
> > -      else
> > -	return cond;
> >      }
> >  
> > -  *assumptions = niter_assumptions;
> > -  *number_of_iterationsm1 = niter;
> > -
> > -  /* We want the number of loop header executions which is the number
> > -     of latch executions plus one.
> > -     ???  For UINT_MAX latch executions this number overflows to zero
> > -     for loops like do { n++; } while (n != 0);  */
> > -  if (niter && !chrec_contains_undetermined (niter))
> > -    niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter),
> > -			  build_int_cst (TREE_TYPE (niter), 1));
> > -  *number_of_iterations = niter;
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "All loop exits successfully analyzed.\n");
> >  
> > -  return cond;
> > +  return conds;
> >  }
> >  
> >  /* Function bb_in_loop_p
> > @@ -1768,15 +1794,26 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
> >  				   "not vectorized:"
> >  				   " abnormal loop exit edge.\n");
> >  
> > -  info->loop_cond
> > +  info->conds
> >      = vect_get_loop_niters (loop, &info->assumptions,
> >  			    &info->number_of_iterations,
> >  			    &info->number_of_iterationsm1);
> > -  if (!info->loop_cond)
> > +
> > +  if (info->conds.is_empty ())
> >      return opt_result::failure_at
> >        (vect_location,
> >         "not vectorized: complicated exit condition.\n");
> >  
> > +  /* Determine what the primary and alternate exit conds are.  */
> > +  info->alt_loop_conds.create (info->conds.length () - 1);
> > +  for (gcond *cond : info->conds)
> > +    {
> > +      if (loop->vec_loop_iv->src != gimple_bb (cond))
> > +	info->alt_loop_conds.quick_push (cond);
> > +      else
> > +	info->loop_cond = cond;
> > +    }
> 
> Do you really need those explicitely?  ->conds and ->alt_loop_conds
> looks redundant at least.
> 
> > +
> >    if (integer_zerop (info->assumptions)
> >        || !info->number_of_iterations
> >        || chrec_contains_undetermined (info->number_of_iterations))
> > @@ -1821,8 +1858,14 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
> >    if (!integer_onep (info->assumptions) && !main_loop_info)
> >      LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
> >  
> > -  stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
> > -  STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> > +  for (gcond *cond : info->alt_loop_conds)
> > +    {
> > +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
> > +      STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
> > +    }
> > +  LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds);
> > +  LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
> > +
> >    if (info->inner_loop_cond)
> >      {
> >        stmt_vec_info inner_loop_cond_info
> > @@ -3520,6 +3563,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
> >  		     "***** Choosing vector mode %s\n",
> >  		     GET_MODE_NAME (first_loop_vinfo->vector_mode));
> >  
> > +  loop_form_info.conds.release ();
> > +  loop_form_info.alt_loop_conds.release ();
> > +
> >    /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
> >       enabled, SIMDUID is not set, it is the innermost loop and we have
> >       either already found the loop's SIMDLEN or there was no SIMDLEN to
> > @@ -3631,6 +3677,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
> >  			   (first_loop_vinfo->epilogue_vinfos[0]->vector_mode));
> >      }
> >  
> > +  loop_form_info.conds.release ();
> > +  loop_form_info.alt_loop_conds.release ();
> > +
> >    return first_loop_vinfo;
> >  }
> >  
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index bd5eceb5da7a45ef036cd14609ebe091799320bf..1cc003c12e2447eca878f56cb019236f56e96f85 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -876,6 +876,12 @@ public:
> >       we need to peel off iterations at the end to form an epilogue loop.  */
> >    bool peeling_for_niter;
> >  
> > +  /* List of loop additional IV conditionals found in the loop.  */
> 
> drop "IV"
> 
> > +  auto_vec<gcond *> conds;
> > +
> > +  /* Main loop IV cond.  */
> > +  gcond* loop_iv_cond;
> > +
> 
> I guess I have to look at the followup patches to see how often we
> have to access loop_iv_cond/conds.
> 
> >    /* True if there are no loop carried data dependencies in the loop.
> >       If loop->safelen <= 1, then this is always true, either the loop
> >       didn't have any loop carried data dependencies, or the loop is being
> > @@ -966,6 +972,8 @@ public:
> >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > +#define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > +#define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
> >  #define LOOP_VINFO_SCALAR_LOOP(L)	   (L)->scalar_loop
> >  #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
> > @@ -2353,7 +2361,9 @@ struct vect_loop_form_info
> >    tree number_of_iterations;
> >    tree number_of_iterationsm1;
> >    tree assumptions;
> > +  vec<gcond *> conds;
> >    gcond *loop_cond;
> > +  vec<gcond *> alt_loop_conds;
> >    gcond *inner_loop_cond;
> >  };
> >  extern opt_result vect_analyze_loop_form (class loop *, vect_loop_form_info *);
> > 
> > 
> > 
> > 
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
  2023-07-13 19:05     ` Tamar Christina
@ 2023-07-14 13:34       ` Richard Biener
  2023-07-17 10:56         ` Tamar Christina
                           ` (2 more replies)
  0 siblings, 3 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-14 13:34 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Thu, 13 Jul 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Thursday, July 13, 2023 6:31 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > updates for early break.
> > 
> > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > The rewrite also naturally takes into account multiple exits and so it didn't
> > > make sense to split them off.
> > >
> > > For the purposes of peeling the only change for multiple exits is that the
> > > secondary exits are all wired to the start of the new loop preheader when
> > doing
> > > epilogue peeling.
> > >
> > > When doing prologue peeling the CFG is kept in tact.
> > >
> > > For both epilogue and prologue peeling we wire through between the two
> > loops any
> > > PHI nodes that escape the first loop into the second loop if flow_loops is
> > > specified.  The reason for this conditionality is because
> > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
> > >   - prologue peeling
> > >   - epilogue peeling
> > >   - loop distribution
> > >
> > > for the last case the loops should remain independent, and so not be
> > connected.
> > > Because of this propagation of only used phi nodes get_current_def can be
> > used
> > > to easily find the previous definitions.  However live statements that are
> > > not used inside the loop itself are not propagated (since if unused, the
> > moment
> > > we add the guard in between the two loops the value across the bypass edge
> > can
> > > be wrong if the loop has been peeled.)
> > >
> > > This is dealt with easily enough in find_guard_arg.
> > >
> > > For multiple exits, while we are in LCSSA form, and have a correct DOM tree,
> > the
> > > moment we add the guard block we will change the dominators again.  To
> > deal with
> > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the blocks
> > to
> > > update without having to recompute the list of blocks to update again.
> > >
> > > When multiple exits and doing epilogue peeling we will also temporarily have
> > an
> > > incorrect VUSES chain for the secondary exits as it anticipates the final result
> > > after the VDEFs have been moved.  This will thus be corrected once the code
> > > motion is applied.
> > >
> > > Lastly by doing things this way we can remove the helper functions that
> > > previously did lock step iterations to update things as it went along.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > 
> > Not sure if I get through all of this in one go - so be prepared that
> > the rest of the review follows another day.
> 
> No worries, I appreciate the reviews!
> Just giving some quick replies for when you continue.

Continueing.

> > 
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-loop-distribution.cc (copy_loop_before): Pass flow_loops =
> > false.
> > > 	* tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
> > > 	* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > additional
> > > 	assert.
> > > 	(vect_set_loop_condition_normal): Skip modifying loop IV for multiple
> > > 	exits.
> > > 	(slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> > peeling.
> > > 	(slpeel_can_duplicate_loop_p): Likewise.
> > > 	(vect_update_ivs_after_vectorizer): Don't enter this...
> > > 	(vect_update_ivs_after_early_break): ...but instead enter here.
> > > 	(find_guard_arg): Update for new peeling code.
> > > 	(slpeel_update_phi_nodes_for_loops): Remove.
> > > 	(slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> > checks.
> > > 	(slpeel_update_phi_nodes_for_lcssa): Remove.
> > > 	(vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > > 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > > 	non_break_control_flow and early_breaks.
> > > 	(vect_need_peeling_or_partial_vectors_p): Force partial vector if
> > > 	multiple exits and VLA.
> > > 	(vect_analyze_loop_form): Support inner loop multiple exits.
> > > 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > > 	(vect_create_epilog_for_reduction):  Update live phi nodes.
> > > 	(vectorizable_live_operation): Ignore live operations in vector loop
> > > 	when multiple exits.
> > > 	(vect_transform_loop): Force unrolling for VF loops and multiple exits.
> > > 	* tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
> > > 	(vect_mark_stmts_to_be_vectorized): Check for non-exit control flow
> > and
> > > 	analyze gcond params.
> > > 	(vect_analyze_stmt): Support gcond.
> > > 	* tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
> > > 	in RPO pass.
> > > 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > > 	(LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW):
> > New.
> > > 	(loop_vec_info_for_loop): Change to const and static.
> > > 	(is_loop_header_bb_p): Drop assert.
> > > 	(slpeel_can_duplicate_loop_p): Update prototype.
> > > 	(class loop): Add early_breaks and non_break_control_flow.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
> > > index
> > 97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40
> > d833a968314a4442b9e 100644
> > > --- a/gcc/tree-loop-distribution.cc
> > > +++ b/gcc/tree-loop-distribution.cc
> > > @@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool
> > redirect_lc_phi_defs)
> > >    edge preheader = loop_preheader_edge (loop);
> > >
> > >    initialize_original_copy_tables ();
> > > -  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
> > > +  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader,
> > false);
> > >    gcc_assert (res != NULL);
> > >
> > >    /* When a not last partition is supposed to keep the LC PHIs computed
> > > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > > index
> > 5d398b67e68c7076760854119590f18b19c622b6..79686f6c4945b7139ba
> > 377300430c04b7aeefe6c 100644
> > > --- a/gcc/tree-ssa-loop-niter.cc
> > > +++ b/gcc/tree-ssa-loop-niter.cc
> > > @@ -3072,7 +3072,12 @@ loop_only_exit_p (const class loop *loop,
> > basic_block *body, const_edge exit)
> > >    gimple_stmt_iterator bsi;
> > >    unsigned i;
> > >
> > > -  if (exit != single_exit (loop))
> > > +  /* We need to check for alternative exits since exit can be NULL.  */
> > 
> > You mean we pass in exit == NULL in some cases?  I'm not sure what
> > the desired behavior in that case is - can you point out the
> > callers you are fixing here?
> > 
> > I think we should add gcc_assert (exit != nullptr)
> > 
> > >    for (i = 0; i < loop->num_nodes; i++)
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > 6b93fb3f9af8f2bbdf5dec28f0009177aa5171ab..550d7f40002cf0b58f8a92
> > 7cb150edd7c2aa9999 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple *update_phi,
> > edge e, tree new_def)
> > >  {
> > >    tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
> > >
> > > +  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
> > > +	      || orig_def != new_def);
> > > +
> > >    SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
> > >
> > >    if (MAY_HAVE_DEBUG_BIND_STMTS)
> > > @@ -1292,7 +1295,8 @@ vect_set_loop_condition_normal (loop_vec_info
> > loop_vinfo,
> > >    gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
> > >
> > >    /* Record the number of latch iterations.  */
> > > -  if (limit == niters)
> > > +  if (limit == niters
> > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >      /* Case A: the loop iterates NITERS times.  Subtract one to get the
> > >         latch count.  */
> > >      loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> > > @@ -1303,7 +1307,13 @@ vect_set_loop_condition_normal
> > (loop_vec_info loop_vinfo,
> > >      loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type,
> > >  				       limit, step);
> > >
> > > -  if (final_iv)
> > > +  /* For multiple exits we've already maintained LCSSA form and handled
> > > +     the scalar iteration update in the code that deals with the merge
> > > +     block and its updated guard.  I could move that code here instead
> > > +     of in vect_update_ivs_after_early_break but I have to still deal
> > > +     with the updates to the counter `i`.  So for now I'll keep them
> > > +     together.  */
> > > +  if (final_iv && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >      {
> > >        gassign *assign;
> > >        edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > @@ -1509,11 +1519,19 @@ vec_init_exit_info (class loop *loop)
> > >     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
> > >     non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
> > >     basic blocks from SCALAR_LOOP instead of LOOP, but to either the
> > > -   entry or exit of LOOP.  */
> > > +   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to
> > SCALAR_LOOP as a
> > > +   continuation.  This is correct for cases where one loop continues from the
> > > +   other like in the vectorizer, but not true for uses in e.g. loop distribution
> > > +   where the loop is duplicated and then modified.
> > > +
> > 
> > but for loop distribution the flow also continues?  I'm not sure what you
> > are refering to here.  Do you by chance have a branch with the patches
> > installed?
> 
> Yup, they're at refs/users/tnfchris/heads/gcc-14-early-break in the repo.
> 
> > 
> > > +   If UPDATED_DOMS is not NULL it is update with the list of basic blocks
> > whoms
> > > +   dominators were updated during the peeling.  */
> > >
> > >  class loop *
> > >  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
> > > -					class loop *scalar_loop, edge e)
> > > +					class loop *scalar_loop, edge e,
> > > +					bool flow_loops,
> > > +					vec<basic_block> *updated_doms)
> > >  {
> > >    class loop *new_loop;
> > >    basic_block *new_bbs, *bbs, *pbbs;
> > > @@ -1602,6 +1620,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *loop,
> > >    for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++)
> > >      rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
> > >
> > > +  /* Rename the exit uses.  */
> > > +  for (edge exit : get_loop_exit_edges (new_loop))
> > > +    for (auto gsi = gsi_start_phis (exit->dest);
> > > +	 !gsi_end_p (gsi); gsi_next (&gsi))
> > > +      {
> > > +	tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit);
> > > +	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit));
> > > +	if (MAY_HAVE_DEBUG_BIND_STMTS)
> > > +	  adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest);
> > > +      }
> > > +
> > > +  /* This condition happens when the loop has been versioned. e.g. due to
> > ifcvt
> > > +     versioning the loop.  */
> > >    if (scalar_loop != loop)
> > >      {
> > >        /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from
> > > @@ -1616,28 +1647,106 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > (class loop *loop,
> > >  						EDGE_SUCC (loop->latch, 0));
> > >      }
> > >
> > > +  vec<edge> alt_exits = loop->vec_loop_alt_exits;
> > 
> > So 'e' is not one of alt_exits, right?  I wonder if we can simply
> > compute the vector from all exits of 'loop' and removing 'e'?
> > 
> > > +  bool multiple_exits_p = !alt_exits.is_empty ();
> > > +  auto_vec<basic_block> doms;
> > > +  class loop *update_loop = NULL;
> > > +
> > >    if (at_exit) /* Add the loop copy at exit.  */
> > >      {
> > > -      if (scalar_loop != loop)
> > > +      if (scalar_loop != loop && new_exit->dest != exit_dest)
> > >  	{
> > > -	  gphi_iterator gsi;
> > >  	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
> > > +	  flush_pending_stmts (new_exit);
> > > +	}
> > >
> > > -	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
> > > -	       gsi_next (&gsi))
> > > +      auto loop_exits = get_loop_exit_edges (loop);
> > > +      for (edge exit : loop_exits)
> > > +	redirect_edge_and_branch (exit, new_preheader);
> > > +
> > > +
> > 
> > one line vertical space too much
> > 
> > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > +	 block and the new loop header.  This allows us to later split the
> > > +	 preheader block and still find the right LC nodes.  */
> > > +      edge latch_new = single_succ_edge (new_preheader);
> > > +      edge latch_old = loop_latch_edge (loop);
> > > +      hash_set <tree> lcssa_vars;
> > > +      for (auto gsi_from = gsi_start_phis (latch_old->dest),
> > 
> > so that's loop->header (and makes it more clear which PHI nodes you are
> > looking at)


So I'm now in a debug session - I think that conceptually it would
make more sense to create the LC PHI nodes that are present at the
old exit destination in the new preheader _before_ you redirect them
above and then flush_pending_stmts after redirecting, that should deal
with the copying.

Now, your copying actually iterates over all PHIs in the loop _header_,
so it doesn't actually copy LC PHI nodes but possibly creates additional
ones.  The intent does seem to do this since you want a different value
on those edges for all but the main loop exit.  But then the 
overall comments should better reflect that and maybe you should
do what I suggested anyway and have this loop alter only the alternate
exit LC PHIs?

If you don't flush_pending_stmts on an edge after redirecting you
should call redirect_edge_var_map_clear (edge), otherwise the stale
info might break things later.

> > > +	   gsi_to = gsi_start_phis (latch_new->dest);
> > 
> > likewise new_loop->header
> > 
> > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > +	{
> > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, latch_old);
> > > +	  /* In all cases, even in early break situations we're only
> > > +	     interested in the number of fully executed loop iters.  As such
> > > +	     we discard any partially done iteration.  So we simply propagate
> > > +	     the phi nodes from the latch to the merge block.  */
> > > +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > > +
> > > +	  lcssa_vars.add (new_arg);
> > > +
> > > +	  /* Main loop exit should use the final iter value.  */
> > > +	  add_phi_arg (lcssa_phi, new_arg, loop->vec_loop_iv,
> > UNKNOWN_LOCATION);
> > 
> > above you are creating the PHI node at e->dest but here add the PHI arg to
> > loop->vec_loop_iv - that's 'e' here, no?  Consistency makes it easier
> > to follow.  I _think_ this code doesn't need to know about the "special"
> > edge.
> > 
> > > +
> > > +	  /* All other exits use the previous iters.  */
> > > +	  for (edge e : alt_exits)
> > > +	    add_phi_arg (lcssa_phi, gimple_phi_result (from_phi), e,
> > > +			 UNKNOWN_LOCATION);
> > > +
> > > +	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
> > > +	}
> > > +
> > > +      /* Copy over any live SSA vars that may not have been materialized in
> > the
> > > +	 loops themselves but would be in the exit block.  However when the
> > live
> > > +	 value is not used inside the loop then we don't need to do this,  if we
> > do
> > > +	 then when we split the guard block the branch edge can end up
> > containing the
> > > +	 wrong reference,  particularly if it shares an edge with something that
> > has
> > > +	 bypassed the loop.  This is not something peeling can check so we
> > need to
> > > +	 anticipate the usage of the live variable here.  */
> > > +      auto exit_map = redirect_edge_var_map_vector (exit);
> > 
> > Hmm, did I use that in my attemt to refactor things? ...
> 
> Indeed, I didn't always use it, but found it was the best way to deal with the
> variables being live in various BB after the loop.

As said this whole piece of code is possibly more complicated than 
necessary.  First copying/creating the PHI nodes that are present
at the exit (the old LC PHI nodes), then redirecting edges and flushing
stmts should deal with half of this.

> > 
> > > +      if (exit_map)
> > > +        for (auto vm : exit_map)
> > > +	{
> > > +	  if (lcssa_vars.contains (vm.def)
> > > +	      || TREE_CODE (vm.def) != SSA_NAME)
> > 
> > the latter check is cheaper so it should come first
> > 
> > > +	    continue;
> > > +
> > > +	  imm_use_iterator imm_iter;
> > > +	  use_operand_p use_p;
> > > +	  bool use_in_loop = false;
> > > +
> > > +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def)
> > >  	    {
> > > -	      gphi *phi = gsi.phi ();
> > > -	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> > > -	      location_t orig_locus
> > > -		= gimple_phi_arg_location_from_edge (phi, e);
> > > +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> > > +	      if (flow_bb_inside_loop_p (loop, bb)
> > > +		  && !gimple_vuse (USE_STMT (use_p)))

what's this gimple_vuse check?  I see now for vect-early-break_17.c this
code triggers and ignores

  vect_b[i_18] = _2;

> > > +		{
> > > +		  use_in_loop = true;
> > > +		  break;
> > > +		}
> > > +	    }
> > >
> > > -	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
> > > +	  if (!use_in_loop)
> > > +	    {
> > > +	       /* Do a final check to see if it's perhaps defined in the loop.  This
> > > +		  mirrors the relevancy analysis's used_outside_scope.  */
> > > +	      gimple *stmt = SSA_NAME_DEF_STMT (vm.def);
> > > +	      if (!stmt || !flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
> > > +		continue;
> > >  	    }

since the def was on a LC PHI the def should always be defined inside the 
loop.

> > > +
> > > +	  tree new_res = copy_ssa_name (vm.result);
> > > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > > +	  for (edge exit : loop_exits)
> > > +	     add_phi_arg (lcssa_phi, vm.def, exit, vm.locus);
> > 
> > not sure what you are doing above - I guess I have to play with it
> > in a debug session.
> 
> Yeah if you comment it out one of the testcases should fail.

using new_preheader instead of e->dest would make things clearer.

You are now adding the same arg to every exit (you've just queried the
main exit redirect_edge_var_map_vector).

OK, so I think I understand what you're doing.  If I understand
correctly we know that when we exit the main loop via one of the
early exits we are definitely going to enter the epilog but when
we take the main exit we might not.

Looking at the CFG we create currently this isn't reflected and
this complicates this PHI node updating.  What I'd try to do
is leave redirecting the alternate exits until after
slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
means leaving it almost unchanged besides the LC SSA maintaining
changes.  After that for the multi-exit case split the
epilog preheader edge and redirect all the alternate exits to the
new preheader.  So the CFG becomes

                 <original loop>
                /      |
               /    <main exit w/ original LC PHI>
              /      if (epilog)
   alt exits /        /  \
            /        /    loop around
            |       /    
           preheader with "header" PHIs
              |
          <epilog>

note you need the header PHIs also on the main exit path but you
only need the loop end PHIs there.

It seems so that at least currently the order of things makes
them more complicated than necessary.

> > 
> > >  	}
> > > -      redirect_edge_and_branch_force (e, new_preheader);
> > > -      flush_pending_stmts (e);
> > > +
> > >        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
> > > -      if (was_imm_dom || duplicate_outer_loop)
> > > +
> > > +      if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p)
> > >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit-
> > >src);
> > >
> > >        /* And remove the non-necessary forwarder again.  Keep the other
> > > @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *loop,
> > >        delete_basic_block (preheader);
> > >        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
> > >  			       loop_preheader_edge (scalar_loop)->src);
> > > +
> > > +      /* Finally after wiring the new epilogue we need to update its main exit
> > > +	 to the original function exit we recorded.  Other exits are already
> > > +	 correct.  */
> > > +      if (multiple_exits_p)
> > > +	{
> > > +	  for (edge e : get_loop_exit_edges (loop))
> > > +	    doms.safe_push (e->dest);
> > > +	  update_loop = new_loop;
> > > +	  doms.safe_push (exit_dest);
> > > +
> > > +	  /* Likely a fall-through edge, so update if needed.  */
> > > +	  if (single_succ_p (exit_dest))
> > > +	    doms.safe_push (single_succ (exit_dest));
> > > +	}
> > >      }
> > >    else /* Add the copy at entry.  */
> > >      {
> > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > +	 block and the new loop header.  This allows us to later split the
> > > +	 preheader block and still find the right LC nodes.  */
> > > +      edge old_latch_loop = loop_latch_edge (loop);
> > > +      edge old_latch_init = loop_preheader_edge (loop);
> > > +      edge new_latch_loop = loop_latch_edge (new_loop);
> > > +      edge new_latch_init = loop_preheader_edge (new_loop);
> > > +      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),
> > 
> > see above
> > 
> > > +	   gsi_to = gsi_start_phis (old_latch_loop->dest);
> > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > +	{
> > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > new_latch_loop);
> > > +	  adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg);
> > > +	}
> > > +
> > >        if (scalar_loop != loop)
> > >  	{
> > >  	  /* Remove the non-necessary forwarder of scalar_loop again.  */
> > > @@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> > loop *loop,
> > >        delete_basic_block (new_preheader);
> > >        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
> > >  			       loop_preheader_edge (new_loop)->src);
> > > +
> > > +      if (multiple_exits_p)
> > > +	update_loop = loop;
> > >      }
> > >
> > > -  if (scalar_loop != loop)
> > > +  if (multiple_exits_p)
> > >      {
> > > -      /* Update new_loop->header PHIs, so that on the preheader
> > > -	 edge they are the ones from loop rather than scalar_loop.  */
> > > -      gphi_iterator gsi_orig, gsi_new;
> > > -      edge orig_e = loop_preheader_edge (loop);
> > > -      edge new_e = loop_preheader_edge (new_loop);
> > > -
> > > -      for (gsi_orig = gsi_start_phis (loop->header),
> > > -	   gsi_new = gsi_start_phis (new_loop->header);
> > > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > > +      for (edge e : get_loop_exit_edges (update_loop))
> > >  	{
> > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > -	  gphi *new_phi = gsi_new.phi ();
> > > -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > > -	  location_t orig_locus
> > > -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > > -
> > > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > > +	  edge ex;
> > > +	  edge_iterator ei;
> > > +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> > > +	    {
> > > +	      /* Find the first non-fallthrough block as fall-throughs can't
> > > +		 dominate other blocks.  */
> > > +	      while ((ex->flags & EDGE_FALLTHRU)

For the prologue peeling any early exit we take would skip all other
loops so we can simply leave them and their LC PHI nodes in place.
We need extra PHIs only on the path to the main vector loop.  I
think the comment isn't accurately reflecting what we do.  In
fact we do not add any LC PHI nodes here but simply adjust the
main loop header PHI arguments?

> > I don't think EDGE_FALLTHRU is set correctly, what's wrong with
> > just using single_succ_p here?  A fallthru edge src dominates the
> > fallthru edge dest, so the sentence above doesn't make sense.
> 
> I wanted to say, that the immediate dominator of a block is never
> an fall through block.  At least from what I understood from how
> the dominators are calculated in the code, though may have missed
> something.

 BB1
  |
 BB2
  |
 BB3

here the immediate dominator of BB3 is BB2 and that of BB2 is BB1.

> > 
> > > +		     && single_succ_p (ex->dest))
> > > +		{
> > > +		  doms.safe_push (ex->dest);
> > > +		  ex = single_succ_edge (ex->dest);
> > > +		}
> > > +	      doms.safe_push (ex->dest);
> > > +	    }
> > > +	  doms.safe_push (e->dest);
> > >  	}
> > > -    }
> > >
> > > +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > +      if (updated_doms)
> > > +	updated_doms->safe_splice (doms);
> > > +    }
> > >    free (new_bbs);
> > >    free (bbs);
> > >
> > > @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const
> > loop_vec_info loop_vinfo, const_edge e)
> > >    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> > >    unsigned int num_bb = loop->inner? 5 : 2;
> > >
> > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
> > > +
> > 
> > I think checking the number of BBs is odd, I don't remember anything
> > in slpeel is specifically tied to that?  I think we can simply drop
> > this or do you remember anything that would depend on ->num_nodes
> > being only exactly 5 or 2?
> 
> Never actually seemed to require it, but they're used as some check to
> see if there are unexpected control flow in the loop.
> 
> i.e. this would say no if you have an if statement in the loop that wasn't
> converted.  The other part of this and the accompanying explanation is in
> vect_analyze_loop_form.  In the patch series I had to remove the hard
> num_nodes == 2 check from there because number of nodes restricted
> things too much.  If you have an empty fall through block, which seems to
> happen often between the main exit and the latch block then we'd not
> vectorize.
> 
> So instead I now rejects loops after analyzing the gcond.  So think this check
> can go/needs to be different.

Lets remove it from this function then.

> > 
> > >    /* All loops have an outer scope; the only case loop->outer is NULL is for
> > >       the function itself.  */
> > >    if (!loop_outer (loop)
> > > @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer
> > (loop_vec_info loop_vinfo,
> > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > >    basic_block update_bb = update_e->dest;
> > >
> > > +  /* For early exits we'll update the IVs in
> > > +     vect_update_ivs_after_early_break.  */
> > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +    return;
> > > +
> > >    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > >
> > >    /* Make sure there exists a single-predecessor exit bb:  */
> > > @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer
> > (loop_vec_info loop_vinfo,
> > >        /* Fix phi expressions in the successor bb.  */
> > >        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > >      }
> > > +  return;
> > 
> > we don't usually place a return at the end of void functions
> > 
> > > +}
> > > +
> > > +/*   Function vect_update_ivs_after_early_break.
> > > +
> > > +     "Advance" the induction variables of LOOP to the value they should take
> > > +     after the execution of LOOP.  This is currently necessary because the
> > > +     vectorizer does not handle induction variables that are used after the
> > > +     loop.  Such a situation occurs when the last iterations of LOOP are
> > > +     peeled, because of the early exit.  With an early exit we always peel the
> > > +     loop.
> > > +
> > > +     Input:
> > > +     - LOOP_VINFO - a loop info structure for the loop that is going to be
> > > +		    vectorized. The last few iterations of LOOP were peeled.
> > > +     - LOOP - a loop that is going to be vectorized. The last few iterations
> > > +	      of LOOP were peeled.
> > > +     - VF - The loop vectorization factor.
> > > +     - NITERS_ORIG - the number of iterations that LOOP executes (before it is
> > > +		     vectorized). i.e, the number of times the ivs should be
> > > +		     bumped.
> > > +     - NITERS_VECTOR - The number of iterations that the vector LOOP
> > executes.
> > > +     - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
> > > +		  coming out from LOOP on which there are uses of the LOOP
> > ivs
> > > +		  (this is the path from LOOP->exit to epilog_loop->preheader).
> > > +
> > > +		  The new definitions of the ivs are placed in LOOP->exit.
> > > +		  The phi args associated with the edge UPDATE_E in the bb
> > > +		  UPDATE_E->dest are updated accordingly.
> > > +
> > > +     Output:
> > > +       - If available, the LCSSA phi node for the loop IV temp.
> > > +
> > > +     Assumption 1: Like the rest of the vectorizer, this function assumes
> > > +     a single loop exit that has a single predecessor.
> > > +
> > > +     Assumption 2: The phi nodes in the LOOP header and in update_bb are
> > > +     organized in the same order.
> > > +
> > > +     Assumption 3: The access function of the ivs is simple enough (see
> > > +     vect_can_advance_ivs_p).  This assumption will be relaxed in the future.
> > > +
> > > +     Assumption 4: Exactly one of the successors of LOOP exit-bb is on a path
> > > +     coming out of LOOP on which the ivs of LOOP are used (this is the path
> > > +     that leads to the epilog loop; other paths skip the epilog loop).  This
> > > +     path starts with the edge UPDATE_E, and its destination (denoted
> > update_bb)
> > > +     needs to have its phis updated.
> > > + */
> > > +
> > > +static tree
> > > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class loop *
> > epilog,
> > > +				   poly_int64 vf, tree niters_orig,
> > > +				   tree niters_vector, edge update_e)
> > > +{
> > > +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +    return NULL;
> > > +
> > > +  gphi_iterator gsi, gsi1;
> > > +  tree ni_name, ivtmp = NULL;
> > > +  basic_block update_bb = update_e->dest;
> > > +  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > +  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > +  basic_block exit_bb = loop_iv->dest;
> > > +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > +  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
> > > +
> > > +  gcc_assert (cond);
> > > +
> > > +  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
> > > +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > +       gsi_next (&gsi), gsi_next (&gsi1))
> > > +    {
> > > +      tree init_expr, final_expr, step_expr;
> > > +      tree type;
> > > +      tree var, ni, off;
> > > +      gimple_stmt_iterator last_gsi;
> > > +
> > > +      gphi *phi = gsi1.phi ();
> > > +      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge
> > (epilog));
> > 
> > I'm confused about the setup.  update_bb looks like the block with the
> > loop-closed PHI nodes of 'loop' and the exit (update_e)?  How does
> > loop_preheader_edge (epilog) come into play here?  That would feed into
> > epilog->header PHIs?!
> 
> We can't query the type of the phis in the block with the LC PHI nodes, so the
> Typical pattern seems to be that we iterate over a block that's part of the loop
> and that would have the PHIs in the same order, just so we can get to the
> stmt_vec_info.
> 
> > 
> > It would be nice to name 'gsi[1]', 'update_e' and 'update_bb' in a
> > better way?  Is update_bb really epilog->header?!
> > 
> > We're missing checking in PHI_ARG_DEF_FROM_EDGE, namely that
> > E->dest == gimple_bb (PHI) - we're just using E->dest_idx there
> > which "works" even for totally unrelated edges.
> > 
> > > +      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
> > > +      if (!phi1)
> > 
> > shouldn't that be an assert?
> > 
> > > +	continue;
> > > +      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> > > +      if (dump_enabled_p ())
> > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > +			 "vect_update_ivs_after_early_break: phi: %G",
> > > +			 (gimple *)phi);
> > > +
> > > +      /* Skip reduction and virtual phis.  */
> > > +      if (!iv_phi_p (phi_info))
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > +			     "reduc or virtual phi. skip.\n");
> > > +	  continue;
> > > +	}
> > > +
> > > +      /* For multiple exits where we handle early exits we need to carry on
> > > +	 with the previous IV as loop iteration was not done because we exited
> > > +	 early.  As such just grab the original IV.  */
> > > +      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge
> > (loop));
> > 
> > but this should be taken care of by LC SSA?
> 
> It is, the comment is probably missing details, this part just scales the counter
> from VF to scalar counts.  It's just a reminder that this scaling is done differently
> from normal single exit vectorization.
>
> > 
> > OK, have to continue tomorrow from here.
> 
> Cheers, Thank you!
> 
> Tamar
> 
> > 
> > Richard.
> > 
> > > +      if (gimple_cond_lhs (cond) != phi_ssa
> > > +	  && gimple_cond_rhs (cond) != phi_ssa)

so this is a way to avoid touching the main IV?  Looks a bit fragile to 
me.  Hmm, we're iterating over the main loop header PHIs here?
Can't you check, say, the relevancy of the PHI node instead?  Though
it might also be used as induction.  Can't it be used as alternate
exit like

  for (i)
   {
     if (i & bit)
       break;
   }

and would we need to adjust 'i' then?

> > > +	{
> > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > +	  step_expr = unshare_expr (step_expr);
> > > +
> > > +	  /* We previously generated the new merged phi in the same BB as
> > the
> > > +	     guard.  So use that to perform the scaling on rather than the
> > > +	     normal loop phi which don't take the early breaks into account.  */
> > > +	  final_expr = gimple_phi_result (phi1);
> > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (),
> > loop_preheader_edge (loop));
> > > +
> > > +	  tree stype = TREE_TYPE (step_expr);
> > > +	  /* For early break the final loop IV is:
> > > +	     init + (final - init) * vf which takes into account peeling
> > > +	     values and non-single steps.  */
> > > +	  off = fold_build2 (MINUS_EXPR, stype,
> > > +			     fold_convert (stype, final_expr),
> > > +			     fold_convert (stype, init_expr));
> > > +	  /* Now adjust for VF to get the final iteration value.  */
> > > +	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf));
> > > +
> > > +	  /* Adjust the value with the offset.  */
> > > +	  if (POINTER_TYPE_P (type))
> > > +	    ni = fold_build_pointer_plus (init_expr, off);
> > > +	  else
> > > +	    ni = fold_convert (type,
> > > +			       fold_build2 (PLUS_EXPR, stype,
> > > +					    fold_convert (stype, init_expr),
> > > +					    off));
> > > +	  var = create_tmp_var (type, "tmp");

so how does the non-early break code deal with updating inductions?
And how do you avoid altering this when we flow in from the normal
exit?  That is, you are updating the value in the epilog loop
header but don't you need to instead update the value only on
the alternate exit edges from the main loop (and keep the not
updated value on the main exit edge)?

> > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > +	  gimple_seq new_stmts = NULL;
> > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > +	  /* Exit_bb shouldn't be empty.  */
> > > +	  if (!gsi_end_p (last_gsi))
> > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > +	  else
> > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > +
> > > +	  /* Fix phi expressions in the successor bb.  */
> > > +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > > +	}
> > > +      else
> > > +	{
> > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > +	  step_expr = unshare_expr (step_expr);
> > > +
> > > +	  /* We previously generated the new merged phi in the same BB as
> > the
> > > +	     guard.  So use that to perform the scaling on rather than the
> > > +	     normal loop phi which don't take the early breaks into account.  */
> > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge
> > (loop));
> > > +	  tree stype = TREE_TYPE (step_expr);
> > > +
> > > +	  if (vf.is_constant ())
> > > +	    {
> > > +	      ni = fold_build2 (MULT_EXPR, stype,
> > > +				fold_convert (stype,
> > > +					      niters_vector),
> > > +				build_int_cst (stype, vf));
> > > +
> > > +	      ni = fold_build2 (MINUS_EXPR, stype,
> > > +				fold_convert (stype,
> > > +					      niters_orig),
> > > +				fold_convert (stype, ni));
> > > +	    }
> > > +	  else
> > > +	    /* If the loop's VF isn't constant then the loop must have been
> > > +	       masked, so at the end of the loop we know we have finished
> > > +	       the entire loop and found nothing.  */
> > > +	    ni = build_zero_cst (stype);
> > > +
> > > +	  ni = fold_convert (type, ni);
> > > +	  /* We don't support variable n in this version yet.  */
> > > +	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> > > +
> > > +	  var = create_tmp_var (type, "tmp");
> > > +
> > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > +	  gimple_seq new_stmts = NULL;
> > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > +	  /* Exit_bb shouldn't be empty.  */
> > > +	  if (!gsi_end_p (last_gsi))
> > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > +	  else
> > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > +
> > > +	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
> > > +
> > > +	  for (edge exit : alt_exits)
> > > +	    adjust_phi_and_debug_stmts (phi1, exit,
> > > +					build_int_cst (TREE_TYPE (step_expr),
> > > +						       vf));
> > > +	  ivtmp = gimple_phi_result (phi1);
> > > +	}
> > > +    }
> > > +
> > > +  return ivtmp;
> > >  }
> > >
> > >  /* Return a gimple value containing the misalignment (measured in vector
> > > @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf
> > (loop_vec_info loop_vinfo,
> > >
> > >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
> > >     this function searches for the corresponding lcssa phi node in exit
> > > -   bb of LOOP.  If it is found, return the phi result; otherwise return
> > > -   NULL.  */
> > > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
> > > +   return the phi result; otherwise return NULL.  */
> > >
> > >  static tree
> > >  find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED,
> > > -		gphi *lcssa_phi)
> > > +		gphi *lcssa_phi, int lcssa_edge = 0)
> > >  {
> > >    gphi_iterator gsi;
> > >    edge e = loop->vec_loop_iv;
> > >
> > > -  gcc_assert (single_pred_p (e->dest));
> > >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> > >      {
> > >        gphi *phi = gsi.phi ();
> > > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > -	return PHI_RESULT (phi);
> > > -    }
> > > -  return NULL_TREE;
> > > -}
> > > -
> > > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > FIRST/SECOND
> > > -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> > > -   edge, the two loops are arranged as below:
> > > -
> > > -       preheader_a:
> > > -     first_loop:
> > > -       header_a:
> > > -	 i_1 = PHI<i_0, i_2>;
> > > -	 ...
> > > -	 i_2 = i_1 + 1;
> > > -	 if (cond_a)
> > > -	   goto latch_a;
> > > -	 else
> > > -	   goto between_bb;
> > > -       latch_a:
> > > -	 goto header_a;
> > > -
> > > -       between_bb:
> > > -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> > > -
> > > -     second_loop:
> > > -       header_b:
> > > -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > > -				 or with i_2 if no LCSSA phi is created
> > > -				 under condition of
> > CREATE_LCSSA_FOR_IV_PHIS.
> > > -	 ...
> > > -	 i_4 = i_3 + 1;
> > > -	 if (cond_b)
> > > -	   goto latch_b;
> > > -	 else
> > > -	   goto exit_bb;
> > > -       latch_b:
> > > -	 goto header_b;
> > > -
> > > -       exit_bb:
> > > -
> > > -   This function creates loop closed SSA for the first loop; update the
> > > -   second loop's PHI nodes by replacing argument on incoming edge with the
> > > -   result of newly created lcssa PHI nodes.  IF CREATE_LCSSA_FOR_IV_PHIS
> > > -   is false, Loop closed ssa phis will only be created for non-iv phis for
> > > -   the first loop.
> > > -
> > > -   This function assumes exit bb of the first loop is preheader bb of the
> > > -   second loop, i.e, between_bb in the example code.  With PHIs updated,
> > > -   the second loop will execute rest iterations of the first.  */
> > > -
> > > -static void
> > > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > -				   class loop *first, class loop *second,
> > > -				   bool create_lcssa_for_iv_phis)
> > > -{
> > > -  gphi_iterator gsi_update, gsi_orig;
> > > -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > -
> > > -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > > -  edge second_preheader_e = loop_preheader_edge (second);
> > > -  basic_block between_bb = single_exit (first)->dest;
> > > -
> > > -  gcc_assert (between_bb == second_preheader_e->src);
> > > -  gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb));
> > > -  /* Either the first loop or the second is the loop to be vectorized.  */
> > > -  gcc_assert (loop == first || loop == second);
> > > -
> > > -  for (gsi_orig = gsi_start_phis (first->header),
> > > -       gsi_update = gsi_start_phis (second->header);
> > > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > > -    {
> > > -      gphi *orig_phi = gsi_orig.phi ();
> > > -      gphi *update_phi = gsi_update.phi ();
> > > -
> > > -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> > > -      /* Generate lcssa PHI node for the first loop.  */
> > > -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> > > -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
> > > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > > +      /* Nested loops with multiple exits can have different no# phi node
> > > +	 arguments between the main loop and epilog as epilog falls to the
> > > +	 second loop.  */
> > > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> > >  	{
> > > -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > > -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > > -	  add_phi_arg (lcssa_phi, arg, single_exit (first),
> > UNKNOWN_LOCATION);
> > > -	  arg = new_res;
> > > -	}
> > > -
> > > -      /* Update PHI node in the second loop by replacing arg on the loop's
> > > -	 incoming edge.  */
> > > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg);
> > > -    }
> > > -
> > > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > > -     for correct vectorization of live stmts.  */
> > > -  if (loop == first)
> > > -    {
> > > -      basic_block orig_exit = single_exit (second)->dest;
> > > -      for (gsi_orig = gsi_start_phis (orig_exit);
> > > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > -	{
> > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> > > -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> > (orig_arg))
> > > -	    continue;
> > > -
> > > -	  /* Already created in the above loop.   */
> > > -	  if (find_guard_arg (first, second, orig_phi))
> > > +	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
> > > +	  if (TREE_CODE (var) != SSA_NAME)
> > >  	    continue;
> > >
> > > -	  tree new_res = copy_ssa_name (orig_arg);
> > > -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> > UNKNOWN_LOCATION);
> > > +	  if (operand_equal_p (get_current_def (var),
> > > +			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > > +	    return PHI_RESULT (phi);
> > >  	}
> > >      }
> > > +  return NULL_TREE;
> > >  }
> > >
> > >  /* Function slpeel_add_loop_guard adds guard skipping from the beginning
> > > @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2 (class
> > loop *loop, class loop *epilog,
> > >    gcc_assert (single_succ_p (merge_bb));
> > >    edge e = single_succ_edge (merge_bb);
> > >    basic_block exit_bb = e->dest;
> > > -  gcc_assert (single_pred_p (exit_bb));
> > > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > >
> > >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > >      {
> > >        gphi *update_phi = gsi.phi ();
> > > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > >
> > >        tree merge_arg = NULL_TREE;
> > >
> > > @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class loop
> > *loop, class loop *epilog,
> > >        if (!merge_arg)
> > >  	merge_arg = old_arg;
> > >
> > > -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> > > +      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e->dest_idx);
> > >        /* If the var is live after loop but not a reduction, we simply
> > >  	 use the old arg.  */
> > >        if (!guard_arg)
> > > @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class
> > loop *loop, class loop *epilog,
> > >      }
> > >  }
> > >
> > > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > > -
> > > -static void
> > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > > -{
> > > -  gphi_iterator gsi;
> > > -  basic_block exit_bb = single_exit (epilog)->dest;
> > > -
> > > -  gcc_assert (single_pred_p (exit_bb));
> > > -  edge e = EDGE_PRED (exit_bb, 0);
> > > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > -}
> > > -

I wonder if we can still split these changes out to before early break 
vect?

> > >  /* EPILOGUE_VINFO is an epilogue loop that we now know would need to
> > >     iterate exactly CONST_NITERS times.  Make a final decision about
> > >     whether the epilogue loop should be used, returning true if so.  */
> > > @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >      bound_epilog += vf - 1;
> > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > >      bound_epilog += 1;
> > > +  /* For early breaks the scalar loop needs to execute at most VF times
> > > +     to find the element that caused the break.  */
> > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +    {
> > > +      bound_epilog = vf;
> > > +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> > > +      vect_epilogues = false;
> > > +    }
> > >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > >    poly_uint64 bound_scalar = bound_epilog;
> > >
> > > @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree niters, tree nitersm1,
> > >  				  bound_prolog + bound_epilog)
> > >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > >  			 || vect_epilogues));
> > > +
> > > +  /* We only support early break vectorization on known bounds at this
> > time.
> > > +     This means that if the vector loop can't be entered then we won't
> > generate
> > > +     it at all.  So for now force skip_vector off because the additional control
> > > +     flow messes with the BB exits and we've already analyzed them.  */
> > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> > > +

I think it should be as "easy" as entering the epilog via the block taking
the regular exit?

> > >    /* Epilog loop must be executed if the number of iterations for epilog
> > >       loop is known at compile time, otherwise we need to add a check at
> > >       the end of vector loop and skip to the end of epilog loop.  */
> > >    bool skip_epilog = (prolog_peeling < 0
> > >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > >  		      || !vf.is_constant ());
> > > -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
> > > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > +  /* PEELING_FOR_GAPS and peeling for early breaks are special because
> > epilog
> > > +     loop must be executed.  */
> > > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >      skip_epilog = false;
> > > -
> > >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > >    auto_vec<profile_count> original_counts;
> > >    basic_block *original_bbs = NULL;
> > > @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree niters, tree nitersm1,
> > >    if (prolog_peeling)
> > >      {
> > >        e = loop_preheader_edge (loop);
> > > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > > -
> > > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
> > >        /* Peel prolog and put it on preheader edge of loop.  */
> > > -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e);
> > > +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e,
> > > +						       true);
> > >        gcc_assert (prolog);
> > >        prolog->force_vectorize = false;
> > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> > > +
> > >        first_loop = prolog;
> > >        reset_original_copy_tables ();
> > >
> > > @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree niters, tree nitersm1,
> > >  	 as the transformations mentioned above make less or no sense when
> > not
> > >  	 vectorizing.  */
> > >        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> > > -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > > +      auto_vec<basic_block> doms;
> > > +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e, true,
> > > +						       &doms);
> > >        gcc_assert (epilog);
> > >
> > >        epilog->force_vectorize = false;
> > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
> > >
> > >        /* Scalar version loop may be preferred.  In this case, add guard
> > >  	 and skip to epilog.  Note this only happens when the number of
> > > @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >        vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > >  					update_e);
> > >
> > > +      /* For early breaks we must create a guard to check how many iterations
> > > +	 of the scalar loop are yet to be performed.  */

We have this check anyway, no?  In fact don't we know that we always enter
the epilog (see above)?

> > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +	{
> > > +	  tree ivtmp =
> > > +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters,
> > > +					       *niters_vector, update_e);
> > > +
> > > +	  gcc_assert (ivtmp);
> > > +	  tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > +					 fold_convert (TREE_TYPE (niters),
> > > +						       ivtmp),
> > > +					 build_zero_cst (TREE_TYPE (niters)));
> > > +	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > +
> > > +	  /* If we had a fallthrough edge, the guard will the threaded through
> > > +	     and so we may need to find the actual final edge.  */
> > > +	  edge final_edge = epilog->vec_loop_iv;
> > > +	  /* slpeel_update_phi_nodes_for_guard2 expects an empty block in
> > > +	     between the guard and the exit edge.  It only adds new nodes and
> > > +	     doesn't update existing one in the current scheme.  */
> > > +	  basic_block guard_to = split_edge (final_edge);
> > > +	  edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
> > guard_to,
> > > +						guard_bb, prob_epilog.invert
> > (),
> > > +						irred_flag);
> > > +	  doms.safe_push (guard_bb);
> > > +
> > > +	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > +
> > > +	  /* We must update all the edges from the new guard_bb.  */
> > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > +					      final_edge);
> > > +
> > > +	  /* If the loop was versioned we'll have an intermediate BB between
> > > +	     the guard and the exit.  This intermediate block is required
> > > +	     because in the current scheme of things the guard block phi
> > > +	     updating can only maintain LCSSA by creating new blocks.  In this
> > > +	     case we just need to update the uses in this block as well.  */
> > > +	  if (loop != scalar_loop)
> > > +	    {
> > > +	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
> > > +		   !gsi_end_p (gsi); gsi_next (&gsi))
> > > +		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (),
> > guard_e));
> > > +	    }
> > > +
> > > +	  flush_pending_stmts (guard_e);
> > > +	}
> > > +
> > >        if (skip_epilog)
> > >  	{
> > >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >  	    }
> > >  	  scale_loop_profile (epilog, prob_epilog, 0);
> > >  	}
> > > -      else
> > > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> > >
> > >        unsigned HOST_WIDE_INT bound;
> > >        if (bound_scalar.is_constant (&bound))
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index
> > b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49
> > f21421958e7ee25eada 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop
> > *loop_in, vec_info_shared *shared)
> > >      partial_load_store_bias (0),
> > >      peeling_for_gaps (false),
> > >      peeling_for_niter (false),
> > > +    early_breaks (false),
> > > +    non_break_control_flow (false),
> > >      no_data_dependencies (false),
> > >      has_mask_store (false),
> > >      scalar_loop_scaling (profile_probability::uninitialized ()),
> > > @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p
> > (loop_vec_info loop_vinfo)
> > >      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> > (LOOP_VINFO_ORIG_LOOP_INFO
> > >  					  (loop_vinfo));
> > >
> > > +  /* When we have multiple exits and VF is unknown, we must require
> > partial
> > > +     vectors because the loop bounds is not a minimum but a maximum.
> > That is to
> > > +     say we cannot unpredicate the main loop unless we peel or use partial
> > > +     vectors in the epilogue.  */
> > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > > +    return true;
> > > +
> > >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> > >      {
> > > @@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost
> > (loop_vec_info loop_vinfo)
> > >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> > >  }
> > >
> > > -
> > >  /* Function vect_analyze_loop_form.
> > >
> > >     Verify that certain CFG restrictions hold, including:
> > >     - the loop has a pre-header
> > > -   - the loop has a single entry and exit
> > > +   - the loop has a single entry
> > > +   - nested loops can have only a single exit.
> > >     - the loop exit condition is simple enough
> > >     - the number of iterations can be analyzed, i.e, a countable loop.  The
> > >       niter could be analyzed under some assumptions.  */
> > > @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop,
> > vect_loop_form_info *info)
> > >                             |
> > >                          (exit-bb)  */
> > >
> > > -      if (loop->num_nodes != 2)
> > > -	return opt_result::failure_at (vect_location,
> > > -				       "not vectorized:"
> > > -				       " control flow in loop.\n");
> > > -
> > >        if (empty_block_p (loop->header))
> > >  	return opt_result::failure_at (vect_location,
> > >  				       "not vectorized: empty loop.\n");
> > > @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop,
> > vect_loop_form_info *info)
> > >          dump_printf_loc (MSG_NOTE, vect_location,
> > >  			 "Considering outer-loop vectorization.\n");
> > >        info->inner_loop_cond = inner.loop_cond;
> > > +
> > > +      if (!single_exit (loop))
> > > +	return opt_result::failure_at (vect_location,
> > > +				       "not vectorized: multiple exits.\n");
> > > +
> > >      }
> > >
> > > -  if (!single_exit (loop))
> > > -    return opt_result::failure_at (vect_location,
> > > -				   "not vectorized: multiple exits.\n");
> > >    if (EDGE_COUNT (loop->header->preds) != 2)
> > >      return opt_result::failure_at (vect_location,
> > >  				   "not vectorized:"
> > > @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop,
> > vect_loop_form_info *info)
> > >  				   "not vectorized: latch block not empty.\n");
> > >
> > >    /* Make sure the exit is not abnormal.  */
> > > -  edge e = single_exit (loop);
> > > -  if (e->flags & EDGE_ABNORMAL)
> > > -    return opt_result::failure_at (vect_location,
> > > -				   "not vectorized:"
> > > -				   " abnormal loop exit edge.\n");
> > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > +  edge nexit = loop->vec_loop_iv;
> > > +  for (edge e : exits)
> > > +    {
> > > +      if (e->flags & EDGE_ABNORMAL)
> > > +	return opt_result::failure_at (vect_location,
> > > +				       "not vectorized:"
> > > +				       " abnormal loop exit edge.\n");
> > > +      /* Early break BB must be after the main exit BB.  In theory we should
> > > +	 be able to vectorize the inverse order, but the current flow in the
> > > +	 the vectorizer always assumes you update successor PHI nodes, not
> > > +	 preds.  */
> > > +      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e-
> > >src))
> > > +	return opt_result::failure_at (vect_location,
> > > +				       "not vectorized:"
> > > +				       " abnormal loop exit edge order.\n");

"unsupported loop exit order", but I don't understand the comment.

> > > +    }
> > > +
> > > +  /* We currently only support early exit loops with known bounds.   */

Btw, why's that?  Is that because we don't support the loop-around edge?
IMHO this is the most serious limitation (and as said above it should be
trivial to fix).

> > > +  if (exits.length () > 1)
> > > +    {
> > > +      class tree_niter_desc niter;
> > > +      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter, NULL)
> > > +	  || chrec_contains_undetermined (niter.niter)
> > > +	  || !evolution_function_is_constant_p (niter.niter))
> > > +	return opt_result::failure_at (vect_location,
> > > +				       "not vectorized:"
> > > +				       " early breaks only supported on loops"
> > > +				       " with known iteration bounds.\n");
> > > +    }
> > >
> > >    info->conds
> > >      = vect_get_loop_niters (loop, &info->assumptions,
> > > @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop,
> > vec_info_shared *shared,
> > >    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info-
> > >alt_loop_conds);
> > >    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
> > >
> > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > +
> > >    if (info->inner_loop_cond)
> > >      {
> > >        stmt_vec_info inner_loop_cond_info
> > > @@ -3070,7 +3106,8 @@ start_over:
> > >
> > >    /* If an epilogue loop is required make sure we can create one.  */
> > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >      {
> > >        if (dump_enabled_p ())
> > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
> > > @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> > loop_vinfo,
> > >    basic_block exit_bb;
> > >    tree scalar_dest;
> > >    tree scalar_type;
> > > -  gimple *new_phi = NULL, *phi;
> > > +  gimple *new_phi = NULL, *phi = NULL;
> > >    gimple_stmt_iterator exit_gsi;
> > >    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> > >    gimple *epilog_stmt = NULL;
> > > @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction
> > (loop_vec_info loop_vinfo,
> > >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> > >  	  reduc_inputs.quick_push (new_def);
> > >  	}
> > > +
> > > +	/* Update the other exits.  */
> > > +	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +	  {
> > > +	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > +	    gphi_iterator gsi, gsi1;
> > > +	    for (edge exit : alt_exits)
> > > +	      {
> > > +		/* Find the phi node to propaget into the exit block for each
> > > +		   exit edge.  */
> > > +		for (gsi = gsi_start_phis (exit_bb),
> > > +		     gsi1 = gsi_start_phis (exit->src);

exit->src == loop->header, right?  I think this won't work for multiple
alternate exits.  It's probably easier to do this where we create the
LC PHI node for the reduction result?

> > > +		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > +		     gsi_next (&gsi), gsi_next (&gsi1))
> > > +		  {
> > > +		    /* There really should be a function to just get the number
> > > +		       of phis inside a bb.  */
> > > +		    if (phi && phi == gsi.phi ())
> > > +		      {
> > > +			gphi *phi1 = gsi1.phi ();
> > > +			SET_PHI_ARG_DEF (phi, exit->dest_idx,
> > > +					 PHI_RESULT (phi1));

I think we know the header PHI of a reduction perfectly well, there 
shouldn't be the need to "search" for it.

> > > +			break;
> > > +		      }
> > > +		  }
> > > +	      }
> > > +	  }
> > >        gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > >      }
> > >
> > > @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info *vinfo,
> > >  	   new_tree = lane_extract <vec_lhs', ...>;
> > >  	   lhs' = new_tree;  */
> > >
> > > +      /* When vectorizing an early break, any live statement that is used
> > > +	 outside of the loop are dead.  The loop will never get to them.
> > > +	 We could change the liveness value during analysis instead but since
> > > +	 the below code is invalid anyway just ignore it during codegen.  */
> > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +	return true;

But what about the value that's live across the main exit when the 
epilogue is not entered?

> > > +
> > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > >        basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > >        gcc_assert (single_pred_p (exit_bb));
> > > @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > >       versioning.   */
> > >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > -  if (! single_pred_p (e->dest))
> > > +  if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> > (loop_vinfo))

e can be NULL here?  I think we should reject such loops earlier.

> > >      {
> > >        split_loop_exit_edge (e, true);
> > >        if (dump_enabled_p ())
> > > @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > >      {
> > >        e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > -      if (! single_pred_p (e->dest))
> > > +      if (e && ! single_pred_p (e->dest))
> > >  	{
> > >  	  split_loop_exit_edge (e, true);
> > >  	  if (dump_enabled_p ())
> > > @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > >
> > >    /* Loops vectorized with a variable factor won't benefit from
> > >       unrolling/peeling.  */

update the comment?  Why would we unroll a VLA loop with early breaks?
Or did you mean to use || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)?

> > > -  if (!vf.is_constant ())
> > > +  if (!vf.is_constant ()
> > > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >      {
> > >        loop->unroll = 1;
> > >        if (dump_enabled_p ())
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index
> > 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82af
> > e78ee2a7c627be45b 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > loop_vec_info loop_vinfo,
> > >    *live_p = false;
> > >
> > >    /* cond stmt other than loop exit cond.  */
> > > -  if (is_ctrl_stmt (stmt_info->stmt)
> > > -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> > > -    *relevant = vect_used_in_scope;

how was that ever hit before?  For outer loop processing with outer loop
vectorization?

> > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > +    {
> > > +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge, but
> > > +	 it looks like loop_manip doesn't do that..  So we have to do it
> > > +	 the hard way.  */
> > > +      basic_block bb = gimple_bb (stmt_info->stmt);
> > > +      bool exit_bb = false, early_exit = false;
> > > +      edge_iterator ei;
> > > +      edge e;
> > > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > > +        if (!flow_bb_inside_loop_p (loop, e->dest))
> > > +	  {
> > > +	    exit_bb = true;
> > > +	    early_exit = loop->vec_loop_iv->src != bb;
> > > +	    break;
> > > +	  }
> > > +
> > > +      /* We should have processed any exit edge, so an edge not an early
> > > +	 break must be a loop IV edge.  We need to distinguish between the
> > > +	 two as we don't want to generate code for the main loop IV.  */
> > > +      if (exit_bb)
> > > +	{
> > > +	  if (early_exit)
> > > +	    *relevant = vect_used_in_scope;
> > > +	}

I wonder why you can't simply do

         if (is_ctrl_stmt (stmt_info->stmt)
             && stmt_info->stmt != LOOP_VINFO_COND (loop_info))

?

> > > +      else if (bb->loop_father == loop)
> > > +	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;

so for control flow not exiting the loop you can check
loop_exits_from_bb_p ().

> > > +    }
> > >
> > >    /* changing memory.  */
> > >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> > > @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > loop_vec_info loop_vinfo,
> > >  	*relevant = vect_used_in_scope;
> > >        }
> > >
> > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > +  auto_bitmap exit_bbs;
> > > +  for (edge exit : exits)
> > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > +
> > >    /* uses outside the loop.  */
> > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > SSA_OP_DEF)
> > >      {
> > > @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > loop_vec_info loop_vinfo,
> > >  	      /* We expect all such uses to be in the loop exit phis
> > >  		 (because of loop closed form)   */
> > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > > -	      gcc_assert (bb == single_exit (loop)->dest);
> > > +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));

That now becomes quite expensive checking already covered by the LC SSA 
verifier so I suggest to simply drop this assert instead.

> > >                *live_p = true;
> > >  	    }
> > > @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized
> > (loop_vec_info loop_vinfo, bool *fatal)
> > >  	}
> > >      }
> > >
> > > +  /* Ideally this should be in vect_analyze_loop_form but we haven't seen all
> > > +     the conds yet at that point and there's no quick way to retrieve them.  */
> > > +  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
> > > +    return opt_result::failure_at (vect_location,
> > > +				   "not vectorized:"
> > > +				   " unsupported control flow in loop.\n");

so we didn't do this before?  But see above where I wondered.  So when 
does this hit with early exits and why can't we check for this in
vect_verify_loop_form?

> > > +
> > >    /* 2. Process_worklist */
> > >    while (worklist.length () > 0)
> > >      {
> > > @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized
> > (loop_vec_info loop_vinfo, bool *fatal)
> > >  			return res;
> > >  		    }
> > >                   }
> > > +	    }
> > > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> > > +	    {
> > > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > > +	      opt_result res
> > > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > +			       loop_vinfo, relevant, &worklist, false);
> > > +	      if (!res)
> > > +		return res;
> > > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > +				loop_vinfo, relevant, &worklist, false);
> > > +	      if (!res)
> > > +		return res;
> > >              }
> > >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> > >  	    {
> > > @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > >  			     node_instance, cost_vec);
> > >        if (!res)
> > >  	return res;
> > > -   }
> > > +    }
> > > +
> > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > >
> > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > >      {
> > >        case vect_internal_def:
> > > +      case vect_early_exit_def:
> > >          break;
> > >
> > >        case vect_reduction_def:
> > > @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > >      {
> > >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> > >        *need_to_vectorize = true;
> > >      }
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > index
> > ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b3
> > 40baff61d18da8e242dd 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -63,6 +63,7 @@ enum vect_def_type {
> > >    vect_internal_def,
> > >    vect_induction_def,
> > >    vect_reduction_def,
> > > +  vect_early_exit_def,

can you avoid putting this inbetween reduction and double reduction 
please?  Just put it before vect_unknown_def_type.  In fact the COND
isn't a def ... maybe we should have pattern recogized

 if (a < b) exit;

as

 cond = a < b;
 if (cond != 0) exit;

so the part that we need to vectorize is more clear.

> > >    vect_double_reduction_def,
> > >    vect_nested_cycle,
> > >    vect_first_order_recurrence,
> > > @@ -876,6 +877,13 @@ public:
> > >       we need to peel off iterations at the end to form an epilogue loop.  */
> > >    bool peeling_for_niter;
> > >
> > > +  /* When the loop has early breaks that we can vectorize we need to peel
> > > +     the loop for the break finding loop.  */
> > > +  bool early_breaks;
> > > +
> > > +  /* When the loop has a non-early break control flow inside.  */
> > > +  bool non_break_control_flow;
> > > +
> > >    /* List of loop additional IV conditionals found in the loop.  */
> > >    auto_vec<gcond *> conds;
> > >
> > > @@ -985,9 +993,11 @@ public:
> > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > >  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> > >early_break_conflict
> > >  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> > >  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> > > +#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)-
> > >non_break_control_flow
> > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > >no_data_dependencies
> > > @@ -1038,8 +1048,8 @@ public:
> > >     stack.  */
> > >  typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
> > >
> > > -inline loop_vec_info
> > > -loop_vec_info_for_loop (class loop *loop)
> > > +static inline loop_vec_info
> > > +loop_vec_info_for_loop (const class loop *loop)
> > >  {
> > >    return (loop_vec_info) loop->aux;
> > >  }
> > > @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
> > >  {
> > >    if (bb == (bb->loop_father)->header)
> > >      return true;
> > > -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> > > +
> > >    return false;
> > >  }
> > >
> > > @@ -2176,9 +2186,10 @@ class auto_purge_vect_location
> > >     in tree-vect-loop-manip.cc.  */
> > >  extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > >  				     tree, tree, tree, bool);
> > > -extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge);
> > > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> > const_edge);
> > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > > -						     class loop *, edge);
> > > +						    class loop *, edge, bool,
> > > +						    vec<basic_block> * = NULL);
> > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);
> > >  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > >  				    tree *, tree *, tree *, int, bool, bool,
> > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > index
> > a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c
> > 67f29f5dc9a8d72235f4 100644
> > > --- a/gcc/tree-vectorizer.cc
> > > +++ b/gcc/tree-vectorizer.cc
> > > @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
> > >  	 predicates that need to be shared for optimal predicate usage.
> > >  	 However reassoc will re-order them and prevent CSE from working
> > >  	 as it should.  CSE only the loop body, not the entry.  */
> > > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);

seeing this more and more I think we want a simple way to iterate over
all exits without copying to a vector when we have them recorded.  My
C++ fu is too limited to support

  for (auto exit : recorded_exits (loop))
    ...

(maybe that's enough for somebody to jump onto this ;))

Don't treat all review comments as change orders, but it should be clear
the code isn't 100% obvious.  Maybe the patch can be simplified by
splitting out the LC SSA cleanup parts.

Thanks,
Richard.

> > > +      for (edge exit : exits)
> > > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> > >
> > >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> > >        do_rpo_vn (fun, entry, exit_bbs);
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > Nuernberg,
> > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > Moerman;
> > HRB 36809 (AG Nuernberg)
> 

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
  2023-07-14 13:34       ` Richard Biener
@ 2023-07-17 10:56         ` Tamar Christina
  2023-07-17 12:48           ` Richard Biener
  2023-08-18 11:35         ` Tamar Christina
  2023-10-23 20:21         ` Tamar Christina
  2 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-07-17 10:56 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw



> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, July 14, 2023 2:35 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Thu, 13 Jul 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Thursday, July 13, 2023 6:31 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> jlaw@ventanamicro.com
> > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > > updates for early break.
> > >
> > > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > > The rewrite also naturally takes into account multiple exits and so it didn't
> > > > make sense to split them off.
> > > >
> > > > For the purposes of peeling the only change for multiple exits is that the
> > > > secondary exits are all wired to the start of the new loop preheader when
> > > doing
> > > > epilogue peeling.
> > > >
> > > > When doing prologue peeling the CFG is kept in tact.
> > > >
> > > > For both epilogue and prologue peeling we wire through between the
> two
> > > loops any
> > > > PHI nodes that escape the first loop into the second loop if flow_loops is
> > > > specified.  The reason for this conditionality is because
> > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
> > > >   - prologue peeling
> > > >   - epilogue peeling
> > > >   - loop distribution
> > > >
> > > > for the last case the loops should remain independent, and so not be
> > > connected.
> > > > Because of this propagation of only used phi nodes get_current_def can
> be
> > > used
> > > > to easily find the previous definitions.  However live statements that are
> > > > not used inside the loop itself are not propagated (since if unused, the
> > > moment
> > > > we add the guard in between the two loops the value across the bypass
> edge
> > > can
> > > > be wrong if the loop has been peeled.)
> > > >
> > > > This is dealt with easily enough in find_guard_arg.
> > > >
> > > > For multiple exits, while we are in LCSSA form, and have a correct DOM
> tree,
> > > the
> > > > moment we add the guard block we will change the dominators again.  To
> > > deal with
> > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the
> blocks
> > > to
> > > > update without having to recompute the list of blocks to update again.
> > > >
> > > > When multiple exits and doing epilogue peeling we will also temporarily
> have
> > > an
> > > > incorrect VUSES chain for the secondary exits as it anticipates the final
> result
> > > > after the VDEFs have been moved.  This will thus be corrected once the
> code
> > > > motion is applied.
> > > >
> > > > Lastly by doing things this way we can remove the helper functions that
> > > > previously did lock step iterations to update things as it went along.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > Not sure if I get through all of this in one go - so be prepared that
> > > the rest of the review follows another day.
> >
> > No worries, I appreciate the reviews!
> > Just giving some quick replies for when you continue.
> 
> Continueing.
> 
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* tree-loop-distribution.cc (copy_loop_before): Pass flow_loops =
> > > false.
> > > > 	* tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
> > > > 	* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > > additional
> > > > 	assert.
> > > > 	(vect_set_loop_condition_normal): Skip modifying loop IV for multiple
> > > > 	exits.
> > > > 	(slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> > > peeling.
> > > > 	(slpeel_can_duplicate_loop_p): Likewise.
> > > > 	(vect_update_ivs_after_vectorizer): Don't enter this...
> > > > 	(vect_update_ivs_after_early_break): ...but instead enter here.
> > > > 	(find_guard_arg): Update for new peeling code.
> > > > 	(slpeel_update_phi_nodes_for_loops): Remove.
> > > > 	(slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> > > checks.
> > > > 	(slpeel_update_phi_nodes_for_lcssa): Remove.
> > > > 	(vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > > > 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > > > 	non_break_control_flow and early_breaks.
> > > > 	(vect_need_peeling_or_partial_vectors_p): Force partial vector if
> > > > 	multiple exits and VLA.
> > > > 	(vect_analyze_loop_form): Support inner loop multiple exits.
> > > > 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > > > 	(vect_create_epilog_for_reduction):  Update live phi nodes.
> > > > 	(vectorizable_live_operation): Ignore live operations in vector loop
> > > > 	when multiple exits.
> > > > 	(vect_transform_loop): Force unrolling for VF loops and multiple exits.
> > > > 	* tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
> > > > 	(vect_mark_stmts_to_be_vectorized): Check for non-exit control flow
> > > and
> > > > 	analyze gcond params.
> > > > 	(vect_analyze_stmt): Support gcond.
> > > > 	* tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
> > > > 	in RPO pass.
> > > > 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > > > 	(LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW):
> > > New.
> > > > 	(loop_vec_info_for_loop): Change to const and static.
> > > > 	(is_loop_header_bb_p): Drop assert.
> > > > 	(slpeel_can_duplicate_loop_p): Update prototype.
> > > > 	(class loop): Add early_breaks and non_break_control_flow.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
> > > > index
> > >
> 97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40
> > > d833a968314a4442b9e 100644
> > > > --- a/gcc/tree-loop-distribution.cc
> > > > +++ b/gcc/tree-loop-distribution.cc
> > > > @@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool
> > > redirect_lc_phi_defs)
> > > >    edge preheader = loop_preheader_edge (loop);
> > > >
> > > >    initialize_original_copy_tables ();
> > > > -  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
> > > > +  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader,
> > > false);
> > > >    gcc_assert (res != NULL);
> > > >
> > > >    /* When a not last partition is supposed to keep the LC PHIs computed
> > > > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > > > index
> > >
> 5d398b67e68c7076760854119590f18b19c622b6..79686f6c4945b7139ba
> > > 377300430c04b7aeefe6c 100644
> > > > --- a/gcc/tree-ssa-loop-niter.cc
> > > > +++ b/gcc/tree-ssa-loop-niter.cc
> > > > @@ -3072,7 +3072,12 @@ loop_only_exit_p (const class loop *loop,
> > > basic_block *body, const_edge exit)
> > > >    gimple_stmt_iterator bsi;
> > > >    unsigned i;
> > > >
> > > > -  if (exit != single_exit (loop))
> > > > +  /* We need to check for alternative exits since exit can be NULL.  */
> > >
> > > You mean we pass in exit == NULL in some cases?  I'm not sure what
> > > the desired behavior in that case is - can you point out the
> > > callers you are fixing here?
> > >
> > > I think we should add gcc_assert (exit != nullptr)
> > >
> > > >    for (i = 0; i < loop->num_nodes; i++)
> > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > > index
> > >
> 6b93fb3f9af8f2bbdf5dec28f0009177aa5171ab..550d7f40002cf0b58f8a92
> > > 7cb150edd7c2aa9999 100644
> > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > @@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple
> *update_phi,
> > > edge e, tree new_def)
> > > >  {
> > > >    tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
> > > >
> > > > +  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
> > > > +	      || orig_def != new_def);
> > > > +
> > > >    SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
> > > >
> > > >    if (MAY_HAVE_DEBUG_BIND_STMTS)
> > > > @@ -1292,7 +1295,8 @@ vect_set_loop_condition_normal
> (loop_vec_info
> > > loop_vinfo,
> > > >    gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
> > > >
> > > >    /* Record the number of latch iterations.  */
> > > > -  if (limit == niters)
> > > > +  if (limit == niters
> > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      /* Case A: the loop iterates NITERS times.  Subtract one to get the
> > > >         latch count.  */
> > > >      loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> > > > @@ -1303,7 +1307,13 @@ vect_set_loop_condition_normal
> > > (loop_vec_info loop_vinfo,
> > > >      loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type,
> > > >  				       limit, step);
> > > >
> > > > -  if (final_iv)
> > > > +  /* For multiple exits we've already maintained LCSSA form and handled
> > > > +     the scalar iteration update in the code that deals with the merge
> > > > +     block and its updated guard.  I could move that code here instead
> > > > +     of in vect_update_ivs_after_early_break but I have to still deal
> > > > +     with the updates to the counter `i`.  So for now I'll keep them
> > > > +     together.  */
> > > > +  if (final_iv && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      {
> > > >        gassign *assign;
> > > >        edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > @@ -1509,11 +1519,19 @@ vec_init_exit_info (class loop *loop)
> > > >     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
> > > >     non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy
> the
> > > >     basic blocks from SCALAR_LOOP instead of LOOP, but to either the
> > > > -   entry or exit of LOOP.  */
> > > > +   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to
> > > SCALAR_LOOP as a
> > > > +   continuation.  This is correct for cases where one loop continues from
> the
> > > > +   other like in the vectorizer, but not true for uses in e.g. loop
> distribution
> > > > +   where the loop is duplicated and then modified.
> > > > +
> > >
> > > but for loop distribution the flow also continues?  I'm not sure what you
> > > are refering to here.  Do you by chance have a branch with the patches
> > > installed?
> >
> > Yup, they're at refs/users/tnfchris/heads/gcc-14-early-break in the repo.
> >
> > >
> > > > +   If UPDATED_DOMS is not NULL it is update with the list of basic blocks
> > > whoms
> > > > +   dominators were updated during the peeling.  */
> > > >
> > > >  class loop *
> > > >  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
> > > > -					class loop *scalar_loop, edge e)
> > > > +					class loop *scalar_loop, edge e,
> > > > +					bool flow_loops,
> > > > +					vec<basic_block> *updated_doms)
> > > >  {
> > > >    class loop *new_loop;
> > > >    basic_block *new_bbs, *bbs, *pbbs;
> > > > @@ -1602,6 +1620,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class
> > > loop *loop,
> > > >    for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++)
> > > >      rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
> > > >
> > > > +  /* Rename the exit uses.  */
> > > > +  for (edge exit : get_loop_exit_edges (new_loop))
> > > > +    for (auto gsi = gsi_start_phis (exit->dest);
> > > > +	 !gsi_end_p (gsi); gsi_next (&gsi))
> > > > +      {
> > > > +	tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit);
> > > > +	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit));
> > > > +	if (MAY_HAVE_DEBUG_BIND_STMTS)
> > > > +	  adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest);
> > > > +      }
> > > > +
> > > > +  /* This condition happens when the loop has been versioned. e.g. due
> to
> > > ifcvt
> > > > +     versioning the loop.  */
> > > >    if (scalar_loop != loop)
> > > >      {
> > > >        /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs
> from
> > > > @@ -1616,28 +1647,106 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > > (class loop *loop,
> > > >  						EDGE_SUCC (loop->latch, 0));
> > > >      }
> > > >
> > > > +  vec<edge> alt_exits = loop->vec_loop_alt_exits;
> > >
> > > So 'e' is not one of alt_exits, right?  I wonder if we can simply
> > > compute the vector from all exits of 'loop' and removing 'e'?
> > >
> > > > +  bool multiple_exits_p = !alt_exits.is_empty ();
> > > > +  auto_vec<basic_block> doms;
> > > > +  class loop *update_loop = NULL;
> > > > +
> > > >    if (at_exit) /* Add the loop copy at exit.  */
> > > >      {
> > > > -      if (scalar_loop != loop)
> > > > +      if (scalar_loop != loop && new_exit->dest != exit_dest)
> > > >  	{
> > > > -	  gphi_iterator gsi;
> > > >  	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
> > > > +	  flush_pending_stmts (new_exit);
> > > > +	}
> > > >
> > > > -	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
> > > > -	       gsi_next (&gsi))
> > > > +      auto loop_exits = get_loop_exit_edges (loop);
> > > > +      for (edge exit : loop_exits)
> > > > +	redirect_edge_and_branch (exit, new_preheader);
> > > > +
> > > > +
> > >
> > > one line vertical space too much
> > >
> > > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > > +	 block and the new loop header.  This allows us to later split the
> > > > +	 preheader block and still find the right LC nodes.  */
> > > > +      edge latch_new = single_succ_edge (new_preheader);
> > > > +      edge latch_old = loop_latch_edge (loop);
> > > > +      hash_set <tree> lcssa_vars;
> > > > +      for (auto gsi_from = gsi_start_phis (latch_old->dest),
> > >
> > > so that's loop->header (and makes it more clear which PHI nodes you are
> > > looking at)
> 
> 
> So I'm now in a debug session - I think that conceptually it would
> make more sense to create the LC PHI nodes that are present at the
> old exit destination in the new preheader _before_ you redirect them
> above and then flush_pending_stmts after redirecting, that should deal
> with the copying.
> 

This was the first thing I tried, however as soon as you redirect one edge
you destroy all other phi nodes on the original block.

As in if I have 3 phi nodes, I need to move them all at the same time, which
brings the next problem in that I can't add more entries to a phi than it has
incoming edges.  That is I can't just make the final phi nodes on the destination
without having an edge for it.

And to make an edge for it I need to have a condition to attach to the edge.
To work around it I tried maintaining a cache of the nodes I need to make on
the new destination and after redirecting just create them,  but that has me
looping over the same PHIs multiple times.

Any suggestions?

> Now, your copying actually iterates over all PHIs in the loop _header_,
> so it doesn't actually copy LC PHI nodes but possibly creates additional
> ones.  The intent does seem to do this since you want a different value
> on those edges for all but the main loop exit.  But then the
> overall comments should better reflect that and maybe you should
> do what I suggested anyway and have this loop alter only the alternate
> exit LC PHIs?

It does create the LC PHI nodes for all exits, it's just that for alternate exits
all the nodes are the same since we only care about the value for full
iterations. 

I'm not sure what you're suggesting with alter only the alternative exits.
I need to still create the one for the main loop and seems easier to do them
In one loop rather than two?

Doing this here allows the removal of all the code later on that the vectorizer
uses to try to find the main exit's PHIs.

> 
> If you don't flush_pending_stmts on an edge after redirecting you
> should call redirect_edge_var_map_clear (edge), otherwise the stale
> info might break things later.
> 
> > > > +	   gsi_to = gsi_start_phis (latch_new->dest);
> > >
> > > likewise new_loop->header
> > >
> > > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > > +	{
> > > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, latch_old);
> > > > +	  /* In all cases, even in early break situations we're only
> > > > +	     interested in the number of fully executed loop iters.  As such
> > > > +	     we discard any partially done iteration.  So we simply propagate
> > > > +	     the phi nodes from the latch to the merge block.  */
> > > > +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > > > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > > > +
> > > > +	  lcssa_vars.add (new_arg);
> > > > +
> > > > +	  /* Main loop exit should use the final iter value.  */
> > > > +	  add_phi_arg (lcssa_phi, new_arg, loop->vec_loop_iv,
> > > UNKNOWN_LOCATION);
> > >
> > > above you are creating the PHI node at e->dest but here add the PHI arg to
> > > loop->vec_loop_iv - that's 'e' here, no?  Consistency makes it easier
> > > to follow.  I _think_ this code doesn't need to know about the "special"
> > > edge.
> > >
> > > > +
> > > > +	  /* All other exits use the previous iters.  */
> > > > +	  for (edge e : alt_exits)
> > > > +	    add_phi_arg (lcssa_phi, gimple_phi_result (from_phi), e,
> > > > +			 UNKNOWN_LOCATION);
> > > > +
> > > > +	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
> > > > +	}
> > > > +
> > > > +      /* Copy over any live SSA vars that may not have been materialized in
> > > the
> > > > +	 loops themselves but would be in the exit block.  However when the
> > > live
> > > > +	 value is not used inside the loop then we don't need to do this,  if we
> > > do
> > > > +	 then when we split the guard block the branch edge can end up
> > > containing the
> > > > +	 wrong reference,  particularly if it shares an edge with something that
> > > has
> > > > +	 bypassed the loop.  This is not something peeling can check so we
> > > need to
> > > > +	 anticipate the usage of the live variable here.  */
> > > > +      auto exit_map = redirect_edge_var_map_vector (exit);
> > >
> > > Hmm, did I use that in my attemt to refactor things? ...
> >
> > Indeed, I didn't always use it, but found it was the best way to deal with the
> > variables being live in various BB after the loop.
> 
> As said this whole piece of code is possibly more complicated than
> necessary.  First copying/creating the PHI nodes that are present
> at the exit (the old LC PHI nodes), then redirecting edges and flushing
> stmts should deal with half of this.
>
> > >
> > > > +      if (exit_map)
> > > > +        for (auto vm : exit_map)
> > > > +	{
> > > > +	  if (lcssa_vars.contains (vm.def)
> > > > +	      || TREE_CODE (vm.def) != SSA_NAME)
> > >
> > > the latter check is cheaper so it should come first
> > >
> > > > +	    continue;
> > > > +
> > > > +	  imm_use_iterator imm_iter;
> > > > +	  use_operand_p use_p;
> > > > +	  bool use_in_loop = false;
> > > > +
> > > > +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def)
> > > >  	    {
> > > > -	      gphi *phi = gsi.phi ();
> > > > -	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> > > > -	      location_t orig_locus
> > > > -		= gimple_phi_arg_location_from_edge (phi, e);
> > > > +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> > > > +	      if (flow_bb_inside_loop_p (loop, bb)
> > > > +		  && !gimple_vuse (USE_STMT (use_p)))
> 
> what's this gimple_vuse check?  I see now for vect-early-break_17.c this
> code triggers and ignores
> 
>   vect_b[i_18] = _2;
> 
> > > > +		{
> > > > +		  use_in_loop = true;
> > > > +		  break;
> > > > +		}
> > > > +	    }
> > > >
> > > > -	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
> > > > +	  if (!use_in_loop)
> > > > +	    {
> > > > +	       /* Do a final check to see if it's perhaps defined in the loop.  This
> > > > +		  mirrors the relevancy analysis's used_outside_scope.  */
> > > > +	      gimple *stmt = SSA_NAME_DEF_STMT (vm.def);
> > > > +	      if (!stmt || !flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
> > > > +		continue;
> > > >  	    }
> 
> since the def was on a LC PHI the def should always be defined inside the
> loop.
> 
> > > > +
> > > > +	  tree new_res = copy_ssa_name (vm.result);
> > > > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > > > +	  for (edge exit : loop_exits)
> > > > +	     add_phi_arg (lcssa_phi, vm.def, exit, vm.locus);
> > >
> > > not sure what you are doing above - I guess I have to play with it
> > > in a debug session.
> >
> > Yeah if you comment it out one of the testcases should fail.
> 
> using new_preheader instead of e->dest would make things clearer.
> 
> You are now adding the same arg to every exit (you've just queried the
> main exit redirect_edge_var_map_vector).
> 
> OK, so I think I understand what you're doing.  If I understand
> correctly we know that when we exit the main loop via one of the
> early exits we are definitely going to enter the epilog but when
> we take the main exit we might not.

Correct

> 
> Looking at the CFG we create currently this isn't reflected and
> this complicates this PHI node updating.  What I'd try to do
> is leave redirecting the alternate exits until after
> slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> means leaving it almost unchanged besides the LC SSA maintaining
> changes.  After that for the multi-exit case split the
> epilog preheader edge and redirect all the alternate exits to the
> new preheader.  So the CFG becomes
> 
>                  <original loop>
>                 /      |
>                /    <main exit w/ original LC PHI>
>               /      if (epilog)
>    alt exits /        /  \
>             /        /    loop around
>             |       /
>            preheader with "header" PHIs
>               |
>           <epilog>
> 
> note you need the header PHIs also on the main exit path but you
> only need the loop end PHIs there.

Ah, hadn't considered this one.  I'll give it a try.

> 
> It seems so that at least currently the order of things makes
> them more complicated than necessary.

Possibly yeah, there's a lot of work to maintain the dominators
and phi nodes because the exits are rewritten early.

I'll give this a go!

> 
> > >
> > > >  	}
> > > > -      redirect_edge_and_branch_force (e, new_preheader);
> > > > -      flush_pending_stmts (e);
> > > > +
> > > >        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e-
> >src);
> > > > -      if (was_imm_dom || duplicate_outer_loop)
> > > > +
> > > > +      if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p)
> > > >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit-
> > > >src);
> > > >
> > > >        /* And remove the non-necessary forwarder again.  Keep the other
> > > > @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class
> > > loop *loop,
> > > >        delete_basic_block (preheader);
> > > >        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
> > > >  			       loop_preheader_edge (scalar_loop)->src);
> > > > +
> > > > +      /* Finally after wiring the new epilogue we need to update its main
> exit
> > > > +	 to the original function exit we recorded.  Other exits are already
> > > > +	 correct.  */
> > > > +      if (multiple_exits_p)
> > > > +	{
> > > > +	  for (edge e : get_loop_exit_edges (loop))
> > > > +	    doms.safe_push (e->dest);
> > > > +	  update_loop = new_loop;
> > > > +	  doms.safe_push (exit_dest);
> > > > +
> > > > +	  /* Likely a fall-through edge, so update if needed.  */
> > > > +	  if (single_succ_p (exit_dest))
> > > > +	    doms.safe_push (single_succ (exit_dest));
> > > > +	}
> > > >      }
> > > >    else /* Add the copy at entry.  */
> > > >      {
> > > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > > +	 block and the new loop header.  This allows us to later split the
> > > > +	 preheader block and still find the right LC nodes.  */
> > > > +      edge old_latch_loop = loop_latch_edge (loop);
> > > > +      edge old_latch_init = loop_preheader_edge (loop);
> > > > +      edge new_latch_loop = loop_latch_edge (new_loop);
> > > > +      edge new_latch_init = loop_preheader_edge (new_loop);
> > > > +      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),
> > >
> > > see above
> > >
> > > > +	   gsi_to = gsi_start_phis (old_latch_loop->dest);
> > > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > > +	{
> > > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > > new_latch_loop);
> > > > +	  adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg);
> > > > +	}
> > > > +
> > > >        if (scalar_loop != loop)
> > > >  	{
> > > >  	  /* Remove the non-necessary forwarder of scalar_loop again.  */
> > > > @@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class
> > > loop *loop,
> > > >        delete_basic_block (new_preheader);
> > > >        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
> > > >  			       loop_preheader_edge (new_loop)->src);
> > > > +
> > > > +      if (multiple_exits_p)
> > > > +	update_loop = loop;
> > > >      }
> > > >
> > > > -  if (scalar_loop != loop)
> > > > +  if (multiple_exits_p)
> > > >      {
> > > > -      /* Update new_loop->header PHIs, so that on the preheader
> > > > -	 edge they are the ones from loop rather than scalar_loop.  */
> > > > -      gphi_iterator gsi_orig, gsi_new;
> > > > -      edge orig_e = loop_preheader_edge (loop);
> > > > -      edge new_e = loop_preheader_edge (new_loop);
> > > > -
> > > > -      for (gsi_orig = gsi_start_phis (loop->header),
> > > > -	   gsi_new = gsi_start_phis (new_loop->header);
> > > > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > > > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > > > +      for (edge e : get_loop_exit_edges (update_loop))
> > > >  	{
> > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > -	  gphi *new_phi = gsi_new.phi ();
> > > > -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > > > -	  location_t orig_locus
> > > > -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > > > -
> > > > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > > > +	  edge ex;
> > > > +	  edge_iterator ei;
> > > > +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> > > > +	    {
> > > > +	      /* Find the first non-fallthrough block as fall-throughs can't
> > > > +		 dominate other blocks.  */
> > > > +	      while ((ex->flags & EDGE_FALLTHRU)
> 
> For the prologue peeling any early exit we take would skip all other
> loops so we can simply leave them and their LC PHI nodes in place.
> We need extra PHIs only on the path to the main vector loop.  I
> think the comment isn't accurately reflecting what we do.  In
> fact we do not add any LC PHI nodes here but simply adjust the
> main loop header PHI arguments?

Yeah we don't create any new nodes after peeling in this version.

> 
> > > I don't think EDGE_FALLTHRU is set correctly, what's wrong with
> > > just using single_succ_p here?  A fallthru edge src dominates the
> > > fallthru edge dest, so the sentence above doesn't make sense.
> >
> > I wanted to say, that the immediate dominator of a block is never
> > an fall through block.  At least from what I understood from how
> > the dominators are calculated in the code, though may have missed
> > something.
> 
>  BB1
>   |
>  BB2
>   |
>  BB3
> 
> here the immediate dominator of BB3 is BB2 and that of BB2 is BB1.
> 
> > >
> > > > +		     && single_succ_p (ex->dest))
> > > > +		{
> > > > +		  doms.safe_push (ex->dest);
> > > > +		  ex = single_succ_edge (ex->dest);
> > > > +		}
> > > > +	      doms.safe_push (ex->dest);
> > > > +	    }
> > > > +	  doms.safe_push (e->dest);
> > > >  	}
> > > > -    }
> > > >
> > > > +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > +      if (updated_doms)
> > > > +	updated_doms->safe_splice (doms);
> > > > +    }
> > > >    free (new_bbs);
> > > >    free (bbs);
> > > >
> > > > @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const
> > > loop_vec_info loop_vinfo, const_edge e)
> > > >    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> > > >    unsigned int num_bb = loop->inner? 5 : 2;
> > > >
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
> > > > +
> > >
> > > I think checking the number of BBs is odd, I don't remember anything
> > > in slpeel is specifically tied to that?  I think we can simply drop
> > > this or do you remember anything that would depend on ->num_nodes
> > > being only exactly 5 or 2?
> >
> > Never actually seemed to require it, but they're used as some check to
> > see if there are unexpected control flow in the loop.
> >
> > i.e. this would say no if you have an if statement in the loop that wasn't
> > converted.  The other part of this and the accompanying explanation is in
> > vect_analyze_loop_form.  In the patch series I had to remove the hard
> > num_nodes == 2 check from there because number of nodes restricted
> > things too much.  If you have an empty fall through block, which seems to
> > happen often between the main exit and the latch block then we'd not
> > vectorize.
> >
> > So instead I now rejects loops after analyzing the gcond.  So think this check
> > can go/needs to be different.
> 
> Lets remove it from this function then.

Ok, I can remove it from the outerloop vect then too, since it's mostly dead code.
Will do so and reg-test that as well.

> 
> > >
> > > >    /* All loops have an outer scope; the only case loop->outer is NULL is for
> > > >       the function itself.  */
> > > >    if (!loop_outer (loop)
> > > > @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer
> > > (loop_vec_info loop_vinfo,
> > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > >    basic_block update_bb = update_e->dest;
> > > >
> > > > +  /* For early exits we'll update the IVs in
> > > > +     vect_update_ivs_after_early_break.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    return;
> > > > +
> > > >    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > >
> > > >    /* Make sure there exists a single-predecessor exit bb:  */
> > > > @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer
> > > (loop_vec_info loop_vinfo,
> > > >        /* Fix phi expressions in the successor bb.  */
> > > >        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > >      }
> > > > +  return;
> > >
> > > we don't usually place a return at the end of void functions
> > >
> > > > +}
> > > > +
> > > > +/*   Function vect_update_ivs_after_early_break.
> > > > +
> > > > +     "Advance" the induction variables of LOOP to the value they should
> take
> > > > +     after the execution of LOOP.  This is currently necessary because the
> > > > +     vectorizer does not handle induction variables that are used after the
> > > > +     loop.  Such a situation occurs when the last iterations of LOOP are
> > > > +     peeled, because of the early exit.  With an early exit we always peel
> the
> > > > +     loop.
> > > > +
> > > > +     Input:
> > > > +     - LOOP_VINFO - a loop info structure for the loop that is going to be
> > > > +		    vectorized. The last few iterations of LOOP were peeled.
> > > > +     - LOOP - a loop that is going to be vectorized. The last few iterations
> > > > +	      of LOOP were peeled.
> > > > +     - VF - The loop vectorization factor.
> > > > +     - NITERS_ORIG - the number of iterations that LOOP executes (before
> it is
> > > > +		     vectorized). i.e, the number of times the ivs should be
> > > > +		     bumped.
> > > > +     - NITERS_VECTOR - The number of iterations that the vector LOOP
> > > executes.
> > > > +     - UPDATE_E - a successor edge of LOOP->exit that is on the (only)
> path
> > > > +		  coming out from LOOP on which there are uses of the LOOP
> > > ivs
> > > > +		  (this is the path from LOOP->exit to epilog_loop->preheader).
> > > > +
> > > > +		  The new definitions of the ivs are placed in LOOP->exit.
> > > > +		  The phi args associated with the edge UPDATE_E in the bb
> > > > +		  UPDATE_E->dest are updated accordingly.
> > > > +
> > > > +     Output:
> > > > +       - If available, the LCSSA phi node for the loop IV temp.
> > > > +
> > > > +     Assumption 1: Like the rest of the vectorizer, this function assumes
> > > > +     a single loop exit that has a single predecessor.
> > > > +
> > > > +     Assumption 2: The phi nodes in the LOOP header and in update_bb
> are
> > > > +     organized in the same order.
> > > > +
> > > > +     Assumption 3: The access function of the ivs is simple enough (see
> > > > +     vect_can_advance_ivs_p).  This assumption will be relaxed in the
> future.
> > > > +
> > > > +     Assumption 4: Exactly one of the successors of LOOP exit-bb is on a
> path
> > > > +     coming out of LOOP on which the ivs of LOOP are used (this is the
> path
> > > > +     that leads to the epilog loop; other paths skip the epilog loop).  This
> > > > +     path starts with the edge UPDATE_E, and its destination (denoted
> > > update_bb)
> > > > +     needs to have its phis updated.
> > > > + */
> > > > +
> > > > +static tree
> > > > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class
> loop *
> > > epilog,
> > > > +				   poly_int64 vf, tree niters_orig,
> > > > +				   tree niters_vector, edge update_e)
> > > > +{
> > > > +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    return NULL;
> > > > +
> > > > +  gphi_iterator gsi, gsi1;
> > > > +  tree ni_name, ivtmp = NULL;
> > > > +  basic_block update_bb = update_e->dest;
> > > > +  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > +  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > +  basic_block exit_bb = loop_iv->dest;
> > > > +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > +  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
> > > > +
> > > > +  gcc_assert (cond);
> > > > +
> > > > +  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> (update_bb);
> > > > +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > +       gsi_next (&gsi), gsi_next (&gsi1))
> > > > +    {
> > > > +      tree init_expr, final_expr, step_expr;
> > > > +      tree type;
> > > > +      tree var, ni, off;
> > > > +      gimple_stmt_iterator last_gsi;
> > > > +
> > > > +      gphi *phi = gsi1.phi ();
> > > > +      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi,
> loop_preheader_edge
> > > (epilog));
> > >
> > > I'm confused about the setup.  update_bb looks like the block with the
> > > loop-closed PHI nodes of 'loop' and the exit (update_e)?  How does
> > > loop_preheader_edge (epilog) come into play here?  That would feed into
> > > epilog->header PHIs?!
> >
> > We can't query the type of the phis in the block with the LC PHI nodes, so the
> > Typical pattern seems to be that we iterate over a block that's part of the
> loop
> > and that would have the PHIs in the same order, just so we can get to the
> > stmt_vec_info.
> >
> > >
> > > It would be nice to name 'gsi[1]', 'update_e' and 'update_bb' in a
> > > better way?  Is update_bb really epilog->header?!
> > >
> > > We're missing checking in PHI_ARG_DEF_FROM_EDGE, namely that
> > > E->dest == gimple_bb (PHI) - we're just using E->dest_idx there
> > > which "works" even for totally unrelated edges.
> > >
> > > > +      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
> > > > +      if (!phi1)
> > >
> > > shouldn't that be an assert?
> > >
> > > > +	continue;
> > > > +      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> > > > +      if (dump_enabled_p ())
> > > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > > +			 "vect_update_ivs_after_early_break: phi: %G",
> > > > +			 (gimple *)phi);
> > > > +
> > > > +      /* Skip reduction and virtual phis.  */
> > > > +      if (!iv_phi_p (phi_info))
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > +			     "reduc or virtual phi. skip.\n");
> > > > +	  continue;
> > > > +	}
> > > > +
> > > > +      /* For multiple exits where we handle early exits we need to carry on
> > > > +	 with the previous IV as loop iteration was not done because we exited
> > > > +	 early.  As such just grab the original IV.  */
> > > > +      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge
> > > (loop));
> > >
> > > but this should be taken care of by LC SSA?
> >
> > It is, the comment is probably missing details, this part just scales the
> counter
> > from VF to scalar counts.  It's just a reminder that this scaling is done
> differently
> > from normal single exit vectorization.
> >
> > >
> > > OK, have to continue tomorrow from here.
> >
> > Cheers, Thank you!
> >
> > Tamar
> >
> > >
> > > Richard.
> > >
> > > > +      if (gimple_cond_lhs (cond) != phi_ssa
> > > > +	  && gimple_cond_rhs (cond) != phi_ssa)
> 
> so this is a way to avoid touching the main IV?  Looks a bit fragile to
> me.  Hmm, we're iterating over the main loop header PHIs here?
> Can't you check, say, the relevancy of the PHI node instead?  Though
> it might also be used as induction.  Can't it be used as alternate
> exit like
> 
>   for (i)
>    {
>      if (i & bit)
>        break;
>    }
> 
> and would we need to adjust 'i' then?

Hmm you're right, I could reject them based on the definition BB and the
location the main IV is, that's probably safest.

> 
> > > > +	{
> > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > +	  step_expr = unshare_expr (step_expr);
> > > > +
> > > > +	  /* We previously generated the new merged phi in the same BB as
> > > the
> > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > +	     normal loop phi which don't take the early breaks into account.  */
> > > > +	  final_expr = gimple_phi_result (phi1);
> > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (),
> > > loop_preheader_edge (loop));
> > > > +
> > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > +	  /* For early break the final loop IV is:
> > > > +	     init + (final - init) * vf which takes into account peeling
> > > > +	     values and non-single steps.  */
> > > > +	  off = fold_build2 (MINUS_EXPR, stype,
> > > > +			     fold_convert (stype, final_expr),
> > > > +			     fold_convert (stype, init_expr));
> > > > +	  /* Now adjust for VF to get the final iteration value.  */
> > > > +	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf));
> > > > +
> > > > +	  /* Adjust the value with the offset.  */
> > > > +	  if (POINTER_TYPE_P (type))
> > > > +	    ni = fold_build_pointer_plus (init_expr, off);
> > > > +	  else
> > > > +	    ni = fold_convert (type,
> > > > +			       fold_build2 (PLUS_EXPR, stype,
> > > > +					    fold_convert (stype, init_expr),
> > > > +					    off));
> > > > +	  var = create_tmp_var (type, "tmp");
> 
> so how does the non-early break code deal with updating inductions?

non-early break code does this essentially the same, see vect_update_ivs_after_vectorizer
which was modified to skip early exits.  The major difference is that on a non-early
exit the value can just be adjusted linearly: i_scalar = i_vect * VF + init and peeling is taken
into account by vect_peel_nonlinear_iv_init. 

I wanted to keep them separately as there's a significant enough difference in the
calculation of the loop bodies themselves that having one function became unwieldy.

That and we don't support all of the induction steps a non-early exit supports so I inlined
a simpler calculation.

Now there's a big difference between the normal loop vectorization and early break.
During the main loop vectorization the ivtmp is adjusted for peeling.  That is the amount is
already adjusted in the phi node itself.

For early break this isn't done because the bounds are fixed, i.e. for every exit we can at
most do VF iterations.

> And how do you avoid altering this when we flow in from the normal
> exit?  That is, you are updating the value in the epilog loop
> header but don't you need to instead update the value only on
> the alternate exit edges from the main loop (and keep the not
> updated value on the main exit edge)?

Because the ivtmps has not been adjusted for peeling, we adjust for it here.  This allows us
to have the same update required for all exits, so I don't have to differentiate between the
two here because I didn't do so during creation of the PHI node.

> 
> > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > +	  gimple_seq new_stmts = NULL;
> > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > +	  if (!gsi_end_p (last_gsi))
> > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +	  else
> > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +
> > > > +	  /* Fix phi expressions in the successor bb.  */
> > > > +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > > > +	}
> > > > +      else
> > > > +	{
> > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > +	  step_expr = unshare_expr (step_expr);
> > > > +
> > > > +	  /* We previously generated the new merged phi in the same BB as
> > > the
> > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > +	     normal loop phi which don't take the early breaks into account.  */
> > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge
> > > (loop));
> > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > +
> > > > +	  if (vf.is_constant ())
> > > > +	    {
> > > > +	      ni = fold_build2 (MULT_EXPR, stype,
> > > > +				fold_convert (stype,
> > > > +					      niters_vector),
> > > > +				build_int_cst (stype, vf));
> > > > +
> > > > +	      ni = fold_build2 (MINUS_EXPR, stype,
> > > > +				fold_convert (stype,
> > > > +					      niters_orig),
> > > > +				fold_convert (stype, ni));
> > > > +	    }
> > > > +	  else
> > > > +	    /* If the loop's VF isn't constant then the loop must have been
> > > > +	       masked, so at the end of the loop we know we have finished
> > > > +	       the entire loop and found nothing.  */
> > > > +	    ni = build_zero_cst (stype);
> > > > +
> > > > +	  ni = fold_convert (type, ni);
> > > > +	  /* We don't support variable n in this version yet.  */
> > > > +	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> > > > +
> > > > +	  var = create_tmp_var (type, "tmp");
> > > > +
> > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > +	  gimple_seq new_stmts = NULL;
> > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > +	  if (!gsi_end_p (last_gsi))
> > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +	  else
> > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +
> > > > +	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
> > > > +
> > > > +	  for (edge exit : alt_exits)
> > > > +	    adjust_phi_and_debug_stmts (phi1, exit,
> > > > +					build_int_cst (TREE_TYPE (step_expr),
> > > > +						       vf));
> > > > +	  ivtmp = gimple_phi_result (phi1);
> > > > +	}
> > > > +    }
> > > > +
> > > > +  return ivtmp;
> > > >  }
> > > >
> > > >  /* Return a gimple value containing the misalignment (measured in
> vector
> > > > @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf
> > > (loop_vec_info loop_vinfo,
> > > >
> > > >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
> > > >     this function searches for the corresponding lcssa phi node in exit
> > > > -   bb of LOOP.  If it is found, return the phi result; otherwise return
> > > > -   NULL.  */
> > > > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
> > > > +   return the phi result; otherwise return NULL.  */
> > > >
> > > >  static tree
> > > >  find_guard_arg (class loop *loop, class loop *epilog
> ATTRIBUTE_UNUSED,
> > > > -		gphi *lcssa_phi)
> > > > +		gphi *lcssa_phi, int lcssa_edge = 0)
> > > >  {
> > > >    gphi_iterator gsi;
> > > >    edge e = loop->vec_loop_iv;
> > > >
> > > > -  gcc_assert (single_pred_p (e->dest));
> > > >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> > > >      {
> > > >        gphi *phi = gsi.phi ();
> > > > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > > -	return PHI_RESULT (phi);
> > > > -    }
> > > > -  return NULL_TREE;
> > > > -}
> > > > -
> > > > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > > FIRST/SECOND
> > > > -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> > > > -   edge, the two loops are arranged as below:
> > > > -
> > > > -       preheader_a:
> > > > -     first_loop:
> > > > -       header_a:
> > > > -	 i_1 = PHI<i_0, i_2>;
> > > > -	 ...
> > > > -	 i_2 = i_1 + 1;
> > > > -	 if (cond_a)
> > > > -	   goto latch_a;
> > > > -	 else
> > > > -	   goto between_bb;
> > > > -       latch_a:
> > > > -	 goto header_a;
> > > > -
> > > > -       between_bb:
> > > > -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> > > > -
> > > > -     second_loop:
> > > > -       header_b:
> > > > -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > > > -				 or with i_2 if no LCSSA phi is created
> > > > -				 under condition of
> > > CREATE_LCSSA_FOR_IV_PHIS.
> > > > -	 ...
> > > > -	 i_4 = i_3 + 1;
> > > > -	 if (cond_b)
> > > > -	   goto latch_b;
> > > > -	 else
> > > > -	   goto exit_bb;
> > > > -       latch_b:
> > > > -	 goto header_b;
> > > > -
> > > > -       exit_bb:
> > > > -
> > > > -   This function creates loop closed SSA for the first loop; update the
> > > > -   second loop's PHI nodes by replacing argument on incoming edge with
> the
> > > > -   result of newly created lcssa PHI nodes.  IF
> CREATE_LCSSA_FOR_IV_PHIS
> > > > -   is false, Loop closed ssa phis will only be created for non-iv phis for
> > > > -   the first loop.
> > > > -
> > > > -   This function assumes exit bb of the first loop is preheader bb of the
> > > > -   second loop, i.e, between_bb in the example code.  With PHIs updated,
> > > > -   the second loop will execute rest iterations of the first.  */
> > > > -
> > > > -static void
> > > > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > > -				   class loop *first, class loop *second,
> > > > -				   bool create_lcssa_for_iv_phis)
> > > > -{
> > > > -  gphi_iterator gsi_update, gsi_orig;
> > > > -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > -
> > > > -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > > > -  edge second_preheader_e = loop_preheader_edge (second);
> > > > -  basic_block between_bb = single_exit (first)->dest;
> > > > -
> > > > -  gcc_assert (between_bb == second_preheader_e->src);
> > > > -  gcc_assert (single_pred_p (between_bb) && single_succ_p
> (between_bb));
> > > > -  /* Either the first loop or the second is the loop to be vectorized.  */
> > > > -  gcc_assert (loop == first || loop == second);
> > > > -
> > > > -  for (gsi_orig = gsi_start_phis (first->header),
> > > > -       gsi_update = gsi_start_phis (second->header);
> > > > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > > > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > > > -    {
> > > > -      gphi *orig_phi = gsi_orig.phi ();
> > > > -      gphi *update_phi = gsi_update.phi ();
> > > > -
> > > > -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> > > > -      /* Generate lcssa PHI node for the first loop.  */
> > > > -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> > > > -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
> > > > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > > > +      /* Nested loops with multiple exits can have different no# phi node
> > > > +	 arguments between the main loop and epilog as epilog falls to the
> > > > +	 second loop.  */
> > > > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> > > >  	{
> > > > -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > > > -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > > > -	  add_phi_arg (lcssa_phi, arg, single_exit (first),
> > > UNKNOWN_LOCATION);
> > > > -	  arg = new_res;
> > > > -	}
> > > > -
> > > > -      /* Update PHI node in the second loop by replacing arg on the loop's
> > > > -	 incoming edge.  */
> > > > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e,
> arg);
> > > > -    }
> > > > -
> > > > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > > > -     for correct vectorization of live stmts.  */
> > > > -  if (loop == first)
> > > > -    {
> > > > -      basic_block orig_exit = single_exit (second)->dest;
> > > > -      for (gsi_orig = gsi_start_phis (orig_exit);
> > > > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > > -	{
> > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> > > > -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> > > (orig_arg))
> > > > -	    continue;
> > > > -
> > > > -	  /* Already created in the above loop.   */
> > > > -	  if (find_guard_arg (first, second, orig_phi))
> > > > +	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
> > > > +	  if (TREE_CODE (var) != SSA_NAME)
> > > >  	    continue;
> > > >
> > > > -	  tree new_res = copy_ssa_name (orig_arg);
> > > > -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > > > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> > > UNKNOWN_LOCATION);
> > > > +	  if (operand_equal_p (get_current_def (var),
> > > > +			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > > > +	    return PHI_RESULT (phi);
> > > >  	}
> > > >      }
> > > > +  return NULL_TREE;
> > > >  }
> > > >
> > > >  /* Function slpeel_add_loop_guard adds guard skipping from the
> beginning
> > > > @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2
> (class
> > > loop *loop, class loop *epilog,
> > > >    gcc_assert (single_succ_p (merge_bb));
> > > >    edge e = single_succ_edge (merge_bb);
> > > >    basic_block exit_bb = e->dest;
> > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > > >
> > > >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > >      {
> > > >        gphi *update_phi = gsi.phi ();
> > > > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > > >
> > > >        tree merge_arg = NULL_TREE;
> > > >
> > > > @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class
> loop
> > > *loop, class loop *epilog,
> > > >        if (!merge_arg)
> > > >  	merge_arg = old_arg;
> > > >
> > > > -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> > > > +      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e-
> >dest_idx);
> > > >        /* If the var is live after loop but not a reduction, we simply
> > > >  	 use the old arg.  */
> > > >        if (!guard_arg)
> > > > @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class
> > > loop *loop, class loop *epilog,
> > > >      }
> > > >  }
> > > >
> > > > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > > > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > > > -
> > > > -static void
> > > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > > > -{
> > > > -  gphi_iterator gsi;
> > > > -  basic_block exit_bb = single_exit (epilog)->dest;
> > > > -
> > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > -  edge e = EDGE_PRED (exit_bb, 0);
> > > > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > > -}
> > > > -
> 
> I wonder if we can still split these changes out to before early break
> vect?

Maybe, I hadn't done so before because I was redirecting the edges during peeling.
If we no longer do that that may be easier.  Let me try to.

> 
> > > >  /* EPILOGUE_VINFO is an epilogue loop that we now know would need
> to
> > > >     iterate exactly CONST_NITERS times.  Make a final decision about
> > > >     whether the epilogue loop should be used, returning true if so.  */
> > > > @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >      bound_epilog += vf - 1;
> > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > >      bound_epilog += 1;
> > > > +  /* For early breaks the scalar loop needs to execute at most VF times
> > > > +     to find the element that caused the break.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    {
> > > > +      bound_epilog = vf;
> > > > +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> > > > +      vect_epilogues = false;
> > > > +    }
> > > >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > > >    poly_uint64 bound_scalar = bound_epilog;
> > > >
> > > > @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >  				  bound_prolog + bound_epilog)
> > > >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > > >  			 || vect_epilogues));
> > > > +
> > > > +  /* We only support early break vectorization on known bounds at this
> > > time.
> > > > +     This means that if the vector loop can't be entered then we won't
> > > generate
> > > > +     it at all.  So for now force skip_vector off because the additional
> control
> > > > +     flow messes with the BB exits and we've already analyzed them.  */
> > > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS
> (loop_vinfo);
> > > > +
> 
> I think it should be as "easy" as entering the epilog via the block taking
> the regular exit?
> 
> > > >    /* Epilog loop must be executed if the number of iterations for epilog
> > > >       loop is known at compile time, otherwise we need to add a check at
> > > >       the end of vector loop and skip to the end of epilog loop.  */
> > > >    bool skip_epilog = (prolog_peeling < 0
> > > >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > >  		      || !vf.is_constant ());
> > > > -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.
> */
> > > > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > +  /* PEELING_FOR_GAPS and peeling for early breaks are special because
> > > epilog
> > > > +     loop must be executed.  */
> > > > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      skip_epilog = false;
> > > > -
> > > >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > >    auto_vec<profile_count> original_counts;
> > > >    basic_block *original_bbs = NULL;
> > > > @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >    if (prolog_peeling)
> > > >      {
> > > >        e = loop_preheader_edge (loop);
> > > > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > > > -
> > > > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
> > > >        /* Peel prolog and put it on preheader edge of loop.  */
> > > > -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
> e);
> > > > +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
> e,
> > > > +						       true);
> > > >        gcc_assert (prolog);
> > > >        prolog->force_vectorize = false;
> > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> > > > +
> > > >        first_loop = prolog;
> > > >        reset_original_copy_tables ();
> > > >
> > > > @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >  	 as the transformations mentioned above make less or no sense when
> > > not
> > > >  	 vectorizing.  */
> > > >        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> > > > -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > > > +      auto_vec<basic_block> doms;
> > > > +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e,
> true,
> > > > +						       &doms);
> > > >        gcc_assert (epilog);
> > > >
> > > >        epilog->force_vectorize = false;
> > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
> > > >
> > > >        /* Scalar version loop may be preferred.  In this case, add guard
> > > >  	 and skip to epilog.  Note this only happens when the number of
> > > > @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >        vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > > >  					update_e);
> > > >
> > > > +      /* For early breaks we must create a guard to check how many
> iterations
> > > > +	 of the scalar loop are yet to be performed.  */
> 
> We have this check anyway, no?  In fact don't we know that we always enter
> the epilog (see above)?

Not always, masked loops for instance never enter the epilogue if the main loop
finished completely since there is no "remainder".

> 
> > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	{
> > > > +	  tree ivtmp =
> > > > +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters,
> > > > +					       *niters_vector, update_e);
> > > > +
> > > > +	  gcc_assert (ivtmp);
> > > > +	  tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > +					 fold_convert (TREE_TYPE (niters),
> > > > +						       ivtmp),
> > > > +					 build_zero_cst (TREE_TYPE (niters)));
> > > > +	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > +
> > > > +	  /* If we had a fallthrough edge, the guard will the threaded through
> > > > +	     and so we may need to find the actual final edge.  */
> > > > +	  edge final_edge = epilog->vec_loop_iv;
> > > > +	  /* slpeel_update_phi_nodes_for_guard2 expects an empty block in
> > > > +	     between the guard and the exit edge.  It only adds new nodes and
> > > > +	     doesn't update existing one in the current scheme.  */
> > > > +	  basic_block guard_to = split_edge (final_edge);
> > > > +	  edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
> > > guard_to,
> > > > +						guard_bb, prob_epilog.invert
> > > (),
> > > > +						irred_flag);
> > > > +	  doms.safe_push (guard_bb);
> > > > +
> > > > +	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > +
> > > > +	  /* We must update all the edges from the new guard_bb.  */
> > > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > > +					      final_edge);
> > > > +
> > > > +	  /* If the loop was versioned we'll have an intermediate BB between
> > > > +	     the guard and the exit.  This intermediate block is required
> > > > +	     because in the current scheme of things the guard block phi
> > > > +	     updating can only maintain LCSSA by creating new blocks.  In this
> > > > +	     case we just need to update the uses in this block as well.  */
> > > > +	  if (loop != scalar_loop)
> > > > +	    {
> > > > +	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
> > > > +		   !gsi_end_p (gsi); gsi_next (&gsi))
> > > > +		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (),
> > > guard_e));
> > > > +	    }
> > > > +
> > > > +	  flush_pending_stmts (guard_e);
> > > > +	}
> > > > +
> > > >        if (skip_epilog)
> > > >  	{
> > > >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >  	    }
> > > >  	  scale_loop_profile (epilog, prob_epilog, 0);
> > > >  	}
> > > > -      else
> > > > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> > > >
> > > >        unsigned HOST_WIDE_INT bound;
> > > >        if (bound_scalar.is_constant (&bound))
> > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > index
> > >
> b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49
> > > f21421958e7ee25eada 100644
> > > > --- a/gcc/tree-vect-loop.cc
> > > > +++ b/gcc/tree-vect-loop.cc
> > > > @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop
> > > *loop_in, vec_info_shared *shared)
> > > >      partial_load_store_bias (0),
> > > >      peeling_for_gaps (false),
> > > >      peeling_for_niter (false),
> > > > +    early_breaks (false),
> > > > +    non_break_control_flow (false),
> > > >      no_data_dependencies (false),
> > > >      has_mask_store (false),
> > > >      scalar_loop_scaling (profile_probability::uninitialized ()),
> > > > @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p
> > > (loop_vec_info loop_vinfo)
> > > >      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> > > (LOOP_VINFO_ORIG_LOOP_INFO
> > > >  					  (loop_vinfo));
> > > >
> > > > +  /* When we have multiple exits and VF is unknown, we must require
> > > partial
> > > > +     vectors because the loop bounds is not a minimum but a maximum.
> > > That is to
> > > > +     say we cannot unpredicate the main loop unless we peel or use partial
> > > > +     vectors in the epilogue.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > > > +    return true;
> > > > +
> > > >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> > > >      {
> > > > @@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost
> > > (loop_vec_info loop_vinfo)
> > > >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> > > >  }
> > > >
> > > > -
> > > >  /* Function vect_analyze_loop_form.
> > > >
> > > >     Verify that certain CFG restrictions hold, including:
> > > >     - the loop has a pre-header
> > > > -   - the loop has a single entry and exit
> > > > +   - the loop has a single entry
> > > > +   - nested loops can have only a single exit.
> > > >     - the loop exit condition is simple enough
> > > >     - the number of iterations can be analyzed, i.e, a countable loop.  The
> > > >       niter could be analyzed under some assumptions.  */
> > > > @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >                             |
> > > >                          (exit-bb)  */
> > > >
> > > > -      if (loop->num_nodes != 2)
> > > > -	return opt_result::failure_at (vect_location,
> > > > -				       "not vectorized:"
> > > > -				       " control flow in loop.\n");
> > > > -
> > > >        if (empty_block_p (loop->header))
> > > >  	return opt_result::failure_at (vect_location,
> > > >  				       "not vectorized: empty loop.\n");
> > > > @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >          dump_printf_loc (MSG_NOTE, vect_location,
> > > >  			 "Considering outer-loop vectorization.\n");
> > > >        info->inner_loop_cond = inner.loop_cond;
> > > > +
> > > > +      if (!single_exit (loop))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized: multiple exits.\n");
> > > > +
> > > >      }
> > > >
> > > > -  if (!single_exit (loop))
> > > > -    return opt_result::failure_at (vect_location,
> > > > -				   "not vectorized: multiple exits.\n");
> > > >    if (EDGE_COUNT (loop->header->preds) != 2)
> > > >      return opt_result::failure_at (vect_location,
> > > >  				   "not vectorized:"
> > > > @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >  				   "not vectorized: latch block not empty.\n");
> > > >
> > > >    /* Make sure the exit is not abnormal.  */
> > > > -  edge e = single_exit (loop);
> > > > -  if (e->flags & EDGE_ABNORMAL)
> > > > -    return opt_result::failure_at (vect_location,
> > > > -				   "not vectorized:"
> > > > -				   " abnormal loop exit edge.\n");
> > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > +  edge nexit = loop->vec_loop_iv;
> > > > +  for (edge e : exits)
> > > > +    {
> > > > +      if (e->flags & EDGE_ABNORMAL)
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " abnormal loop exit edge.\n");
> > > > +      /* Early break BB must be after the main exit BB.  In theory we should
> > > > +	 be able to vectorize the inverse order, but the current flow in the
> > > > +	 the vectorizer always assumes you update successor PHI nodes, not
> > > > +	 preds.  */
> > > > +      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e-
> > > >src))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " abnormal loop exit edge order.\n");
> 
> "unsupported loop exit order", but I don't understand the comment.
> 

One failure I found during bootstrap is that there was a CFG where the BBs were all
reversed.  Should have it in the testsuite, I'll dig it out and come back to you.

> > > > +    }
> > > > +
> > > > +  /* We currently only support early exit loops with known bounds.   */
> 
> Btw, why's that?  Is that because we don't support the loop-around edge?
> IMHO this is the most serious limitation (and as said above it should be
> trivial to fix).

Nah, it's just time 😊 I wanted to start getting feedback before relaxing it.
My patch 0/19 has an implementation plan for the remaining work.

I plan to relax this in this release, most likely in this series itself.

> 
> > > > +  if (exits.length () > 1)
> > > > +    {
> > > > +      class tree_niter_desc niter;
> > > > +      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter,
> NULL)
> > > > +	  || chrec_contains_undetermined (niter.niter)
> > > > +	  || !evolution_function_is_constant_p (niter.niter))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " early breaks only supported on loops"
> > > > +				       " with known iteration bounds.\n");
> > > > +    }
> > > >
> > > >    info->conds
> > > >      = vect_get_loop_niters (loop, &info->assumptions,
> > > > @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop,
> > > vec_info_shared *shared,
> > > >    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info-
> > > >alt_loop_conds);
> > > >    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
> > > >
> > > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > > +
> > > >    if (info->inner_loop_cond)
> > > >      {
> > > >        stmt_vec_info inner_loop_cond_info
> > > > @@ -3070,7 +3106,8 @@ start_over:
> > > >
> > > >    /* If an epilogue loop is required make sure we can create one.  */
> > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      {
> > > >        if (dump_enabled_p ())
> > > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop
> required\n");
> > > > @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction
> (loop_vec_info
> > > loop_vinfo,
> > > >    basic_block exit_bb;
> > > >    tree scalar_dest;
> > > >    tree scalar_type;
> > > > -  gimple *new_phi = NULL, *phi;
> > > > +  gimple *new_phi = NULL, *phi = NULL;
> > > >    gimple_stmt_iterator exit_gsi;
> > > >    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> > > >    gimple *epilog_stmt = NULL;
> > > > @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction
> > > (loop_vec_info loop_vinfo,
> > > >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> > > >  	  reduc_inputs.quick_push (new_def);
> > > >  	}
> > > > +
> > > > +	/* Update the other exits.  */
> > > > +	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	  {
> > > > +	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > +	    gphi_iterator gsi, gsi1;
> > > > +	    for (edge exit : alt_exits)
> > > > +	      {
> > > > +		/* Find the phi node to propaget into the exit block for each
> > > > +		   exit edge.  */
> > > > +		for (gsi = gsi_start_phis (exit_bb),
> > > > +		     gsi1 = gsi_start_phis (exit->src);
> 
> exit->src == loop->header, right?  I think this won't work for multiple
> alternate exits.  It's probably easier to do this where we create the
> LC PHI node for the reduction result?

No exit->src == definition block of the gcond. 

> 
> > > > +		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > +		     gsi_next (&gsi), gsi_next (&gsi1))
> > > > +		  {
> > > > +		    /* There really should be a function to just get the number
> > > > +		       of phis inside a bb.  */
> > > > +		    if (phi && phi == gsi.phi ())
> > > > +		      {
> > > > +			gphi *phi1 = gsi1.phi ();
> > > > +			SET_PHI_ARG_DEF (phi, exit->dest_idx,
> > > > +					 PHI_RESULT (phi1));
> 
> I think we know the header PHI of a reduction perfectly well, there
> shouldn't be the need to "search" for it.
> 
> > > > +			break;
> > > > +		      }
> > > > +		  }
> > > > +	      }
> > > > +	  }
> > > >        gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > > >      }
> > > >
> > > > @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info
> *vinfo,
> > > >  	   new_tree = lane_extract <vec_lhs', ...>;
> > > >  	   lhs' = new_tree;  */
> > > >
> > > > +      /* When vectorizing an early break, any live statement that is used
> > > > +	 outside of the loop are dead.  The loop will never get to them.
> > > > +	 We could change the liveness value during analysis instead but since
> > > > +	 the below code is invalid anyway just ignore it during codegen.  */
> > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	return true;
> 
> But what about the value that's live across the main exit when the
> epilogue is not entered?

My understanding is that vectorizable_live_operation only vectorizes statements within
a loop that can be live outside, In this case e.g. statements inside the body of the IF.

What you're describing above is done by other vectorizable_reduction etc.

That said, I can make this much safer by just restricting it to statements inside the same BB
as an alt exit BB.

> 
> > > > +
> > > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > >        basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > >        gcc_assert (single_pred_p (exit_bb));
> > > > @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > > >       versioning.   */
> > > >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > -  if (! single_pred_p (e->dest))
> > > > +  if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> > > (loop_vinfo))
> 
> e can be NULL here?  I think we should reject such loops earlier.
> 

Ah no, that's left-over from when this used single_exit.  It should be removed
In this patch.  Had missed it, sorry.

> > > >      {
> > > >        split_loop_exit_edge (e, true);
> > > >        if (dump_enabled_p ())
> > > > @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > > >      {
> > > >        e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > > -      if (! single_pred_p (e->dest))
> > > > +      if (e && ! single_pred_p (e->dest))
> > > >  	{
> > > >  	  split_loop_exit_edge (e, true);
> > > >  	  if (dump_enabled_p ())
> > > > @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >
> > > >    /* Loops vectorized with a variable factor won't benefit from
> > > >       unrolling/peeling.  */
> 
> update the comment?  Why would we unroll a VLA loop with early breaks?
> Or did you mean to use || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)?
> 

Ah indeed, should be ||.

> > > > -  if (!vf.is_constant ())
> > > > +  if (!vf.is_constant ()
> > > > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      {
> > > >        loop->unroll = 1;
> > > >        if (dump_enabled_p ())
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index
> > >
> 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82af
> > > e78ee2a7c627be45b 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info
> stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >    *live_p = false;
> > > >
> > > >    /* cond stmt other than loop exit cond.  */
> > > > -  if (is_ctrl_stmt (stmt_info->stmt)
> > > > -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> > > > -    *relevant = vect_used_in_scope;
> 
> how was that ever hit before?  For outer loop processing with outer loop
> vectorization?
>

I believe so, because the outer-loop would see the exit cond of the inner loop as well.

> > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > +    {
> > > > +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge,
> but
> > > > +	 it looks like loop_manip doesn't do that..  So we have to do it
> > > > +	 the hard way.  */
> > > > +      basic_block bb = gimple_bb (stmt_info->stmt);
> > > > +      bool exit_bb = false, early_exit = false;
> > > > +      edge_iterator ei;
> > > > +      edge e;
> > > > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > > > +        if (!flow_bb_inside_loop_p (loop, e->dest))
> > > > +	  {
> > > > +	    exit_bb = true;
> > > > +	    early_exit = loop->vec_loop_iv->src != bb;
> > > > +	    break;
> > > > +	  }
> > > > +
> > > > +      /* We should have processed any exit edge, so an edge not an early
> > > > +	 break must be a loop IV edge.  We need to distinguish between the
> > > > +	 two as we don't want to generate code for the main loop IV.  */
> > > > +      if (exit_bb)
> > > > +	{
> > > > +	  if (early_exit)
> > > > +	    *relevant = vect_used_in_scope;
> > > > +	}
> 
> I wonder why you can't simply do
> 
>          if (is_ctrl_stmt (stmt_info->stmt)
>              && stmt_info->stmt != LOOP_VINFO_COND (loop_info))
> 
> ?
> 
> > > > +      else if (bb->loop_father == loop)
> > > > +	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;
> 
> so for control flow not exiting the loop you can check
> loop_exits_from_bb_p ().
> 
> > > > +    }
> > > >
> > > >    /* changing memory.  */
> > > >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> > > > @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info
> stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >  	*relevant = vect_used_in_scope;
> > > >        }
> > > >
> > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > +  auto_bitmap exit_bbs;
> > > > +  for (edge exit : exits)
> > > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > +
> > > >    /* uses outside the loop.  */
> > > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > SSA_OP_DEF)
> > > >      {
> > > > @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >  	      /* We expect all such uses to be in the loop exit phis
> > > >  		 (because of loop closed form)   */
> > > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > > > -	      gcc_assert (bb == single_exit (loop)->dest);
> > > > +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
> 
> That now becomes quite expensive checking already covered by the LC SSA
> verifier so I suggest to simply drop this assert instead.
> 
> > > >                *live_p = true;
> > > >  	    }
> > > > @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized
> > > (loop_vec_info loop_vinfo, bool *fatal)
> > > >  	}
> > > >      }
> > > >
> > > > +  /* Ideally this should be in vect_analyze_loop_form but we haven't
> seen all
> > > > +     the conds yet at that point and there's no quick way to retrieve them.
> */
> > > > +  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
> > > > +    return opt_result::failure_at (vect_location,
> > > > +				   "not vectorized:"
> > > > +				   " unsupported control flow in loop.\n");
> 
> so we didn't do this before?  But see above where I wondered.  So when
> does this hit with early exits and why can't we check for this in
> vect_verify_loop_form?
> 

We did, but it was done in vect_analyze_loop_form but it was done purely based
on number of BB in the loop.  This required loops to be highly normalized which
isn't the case with multiple exits. That is I've seen various loops with different
numbers of random empty fall through BB in the body or after the main exit before
the latch.

We can do it in vect_analyze_loop_form but that requires us to walk all the
statements in all the basic blocks, because the loops track exit edges and a general
control flow edge is not easy to find as far as I know.  I added it at this point
because by here because by this point we would have walked all the statements.

> > > > +
> > > >    /* 2. Process_worklist */
> > > >    while (worklist.length () > 0)
> > > >      {
> > > > @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized
> > > (loop_vec_info loop_vinfo, bool *fatal)
> > > >  			return res;
> > > >  		    }
> > > >                   }
> > > > +	    }
> > > > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> > > > +	    {
> > > > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > > > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > > > +	      opt_result res
> > > > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > > +			       loop_vinfo, relevant, &worklist, false);
> > > > +	      if (!res)
> > > > +		return res;
> > > > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > > +				loop_vinfo, relevant, &worklist, false);
> > > > +	      if (!res)
> > > > +		return res;
> > > >              }
> > > >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> > > >  	    {
> > > > @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >  			     node_instance, cost_vec);
> > > >        if (!res)
> > > >  	return res;
> > > > -   }
> > > > +    }
> > > > +
> > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > > >
> > > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > >      {
> > > >        case vect_internal_def:
> > > > +      case vect_early_exit_def:
> > > >          break;
> > > >
> > > >        case vect_reduction_def:
> > > > @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >      {
> > > >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > > >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> > > >        *need_to_vectorize = true;
> > > >      }
> > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > index
> > >
> ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b3
> > > 40baff61d18da8e242dd 100644
> > > > --- a/gcc/tree-vectorizer.h
> > > > +++ b/gcc/tree-vectorizer.h
> > > > @@ -63,6 +63,7 @@ enum vect_def_type {
> > > >    vect_internal_def,
> > > >    vect_induction_def,
> > > >    vect_reduction_def,
> > > > +  vect_early_exit_def,
> 
> can you avoid putting this inbetween reduction and double reduction
> please?  Just put it before vect_unknown_def_type.  In fact the COND
> isn't a def ... maybe we should have pattern recogized
> 
>  if (a < b) exit;
> 
> as
> 
>  cond = a < b;
>  if (cond != 0) exit;
> 
> so the part that we need to vectorize is more clear.

Hmm fair enough, I still find it useful to be able to distinguish
between this and general control flow though.  In fact depending
on when we finish reviewing/upstreaming this It should be easy to
support general control flow in such loops.

> 
> > > >    vect_double_reduction_def,
> > > >    vect_nested_cycle,
> > > >    vect_first_order_recurrence,
> > > > @@ -876,6 +877,13 @@ public:
> > > >       we need to peel off iterations at the end to form an epilogue loop.  */
> > > >    bool peeling_for_niter;
> > > >
> > > > +  /* When the loop has early breaks that we can vectorize we need to
> peel
> > > > +     the loop for the break finding loop.  */
> > > > +  bool early_breaks;
> > > > +
> > > > +  /* When the loop has a non-early break control flow inside.  */
> > > > +  bool non_break_control_flow;
> > > > +
> > > >    /* List of loop additional IV conditionals found in the loop.  */
> > > >    auto_vec<gcond *> conds;
> > > >
> > > > @@ -985,9 +993,11 @@ public:
> > > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> > > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> > > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > > >  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> > > >early_break_conflict
> > > >  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)-
> >early_break_dest_bb
> > > >  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> > > > +#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)-
> > > >non_break_control_flow
> > > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > > >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > > >no_data_dependencies
> > > > @@ -1038,8 +1048,8 @@ public:
> > > >     stack.  */
> > > >  typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
> > > >
> > > > -inline loop_vec_info
> > > > -loop_vec_info_for_loop (class loop *loop)
> > > > +static inline loop_vec_info
> > > > +loop_vec_info_for_loop (const class loop *loop)
> > > >  {
> > > >    return (loop_vec_info) loop->aux;
> > > >  }
> > > > @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
> > > >  {
> > > >    if (bb == (bb->loop_father)->header)
> > > >      return true;
> > > > -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> > > > +
> > > >    return false;
> > > >  }
> > > >
> > > > @@ -2176,9 +2186,10 @@ class auto_purge_vect_location
> > > >     in tree-vect-loop-manip.cc.  */
> > > >  extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > >  				     tree, tree, tree, bool);
> > > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> const_edge);
> > > > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> > > const_edge);
> > > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > > > -						     class loop *, edge);
> > > > +						    class loop *, edge, bool,
> > > > +						    vec<basic_block> * = NULL);
> > > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);
> > > >  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > > >  				    tree *, tree *, tree *, int, bool, bool,
> > > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > > index
> > >
> a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c
> > > 67f29f5dc9a8d72235f4 100644
> > > > --- a/gcc/tree-vectorizer.cc
> > > > +++ b/gcc/tree-vectorizer.cc
> > > > @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
> > > >  	 predicates that need to be shared for optimal predicate usage.
> > > >  	 However reassoc will re-order them and prevent CSE from working
> > > >  	 as it should.  CSE only the loop body, not the entry.  */
> > > > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> 
> seeing this more and more I think we want a simple way to iterate over
> all exits without copying to a vector when we have them recorded.  My
> C++ fu is too limited to support
> 
>   for (auto exit : recorded_exits (loop))
>     ...
> 
> (maybe that's enough for somebody to jump onto this ;))
> 
> Don't treat all review comments as change orders, but it should be clear
> the code isn't 100% obvious.  Maybe the patch can be simplified by
> splitting out the LC SSA cleanup parts.

Will give it a try,

Thanks!
Tamar

> 
> Thanks,
> Richard.
> 
> > > > +      for (edge exit : exits)
> > > > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> > > >
> > > >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> > > >        do_rpo_vn (fun, entry, exit_bbs);
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg,
> > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > > Moerman;
> > > HRB 36809 (AG Nuernberg)
> >

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
  2023-07-17 10:56         ` Tamar Christina
@ 2023-07-17 12:48           ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-07-17 12:48 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 17 Jul 2023, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Friday, July 14, 2023 2:35 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> > updates for early break.
> > 
> > On Thu, 13 Jul 2023, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Thursday, July 13, 2023 6:31 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > jlaw@ventanamicro.com
> > > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > > > updates for early break.
> > > >
> > > > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > > > The rewrite also naturally takes into account multiple exits and so it didn't
> > > > > make sense to split them off.
> > > > >
> > > > > For the purposes of peeling the only change for multiple exits is that the
> > > > > secondary exits are all wired to the start of the new loop preheader when
> > > > doing
> > > > > epilogue peeling.
> > > > >
> > > > > When doing prologue peeling the CFG is kept in tact.
> > > > >
> > > > > For both epilogue and prologue peeling we wire through between the
> > two
> > > > loops any
> > > > > PHI nodes that escape the first loop into the second loop if flow_loops is
> > > > > specified.  The reason for this conditionality is because
> > > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
> > > > >   - prologue peeling
> > > > >   - epilogue peeling
> > > > >   - loop distribution
> > > > >
> > > > > for the last case the loops should remain independent, and so not be
> > > > connected.
> > > > > Because of this propagation of only used phi nodes get_current_def can
> > be
> > > > used
> > > > > to easily find the previous definitions.  However live statements that are
> > > > > not used inside the loop itself are not propagated (since if unused, the
> > > > moment
> > > > > we add the guard in between the two loops the value across the bypass
> > edge
> > > > can
> > > > > be wrong if the loop has been peeled.)
> > > > >
> > > > > This is dealt with easily enough in find_guard_arg.
> > > > >
> > > > > For multiple exits, while we are in LCSSA form, and have a correct DOM
> > tree,
> > > > the
> > > > > moment we add the guard block we will change the dominators again.  To
> > > > deal with
> > > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the
> > blocks
> > > > to
> > > > > update without having to recompute the list of blocks to update again.
> > > > >
> > > > > When multiple exits and doing epilogue peeling we will also temporarily
> > have
> > > > an
> > > > > incorrect VUSES chain for the secondary exits as it anticipates the final
> > result
> > > > > after the VDEFs have been moved.  This will thus be corrected once the
> > code
> > > > > motion is applied.
> > > > >
> > > > > Lastly by doing things this way we can remove the helper functions that
> > > > > previously did lock step iterations to update things as it went along.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > >
> > > > Not sure if I get through all of this in one go - so be prepared that
> > > > the rest of the review follows another day.
> > >
> > > No worries, I appreciate the reviews!
> > > Just giving some quick replies for when you continue.
> > 
> > Continueing.
> > 
> > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 	* tree-loop-distribution.cc (copy_loop_before): Pass flow_loops =
> > > > false.
> > > > > 	* tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
> > > > > 	* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > > > additional
> > > > > 	assert.
> > > > > 	(vect_set_loop_condition_normal): Skip modifying loop IV for multiple
> > > > > 	exits.
> > > > > 	(slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> > > > peeling.
> > > > > 	(slpeel_can_duplicate_loop_p): Likewise.
> > > > > 	(vect_update_ivs_after_vectorizer): Don't enter this...
> > > > > 	(vect_update_ivs_after_early_break): ...but instead enter here.
> > > > > 	(find_guard_arg): Update for new peeling code.
> > > > > 	(slpeel_update_phi_nodes_for_loops): Remove.
> > > > > 	(slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> > > > checks.
> > > > > 	(slpeel_update_phi_nodes_for_lcssa): Remove.
> > > > > 	(vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > > > > 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > > > > 	non_break_control_flow and early_breaks.
> > > > > 	(vect_need_peeling_or_partial_vectors_p): Force partial vector if
> > > > > 	multiple exits and VLA.
> > > > > 	(vect_analyze_loop_form): Support inner loop multiple exits.
> > > > > 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > > > > 	(vect_create_epilog_for_reduction):  Update live phi nodes.
> > > > > 	(vectorizable_live_operation): Ignore live operations in vector loop
> > > > > 	when multiple exits.
> > > > > 	(vect_transform_loop): Force unrolling for VF loops and multiple exits.
> > > > > 	* tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
> > > > > 	(vect_mark_stmts_to_be_vectorized): Check for non-exit control flow
> > > > and
> > > > > 	analyze gcond params.
> > > > > 	(vect_analyze_stmt): Support gcond.
> > > > > 	* tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
> > > > > 	in RPO pass.
> > > > > 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > > > > 	(LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW):
> > > > New.
> > > > > 	(loop_vec_info_for_loop): Change to const and static.
> > > > > 	(is_loop_header_bb_p): Drop assert.
> > > > > 	(slpeel_can_duplicate_loop_p): Update prototype.
> > > > > 	(class loop): Add early_breaks and non_break_control_flow.
> > > > >
> > > > > --- inline copy of patch --
> > > > > diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
> > > > > index
> > > >
> > 97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40
> > > > d833a968314a4442b9e 100644
> > > > > --- a/gcc/tree-loop-distribution.cc
> > > > > +++ b/gcc/tree-loop-distribution.cc
> > > > > @@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool
> > > > redirect_lc_phi_defs)
> > > > >    edge preheader = loop_preheader_edge (loop);
> > > > >
> > > > >    initialize_original_copy_tables ();
> > > > > -  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
> > > > > +  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader,
> > > > false);
> > > > >    gcc_assert (res != NULL);
> > > > >
> > > > >    /* When a not last partition is supposed to keep the LC PHIs computed
> > > > > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > > > > index
> > > >
> > 5d398b67e68c7076760854119590f18b19c622b6..79686f6c4945b7139ba
> > > > 377300430c04b7aeefe6c 100644
> > > > > --- a/gcc/tree-ssa-loop-niter.cc
> > > > > +++ b/gcc/tree-ssa-loop-niter.cc
> > > > > @@ -3072,7 +3072,12 @@ loop_only_exit_p (const class loop *loop,
> > > > basic_block *body, const_edge exit)
> > > > >    gimple_stmt_iterator bsi;
> > > > >    unsigned i;
> > > > >
> > > > > -  if (exit != single_exit (loop))
> > > > > +  /* We need to check for alternative exits since exit can be NULL.  */
> > > >
> > > > You mean we pass in exit == NULL in some cases?  I'm not sure what
> > > > the desired behavior in that case is - can you point out the
> > > > callers you are fixing here?
> > > >
> > > > I think we should add gcc_assert (exit != nullptr)
> > > >
> > > > >    for (i = 0; i < loop->num_nodes; i++)
> > > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > > > index
> > > >
> > 6b93fb3f9af8f2bbdf5dec28f0009177aa5171ab..550d7f40002cf0b58f8a92
> > > > 7cb150edd7c2aa9999 100644
> > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > @@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple
> > *update_phi,
> > > > edge e, tree new_def)
> > > > >  {
> > > > >    tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
> > > > >
> > > > > +  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
> > > > > +	      || orig_def != new_def);
> > > > > +
> > > > >    SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
> > > > >
> > > > >    if (MAY_HAVE_DEBUG_BIND_STMTS)
> > > > > @@ -1292,7 +1295,8 @@ vect_set_loop_condition_normal
> > (loop_vec_info
> > > > loop_vinfo,
> > > > >    gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
> > > > >
> > > > >    /* Record the number of latch iterations.  */
> > > > > -  if (limit == niters)
> > > > > +  if (limit == niters
> > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > >      /* Case A: the loop iterates NITERS times.  Subtract one to get the
> > > > >         latch count.  */
> > > > >      loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> > > > > @@ -1303,7 +1307,13 @@ vect_set_loop_condition_normal
> > > > (loop_vec_info loop_vinfo,
> > > > >      loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type,
> > > > >  				       limit, step);
> > > > >
> > > > > -  if (final_iv)
> > > > > +  /* For multiple exits we've already maintained LCSSA form and handled
> > > > > +     the scalar iteration update in the code that deals with the merge
> > > > > +     block and its updated guard.  I could move that code here instead
> > > > > +     of in vect_update_ivs_after_early_break but I have to still deal
> > > > > +     with the updates to the counter `i`.  So for now I'll keep them
> > > > > +     together.  */
> > > > > +  if (final_iv && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > >      {
> > > > >        gassign *assign;
> > > > >        edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > @@ -1509,11 +1519,19 @@ vec_init_exit_info (class loop *loop)
> > > > >     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
> > > > >     non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy
> > the
> > > > >     basic blocks from SCALAR_LOOP instead of LOOP, but to either the
> > > > > -   entry or exit of LOOP.  */
> > > > > +   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to
> > > > SCALAR_LOOP as a
> > > > > +   continuation.  This is correct for cases where one loop continues from
> > the
> > > > > +   other like in the vectorizer, but not true for uses in e.g. loop
> > distribution
> > > > > +   where the loop is duplicated and then modified.
> > > > > +
> > > >
> > > > but for loop distribution the flow also continues?  I'm not sure what you
> > > > are refering to here.  Do you by chance have a branch with the patches
> > > > installed?
> > >
> > > Yup, they're at refs/users/tnfchris/heads/gcc-14-early-break in the repo.
> > >
> > > >
> > > > > +   If UPDATED_DOMS is not NULL it is update with the list of basic blocks
> > > > whoms
> > > > > +   dominators were updated during the peeling.  */
> > > > >
> > > > >  class loop *
> > > > >  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
> > > > > -					class loop *scalar_loop, edge e)
> > > > > +					class loop *scalar_loop, edge e,
> > > > > +					bool flow_loops,
> > > > > +					vec<basic_block> *updated_doms)
> > > > >  {
> > > > >    class loop *new_loop;
> > > > >    basic_block *new_bbs, *bbs, *pbbs;
> > > > > @@ -1602,6 +1620,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > (class
> > > > loop *loop,
> > > > >    for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++)
> > > > >      rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
> > > > >
> > > > > +  /* Rename the exit uses.  */
> > > > > +  for (edge exit : get_loop_exit_edges (new_loop))
> > > > > +    for (auto gsi = gsi_start_phis (exit->dest);
> > > > > +	 !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > +      {
> > > > > +	tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit);
> > > > > +	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit));
> > > > > +	if (MAY_HAVE_DEBUG_BIND_STMTS)
> > > > > +	  adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest);
> > > > > +      }
> > > > > +
> > > > > +  /* This condition happens when the loop has been versioned. e.g. due
> > to
> > > > ifcvt
> > > > > +     versioning the loop.  */
> > > > >    if (scalar_loop != loop)
> > > > >      {
> > > > >        /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs
> > from
> > > > > @@ -1616,28 +1647,106 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > > > (class loop *loop,
> > > > >  						EDGE_SUCC (loop->latch, 0));
> > > > >      }
> > > > >
> > > > > +  vec<edge> alt_exits = loop->vec_loop_alt_exits;
> > > >
> > > > So 'e' is not one of alt_exits, right?  I wonder if we can simply
> > > > compute the vector from all exits of 'loop' and removing 'e'?
> > > >
> > > > > +  bool multiple_exits_p = !alt_exits.is_empty ();
> > > > > +  auto_vec<basic_block> doms;
> > > > > +  class loop *update_loop = NULL;
> > > > > +
> > > > >    if (at_exit) /* Add the loop copy at exit.  */
> > > > >      {
> > > > > -      if (scalar_loop != loop)
> > > > > +      if (scalar_loop != loop && new_exit->dest != exit_dest)
> > > > >  	{
> > > > > -	  gphi_iterator gsi;
> > > > >  	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
> > > > > +	  flush_pending_stmts (new_exit);
> > > > > +	}
> > > > >
> > > > > -	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
> > > > > -	       gsi_next (&gsi))
> > > > > +      auto loop_exits = get_loop_exit_edges (loop);
> > > > > +      for (edge exit : loop_exits)
> > > > > +	redirect_edge_and_branch (exit, new_preheader);
> > > > > +
> > > > > +
> > > >
> > > > one line vertical space too much
> > > >
> > > > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > > > +	 block and the new loop header.  This allows us to later split the
> > > > > +	 preheader block and still find the right LC nodes.  */
> > > > > +      edge latch_new = single_succ_edge (new_preheader);
> > > > > +      edge latch_old = loop_latch_edge (loop);
> > > > > +      hash_set <tree> lcssa_vars;
> > > > > +      for (auto gsi_from = gsi_start_phis (latch_old->dest),
> > > >
> > > > so that's loop->header (and makes it more clear which PHI nodes you are
> > > > looking at)
> > 
> > 
> > So I'm now in a debug session - I think that conceptually it would
> > make more sense to create the LC PHI nodes that are present at the
> > old exit destination in the new preheader _before_ you redirect them
> > above and then flush_pending_stmts after redirecting, that should deal
> > with the copying.
> > 
> 
> This was the first thing I tried, however as soon as you redirect one edge
> you destroy all other phi nodes on the original block.
> 
> As in if I have 3 phi nodes, I need to move them all at the same time, which
> brings the next problem in that I can't add more entries to a phi than it has
> incoming edges.  That is I can't just make the final phi nodes on the destination
> without having an edge for it.

Hmm?  It would go like

 1. create all PHI nodes (without any arguments) at the destination
 2. redirect edge + flush stmts (will copy all PHI args)

> And to make an edge for it I need to have a condition to attach to the edge.
> To work around it I tried maintaining a cache of the nodes I need to make on
> the new destination and after redirecting just create them,  but that has me
> looping over the same PHIs multiple times.
> 
> Any suggestions?

Not sure what you mean with "a condition to attach to the edge"?

> > Now, your copying actually iterates over all PHIs in the loop _header_,
> > so it doesn't actually copy LC PHI nodes but possibly creates additional
> > ones.  The intent does seem to do this since you want a different value
> > on those edges for all but the main loop exit.  But then the
> > overall comments should better reflect that and maybe you should
> > do what I suggested anyway and have this loop alter only the alternate
> > exit LC PHIs?
> 
> It does create the LC PHI nodes for all exits, it's just that for alternate exits
> all the nodes are the same since we only care about the value for full
> iterations. 
> 
> I'm not sure what you're suggesting with alter only the alternative exits.
> I need to still create the one for the main loop and seems easier to do them
> In one loop rather than two?
> 
> Doing this here allows the removal of all the code later on that the vectorizer
> uses to try to find the main exit's PHIs.

I think you have two paths (unless you always peel at least a single 
iteration - it seems you probably do that).  On the path from main
to epilog you don't need the existing LC PHI values - those are for
the loop _exit_ but you are continuing the loop so you need values
for the entry of the epilog header PHIs.

There's still somehow the distinction betwee taking the "normal"
exit and the alternate exits - at least your code suggests so.
In case we would have the case the epilog loop is skipped for
the normal exit we'd need the original LC PHI nodes there
(merging with the epilog exit LC PHI nodes).  So for the
"normal" exit we'd need both set of PHIs.  Thus I assume you
made your life simpler by always peeling one iteration so
even the "normal" exit only needs the PHIs for the epilog header?

But then I wonder why you look at the original values of the exit
LC PHI node at all ...

So what is it?  A comment at the very spot in the code would be
helpful, documenting the constraints we are dealing with.

> > 
> > If you don't flush_pending_stmts on an edge after redirecting you
> > should call redirect_edge_var_map_clear (edge), otherwise the stale
> > info might break things later.
> > 
> > > > > +	   gsi_to = gsi_start_phis (latch_new->dest);
> > > >
> > > > likewise new_loop->header
> > > >
> > > > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > > > +	{
> > > > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, latch_old);
> > > > > +	  /* In all cases, even in early break situations we're only
> > > > > +	     interested in the number of fully executed loop iters.  As such
> > > > > +	     we discard any partially done iteration.  So we simply propagate
> > > > > +	     the phi nodes from the latch to the merge block.  */
> > > > > +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > > > > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > > > > +
> > > > > +	  lcssa_vars.add (new_arg);
> > > > > +
> > > > > +	  /* Main loop exit should use the final iter value.  */
> > > > > +	  add_phi_arg (lcssa_phi, new_arg, loop->vec_loop_iv,
> > > > UNKNOWN_LOCATION);
> > > >
> > > > above you are creating the PHI node at e->dest but here add the PHI arg to
> > > > loop->vec_loop_iv - that's 'e' here, no?  Consistency makes it easier
> > > > to follow.  I _think_ this code doesn't need to know about the "special"
> > > > edge.
> > > >
> > > > > +
> > > > > +	  /* All other exits use the previous iters.  */
> > > > > +	  for (edge e : alt_exits)
> > > > > +	    add_phi_arg (lcssa_phi, gimple_phi_result (from_phi), e,
> > > > > +			 UNKNOWN_LOCATION);
> > > > > +
> > > > > +	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
> > > > > +	}
> > > > > +
> > > > > +      /* Copy over any live SSA vars that may not have been materialized in
> > > > the
> > > > > +	 loops themselves but would be in the exit block.  However when the
> > > > live
> > > > > +	 value is not used inside the loop then we don't need to do this,  if we
> > > > do
> > > > > +	 then when we split the guard block the branch edge can end up
> > > > containing the
> > > > > +	 wrong reference,  particularly if it shares an edge with something that
> > > > has
> > > > > +	 bypassed the loop.  This is not something peeling can check so we
> > > > need to
> > > > > +	 anticipate the usage of the live variable here.  */
> > > > > +      auto exit_map = redirect_edge_var_map_vector (exit);
> > > >
> > > > Hmm, did I use that in my attemt to refactor things? ...
> > >
> > > Indeed, I didn't always use it, but found it was the best way to deal with the
> > > variables being live in various BB after the loop.
> > 
> > As said this whole piece of code is possibly more complicated than
> > necessary.  First copying/creating the PHI nodes that are present
> > at the exit (the old LC PHI nodes), then redirecting edges and flushing
> > stmts should deal with half of this.
> >
> > > >
> > > > > +      if (exit_map)
> > > > > +        for (auto vm : exit_map)
> > > > > +	{
> > > > > +	  if (lcssa_vars.contains (vm.def)
> > > > > +	      || TREE_CODE (vm.def) != SSA_NAME)
> > > >
> > > > the latter check is cheaper so it should come first
> > > >
> > > > > +	    continue;
> > > > > +
> > > > > +	  imm_use_iterator imm_iter;
> > > > > +	  use_operand_p use_p;
> > > > > +	  bool use_in_loop = false;
> > > > > +
> > > > > +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def)
> > > > >  	    {
> > > > > -	      gphi *phi = gsi.phi ();
> > > > > -	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> > > > > -	      location_t orig_locus
> > > > > -		= gimple_phi_arg_location_from_edge (phi, e);
> > > > > +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> > > > > +	      if (flow_bb_inside_loop_p (loop, bb)
> > > > > +		  && !gimple_vuse (USE_STMT (use_p)))
> > 
> > what's this gimple_vuse check?  I see now for vect-early-break_17.c this
> > code triggers and ignores
> > 
> >   vect_b[i_18] = _2;
> > 
> > > > > +		{
> > > > > +		  use_in_loop = true;
> > > > > +		  break;
> > > > > +		}
> > > > > +	    }
> > > > >
> > > > > -	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
> > > > > +	  if (!use_in_loop)
> > > > > +	    {
> > > > > +	       /* Do a final check to see if it's perhaps defined in the loop.  This
> > > > > +		  mirrors the relevancy analysis's used_outside_scope.  */
> > > > > +	      gimple *stmt = SSA_NAME_DEF_STMT (vm.def);
> > > > > +	      if (!stmt || !flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
> > > > > +		continue;
> > > > >  	    }
> > 
> > since the def was on a LC PHI the def should always be defined inside the
> > loop.
> > 
> > > > > +
> > > > > +	  tree new_res = copy_ssa_name (vm.result);
> > > > > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > > > > +	  for (edge exit : loop_exits)
> > > > > +	     add_phi_arg (lcssa_phi, vm.def, exit, vm.locus);
> > > >
> > > > not sure what you are doing above - I guess I have to play with it
> > > > in a debug session.
> > >
> > > Yeah if you comment it out one of the testcases should fail.
> > 
> > using new_preheader instead of e->dest would make things clearer.
> > 
> > You are now adding the same arg to every exit (you've just queried the
> > main exit redirect_edge_var_map_vector).
> > 
> > OK, so I think I understand what you're doing.  If I understand
> > correctly we know that when we exit the main loop via one of the
> > early exits we are definitely going to enter the epilog but when
> > we take the main exit we might not.
> 
> Correct

Hmm, so you _didn't_ make your life easier.

> > 
> > Looking at the CFG we create currently this isn't reflected and
> > this complicates this PHI node updating.  What I'd try to do
> > is leave redirecting the alternate exits until after
> > slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> > means leaving it almost unchanged besides the LC SSA maintaining
> > changes.  After that for the multi-exit case split the
> > epilog preheader edge and redirect all the alternate exits to the
> > new preheader.  So the CFG becomes
> > 
> >                  <original loop>
> >                 /      |
> >                /    <main exit w/ original LC PHI>
> >               /      if (epilog)
> >    alt exits /        /  \
> >             /        /    loop around
> >             |       /
> >            preheader with "header" PHIs
> >               |
> >           <epilog>
> > 
> > note you need the header PHIs also on the main exit path but you
> > only need the loop end PHIs there.
> 
> Ah, hadn't considered this one.  I'll give it a try.
> 
> > 
> > It seems so that at least currently the order of things makes
> > them more complicated than necessary.
> 
> Possibly yeah, there's a lot of work to maintain the dominators
> and phi nodes because the exits are rewritten early.
> 
> I'll give this a go!

Note the above CFG will likely confuse the hell out of the
reduction code (or at least you need to fixup your fixup).

> > 
> > > >
> > > > >  	}
> > > > > -      redirect_edge_and_branch_force (e, new_preheader);
> > > > > -      flush_pending_stmts (e);
> > > > > +
> > > > >        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e-
> > >src);
> > > > > -      if (was_imm_dom || duplicate_outer_loop)
> > > > > +
> > > > > +      if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p)
> > > > >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit-
> > > > >src);
> > > > >
> > > > >        /* And remove the non-necessary forwarder again.  Keep the other
> > > > > @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > (class
> > > > loop *loop,
> > > > >        delete_basic_block (preheader);
> > > > >        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
> > > > >  			       loop_preheader_edge (scalar_loop)->src);
> > > > > +
> > > > > +      /* Finally after wiring the new epilogue we need to update its main
> > exit
> > > > > +	 to the original function exit we recorded.  Other exits are already
> > > > > +	 correct.  */
> > > > > +      if (multiple_exits_p)
> > > > > +	{
> > > > > +	  for (edge e : get_loop_exit_edges (loop))
> > > > > +	    doms.safe_push (e->dest);
> > > > > +	  update_loop = new_loop;
> > > > > +	  doms.safe_push (exit_dest);
> > > > > +
> > > > > +	  /* Likely a fall-through edge, so update if needed.  */
> > > > > +	  if (single_succ_p (exit_dest))
> > > > > +	    doms.safe_push (single_succ (exit_dest));
> > > > > +	}
> > > > >      }
> > > > >    else /* Add the copy at entry.  */
> > > > >      {
> > > > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > > > +	 block and the new loop header.  This allows us to later split the
> > > > > +	 preheader block and still find the right LC nodes.  */
> > > > > +      edge old_latch_loop = loop_latch_edge (loop);
> > > > > +      edge old_latch_init = loop_preheader_edge (loop);
> > > > > +      edge new_latch_loop = loop_latch_edge (new_loop);
> > > > > +      edge new_latch_init = loop_preheader_edge (new_loop);
> > > > > +      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),
> > > >
> > > > see above
> > > >
> > > > > +	   gsi_to = gsi_start_phis (old_latch_loop->dest);
> > > > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > > > +	{
> > > > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > > > new_latch_loop);
> > > > > +	  adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg);
> > > > > +	}
> > > > > +
> > > > >        if (scalar_loop != loop)
> > > > >  	{
> > > > >  	  /* Remove the non-necessary forwarder of scalar_loop again.  */
> > > > > @@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > (class
> > > > loop *loop,
> > > > >        delete_basic_block (new_preheader);
> > > > >        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
> > > > >  			       loop_preheader_edge (new_loop)->src);
> > > > > +
> > > > > +      if (multiple_exits_p)
> > > > > +	update_loop = loop;
> > > > >      }
> > > > >
> > > > > -  if (scalar_loop != loop)
> > > > > +  if (multiple_exits_p)
> > > > >      {
> > > > > -      /* Update new_loop->header PHIs, so that on the preheader
> > > > > -	 edge they are the ones from loop rather than scalar_loop.  */
> > > > > -      gphi_iterator gsi_orig, gsi_new;
> > > > > -      edge orig_e = loop_preheader_edge (loop);
> > > > > -      edge new_e = loop_preheader_edge (new_loop);
> > > > > -
> > > > > -      for (gsi_orig = gsi_start_phis (loop->header),
> > > > > -	   gsi_new = gsi_start_phis (new_loop->header);
> > > > > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > > > > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > > > > +      for (edge e : get_loop_exit_edges (update_loop))
> > > > >  	{
> > > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > > -	  gphi *new_phi = gsi_new.phi ();
> > > > > -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > > > > -	  location_t orig_locus
> > > > > -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > > > > -
> > > > > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > > > > +	  edge ex;
> > > > > +	  edge_iterator ei;
> > > > > +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> > > > > +	    {
> > > > > +	      /* Find the first non-fallthrough block as fall-throughs can't
> > > > > +		 dominate other blocks.  */
> > > > > +	      while ((ex->flags & EDGE_FALLTHRU)
> > 
> > For the prologue peeling any early exit we take would skip all other
> > loops so we can simply leave them and their LC PHI nodes in place.
> > We need extra PHIs only on the path to the main vector loop.  I
> > think the comment isn't accurately reflecting what we do.  In
> > fact we do not add any LC PHI nodes here but simply adjust the
> > main loop header PHI arguments?
> 
> Yeah we don't create any new nodes after peeling in this version.
> 
> > 
> > > > I don't think EDGE_FALLTHRU is set correctly, what's wrong with
> > > > just using single_succ_p here?  A fallthru edge src dominates the
> > > > fallthru edge dest, so the sentence above doesn't make sense.
> > >
> > > I wanted to say, that the immediate dominator of a block is never
> > > an fall through block.  At least from what I understood from how
> > > the dominators are calculated in the code, though may have missed
> > > something.
> > 
> >  BB1
> >   |
> >  BB2
> >   |
> >  BB3
> > 
> > here the immediate dominator of BB3 is BB2 and that of BB2 is BB1.
> > 
> > > >
> > > > > +		     && single_succ_p (ex->dest))
> > > > > +		{
> > > > > +		  doms.safe_push (ex->dest);
> > > > > +		  ex = single_succ_edge (ex->dest);
> > > > > +		}
> > > > > +	      doms.safe_push (ex->dest);
> > > > > +	    }
> > > > > +	  doms.safe_push (e->dest);
> > > > >  	}
> > > > > -    }
> > > > >
> > > > > +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > > +      if (updated_doms)
> > > > > +	updated_doms->safe_splice (doms);
> > > > > +    }
> > > > >    free (new_bbs);
> > > > >    free (bbs);
> > > > >
> > > > > @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const
> > > > loop_vec_info loop_vinfo, const_edge e)
> > > > >    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> > > > >    unsigned int num_bb = loop->inner? 5 : 2;
> > > > >
> > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
> > > > > +
> > > >
> > > > I think checking the number of BBs is odd, I don't remember anything
> > > > in slpeel is specifically tied to that?  I think we can simply drop
> > > > this or do you remember anything that would depend on ->num_nodes
> > > > being only exactly 5 or 2?
> > >
> > > Never actually seemed to require it, but they're used as some check to
> > > see if there are unexpected control flow in the loop.
> > >
> > > i.e. this would say no if you have an if statement in the loop that wasn't
> > > converted.  The other part of this and the accompanying explanation is in
> > > vect_analyze_loop_form.  In the patch series I had to remove the hard
> > > num_nodes == 2 check from there because number of nodes restricted
> > > things too much.  If you have an empty fall through block, which seems to
> > > happen often between the main exit and the latch block then we'd not
> > > vectorize.
> > >
> > > So instead I now rejects loops after analyzing the gcond.  So think this check
> > > can go/needs to be different.
> > 
> > Lets remove it from this function then.
> 
> Ok, I can remove it from the outerloop vect then too, since it's mostly dead code.
> Will do so and reg-test that as well.
> 
> > 
> > > >
> > > > >    /* All loops have an outer scope; the only case loop->outer is NULL is for
> > > > >       the function itself.  */
> > > > >    if (!loop_outer (loop)
> > > > > @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer
> > > > (loop_vec_info loop_vinfo,
> > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > >    basic_block update_bb = update_e->dest;
> > > > >
> > > > > +  /* For early exits we'll update the IVs in
> > > > > +     vect_update_ivs_after_early_break.  */
> > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +    return;
> > > > > +
> > > > >    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > >
> > > > >    /* Make sure there exists a single-predecessor exit bb:  */
> > > > > @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer
> > > > (loop_vec_info loop_vinfo,
> > > > >        /* Fix phi expressions in the successor bb.  */
> > > > >        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > > >      }
> > > > > +  return;
> > > >
> > > > we don't usually place a return at the end of void functions
> > > >
> > > > > +}
> > > > > +
> > > > > +/*   Function vect_update_ivs_after_early_break.
> > > > > +
> > > > > +     "Advance" the induction variables of LOOP to the value they should
> > take
> > > > > +     after the execution of LOOP.  This is currently necessary because the
> > > > > +     vectorizer does not handle induction variables that are used after the
> > > > > +     loop.  Such a situation occurs when the last iterations of LOOP are
> > > > > +     peeled, because of the early exit.  With an early exit we always peel
> > the
> > > > > +     loop.
> > > > > +
> > > > > +     Input:
> > > > > +     - LOOP_VINFO - a loop info structure for the loop that is going to be
> > > > > +		    vectorized. The last few iterations of LOOP were peeled.
> > > > > +     - LOOP - a loop that is going to be vectorized. The last few iterations
> > > > > +	      of LOOP were peeled.
> > > > > +     - VF - The loop vectorization factor.
> > > > > +     - NITERS_ORIG - the number of iterations that LOOP executes (before
> > it is
> > > > > +		     vectorized). i.e, the number of times the ivs should be
> > > > > +		     bumped.
> > > > > +     - NITERS_VECTOR - The number of iterations that the vector LOOP
> > > > executes.
> > > > > +     - UPDATE_E - a successor edge of LOOP->exit that is on the (only)
> > path
> > > > > +		  coming out from LOOP on which there are uses of the LOOP
> > > > ivs
> > > > > +		  (this is the path from LOOP->exit to epilog_loop->preheader).
> > > > > +
> > > > > +		  The new definitions of the ivs are placed in LOOP->exit.
> > > > > +		  The phi args associated with the edge UPDATE_E in the bb
> > > > > +		  UPDATE_E->dest are updated accordingly.
> > > > > +
> > > > > +     Output:
> > > > > +       - If available, the LCSSA phi node for the loop IV temp.
> > > > > +
> > > > > +     Assumption 1: Like the rest of the vectorizer, this function assumes
> > > > > +     a single loop exit that has a single predecessor.
> > > > > +
> > > > > +     Assumption 2: The phi nodes in the LOOP header and in update_bb
> > are
> > > > > +     organized in the same order.
> > > > > +
> > > > > +     Assumption 3: The access function of the ivs is simple enough (see
> > > > > +     vect_can_advance_ivs_p).  This assumption will be relaxed in the
> > future.
> > > > > +
> > > > > +     Assumption 4: Exactly one of the successors of LOOP exit-bb is on a
> > path
> > > > > +     coming out of LOOP on which the ivs of LOOP are used (this is the
> > path
> > > > > +     that leads to the epilog loop; other paths skip the epilog loop).  This
> > > > > +     path starts with the edge UPDATE_E, and its destination (denoted
> > > > update_bb)
> > > > > +     needs to have its phis updated.
> > > > > + */
> > > > > +
> > > > > +static tree
> > > > > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class
> > loop *
> > > > epilog,
> > > > > +				   poly_int64 vf, tree niters_orig,
> > > > > +				   tree niters_vector, edge update_e)
> > > > > +{
> > > > > +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +    return NULL;
> > > > > +
> > > > > +  gphi_iterator gsi, gsi1;
> > > > > +  tree ni_name, ivtmp = NULL;
> > > > > +  basic_block update_bb = update_e->dest;
> > > > > +  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > > +  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > +  basic_block exit_bb = loop_iv->dest;
> > > > > +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > +  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
> > > > > +
> > > > > +  gcc_assert (cond);
> > > > > +
> > > > > +  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> > (update_bb);
> > > > > +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > > +       gsi_next (&gsi), gsi_next (&gsi1))
> > > > > +    {
> > > > > +      tree init_expr, final_expr, step_expr;
> > > > > +      tree type;
> > > > > +      tree var, ni, off;
> > > > > +      gimple_stmt_iterator last_gsi;
> > > > > +
> > > > > +      gphi *phi = gsi1.phi ();
> > > > > +      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi,
> > loop_preheader_edge
> > > > (epilog));
> > > >
> > > > I'm confused about the setup.  update_bb looks like the block with the
> > > > loop-closed PHI nodes of 'loop' and the exit (update_e)?  How does
> > > > loop_preheader_edge (epilog) come into play here?  That would feed into
> > > > epilog->header PHIs?!
> > >
> > > We can't query the type of the phis in the block with the LC PHI nodes, so the
> > > Typical pattern seems to be that we iterate over a block that's part of the
> > loop
> > > and that would have the PHIs in the same order, just so we can get to the
> > > stmt_vec_info.
> > >
> > > >
> > > > It would be nice to name 'gsi[1]', 'update_e' and 'update_bb' in a
> > > > better way?  Is update_bb really epilog->header?!
> > > >
> > > > We're missing checking in PHI_ARG_DEF_FROM_EDGE, namely that
> > > > E->dest == gimple_bb (PHI) - we're just using E->dest_idx there
> > > > which "works" even for totally unrelated edges.
> > > >
> > > > > +      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
> > > > > +      if (!phi1)
> > > >
> > > > shouldn't that be an assert?
> > > >
> > > > > +	continue;
> > > > > +      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> > > > > +      if (dump_enabled_p ())
> > > > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +			 "vect_update_ivs_after_early_break: phi: %G",
> > > > > +			 (gimple *)phi);
> > > > > +
> > > > > +      /* Skip reduction and virtual phis.  */
> > > > > +      if (!iv_phi_p (phi_info))
> > > > > +	{
> > > > > +	  if (dump_enabled_p ())
> > > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +			     "reduc or virtual phi. skip.\n");
> > > > > +	  continue;
> > > > > +	}
> > > > > +
> > > > > +      /* For multiple exits where we handle early exits we need to carry on
> > > > > +	 with the previous IV as loop iteration was not done because we exited
> > > > > +	 early.  As such just grab the original IV.  */
> > > > > +      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge
> > > > (loop));
> > > >
> > > > but this should be taken care of by LC SSA?
> > >
> > > It is, the comment is probably missing details, this part just scales the
> > counter
> > > from VF to scalar counts.  It's just a reminder that this scaling is done
> > differently
> > > from normal single exit vectorization.
> > >
> > > >
> > > > OK, have to continue tomorrow from here.
> > >
> > > Cheers, Thank you!
> > >
> > > Tamar
> > >
> > > >
> > > > Richard.
> > > >
> > > > > +      if (gimple_cond_lhs (cond) != phi_ssa
> > > > > +	  && gimple_cond_rhs (cond) != phi_ssa)
> > 
> > so this is a way to avoid touching the main IV?  Looks a bit fragile to
> > me.  Hmm, we're iterating over the main loop header PHIs here?
> > Can't you check, say, the relevancy of the PHI node instead?  Though
> > it might also be used as induction.  Can't it be used as alternate
> > exit like
> > 
> >   for (i)
> >    {
> >      if (i & bit)
> >        break;
> >    }
> > 
> > and would we need to adjust 'i' then?
> 
> Hmm you're right, I could reject them based on the definition BB and the
> location the main IV is, that's probably safest.
> 
> > 
> > > > > +	{
> > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > > +	  step_expr = unshare_expr (step_expr);
> > > > > +
> > > > > +	  /* We previously generated the new merged phi in the same BB as
> > > > the
> > > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > > +	     normal loop phi which don't take the early breaks into account.  */
> > > > > +	  final_expr = gimple_phi_result (phi1);
> > > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (),
> > > > loop_preheader_edge (loop));
> > > > > +
> > > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > > +	  /* For early break the final loop IV is:
> > > > > +	     init + (final - init) * vf which takes into account peeling
> > > > > +	     values and non-single steps.  */
> > > > > +	  off = fold_build2 (MINUS_EXPR, stype,
> > > > > +			     fold_convert (stype, final_expr),
> > > > > +			     fold_convert (stype, init_expr));
> > > > > +	  /* Now adjust for VF to get the final iteration value.  */
> > > > > +	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf));
> > > > > +
> > > > > +	  /* Adjust the value with the offset.  */
> > > > > +	  if (POINTER_TYPE_P (type))
> > > > > +	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > +	  else
> > > > > +	    ni = fold_convert (type,
> > > > > +			       fold_build2 (PLUS_EXPR, stype,
> > > > > +					    fold_convert (stype, init_expr),
> > > > > +					    off));
> > > > > +	  var = create_tmp_var (type, "tmp");
> > 
> > so how does the non-early break code deal with updating inductions?
> 
> non-early break code does this essentially the same, see vect_update_ivs_after_vectorizer
> which was modified to skip early exits.  The major difference is that on a non-early
> exit the value can just be adjusted linearly: i_scalar = i_vect * VF + init and peeling is taken
> into account by vect_peel_nonlinear_iv_init. 
> 
> I wanted to keep them separately as there's a significant enough difference in the
> calculation of the loop bodies themselves that having one function became unwieldy.
> 
> That and we don't support all of the induction steps a non-early exit supports so I inlined
> a simpler calculation.
> 
> Now there's a big difference between the normal loop vectorization and early break.
> During the main loop vectorization the ivtmp is adjusted for peeling.  That is the amount is
> already adjusted in the phi node itself.
> 
> For early break this isn't done because the bounds are fixed, i.e. for every exit we can at
> most do VF iterations.
>
> > And how do you avoid altering this when we flow in from the normal
> > exit?  That is, you are updating the value in the epilog loop
> > header but don't you need to instead update the value only on
> > the alternate exit edges from the main loop (and keep the not
> > updated value on the main exit edge)?
> 
> Because the ivtmps has not been adjusted for peeling, we adjust for it here.  This allows us
> to have the same update required for all exits, so I don't have to differentiate between the
> two here because I didn't do so during creation of the PHI node.

Can we do this in all cases then?  I really don't like to have two
schemes with differences spread over two places ... if the current
scheme doesn't work for early exit the early exit scheme should
do for zero early exits?  What's the downside of using that 
unconditionally?

> > 
> > > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > > +	  gimple_seq new_stmts = NULL;
> > > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > > +	  if (!gsi_end_p (last_gsi))
> > > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > > +	  else
> > > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > > +
> > > > > +	  /* Fix phi expressions in the successor bb.  */
> > > > > +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > > > > +	}
> > > > > +      else
> > > > > +	{
> > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > > +	  step_expr = unshare_expr (step_expr);
> > > > > +
> > > > > +	  /* We previously generated the new merged phi in the same BB as
> > > > the
> > > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > > +	     normal loop phi which don't take the early breaks into account.  */
> > > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge
> > > > (loop));
> > > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > > +
> > > > > +	  if (vf.is_constant ())
> > > > > +	    {
> > > > > +	      ni = fold_build2 (MULT_EXPR, stype,
> > > > > +				fold_convert (stype,
> > > > > +					      niters_vector),
> > > > > +				build_int_cst (stype, vf));
> > > > > +
> > > > > +	      ni = fold_build2 (MINUS_EXPR, stype,
> > > > > +				fold_convert (stype,
> > > > > +					      niters_orig),
> > > > > +				fold_convert (stype, ni));
> > > > > +	    }
> > > > > +	  else
> > > > > +	    /* If the loop's VF isn't constant then the loop must have been
> > > > > +	       masked, so at the end of the loop we know we have finished
> > > > > +	       the entire loop and found nothing.  */
> > > > > +	    ni = build_zero_cst (stype);
> > > > > +
> > > > > +	  ni = fold_convert (type, ni);
> > > > > +	  /* We don't support variable n in this version yet.  */
> > > > > +	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> > > > > +
> > > > > +	  var = create_tmp_var (type, "tmp");
> > > > > +
> > > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > > +	  gimple_seq new_stmts = NULL;
> > > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > > +	  if (!gsi_end_p (last_gsi))
> > > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > > +	  else
> > > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > > +
> > > > > +	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
> > > > > +
> > > > > +	  for (edge exit : alt_exits)
> > > > > +	    adjust_phi_and_debug_stmts (phi1, exit,
> > > > > +					build_int_cst (TREE_TYPE (step_expr),
> > > > > +						       vf));
> > > > > +	  ivtmp = gimple_phi_result (phi1);
> > > > > +	}
> > > > > +    }
> > > > > +
> > > > > +  return ivtmp;
> > > > >  }
> > > > >
> > > > >  /* Return a gimple value containing the misalignment (measured in
> > vector
> > > > > @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf
> > > > (loop_vec_info loop_vinfo,
> > > > >
> > > > >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
> > > > >     this function searches for the corresponding lcssa phi node in exit
> > > > > -   bb of LOOP.  If it is found, return the phi result; otherwise return
> > > > > -   NULL.  */
> > > > > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
> > > > > +   return the phi result; otherwise return NULL.  */
> > > > >
> > > > >  static tree
> > > > >  find_guard_arg (class loop *loop, class loop *epilog
> > ATTRIBUTE_UNUSED,
> > > > > -		gphi *lcssa_phi)
> > > > > +		gphi *lcssa_phi, int lcssa_edge = 0)
> > > > >  {
> > > > >    gphi_iterator gsi;
> > > > >    edge e = loop->vec_loop_iv;
> > > > >
> > > > > -  gcc_assert (single_pred_p (e->dest));
> > > > >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > >      {
> > > > >        gphi *phi = gsi.phi ();
> > > > > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > > > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > > > -	return PHI_RESULT (phi);
> > > > > -    }
> > > > > -  return NULL_TREE;
> > > > > -}
> > > > > -
> > > > > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > > > FIRST/SECOND
> > > > > -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> > > > > -   edge, the two loops are arranged as below:
> > > > > -
> > > > > -       preheader_a:
> > > > > -     first_loop:
> > > > > -       header_a:
> > > > > -	 i_1 = PHI<i_0, i_2>;
> > > > > -	 ...
> > > > > -	 i_2 = i_1 + 1;
> > > > > -	 if (cond_a)
> > > > > -	   goto latch_a;
> > > > > -	 else
> > > > > -	   goto between_bb;
> > > > > -       latch_a:
> > > > > -	 goto header_a;
> > > > > -
> > > > > -       between_bb:
> > > > > -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> > > > > -
> > > > > -     second_loop:
> > > > > -       header_b:
> > > > > -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > > > > -				 or with i_2 if no LCSSA phi is created
> > > > > -				 under condition of
> > > > CREATE_LCSSA_FOR_IV_PHIS.
> > > > > -	 ...
> > > > > -	 i_4 = i_3 + 1;
> > > > > -	 if (cond_b)
> > > > > -	   goto latch_b;
> > > > > -	 else
> > > > > -	   goto exit_bb;
> > > > > -       latch_b:
> > > > > -	 goto header_b;
> > > > > -
> > > > > -       exit_bb:
> > > > > -
> > > > > -   This function creates loop closed SSA for the first loop; update the
> > > > > -   second loop's PHI nodes by replacing argument on incoming edge with
> > the
> > > > > -   result of newly created lcssa PHI nodes.  IF
> > CREATE_LCSSA_FOR_IV_PHIS
> > > > > -   is false, Loop closed ssa phis will only be created for non-iv phis for
> > > > > -   the first loop.
> > > > > -
> > > > > -   This function assumes exit bb of the first loop is preheader bb of the
> > > > > -   second loop, i.e, between_bb in the example code.  With PHIs updated,
> > > > > -   the second loop will execute rest iterations of the first.  */
> > > > > -
> > > > > -static void
> > > > > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > > > -				   class loop *first, class loop *second,
> > > > > -				   bool create_lcssa_for_iv_phis)
> > > > > -{
> > > > > -  gphi_iterator gsi_update, gsi_orig;
> > > > > -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > -
> > > > > -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > > > > -  edge second_preheader_e = loop_preheader_edge (second);
> > > > > -  basic_block between_bb = single_exit (first)->dest;
> > > > > -
> > > > > -  gcc_assert (between_bb == second_preheader_e->src);
> > > > > -  gcc_assert (single_pred_p (between_bb) && single_succ_p
> > (between_bb));
> > > > > -  /* Either the first loop or the second is the loop to be vectorized.  */
> > > > > -  gcc_assert (loop == first || loop == second);
> > > > > -
> > > > > -  for (gsi_orig = gsi_start_phis (first->header),
> > > > > -       gsi_update = gsi_start_phis (second->header);
> > > > > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > > > > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > > > > -    {
> > > > > -      gphi *orig_phi = gsi_orig.phi ();
> > > > > -      gphi *update_phi = gsi_update.phi ();
> > > > > -
> > > > > -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> > > > > -      /* Generate lcssa PHI node for the first loop.  */
> > > > > -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> > > > > -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
> > > > > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > > > > +      /* Nested loops with multiple exits can have different no# phi node
> > > > > +	 arguments between the main loop and epilog as epilog falls to the
> > > > > +	 second loop.  */
> > > > > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> > > > >  	{
> > > > > -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > > > > -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > > > > -	  add_phi_arg (lcssa_phi, arg, single_exit (first),
> > > > UNKNOWN_LOCATION);
> > > > > -	  arg = new_res;
> > > > > -	}
> > > > > -
> > > > > -      /* Update PHI node in the second loop by replacing arg on the loop's
> > > > > -	 incoming edge.  */
> > > > > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e,
> > arg);
> > > > > -    }
> > > > > -
> > > > > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > > > > -     for correct vectorization of live stmts.  */
> > > > > -  if (loop == first)
> > > > > -    {
> > > > > -      basic_block orig_exit = single_exit (second)->dest;
> > > > > -      for (gsi_orig = gsi_start_phis (orig_exit);
> > > > > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > > > -	{
> > > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > > -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> > > > > -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> > > > (orig_arg))
> > > > > -	    continue;
> > > > > -
> > > > > -	  /* Already created in the above loop.   */
> > > > > -	  if (find_guard_arg (first, second, orig_phi))
> > > > > +	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
> > > > > +	  if (TREE_CODE (var) != SSA_NAME)
> > > > >  	    continue;
> > > > >
> > > > > -	  tree new_res = copy_ssa_name (orig_arg);
> > > > > -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > > > > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> > > > UNKNOWN_LOCATION);
> > > > > +	  if (operand_equal_p (get_current_def (var),
> > > > > +			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > > > > +	    return PHI_RESULT (phi);
> > > > >  	}
> > > > >      }
> > > > > +  return NULL_TREE;
> > > > >  }
> > > > >
> > > > >  /* Function slpeel_add_loop_guard adds guard skipping from the
> > beginning
> > > > > @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2
> > (class
> > > > loop *loop, class loop *epilog,
> > > > >    gcc_assert (single_succ_p (merge_bb));
> > > > >    edge e = single_succ_edge (merge_bb);
> > > > >    basic_block exit_bb = e->dest;
> > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > > > >
> > > > >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > >      {
> > > > >        gphi *update_phi = gsi.phi ();
> > > > > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > > > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > > > >
> > > > >        tree merge_arg = NULL_TREE;
> > > > >
> > > > > @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class
> > loop
> > > > *loop, class loop *epilog,
> > > > >        if (!merge_arg)
> > > > >  	merge_arg = old_arg;
> > > > >
> > > > > -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> > > > > +      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e-
> > >dest_idx);
> > > > >        /* If the var is live after loop but not a reduction, we simply
> > > > >  	 use the old arg.  */
> > > > >        if (!guard_arg)
> > > > > @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class
> > > > loop *loop, class loop *epilog,
> > > > >      }
> > > > >  }
> > > > >
> > > > > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > > > > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > > > > -
> > > > > -static void
> > > > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > > > > -{
> > > > > -  gphi_iterator gsi;
> > > > > -  basic_block exit_bb = single_exit (epilog)->dest;
> > > > > -
> > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > -  edge e = EDGE_PRED (exit_bb, 0);
> > > > > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > > > -}
> > > > > -
> > 
> > I wonder if we can still split these changes out to before early break
> > vect?
> 
> Maybe, I hadn't done so before because I was redirecting the edges during peeling.
> If we no longer do that that may be easier.  Let me try to.
>
> > 
> > > > >  /* EPILOGUE_VINFO is an epilogue loop that we now know would need
> > to
> > > > >     iterate exactly CONST_NITERS times.  Make a final decision about
> > > > >     whether the epilogue loop should be used, returning true if so.  */
> > > > > @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree
> > > > niters, tree nitersm1,
> > > > >      bound_epilog += vf - 1;
> > > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > >      bound_epilog += 1;
> > > > > +  /* For early breaks the scalar loop needs to execute at most VF times
> > > > > +     to find the element that caused the break.  */
> > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +    {
> > > > > +      bound_epilog = vf;
> > > > > +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> > > > > +      vect_epilogues = false;
> > > > > +    }
> > > > >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > > > >    poly_uint64 bound_scalar = bound_epilog;
> > > > >
> > > > > @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo,
> > > > tree niters, tree nitersm1,
> > > > >  				  bound_prolog + bound_epilog)
> > > > >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > > > >  			 || vect_epilogues));
> > > > > +
> > > > > +  /* We only support early break vectorization on known bounds at this
> > > > time.
> > > > > +     This means that if the vector loop can't be entered then we won't
> > > > generate
> > > > > +     it at all.  So for now force skip_vector off because the additional
> > control
> > > > > +     flow messes with the BB exits and we've already analyzed them.  */
> > > > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS
> > (loop_vinfo);
> > > > > +
> > 
> > I think it should be as "easy" as entering the epilog via the block taking
> > the regular exit?
> > 
> > > > >    /* Epilog loop must be executed if the number of iterations for epilog
> > > > >       loop is known at compile time, otherwise we need to add a check at
> > > > >       the end of vector loop and skip to the end of epilog loop.  */
> > > > >    bool skip_epilog = (prolog_peeling < 0
> > > > >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > > >  		      || !vf.is_constant ());
> > > > > -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.
> > */
> > > > > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > > +  /* PEELING_FOR_GAPS and peeling for early breaks are special because
> > > > epilog
> > > > > +     loop must be executed.  */
> > > > > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > >      skip_epilog = false;
> > > > > -
> > > > >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > > >    auto_vec<profile_count> original_counts;
> > > > >    basic_block *original_bbs = NULL;
> > > > > @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo,
> > > > tree niters, tree nitersm1,
> > > > >    if (prolog_peeling)
> > > > >      {
> > > > >        e = loop_preheader_edge (loop);
> > > > > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > > > > -
> > > > > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
> > > > >        /* Peel prolog and put it on preheader edge of loop.  */
> > > > > -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
> > e);
> > > > > +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
> > e,
> > > > > +						       true);
> > > > >        gcc_assert (prolog);
> > > > >        prolog->force_vectorize = false;
> > > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> > > > > +
> > > > >        first_loop = prolog;
> > > > >        reset_original_copy_tables ();
> > > > >
> > > > > @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo,
> > > > tree niters, tree nitersm1,
> > > > >  	 as the transformations mentioned above make less or no sense when
> > > > not
> > > > >  	 vectorizing.  */
> > > > >        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> > > > > -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > > > > +      auto_vec<basic_block> doms;
> > > > > +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e,
> > true,
> > > > > +						       &doms);
> > > > >        gcc_assert (epilog);
> > > > >
> > > > >        epilog->force_vectorize = false;
> > > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
> > > > >
> > > > >        /* Scalar version loop may be preferred.  In this case, add guard
> > > > >  	 and skip to epilog.  Note this only happens when the number of
> > > > > @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree
> > > > niters, tree nitersm1,
> > > > >        vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > > > >  					update_e);
> > > > >
> > > > > +      /* For early breaks we must create a guard to check how many
> > iterations
> > > > > +	 of the scalar loop are yet to be performed.  */
> > 
> > We have this check anyway, no?  In fact don't we know that we always enter
> > the epilog (see above)?
> 
> Not always, masked loops for instance never enter the epilogue if the main loop
> finished completely since there is no "remainder".
> 
> > 
> > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +	{
> > > > > +	  tree ivtmp =
> > > > > +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters,
> > > > > +					       *niters_vector, update_e);
> > > > > +
> > > > > +	  gcc_assert (ivtmp);
> > > > > +	  tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > > +					 fold_convert (TREE_TYPE (niters),
> > > > > +						       ivtmp),
> > > > > +					 build_zero_cst (TREE_TYPE (niters)));
> > > > > +	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > > +
> > > > > +	  /* If we had a fallthrough edge, the guard will the threaded through
> > > > > +	     and so we may need to find the actual final edge.  */
> > > > > +	  edge final_edge = epilog->vec_loop_iv;
> > > > > +	  /* slpeel_update_phi_nodes_for_guard2 expects an empty block in
> > > > > +	     between the guard and the exit edge.  It only adds new nodes and
> > > > > +	     doesn't update existing one in the current scheme.  */
> > > > > +	  basic_block guard_to = split_edge (final_edge);
> > > > > +	  edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
> > > > guard_to,
> > > > > +						guard_bb, prob_epilog.invert
> > > > (),
> > > > > +						irred_flag);
> > > > > +	  doms.safe_push (guard_bb);
> > > > > +
> > > > > +	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > > +
> > > > > +	  /* We must update all the edges from the new guard_bb.  */
> > > > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > > > +					      final_edge);
> > > > > +
> > > > > +	  /* If the loop was versioned we'll have an intermediate BB between
> > > > > +	     the guard and the exit.  This intermediate block is required
> > > > > +	     because in the current scheme of things the guard block phi
> > > > > +	     updating can only maintain LCSSA by creating new blocks.  In this
> > > > > +	     case we just need to update the uses in this block as well.  */
> > > > > +	  if (loop != scalar_loop)
> > > > > +	    {
> > > > > +	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
> > > > > +		   !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > +		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (),
> > > > guard_e));
> > > > > +	    }
> > > > > +
> > > > > +	  flush_pending_stmts (guard_e);
> > > > > +	}
> > > > > +
> > > > >        if (skip_epilog)
> > > > >  	{
> > > > >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > > @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree
> > > > niters, tree nitersm1,
> > > > >  	    }
> > > > >  	  scale_loop_profile (epilog, prob_epilog, 0);
> > > > >  	}
> > > > > -      else
> > > > > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> > > > >
> > > > >        unsigned HOST_WIDE_INT bound;
> > > > >        if (bound_scalar.is_constant (&bound))
> > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > index
> > > >
> > b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49
> > > > f21421958e7ee25eada 100644
> > > > > --- a/gcc/tree-vect-loop.cc
> > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop
> > > > *loop_in, vec_info_shared *shared)
> > > > >      partial_load_store_bias (0),
> > > > >      peeling_for_gaps (false),
> > > > >      peeling_for_niter (false),
> > > > > +    early_breaks (false),
> > > > > +    non_break_control_flow (false),
> > > > >      no_data_dependencies (false),
> > > > >      has_mask_store (false),
> > > > >      scalar_loop_scaling (profile_probability::uninitialized ()),
> > > > > @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p
> > > > (loop_vec_info loop_vinfo)
> > > > >      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> > > > (LOOP_VINFO_ORIG_LOOP_INFO
> > > > >  					  (loop_vinfo));
> > > > >
> > > > > +  /* When we have multiple exits and VF is unknown, we must require
> > > > partial
> > > > > +     vectors because the loop bounds is not a minimum but a maximum.
> > > > That is to
> > > > > +     say we cannot unpredicate the main loop unless we peel or use partial
> > > > > +     vectors in the epilogue.  */
> > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > > > > +    return true;
> > > > > +
> > > > >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > > >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> > > > >      {
> > > > > @@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost
> > > > (loop_vec_info loop_vinfo)
> > > > >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> > > > >  }
> > > > >
> > > > > -
> > > > >  /* Function vect_analyze_loop_form.
> > > > >
> > > > >     Verify that certain CFG restrictions hold, including:
> > > > >     - the loop has a pre-header
> > > > > -   - the loop has a single entry and exit
> > > > > +   - the loop has a single entry
> > > > > +   - nested loops can have only a single exit.
> > > > >     - the loop exit condition is simple enough
> > > > >     - the number of iterations can be analyzed, i.e, a countable loop.  The
> > > > >       niter could be analyzed under some assumptions.  */
> > > > > @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop,
> > > > vect_loop_form_info *info)
> > > > >                             |
> > > > >                          (exit-bb)  */
> > > > >
> > > > > -      if (loop->num_nodes != 2)
> > > > > -	return opt_result::failure_at (vect_location,
> > > > > -				       "not vectorized:"
> > > > > -				       " control flow in loop.\n");
> > > > > -
> > > > >        if (empty_block_p (loop->header))
> > > > >  	return opt_result::failure_at (vect_location,
> > > > >  				       "not vectorized: empty loop.\n");
> > > > > @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop,
> > > > vect_loop_form_info *info)
> > > > >          dump_printf_loc (MSG_NOTE, vect_location,
> > > > >  			 "Considering outer-loop vectorization.\n");
> > > > >        info->inner_loop_cond = inner.loop_cond;
> > > > > +
> > > > > +      if (!single_exit (loop))
> > > > > +	return opt_result::failure_at (vect_location,
> > > > > +				       "not vectorized: multiple exits.\n");
> > > > > +
> > > > >      }
> > > > >
> > > > > -  if (!single_exit (loop))
> > > > > -    return opt_result::failure_at (vect_location,
> > > > > -				   "not vectorized: multiple exits.\n");
> > > > >    if (EDGE_COUNT (loop->header->preds) != 2)
> > > > >      return opt_result::failure_at (vect_location,
> > > > >  				   "not vectorized:"
> > > > > @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop,
> > > > vect_loop_form_info *info)
> > > > >  				   "not vectorized: latch block not empty.\n");
> > > > >
> > > > >    /* Make sure the exit is not abnormal.  */
> > > > > -  edge e = single_exit (loop);
> > > > > -  if (e->flags & EDGE_ABNORMAL)
> > > > > -    return opt_result::failure_at (vect_location,
> > > > > -				   "not vectorized:"
> > > > > -				   " abnormal loop exit edge.\n");
> > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > +  edge nexit = loop->vec_loop_iv;
> > > > > +  for (edge e : exits)
> > > > > +    {
> > > > > +      if (e->flags & EDGE_ABNORMAL)
> > > > > +	return opt_result::failure_at (vect_location,
> > > > > +				       "not vectorized:"
> > > > > +				       " abnormal loop exit edge.\n");
> > > > > +      /* Early break BB must be after the main exit BB.  In theory we should
> > > > > +	 be able to vectorize the inverse order, but the current flow in the
> > > > > +	 the vectorizer always assumes you update successor PHI nodes, not
> > > > > +	 preds.  */
> > > > > +      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e-
> > > > >src))
> > > > > +	return opt_result::failure_at (vect_location,
> > > > > +				       "not vectorized:"
> > > > > +				       " abnormal loop exit edge order.\n");
> > 
> > "unsupported loop exit order", but I don't understand the comment.
> > 
> 
> One failure I found during bootstrap is that there was a CFG where the BBs were all
> reversed.  Should have it in the testsuite, I'll dig it out and come back to you.
> 
> > > > > +    }
> > > > > +
> > > > > +  /* We currently only support early exit loops with known bounds.   */
> > 
> > Btw, why's that?  Is that because we don't support the loop-around edge?
> > IMHO this is the most serious limitation (and as said above it should be
> > trivial to fix).
> 
> Nah, it's just time ? I wanted to start getting feedback before relaxing it.
> My patch 0/19 has an implementation plan for the remaining work.
> 
> I plan to relax this in this release, most likely in this series itself.
> 
> > 
> > > > > +  if (exits.length () > 1)
> > > > > +    {
> > > > > +      class tree_niter_desc niter;
> > > > > +      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter,
> > NULL)
> > > > > +	  || chrec_contains_undetermined (niter.niter)
> > > > > +	  || !evolution_function_is_constant_p (niter.niter))
> > > > > +	return opt_result::failure_at (vect_location,
> > > > > +				       "not vectorized:"
> > > > > +				       " early breaks only supported on loops"
> > > > > +				       " with known iteration bounds.\n");
> > > > > +    }
> > > > >
> > > > >    info->conds
> > > > >      = vect_get_loop_niters (loop, &info->assumptions,
> > > > > @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop,
> > > > vec_info_shared *shared,
> > > > >    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info-
> > > > >alt_loop_conds);
> > > > >    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
> > > > >
> > > > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > > > +
> > > > >    if (info->inner_loop_cond)
> > > > >      {
> > > > >        stmt_vec_info inner_loop_cond_info
> > > > > @@ -3070,7 +3106,8 @@ start_over:
> > > > >
> > > > >    /* If an epilogue loop is required make sure we can create one.  */
> > > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > > > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > >      {
> > > > >        if (dump_enabled_p ())
> > > > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop
> > required\n");
> > > > > @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction
> > (loop_vec_info
> > > > loop_vinfo,
> > > > >    basic_block exit_bb;
> > > > >    tree scalar_dest;
> > > > >    tree scalar_type;
> > > > > -  gimple *new_phi = NULL, *phi;
> > > > > +  gimple *new_phi = NULL, *phi = NULL;
> > > > >    gimple_stmt_iterator exit_gsi;
> > > > >    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> > > > >    gimple *epilog_stmt = NULL;
> > > > > @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction
> > > > (loop_vec_info loop_vinfo,
> > > > >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> > > > >  	  reduc_inputs.quick_push (new_def);
> > > > >  	}
> > > > > +
> > > > > +	/* Update the other exits.  */
> > > > > +	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +	  {
> > > > > +	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > > +	    gphi_iterator gsi, gsi1;
> > > > > +	    for (edge exit : alt_exits)
> > > > > +	      {
> > > > > +		/* Find the phi node to propaget into the exit block for each
> > > > > +		   exit edge.  */
> > > > > +		for (gsi = gsi_start_phis (exit_bb),
> > > > > +		     gsi1 = gsi_start_phis (exit->src);
> > 
> > exit->src == loop->header, right?  I think this won't work for multiple
> > alternate exits.  It's probably easier to do this where we create the
> > LC PHI node for the reduction result?
> 
> No exit->src == definition block of the gcond. 
>
> > 
> > > > > +		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > > +		     gsi_next (&gsi), gsi_next (&gsi1))
> > > > > +		  {
> > > > > +		    /* There really should be a function to just get the number
> > > > > +		       of phis inside a bb.  */
> > > > > +		    if (phi && phi == gsi.phi ())
> > > > > +		      {
> > > > > +			gphi *phi1 = gsi1.phi ();
> > > > > +			SET_PHI_ARG_DEF (phi, exit->dest_idx,
> > > > > +					 PHI_RESULT (phi1));

But the definition block of the gcond doesn't have a PHI?!

for (;;)
{
  if (alt-exit1) <- exit->src is the loop header in this case
    ..
  if (alt-exit2) <- exit->src doesn't have any PHIs
    ..
  if (IV)
    ..
}

so I think the above "works" for alt-exit1 but not alt-exit2?  Should't
you look at loop->header for phi1?

> > 
> > I think we know the header PHI of a reduction perfectly well, there
> > shouldn't be the need to "search" for it.
> > 
> > > > > +			break;
> > > > > +		      }
> > > > > +		  }
> > > > > +	      }
> > > > > +	  }
> > > > >        gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > > > >      }
> > > > >
> > > > > @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info
> > *vinfo,
> > > > >  	   new_tree = lane_extract <vec_lhs', ...>;
> > > > >  	   lhs' = new_tree;  */
> > > > >
> > > > > +      /* When vectorizing an early break, any live statement that is used
> > > > > +	 outside of the loop are dead.  The loop will never get to them.
> > > > > +	 We could change the liveness value during analysis instead but since
> > > > > +	 the below code is invalid anyway just ignore it during codegen.  */
> > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +	return true;
> > 
> > But what about the value that's live across the main exit when the
> > epilogue is not entered?
> 
> My understanding is that vectorizable_live_operation only vectorizes statements within
> a loop that can be live outside, In this case e.g. statements inside the body of the IF.
> 
> What you're describing above is done by other vectorizable_reduction etc.
> 
> That said, I can make this much safer by just restricting it to statements inside the same BB
> as an alt exit BB.

So is this a restriction you pose on LOOP_VINFO_EARLY_BREAKS during
early analysis?  Because as you return true above you claim you can
vectorize all live operations just fine.  A live operation is for example

 for (i)
  {
    tem = a[i];
    b[i] = tem;
  }
.. use tem here ..

this isn't a reduction or induction.  You don't need 'tem' on the
path to the epilog loop (if you enter it), since the epilog will
compute the correct 'tem', but on the path through the regular
exit when that doesn't enter the epilog because no iterations are
left you _do_ need the value of 'tem' and thus have to code generate
it.

> > 
> > > > > +
> > > > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > >        basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > >        gcc_assert (single_pred_p (exit_bb));
> > > > > @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info
> > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > > > >       versioning.   */
> > > > >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > -  if (! single_pred_p (e->dest))
> > > > > +  if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> > > > (loop_vinfo))
> > 
> > e can be NULL here?  I think we should reject such loops earlier.
> > 
> 
> Ah no, that's left-over from when this used single_exit.  It should be removed
> In this patch.  Had missed it, sorry.
> 
> > > > >      {
> > > > >        split_loop_exit_edge (e, true);
> > > > >        if (dump_enabled_p ())
> > > > > @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info
> > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > > > >      {
> > > > >        e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > > > -      if (! single_pred_p (e->dest))
> > > > > +      if (e && ! single_pred_p (e->dest))
> > > > >  	{
> > > > >  	  split_loop_exit_edge (e, true);
> > > > >  	  if (dump_enabled_p ())
> > > > > @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info
> > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > >
> > > > >    /* Loops vectorized with a variable factor won't benefit from
> > > > >       unrolling/peeling.  */
> > 
> > update the comment?  Why would we unroll a VLA loop with early breaks?
> > Or did you mean to use || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)?
> > 
> 
> Ah indeed, should be ||.
> 
> > > > > -  if (!vf.is_constant ())
> > > > > +  if (!vf.is_constant ()
> > > > > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > >      {
> > > > >        loop->unroll = 1;
> > > > >        if (dump_enabled_p ())
> > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > index
> > > >
> > 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82af
> > > > e78ee2a7c627be45b 100644
> > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info
> > stmt_info,
> > > > loop_vec_info loop_vinfo,
> > > > >    *live_p = false;
> > > > >
> > > > >    /* cond stmt other than loop exit cond.  */
> > > > > -  if (is_ctrl_stmt (stmt_info->stmt)
> > > > > -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> > > > > -    *relevant = vect_used_in_scope;
> > 
> > how was that ever hit before?  For outer loop processing with outer loop
> > vectorization?
> >
> 
> I believe so, because the outer-loop would see the exit cond of the inner loop as well.
> 
> > > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > > +    {
> > > > > +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge,
> > but
> > > > > +	 it looks like loop_manip doesn't do that..  So we have to do it
> > > > > +	 the hard way.  */
> > > > > +      basic_block bb = gimple_bb (stmt_info->stmt);
> > > > > +      bool exit_bb = false, early_exit = false;
> > > > > +      edge_iterator ei;
> > > > > +      edge e;
> > > > > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > > > > +        if (!flow_bb_inside_loop_p (loop, e->dest))
> > > > > +	  {
> > > > > +	    exit_bb = true;
> > > > > +	    early_exit = loop->vec_loop_iv->src != bb;
> > > > > +	    break;
> > > > > +	  }
> > > > > +
> > > > > +      /* We should have processed any exit edge, so an edge not an early
> > > > > +	 break must be a loop IV edge.  We need to distinguish between the
> > > > > +	 two as we don't want to generate code for the main loop IV.  */
> > > > > +      if (exit_bb)
> > > > > +	{
> > > > > +	  if (early_exit)
> > > > > +	    *relevant = vect_used_in_scope;
> > > > > +	}
> > 
> > I wonder why you can't simply do
> > 
> >          if (is_ctrl_stmt (stmt_info->stmt)
> >              && stmt_info->stmt != LOOP_VINFO_COND (loop_info))
> > 
> > ?
> > 
> > > > > +      else if (bb->loop_father == loop)
> > > > > +	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;
> > 
> > so for control flow not exiting the loop you can check
> > loop_exits_from_bb_p ().
> > 
> > > > > +    }
> > > > >
> > > > >    /* changing memory.  */
> > > > >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> > > > > @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info
> > stmt_info,
> > > > loop_vec_info loop_vinfo,
> > > > >  	*relevant = vect_used_in_scope;
> > > > >        }
> > > > >
> > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > +  auto_bitmap exit_bbs;
> > > > > +  for (edge exit : exits)
> > > > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > > +
> > > > >    /* uses outside the loop.  */
> > > > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > > SSA_OP_DEF)
> > > > >      {
> > > > > @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > > > loop_vec_info loop_vinfo,
> > > > >  	      /* We expect all such uses to be in the loop exit phis
> > > > >  		 (because of loop closed form)   */
> > > > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > > > > -	      gcc_assert (bb == single_exit (loop)->dest);
> > > > > +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
> > 
> > That now becomes quite expensive checking already covered by the LC SSA
> > verifier so I suggest to simply drop this assert instead.
> > 
> > > > >                *live_p = true;
> > > > >  	    }
> > > > > @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized
> > > > (loop_vec_info loop_vinfo, bool *fatal)
> > > > >  	}
> > > > >      }
> > > > >
> > > > > +  /* Ideally this should be in vect_analyze_loop_form but we haven't
> > seen all
> > > > > +     the conds yet at that point and there's no quick way to retrieve them.
> > */
> > > > > +  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
> > > > > +    return opt_result::failure_at (vect_location,
> > > > > +				   "not vectorized:"
> > > > > +				   " unsupported control flow in loop.\n");
> > 
> > so we didn't do this before?  But see above where I wondered.  So when
> > does this hit with early exits and why can't we check for this in
> > vect_verify_loop_form?
> > 
> 
> We did, but it was done in vect_analyze_loop_form but it was done purely based
> on number of BB in the loop.  This required loops to be highly normalized which
> isn't the case with multiple exits. That is I've seen various loops with different
> numbers of random empty fall through BB in the body or after the main exit before
> the latch.
> 
> We can do it in vect_analyze_loop_form but that requires us to walk all the
> statements in all the basic blocks, because the loops track exit edges and a general
> control flow edge is not easy to find as far as I know.  I added it at this point
> because by here because by this point we would have walked all the statements.

You only need to look at *gsi_last_bb (bb) for each block, only the last
stmt can be a control stmt.  I think rejecting this earlier is better.

> > > > > +
> > > > >    /* 2. Process_worklist */
> > > > >    while (worklist.length () > 0)
> > > > >      {
> > > > > @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized
> > > > (loop_vec_info loop_vinfo, bool *fatal)
> > > > >  			return res;
> > > > >  		    }
> > > > >                   }
> > > > > +	    }
> > > > > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> > > > > +	    {
> > > > > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > > > > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > > > > +	      opt_result res
> > > > > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > > > +			       loop_vinfo, relevant, &worklist, false);
> > > > > +	      if (!res)
> > > > > +		return res;
> > > > > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > > > +				loop_vinfo, relevant, &worklist, false);
> > > > > +	      if (!res)
> > > > > +		return res;
> > > > >              }
> > > > >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> > > > >  	    {
> > > > > @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > >  			     node_instance, cost_vec);
> > > > >        if (!res)
> > > > >  	return res;
> > > > > -   }
> > > > > +    }
> > > > > +
> > > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > > > >
> > > > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > > >      {
> > > > >        case vect_internal_def:
> > > > > +      case vect_early_exit_def:
> > > > >          break;
> > > > >
> > > > >        case vect_reduction_def:
> > > > > @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > >      {
> > > > >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > > > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > > > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > > > >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> > > > >        *need_to_vectorize = true;
> > > > >      }
> > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > > index
> > > >
> > ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b3
> > > > 40baff61d18da8e242dd 100644
> > > > > --- a/gcc/tree-vectorizer.h
> > > > > +++ b/gcc/tree-vectorizer.h
> > > > > @@ -63,6 +63,7 @@ enum vect_def_type {
> > > > >    vect_internal_def,
> > > > >    vect_induction_def,
> > > > >    vect_reduction_def,
> > > > > +  vect_early_exit_def,
> > 
> > can you avoid putting this inbetween reduction and double reduction
> > please?  Just put it before vect_unknown_def_type.  In fact the COND
> > isn't a def ... maybe we should have pattern recogized
> > 
> >  if (a < b) exit;
> > 
> > as
> > 
> >  cond = a < b;
> >  if (cond != 0) exit;
> > 
> > so the part that we need to vectorize is more clear.
> 
> Hmm fair enough, I still find it useful to be able to distinguish
> between this and general control flow though.  In fact depending
> on when we finish reviewing/upstreaming this It should be easy to
> support general control flow in such loops.
> 
> > 
> > > > >    vect_double_reduction_def,
> > > > >    vect_nested_cycle,
> > > > >    vect_first_order_recurrence,
> > > > > @@ -876,6 +877,13 @@ public:
> > > > >       we need to peel off iterations at the end to form an epilogue loop.  */
> > > > >    bool peeling_for_niter;
> > > > >
> > > > > +  /* When the loop has early breaks that we can vectorize we need to
> > peel
> > > > > +     the loop for the break finding loop.  */
> > > > > +  bool early_breaks;
> > > > > +
> > > > > +  /* When the loop has a non-early break control flow inside.  */
> > > > > +  bool non_break_control_flow;
> > > > > +
> > > > >    /* List of loop additional IV conditionals found in the loop.  */
> > > > >    auto_vec<gcond *> conds;
> > > > >
> > > > > @@ -985,9 +993,11 @@ public:
> > > > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> > > > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> > > > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > > > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > > > >  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> > > > >early_break_conflict
> > > > >  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)-
> > >early_break_dest_bb
> > > > >  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> > > > > +#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)-
> > > > >non_break_control_flow
> > > > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > > > >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > > > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > > > >no_data_dependencies
> > > > > @@ -1038,8 +1048,8 @@ public:
> > > > >     stack.  */
> > > > >  typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
> > > > >
> > > > > -inline loop_vec_info
> > > > > -loop_vec_info_for_loop (class loop *loop)
> > > > > +static inline loop_vec_info
> > > > > +loop_vec_info_for_loop (const class loop *loop)
> > > > >  {
> > > > >    return (loop_vec_info) loop->aux;
> > > > >  }
> > > > > @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
> > > > >  {
> > > > >    if (bb == (bb->loop_father)->header)
> > > > >      return true;
> > > > > -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> > > > > +
> > > > >    return false;
> > > > >  }
> > > > >
> > > > > @@ -2176,9 +2186,10 @@ class auto_purge_vect_location
> > > > >     in tree-vect-loop-manip.cc.  */
> > > > >  extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > > >  				     tree, tree, tree, bool);
> > > > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> > const_edge);
> > > > > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> > > > const_edge);
> > > > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > > > > -						     class loop *, edge);
> > > > > +						    class loop *, edge, bool,
> > > > > +						    vec<basic_block> * = NULL);
> > > > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);
> > > > >  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > > > >  				    tree *, tree *, tree *, int, bool, bool,
> > > > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > > > index
> > > >
> > a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c
> > > > 67f29f5dc9a8d72235f4 100644
> > > > > --- a/gcc/tree-vectorizer.cc
> > > > > +++ b/gcc/tree-vectorizer.cc
> > > > > @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
> > > > >  	 predicates that need to be shared for optimal predicate usage.
> > > > >  	 However reassoc will re-order them and prevent CSE from working
> > > > >  	 as it should.  CSE only the loop body, not the entry.  */
> > > > > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> > 
> > seeing this more and more I think we want a simple way to iterate over
> > all exits without copying to a vector when we have them recorded.  My
> > C++ fu is too limited to support
> > 
> >   for (auto exit : recorded_exits (loop))
> >     ...
> > 
> > (maybe that's enough for somebody to jump onto this ;))
> > 
> > Don't treat all review comments as change orders, but it should be clear
> > the code isn't 100% obvious.  Maybe the patch can be simplified by
> > splitting out the LC SSA cleanup parts.
> 
> Will give it a try,
> 
> Thanks!
> Tamar
> 
> > 
> > Thanks,
> > Richard.
> > 
> > > > > +      for (edge exit : exits)
> > > > > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > >
> > > > >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> > > > >        do_rpo_vn (fun, entry, exit_bbs);
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > Nuernberg,
> > > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > > > Moerman;
> > > > HRB 36809 (AG Nuernberg)
> > >
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
  2023-07-14 13:34       ` Richard Biener
  2023-07-17 10:56         ` Tamar Christina
@ 2023-08-18 11:35         ` Tamar Christina
  2023-08-18 12:53           ` Richard Biener
  2023-10-23 20:21         ` Tamar Christina
  2 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-08-18 11:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> > Yeah if you comment it out one of the testcases should fail.
> 
> using new_preheader instead of e->dest would make things clearer.
> 
> You are now adding the same arg to every exit (you've just queried the
> main exit redirect_edge_var_map_vector).
> 
> OK, so I think I understand what you're doing.  If I understand
> correctly we know that when we exit the main loop via one of the
> early exits we are definitely going to enter the epilog but when
> we take the main exit we might not.
> 

Correct.. but..

> Looking at the CFG we create currently this isn't reflected and
> this complicates this PHI node updating.  What I'd try to do
> is leave redirecting the alternate exits until after

It is, in the case of the alternate exits this is reflected in copying
the same values, as they are the values of the number of completed 
iterations since the scalar code restarts the last iteration.

So all the PHI nodes of the alternate exits are correct.  The vector
Iteration doesn't handle the partial iteration.

> slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> means leaving it almost unchanged besides the LC SSA maintaining
> changes.  After that for the multi-exit case split the
> epilog preheader edge and redirect all the alternate exits to the
> new preheader.  So the CFG becomes
> 
>                  <original loop>
>                 /      |
>                /    <main exit w/ original LC PHI>
>               /      if (epilog)
>    alt exits /        /  \
>             /        /    loop around
>             |       /
>            preheader with "header" PHIs
>               |
>           <epilog>
> 
> note you need the header PHIs also on the main exit path but you
> only need the loop end PHIs there.
> 
> It seems so that at least currently the order of things makes
> them more complicated than necessary.

I've been trying to, but this representation seems a lot harder to work with,
In particular at the moment once we exit slpeel_tree_duplicate_loop_to_edge_cfg
the loop structure is exactly the same as one expects from any normal epilog vectorization.

But this new representation requires me to place the guard much earlier than the epilogue
preheader,  yet I still have to adjust the PHI nodes in the preheader.  So it seems that this split
is there to only indicate that we always enter the epilog when taking an early exit.

Today this is reflected in the values of the PHI nodes rather than structurally.  Once we place
The guard we update the nodes and the alternate exits get their value for ivtmp updated to VF.

This representation also forces me to do the redirection in every call site of
slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated in all use sites.

But I think this doesn't address the main reason why the slpeel_tree_duplicate_loop_to_edge_cfg
code has a large block of code to deal with PHI node updates.

The reason as you mentioned somewhere else is that after we redirect the edges I have to reconstruct
the phi nodes.  For most it's straight forwards, but for live values or vuse chains it requires extra code.

You're right in that before we redirect the edges they are all correct in the exit block, you mentioned that
the API for the edge redirection is supposed to copy the values over if I create the phi nodes before hand.

However this doesn't seem to work:

     for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
	{
	  gimple *from_phi = gsi_stmt (gsi_from);
	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
	  create_phi_node (new_res, new_preheader);
	}

      for (edge exit : loop_exits)
	redirect_edge_and_branch (exit, new_preheader);

Still leaves them empty.  Grepping around most code seems to pair redirect_edge_and_branch with
copy_phi_arg_into_existing_phi.  The problem is that in all these cases after redirecting an edge they
call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the phi nodes.

This is because as I redirect_edge_and_branch destroys the phi node entries and copy_phi_arg_into_existing_phi
simply just reads the gimple_phi_arg_def which would be NULL.

You could point it to the src block of the exit, in which case it copies the wrong values in for the vuses.  At the end
of vectorization the cfgcleanup code does the same thing to maintain LCSSA if you haven't.  This code always goes
wrong for multiple exits because of the problem described above.  There's no node for it to copy the right value
from.

As an alternate approach I can split the exit edges, copy the phi nodes into the split and after that redirect them.
This however creates the awkwardness of having the exit edges no longer connect to the preheader.

All of this then begs the question if this is all easier than the current approach which is just to read the edge var
map to figure out the nodes that were removed during the redirect.

Maybe I'm still misunderstanding the API, but reading the sources of the functions, they all copy values from *existing*
phi nodes.  And any existing phi node after the redirect are not correct.

gimple_redirect_edge_and_branch has a chunk that indicates it should have updated the PHI nodes before calling
ssa_redirect_edge to remove the old ones, but there's no code there. It's all empty.

Most of the other refactorings/changes were easy enough to do, but this one I seem to be a struggling with.

Thanks,
Tamar
> 
> > >
> > > >  	}
> > > > -      redirect_edge_and_branch_force (e, new_preheader);
> > > > -      flush_pending_stmts (e);
> > > > +
> > > >        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e-
> >src);
> > > > -      if (was_imm_dom || duplicate_outer_loop)
> > > > +
> > > > +      if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p)
> > > >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit-
> > > >src);
> > > >
> > > >        /* And remove the non-necessary forwarder again.  Keep the other
> > > > @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class
> > > loop *loop,
> > > >        delete_basic_block (preheader);
> > > >        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
> > > >  			       loop_preheader_edge (scalar_loop)->src);
> > > > +
> > > > +      /* Finally after wiring the new epilogue we need to update its main
> exit
> > > > +	 to the original function exit we recorded.  Other exits are already
> > > > +	 correct.  */
> > > > +      if (multiple_exits_p)
> > > > +	{
> > > > +	  for (edge e : get_loop_exit_edges (loop))
> > > > +	    doms.safe_push (e->dest);
> > > > +	  update_loop = new_loop;
> > > > +	  doms.safe_push (exit_dest);
> > > > +
> > > > +	  /* Likely a fall-through edge, so update if needed.  */
> > > > +	  if (single_succ_p (exit_dest))
> > > > +	    doms.safe_push (single_succ (exit_dest));
> > > > +	}
> > > >      }
> > > >    else /* Add the copy at entry.  */
> > > >      {
> > > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > > +	 block and the new loop header.  This allows us to later split the
> > > > +	 preheader block and still find the right LC nodes.  */
> > > > +      edge old_latch_loop = loop_latch_edge (loop);
> > > > +      edge old_latch_init = loop_preheader_edge (loop);
> > > > +      edge new_latch_loop = loop_latch_edge (new_loop);
> > > > +      edge new_latch_init = loop_preheader_edge (new_loop);
> > > > +      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),
> > >
> > > see above
> > >
> > > > +	   gsi_to = gsi_start_phis (old_latch_loop->dest);
> > > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > > +	{
> > > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > > new_latch_loop);
> > > > +	  adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg);
> > > > +	}
> > > > +
> > > >        if (scalar_loop != loop)
> > > >  	{
> > > >  	  /* Remove the non-necessary forwarder of scalar_loop again.  */
> > > > @@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class
> > > loop *loop,
> > > >        delete_basic_block (new_preheader);
> > > >        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
> > > >  			       loop_preheader_edge (new_loop)->src);
> > > > +
> > > > +      if (multiple_exits_p)
> > > > +	update_loop = loop;
> > > >      }
> > > >
> > > > -  if (scalar_loop != loop)
> > > > +  if (multiple_exits_p)
> > > >      {
> > > > -      /* Update new_loop->header PHIs, so that on the preheader
> > > > -	 edge they are the ones from loop rather than scalar_loop.  */
> > > > -      gphi_iterator gsi_orig, gsi_new;
> > > > -      edge orig_e = loop_preheader_edge (loop);
> > > > -      edge new_e = loop_preheader_edge (new_loop);
> > > > -
> > > > -      for (gsi_orig = gsi_start_phis (loop->header),
> > > > -	   gsi_new = gsi_start_phis (new_loop->header);
> > > > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > > > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > > > +      for (edge e : get_loop_exit_edges (update_loop))
> > > >  	{
> > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > -	  gphi *new_phi = gsi_new.phi ();
> > > > -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > > > -	  location_t orig_locus
> > > > -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > > > -
> > > > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > > > +	  edge ex;
> > > > +	  edge_iterator ei;
> > > > +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> > > > +	    {
> > > > +	      /* Find the first non-fallthrough block as fall-throughs can't
> > > > +		 dominate other blocks.  */
> > > > +	      while ((ex->flags & EDGE_FALLTHRU)
> 
> For the prologue peeling any early exit we take would skip all other
> loops so we can simply leave them and their LC PHI nodes in place.
> We need extra PHIs only on the path to the main vector loop.  I
> think the comment isn't accurately reflecting what we do.  In
> fact we do not add any LC PHI nodes here but simply adjust the
> main loop header PHI arguments?
> 
> > > I don't think EDGE_FALLTHRU is set correctly, what's wrong with
> > > just using single_succ_p here?  A fallthru edge src dominates the
> > > fallthru edge dest, so the sentence above doesn't make sense.
> >
> > I wanted to say, that the immediate dominator of a block is never
> > an fall through block.  At least from what I understood from how
> > the dominators are calculated in the code, though may have missed
> > something.
> 
>  BB1
>   |
>  BB2
>   |
>  BB3
> 
> here the immediate dominator of BB3 is BB2 and that of BB2 is BB1.
> 
> > >
> > > > +		     && single_succ_p (ex->dest))
> > > > +		{
> > > > +		  doms.safe_push (ex->dest);
> > > > +		  ex = single_succ_edge (ex->dest);
> > > > +		}
> > > > +	      doms.safe_push (ex->dest);
> > > > +	    }
> > > > +	  doms.safe_push (e->dest);
> > > >  	}
> > > > -    }
> > > >
> > > > +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > +      if (updated_doms)
> > > > +	updated_doms->safe_splice (doms);
> > > > +    }
> > > >    free (new_bbs);
> > > >    free (bbs);
> > > >
> > > > @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const
> > > loop_vec_info loop_vinfo, const_edge e)
> > > >    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> > > >    unsigned int num_bb = loop->inner? 5 : 2;
> > > >
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
> > > > +
> > >
> > > I think checking the number of BBs is odd, I don't remember anything
> > > in slpeel is specifically tied to that?  I think we can simply drop
> > > this or do you remember anything that would depend on ->num_nodes
> > > being only exactly 5 or 2?
> >
> > Never actually seemed to require it, but they're used as some check to
> > see if there are unexpected control flow in the loop.
> >
> > i.e. this would say no if you have an if statement in the loop that wasn't
> > converted.  The other part of this and the accompanying explanation is in
> > vect_analyze_loop_form.  In the patch series I had to remove the hard
> > num_nodes == 2 check from there because number of nodes restricted
> > things too much.  If you have an empty fall through block, which seems to
> > happen often between the main exit and the latch block then we'd not
> > vectorize.
> >
> > So instead I now rejects loops after analyzing the gcond.  So think this check
> > can go/needs to be different.
> 
> Lets remove it from this function then.
> 
> > >
> > > >    /* All loops have an outer scope; the only case loop->outer is NULL is for
> > > >       the function itself.  */
> > > >    if (!loop_outer (loop)
> > > > @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer
> > > (loop_vec_info loop_vinfo,
> > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > >    basic_block update_bb = update_e->dest;
> > > >
> > > > +  /* For early exits we'll update the IVs in
> > > > +     vect_update_ivs_after_early_break.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    return;
> > > > +
> > > >    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > >
> > > >    /* Make sure there exists a single-predecessor exit bb:  */
> > > > @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer
> > > (loop_vec_info loop_vinfo,
> > > >        /* Fix phi expressions in the successor bb.  */
> > > >        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > >      }
> > > > +  return;
> > >
> > > we don't usually place a return at the end of void functions
> > >
> > > > +}
> > > > +
> > > > +/*   Function vect_update_ivs_after_early_break.
> > > > +
> > > > +     "Advance" the induction variables of LOOP to the value they should
> take
> > > > +     after the execution of LOOP.  This is currently necessary because the
> > > > +     vectorizer does not handle induction variables that are used after the
> > > > +     loop.  Such a situation occurs when the last iterations of LOOP are
> > > > +     peeled, because of the early exit.  With an early exit we always peel
> the
> > > > +     loop.
> > > > +
> > > > +     Input:
> > > > +     - LOOP_VINFO - a loop info structure for the loop that is going to be
> > > > +		    vectorized. The last few iterations of LOOP were peeled.
> > > > +     - LOOP - a loop that is going to be vectorized. The last few iterations
> > > > +	      of LOOP were peeled.
> > > > +     - VF - The loop vectorization factor.
> > > > +     - NITERS_ORIG - the number of iterations that LOOP executes (before
> it is
> > > > +		     vectorized). i.e, the number of times the ivs should be
> > > > +		     bumped.
> > > > +     - NITERS_VECTOR - The number of iterations that the vector LOOP
> > > executes.
> > > > +     - UPDATE_E - a successor edge of LOOP->exit that is on the (only)
> path
> > > > +		  coming out from LOOP on which there are uses of the LOOP
> > > ivs
> > > > +		  (this is the path from LOOP->exit to epilog_loop->preheader).
> > > > +
> > > > +		  The new definitions of the ivs are placed in LOOP->exit.
> > > > +		  The phi args associated with the edge UPDATE_E in the bb
> > > > +		  UPDATE_E->dest are updated accordingly.
> > > > +
> > > > +     Output:
> > > > +       - If available, the LCSSA phi node for the loop IV temp.
> > > > +
> > > > +     Assumption 1: Like the rest of the vectorizer, this function assumes
> > > > +     a single loop exit that has a single predecessor.
> > > > +
> > > > +     Assumption 2: The phi nodes in the LOOP header and in update_bb
> are
> > > > +     organized in the same order.
> > > > +
> > > > +     Assumption 3: The access function of the ivs is simple enough (see
> > > > +     vect_can_advance_ivs_p).  This assumption will be relaxed in the
> future.
> > > > +
> > > > +     Assumption 4: Exactly one of the successors of LOOP exit-bb is on a
> path
> > > > +     coming out of LOOP on which the ivs of LOOP are used (this is the
> path
> > > > +     that leads to the epilog loop; other paths skip the epilog loop).  This
> > > > +     path starts with the edge UPDATE_E, and its destination (denoted
> > > update_bb)
> > > > +     needs to have its phis updated.
> > > > + */
> > > > +
> > > > +static tree
> > > > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class
> loop *
> > > epilog,
> > > > +				   poly_int64 vf, tree niters_orig,
> > > > +				   tree niters_vector, edge update_e)
> > > > +{
> > > > +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    return NULL;
> > > > +
> > > > +  gphi_iterator gsi, gsi1;
> > > > +  tree ni_name, ivtmp = NULL;
> > > > +  basic_block update_bb = update_e->dest;
> > > > +  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > +  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > +  basic_block exit_bb = loop_iv->dest;
> > > > +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > +  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
> > > > +
> > > > +  gcc_assert (cond);
> > > > +
> > > > +  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> (update_bb);
> > > > +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > +       gsi_next (&gsi), gsi_next (&gsi1))
> > > > +    {
> > > > +      tree init_expr, final_expr, step_expr;
> > > > +      tree type;
> > > > +      tree var, ni, off;
> > > > +      gimple_stmt_iterator last_gsi;
> > > > +
> > > > +      gphi *phi = gsi1.phi ();
> > > > +      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi,
> loop_preheader_edge
> > > (epilog));
> > >
> > > I'm confused about the setup.  update_bb looks like the block with the
> > > loop-closed PHI nodes of 'loop' and the exit (update_e)?  How does
> > > loop_preheader_edge (epilog) come into play here?  That would feed into
> > > epilog->header PHIs?!
> >
> > We can't query the type of the phis in the block with the LC PHI nodes, so the
> > Typical pattern seems to be that we iterate over a block that's part of the
> loop
> > and that would have the PHIs in the same order, just so we can get to the
> > stmt_vec_info.
> >
> > >
> > > It would be nice to name 'gsi[1]', 'update_e' and 'update_bb' in a
> > > better way?  Is update_bb really epilog->header?!
> > >
> > > We're missing checking in PHI_ARG_DEF_FROM_EDGE, namely that
> > > E->dest == gimple_bb (PHI) - we're just using E->dest_idx there
> > > which "works" even for totally unrelated edges.
> > >
> > > > +      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
> > > > +      if (!phi1)
> > >
> > > shouldn't that be an assert?
> > >
> > > > +	continue;
> > > > +      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> > > > +      if (dump_enabled_p ())
> > > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > > +			 "vect_update_ivs_after_early_break: phi: %G",
> > > > +			 (gimple *)phi);
> > > > +
> > > > +      /* Skip reduction and virtual phis.  */
> > > > +      if (!iv_phi_p (phi_info))
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > +			     "reduc or virtual phi. skip.\n");
> > > > +	  continue;
> > > > +	}
> > > > +
> > > > +      /* For multiple exits where we handle early exits we need to carry on
> > > > +	 with the previous IV as loop iteration was not done because we exited
> > > > +	 early.  As such just grab the original IV.  */
> > > > +      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge
> > > (loop));
> > >
> > > but this should be taken care of by LC SSA?
> >
> > It is, the comment is probably missing details, this part just scales the
> counter
> > from VF to scalar counts.  It's just a reminder that this scaling is done
> differently
> > from normal single exit vectorization.
> >
> > >
> > > OK, have to continue tomorrow from here.
> >
> > Cheers, Thank you!
> >
> > Tamar
> >
> > >
> > > Richard.
> > >
> > > > +      if (gimple_cond_lhs (cond) != phi_ssa
> > > > +	  && gimple_cond_rhs (cond) != phi_ssa)
> 
> so this is a way to avoid touching the main IV?  Looks a bit fragile to
> me.  Hmm, we're iterating over the main loop header PHIs here?
> Can't you check, say, the relevancy of the PHI node instead?  Though
> it might also be used as induction.  Can't it be used as alternate
> exit like
> 
>   for (i)
>    {
>      if (i & bit)
>        break;
>    }
> 
> and would we need to adjust 'i' then?
> 
> > > > +	{
> > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > +	  step_expr = unshare_expr (step_expr);
> > > > +
> > > > +	  /* We previously generated the new merged phi in the same BB as
> > > the
> > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > +	     normal loop phi which don't take the early breaks into account.  */
> > > > +	  final_expr = gimple_phi_result (phi1);
> > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (),
> > > loop_preheader_edge (loop));
> > > > +
> > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > +	  /* For early break the final loop IV is:
> > > > +	     init + (final - init) * vf which takes into account peeling
> > > > +	     values and non-single steps.  */
> > > > +	  off = fold_build2 (MINUS_EXPR, stype,
> > > > +			     fold_convert (stype, final_expr),
> > > > +			     fold_convert (stype, init_expr));
> > > > +	  /* Now adjust for VF to get the final iteration value.  */
> > > > +	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf));
> > > > +
> > > > +	  /* Adjust the value with the offset.  */
> > > > +	  if (POINTER_TYPE_P (type))
> > > > +	    ni = fold_build_pointer_plus (init_expr, off);
> > > > +	  else
> > > > +	    ni = fold_convert (type,
> > > > +			       fold_build2 (PLUS_EXPR, stype,
> > > > +					    fold_convert (stype, init_expr),
> > > > +					    off));
> > > > +	  var = create_tmp_var (type, "tmp");
> 
> so how does the non-early break code deal with updating inductions?
> And how do you avoid altering this when we flow in from the normal
> exit?  That is, you are updating the value in the epilog loop
> header but don't you need to instead update the value only on
> the alternate exit edges from the main loop (and keep the not
> updated value on the main exit edge)?
> 
> > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > +	  gimple_seq new_stmts = NULL;
> > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > +	  if (!gsi_end_p (last_gsi))
> > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +	  else
> > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +
> > > > +	  /* Fix phi expressions in the successor bb.  */
> > > > +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > > > +	}
> > > > +      else
> > > > +	{
> > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > +	  step_expr = unshare_expr (step_expr);
> > > > +
> > > > +	  /* We previously generated the new merged phi in the same BB as
> > > the
> > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > +	     normal loop phi which don't take the early breaks into account.  */
> > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge
> > > (loop));
> > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > +
> > > > +	  if (vf.is_constant ())
> > > > +	    {
> > > > +	      ni = fold_build2 (MULT_EXPR, stype,
> > > > +				fold_convert (stype,
> > > > +					      niters_vector),
> > > > +				build_int_cst (stype, vf));
> > > > +
> > > > +	      ni = fold_build2 (MINUS_EXPR, stype,
> > > > +				fold_convert (stype,
> > > > +					      niters_orig),
> > > > +				fold_convert (stype, ni));
> > > > +	    }
> > > > +	  else
> > > > +	    /* If the loop's VF isn't constant then the loop must have been
> > > > +	       masked, so at the end of the loop we know we have finished
> > > > +	       the entire loop and found nothing.  */
> > > > +	    ni = build_zero_cst (stype);
> > > > +
> > > > +	  ni = fold_convert (type, ni);
> > > > +	  /* We don't support variable n in this version yet.  */
> > > > +	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> > > > +
> > > > +	  var = create_tmp_var (type, "tmp");
> > > > +
> > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > +	  gimple_seq new_stmts = NULL;
> > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > +	  if (!gsi_end_p (last_gsi))
> > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +	  else
> > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +
> > > > +	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
> > > > +
> > > > +	  for (edge exit : alt_exits)
> > > > +	    adjust_phi_and_debug_stmts (phi1, exit,
> > > > +					build_int_cst (TREE_TYPE (step_expr),
> > > > +						       vf));
> > > > +	  ivtmp = gimple_phi_result (phi1);
> > > > +	}
> > > > +    }
> > > > +
> > > > +  return ivtmp;
> > > >  }
> > > >
> > > >  /* Return a gimple value containing the misalignment (measured in
> vector
> > > > @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf
> > > (loop_vec_info loop_vinfo,
> > > >
> > > >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
> > > >     this function searches for the corresponding lcssa phi node in exit
> > > > -   bb of LOOP.  If it is found, return the phi result; otherwise return
> > > > -   NULL.  */
> > > > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
> > > > +   return the phi result; otherwise return NULL.  */
> > > >
> > > >  static tree
> > > >  find_guard_arg (class loop *loop, class loop *epilog
> ATTRIBUTE_UNUSED,
> > > > -		gphi *lcssa_phi)
> > > > +		gphi *lcssa_phi, int lcssa_edge = 0)
> > > >  {
> > > >    gphi_iterator gsi;
> > > >    edge e = loop->vec_loop_iv;
> > > >
> > > > -  gcc_assert (single_pred_p (e->dest));
> > > >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> > > >      {
> > > >        gphi *phi = gsi.phi ();
> > > > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > > -	return PHI_RESULT (phi);
> > > > -    }
> > > > -  return NULL_TREE;
> > > > -}
> > > > -
> > > > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > > FIRST/SECOND
> > > > -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> > > > -   edge, the two loops are arranged as below:
> > > > -
> > > > -       preheader_a:
> > > > -     first_loop:
> > > > -       header_a:
> > > > -	 i_1 = PHI<i_0, i_2>;
> > > > -	 ...
> > > > -	 i_2 = i_1 + 1;
> > > > -	 if (cond_a)
> > > > -	   goto latch_a;
> > > > -	 else
> > > > -	   goto between_bb;
> > > > -       latch_a:
> > > > -	 goto header_a;
> > > > -
> > > > -       between_bb:
> > > > -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> > > > -
> > > > -     second_loop:
> > > > -       header_b:
> > > > -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > > > -				 or with i_2 if no LCSSA phi is created
> > > > -				 under condition of
> > > CREATE_LCSSA_FOR_IV_PHIS.
> > > > -	 ...
> > > > -	 i_4 = i_3 + 1;
> > > > -	 if (cond_b)
> > > > -	   goto latch_b;
> > > > -	 else
> > > > -	   goto exit_bb;
> > > > -       latch_b:
> > > > -	 goto header_b;
> > > > -
> > > > -       exit_bb:
> > > > -
> > > > -   This function creates loop closed SSA for the first loop; update the
> > > > -   second loop's PHI nodes by replacing argument on incoming edge with
> the
> > > > -   result of newly created lcssa PHI nodes.  IF
> CREATE_LCSSA_FOR_IV_PHIS
> > > > -   is false, Loop closed ssa phis will only be created for non-iv phis for
> > > > -   the first loop.
> > > > -
> > > > -   This function assumes exit bb of the first loop is preheader bb of the
> > > > -   second loop, i.e, between_bb in the example code.  With PHIs updated,
> > > > -   the second loop will execute rest iterations of the first.  */
> > > > -
> > > > -static void
> > > > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > > -				   class loop *first, class loop *second,
> > > > -				   bool create_lcssa_for_iv_phis)
> > > > -{
> > > > -  gphi_iterator gsi_update, gsi_orig;
> > > > -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > -
> > > > -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > > > -  edge second_preheader_e = loop_preheader_edge (second);
> > > > -  basic_block between_bb = single_exit (first)->dest;
> > > > -
> > > > -  gcc_assert (between_bb == second_preheader_e->src);
> > > > -  gcc_assert (single_pred_p (between_bb) && single_succ_p
> (between_bb));
> > > > -  /* Either the first loop or the second is the loop to be vectorized.  */
> > > > -  gcc_assert (loop == first || loop == second);
> > > > -
> > > > -  for (gsi_orig = gsi_start_phis (first->header),
> > > > -       gsi_update = gsi_start_phis (second->header);
> > > > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > > > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > > > -    {
> > > > -      gphi *orig_phi = gsi_orig.phi ();
> > > > -      gphi *update_phi = gsi_update.phi ();
> > > > -
> > > > -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> > > > -      /* Generate lcssa PHI node for the first loop.  */
> > > > -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> > > > -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
> > > > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > > > +      /* Nested loops with multiple exits can have different no# phi node
> > > > +	 arguments between the main loop and epilog as epilog falls to the
> > > > +	 second loop.  */
> > > > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> > > >  	{
> > > > -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > > > -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > > > -	  add_phi_arg (lcssa_phi, arg, single_exit (first),
> > > UNKNOWN_LOCATION);
> > > > -	  arg = new_res;
> > > > -	}
> > > > -
> > > > -      /* Update PHI node in the second loop by replacing arg on the loop's
> > > > -	 incoming edge.  */
> > > > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e,
> arg);
> > > > -    }
> > > > -
> > > > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > > > -     for correct vectorization of live stmts.  */
> > > > -  if (loop == first)
> > > > -    {
> > > > -      basic_block orig_exit = single_exit (second)->dest;
> > > > -      for (gsi_orig = gsi_start_phis (orig_exit);
> > > > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > > -	{
> > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> > > > -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> > > (orig_arg))
> > > > -	    continue;
> > > > -
> > > > -	  /* Already created in the above loop.   */
> > > > -	  if (find_guard_arg (first, second, orig_phi))
> > > > +	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
> > > > +	  if (TREE_CODE (var) != SSA_NAME)
> > > >  	    continue;
> > > >
> > > > -	  tree new_res = copy_ssa_name (orig_arg);
> > > > -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > > > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> > > UNKNOWN_LOCATION);
> > > > +	  if (operand_equal_p (get_current_def (var),
> > > > +			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > > > +	    return PHI_RESULT (phi);
> > > >  	}
> > > >      }
> > > > +  return NULL_TREE;
> > > >  }
> > > >
> > > >  /* Function slpeel_add_loop_guard adds guard skipping from the
> beginning
> > > > @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2
> (class
> > > loop *loop, class loop *epilog,
> > > >    gcc_assert (single_succ_p (merge_bb));
> > > >    edge e = single_succ_edge (merge_bb);
> > > >    basic_block exit_bb = e->dest;
> > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > > >
> > > >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > >      {
> > > >        gphi *update_phi = gsi.phi ();
> > > > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > > >
> > > >        tree merge_arg = NULL_TREE;
> > > >
> > > > @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class
> loop
> > > *loop, class loop *epilog,
> > > >        if (!merge_arg)
> > > >  	merge_arg = old_arg;
> > > >
> > > > -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> > > > +      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e-
> >dest_idx);
> > > >        /* If the var is live after loop but not a reduction, we simply
> > > >  	 use the old arg.  */
> > > >        if (!guard_arg)
> > > > @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class
> > > loop *loop, class loop *epilog,
> > > >      }
> > > >  }
> > > >
> > > > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > > > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > > > -
> > > > -static void
> > > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > > > -{
> > > > -  gphi_iterator gsi;
> > > > -  basic_block exit_bb = single_exit (epilog)->dest;
> > > > -
> > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > -  edge e = EDGE_PRED (exit_bb, 0);
> > > > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > > -}
> > > > -
> 
> I wonder if we can still split these changes out to before early break
> vect?
> 
> > > >  /* EPILOGUE_VINFO is an epilogue loop that we now know would need
> to
> > > >     iterate exactly CONST_NITERS times.  Make a final decision about
> > > >     whether the epilogue loop should be used, returning true if so.  */
> > > > @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >      bound_epilog += vf - 1;
> > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > >      bound_epilog += 1;
> > > > +  /* For early breaks the scalar loop needs to execute at most VF times
> > > > +     to find the element that caused the break.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    {
> > > > +      bound_epilog = vf;
> > > > +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> > > > +      vect_epilogues = false;
> > > > +    }
> > > >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > > >    poly_uint64 bound_scalar = bound_epilog;
> > > >
> > > > @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >  				  bound_prolog + bound_epilog)
> > > >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > > >  			 || vect_epilogues));
> > > > +
> > > > +  /* We only support early break vectorization on known bounds at this
> > > time.
> > > > +     This means that if the vector loop can't be entered then we won't
> > > generate
> > > > +     it at all.  So for now force skip_vector off because the additional
> control
> > > > +     flow messes with the BB exits and we've already analyzed them.  */
> > > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS
> (loop_vinfo);
> > > > +
> 
> I think it should be as "easy" as entering the epilog via the block taking
> the regular exit?
> 
> > > >    /* Epilog loop must be executed if the number of iterations for epilog
> > > >       loop is known at compile time, otherwise we need to add a check at
> > > >       the end of vector loop and skip to the end of epilog loop.  */
> > > >    bool skip_epilog = (prolog_peeling < 0
> > > >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > >  		      || !vf.is_constant ());
> > > > -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.
> */
> > > > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > +  /* PEELING_FOR_GAPS and peeling for early breaks are special because
> > > epilog
> > > > +     loop must be executed.  */
> > > > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      skip_epilog = false;
> > > > -
> > > >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > >    auto_vec<profile_count> original_counts;
> > > >    basic_block *original_bbs = NULL;
> > > > @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >    if (prolog_peeling)
> > > >      {
> > > >        e = loop_preheader_edge (loop);
> > > > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > > > -
> > > > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
> > > >        /* Peel prolog and put it on preheader edge of loop.  */
> > > > -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
> e);
> > > > +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
> e,
> > > > +						       true);
> > > >        gcc_assert (prolog);
> > > >        prolog->force_vectorize = false;
> > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> > > > +
> > > >        first_loop = prolog;
> > > >        reset_original_copy_tables ();
> > > >
> > > > @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >  	 as the transformations mentioned above make less or no sense when
> > > not
> > > >  	 vectorizing.  */
> > > >        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> > > > -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > > > +      auto_vec<basic_block> doms;
> > > > +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e,
> true,
> > > > +						       &doms);
> > > >        gcc_assert (epilog);
> > > >
> > > >        epilog->force_vectorize = false;
> > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
> > > >
> > > >        /* Scalar version loop may be preferred.  In this case, add guard
> > > >  	 and skip to epilog.  Note this only happens when the number of
> > > > @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >        vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > > >  					update_e);
> > > >
> > > > +      /* For early breaks we must create a guard to check how many
> iterations
> > > > +	 of the scalar loop are yet to be performed.  */
> 
> We have this check anyway, no?  In fact don't we know that we always enter
> the epilog (see above)?
> 
> > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	{
> > > > +	  tree ivtmp =
> > > > +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters,
> > > > +					       *niters_vector, update_e);
> > > > +
> > > > +	  gcc_assert (ivtmp);
> > > > +	  tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > +					 fold_convert (TREE_TYPE (niters),
> > > > +						       ivtmp),
> > > > +					 build_zero_cst (TREE_TYPE (niters)));
> > > > +	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > +
> > > > +	  /* If we had a fallthrough edge, the guard will the threaded through
> > > > +	     and so we may need to find the actual final edge.  */
> > > > +	  edge final_edge = epilog->vec_loop_iv;
> > > > +	  /* slpeel_update_phi_nodes_for_guard2 expects an empty block in
> > > > +	     between the guard and the exit edge.  It only adds new nodes and
> > > > +	     doesn't update existing one in the current scheme.  */
> > > > +	  basic_block guard_to = split_edge (final_edge);
> > > > +	  edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
> > > guard_to,
> > > > +						guard_bb, prob_epilog.invert
> > > (),
> > > > +						irred_flag);
> > > > +	  doms.safe_push (guard_bb);
> > > > +
> > > > +	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > +
> > > > +	  /* We must update all the edges from the new guard_bb.  */
> > > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > > +					      final_edge);
> > > > +
> > > > +	  /* If the loop was versioned we'll have an intermediate BB between
> > > > +	     the guard and the exit.  This intermediate block is required
> > > > +	     because in the current scheme of things the guard block phi
> > > > +	     updating can only maintain LCSSA by creating new blocks.  In this
> > > > +	     case we just need to update the uses in this block as well.  */
> > > > +	  if (loop != scalar_loop)
> > > > +	    {
> > > > +	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
> > > > +		   !gsi_end_p (gsi); gsi_next (&gsi))
> > > > +		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (),
> > > guard_e));
> > > > +	    }
> > > > +
> > > > +	  flush_pending_stmts (guard_e);
> > > > +	}
> > > > +
> > > >        if (skip_epilog)
> > > >  	{
> > > >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >  	    }
> > > >  	  scale_loop_profile (epilog, prob_epilog, 0);
> > > >  	}
> > > > -      else
> > > > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> > > >
> > > >        unsigned HOST_WIDE_INT bound;
> > > >        if (bound_scalar.is_constant (&bound))
> > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > index
> > >
> b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49
> > > f21421958e7ee25eada 100644
> > > > --- a/gcc/tree-vect-loop.cc
> > > > +++ b/gcc/tree-vect-loop.cc
> > > > @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop
> > > *loop_in, vec_info_shared *shared)
> > > >      partial_load_store_bias (0),
> > > >      peeling_for_gaps (false),
> > > >      peeling_for_niter (false),
> > > > +    early_breaks (false),
> > > > +    non_break_control_flow (false),
> > > >      no_data_dependencies (false),
> > > >      has_mask_store (false),
> > > >      scalar_loop_scaling (profile_probability::uninitialized ()),
> > > > @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p
> > > (loop_vec_info loop_vinfo)
> > > >      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> > > (LOOP_VINFO_ORIG_LOOP_INFO
> > > >  					  (loop_vinfo));
> > > >
> > > > +  /* When we have multiple exits and VF is unknown, we must require
> > > partial
> > > > +     vectors because the loop bounds is not a minimum but a maximum.
> > > That is to
> > > > +     say we cannot unpredicate the main loop unless we peel or use partial
> > > > +     vectors in the epilogue.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > > > +    return true;
> > > > +
> > > >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> > > >      {
> > > > @@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost
> > > (loop_vec_info loop_vinfo)
> > > >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> > > >  }
> > > >
> > > > -
> > > >  /* Function vect_analyze_loop_form.
> > > >
> > > >     Verify that certain CFG restrictions hold, including:
> > > >     - the loop has a pre-header
> > > > -   - the loop has a single entry and exit
> > > > +   - the loop has a single entry
> > > > +   - nested loops can have only a single exit.
> > > >     - the loop exit condition is simple enough
> > > >     - the number of iterations can be analyzed, i.e, a countable loop.  The
> > > >       niter could be analyzed under some assumptions.  */
> > > > @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >                             |
> > > >                          (exit-bb)  */
> > > >
> > > > -      if (loop->num_nodes != 2)
> > > > -	return opt_result::failure_at (vect_location,
> > > > -				       "not vectorized:"
> > > > -				       " control flow in loop.\n");
> > > > -
> > > >        if (empty_block_p (loop->header))
> > > >  	return opt_result::failure_at (vect_location,
> > > >  				       "not vectorized: empty loop.\n");
> > > > @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >          dump_printf_loc (MSG_NOTE, vect_location,
> > > >  			 "Considering outer-loop vectorization.\n");
> > > >        info->inner_loop_cond = inner.loop_cond;
> > > > +
> > > > +      if (!single_exit (loop))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized: multiple exits.\n");
> > > > +
> > > >      }
> > > >
> > > > -  if (!single_exit (loop))
> > > > -    return opt_result::failure_at (vect_location,
> > > > -				   "not vectorized: multiple exits.\n");
> > > >    if (EDGE_COUNT (loop->header->preds) != 2)
> > > >      return opt_result::failure_at (vect_location,
> > > >  				   "not vectorized:"
> > > > @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >  				   "not vectorized: latch block not empty.\n");
> > > >
> > > >    /* Make sure the exit is not abnormal.  */
> > > > -  edge e = single_exit (loop);
> > > > -  if (e->flags & EDGE_ABNORMAL)
> > > > -    return opt_result::failure_at (vect_location,
> > > > -				   "not vectorized:"
> > > > -				   " abnormal loop exit edge.\n");
> > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > +  edge nexit = loop->vec_loop_iv;
> > > > +  for (edge e : exits)
> > > > +    {
> > > > +      if (e->flags & EDGE_ABNORMAL)
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " abnormal loop exit edge.\n");
> > > > +      /* Early break BB must be after the main exit BB.  In theory we should
> > > > +	 be able to vectorize the inverse order, but the current flow in the
> > > > +	 the vectorizer always assumes you update successor PHI nodes, not
> > > > +	 preds.  */
> > > > +      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e-
> > > >src))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " abnormal loop exit edge order.\n");
> 
> "unsupported loop exit order", but I don't understand the comment.
> 
> > > > +    }
> > > > +
> > > > +  /* We currently only support early exit loops with known bounds.   */
> 
> Btw, why's that?  Is that because we don't support the loop-around edge?
> IMHO this is the most serious limitation (and as said above it should be
> trivial to fix).
> 
> > > > +  if (exits.length () > 1)
> > > > +    {
> > > > +      class tree_niter_desc niter;
> > > > +      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter,
> NULL)
> > > > +	  || chrec_contains_undetermined (niter.niter)
> > > > +	  || !evolution_function_is_constant_p (niter.niter))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " early breaks only supported on loops"
> > > > +				       " with known iteration bounds.\n");
> > > > +    }
> > > >
> > > >    info->conds
> > > >      = vect_get_loop_niters (loop, &info->assumptions,
> > > > @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop,
> > > vec_info_shared *shared,
> > > >    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info-
> > > >alt_loop_conds);
> > > >    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
> > > >
> > > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > > +
> > > >    if (info->inner_loop_cond)
> > > >      {
> > > >        stmt_vec_info inner_loop_cond_info
> > > > @@ -3070,7 +3106,8 @@ start_over:
> > > >
> > > >    /* If an epilogue loop is required make sure we can create one.  */
> > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      {
> > > >        if (dump_enabled_p ())
> > > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop
> required\n");
> > > > @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction
> (loop_vec_info
> > > loop_vinfo,
> > > >    basic_block exit_bb;
> > > >    tree scalar_dest;
> > > >    tree scalar_type;
> > > > -  gimple *new_phi = NULL, *phi;
> > > > +  gimple *new_phi = NULL, *phi = NULL;
> > > >    gimple_stmt_iterator exit_gsi;
> > > >    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> > > >    gimple *epilog_stmt = NULL;
> > > > @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction
> > > (loop_vec_info loop_vinfo,
> > > >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> > > >  	  reduc_inputs.quick_push (new_def);
> > > >  	}
> > > > +
> > > > +	/* Update the other exits.  */
> > > > +	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	  {
> > > > +	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > +	    gphi_iterator gsi, gsi1;
> > > > +	    for (edge exit : alt_exits)
> > > > +	      {
> > > > +		/* Find the phi node to propaget into the exit block for each
> > > > +		   exit edge.  */
> > > > +		for (gsi = gsi_start_phis (exit_bb),
> > > > +		     gsi1 = gsi_start_phis (exit->src);
> 
> exit->src == loop->header, right?  I think this won't work for multiple
> alternate exits.  It's probably easier to do this where we create the
> LC PHI node for the reduction result?
> 
> > > > +		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > +		     gsi_next (&gsi), gsi_next (&gsi1))
> > > > +		  {
> > > > +		    /* There really should be a function to just get the number
> > > > +		       of phis inside a bb.  */
> > > > +		    if (phi && phi == gsi.phi ())
> > > > +		      {
> > > > +			gphi *phi1 = gsi1.phi ();
> > > > +			SET_PHI_ARG_DEF (phi, exit->dest_idx,
> > > > +					 PHI_RESULT (phi1));
> 
> I think we know the header PHI of a reduction perfectly well, there
> shouldn't be the need to "search" for it.
> 
> > > > +			break;
> > > > +		      }
> > > > +		  }
> > > > +	      }
> > > > +	  }
> > > >        gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > > >      }
> > > >
> > > > @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info
> *vinfo,
> > > >  	   new_tree = lane_extract <vec_lhs', ...>;
> > > >  	   lhs' = new_tree;  */
> > > >
> > > > +      /* When vectorizing an early break, any live statement that is used
> > > > +	 outside of the loop are dead.  The loop will never get to them.
> > > > +	 We could change the liveness value during analysis instead but since
> > > > +	 the below code is invalid anyway just ignore it during codegen.  */
> > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	return true;
> 
> But what about the value that's live across the main exit when the
> epilogue is not entered?
> 
> > > > +
> > > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > >        basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > >        gcc_assert (single_pred_p (exit_bb));
> > > > @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > > >       versioning.   */
> > > >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > -  if (! single_pred_p (e->dest))
> > > > +  if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> > > (loop_vinfo))
> 
> e can be NULL here?  I think we should reject such loops earlier.
> 
> > > >      {
> > > >        split_loop_exit_edge (e, true);
> > > >        if (dump_enabled_p ())
> > > > @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > > >      {
> > > >        e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > > -      if (! single_pred_p (e->dest))
> > > > +      if (e && ! single_pred_p (e->dest))
> > > >  	{
> > > >  	  split_loop_exit_edge (e, true);
> > > >  	  if (dump_enabled_p ())
> > > > @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >
> > > >    /* Loops vectorized with a variable factor won't benefit from
> > > >       unrolling/peeling.  */
> 
> update the comment?  Why would we unroll a VLA loop with early breaks?
> Or did you mean to use || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)?
> 
> > > > -  if (!vf.is_constant ())
> > > > +  if (!vf.is_constant ()
> > > > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      {
> > > >        loop->unroll = 1;
> > > >        if (dump_enabled_p ())
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index
> > >
> 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82af
> > > e78ee2a7c627be45b 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info
> stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >    *live_p = false;
> > > >
> > > >    /* cond stmt other than loop exit cond.  */
> > > > -  if (is_ctrl_stmt (stmt_info->stmt)
> > > > -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> > > > -    *relevant = vect_used_in_scope;
> 
> how was that ever hit before?  For outer loop processing with outer loop
> vectorization?
> 
> > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > +    {
> > > > +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge,
> but
> > > > +	 it looks like loop_manip doesn't do that..  So we have to do it
> > > > +	 the hard way.  */
> > > > +      basic_block bb = gimple_bb (stmt_info->stmt);
> > > > +      bool exit_bb = false, early_exit = false;
> > > > +      edge_iterator ei;
> > > > +      edge e;
> > > > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > > > +        if (!flow_bb_inside_loop_p (loop, e->dest))
> > > > +	  {
> > > > +	    exit_bb = true;
> > > > +	    early_exit = loop->vec_loop_iv->src != bb;
> > > > +	    break;
> > > > +	  }
> > > > +
> > > > +      /* We should have processed any exit edge, so an edge not an early
> > > > +	 break must be a loop IV edge.  We need to distinguish between the
> > > > +	 two as we don't want to generate code for the main loop IV.  */
> > > > +      if (exit_bb)
> > > > +	{
> > > > +	  if (early_exit)
> > > > +	    *relevant = vect_used_in_scope;
> > > > +	}
> 
> I wonder why you can't simply do
> 
>          if (is_ctrl_stmt (stmt_info->stmt)
>              && stmt_info->stmt != LOOP_VINFO_COND (loop_info))
> 
> ?
> 
> > > > +      else if (bb->loop_father == loop)
> > > > +	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;
> 
> so for control flow not exiting the loop you can check
> loop_exits_from_bb_p ().
> 
> > > > +    }
> > > >
> > > >    /* changing memory.  */
> > > >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> > > > @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info
> stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >  	*relevant = vect_used_in_scope;
> > > >        }
> > > >
> > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > +  auto_bitmap exit_bbs;
> > > > +  for (edge exit : exits)
> > > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > +
> > > >    /* uses outside the loop.  */
> > > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > SSA_OP_DEF)
> > > >      {
> > > > @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >  	      /* We expect all such uses to be in the loop exit phis
> > > >  		 (because of loop closed form)   */
> > > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > > > -	      gcc_assert (bb == single_exit (loop)->dest);
> > > > +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
> 
> That now becomes quite expensive checking already covered by the LC SSA
> verifier so I suggest to simply drop this assert instead.
> 
> > > >                *live_p = true;
> > > >  	    }
> > > > @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized
> > > (loop_vec_info loop_vinfo, bool *fatal)
> > > >  	}
> > > >      }
> > > >
> > > > +  /* Ideally this should be in vect_analyze_loop_form but we haven't
> seen all
> > > > +     the conds yet at that point and there's no quick way to retrieve them.
> */
> > > > +  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
> > > > +    return opt_result::failure_at (vect_location,
> > > > +				   "not vectorized:"
> > > > +				   " unsupported control flow in loop.\n");
> 
> so we didn't do this before?  But see above where I wondered.  So when
> does this hit with early exits and why can't we check for this in
> vect_verify_loop_form?
> 
> > > > +
> > > >    /* 2. Process_worklist */
> > > >    while (worklist.length () > 0)
> > > >      {
> > > > @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized
> > > (loop_vec_info loop_vinfo, bool *fatal)
> > > >  			return res;
> > > >  		    }
> > > >                   }
> > > > +	    }
> > > > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> > > > +	    {
> > > > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > > > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > > > +	      opt_result res
> > > > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > > +			       loop_vinfo, relevant, &worklist, false);
> > > > +	      if (!res)
> > > > +		return res;
> > > > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > > +				loop_vinfo, relevant, &worklist, false);
> > > > +	      if (!res)
> > > > +		return res;
> > > >              }
> > > >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> > > >  	    {
> > > > @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >  			     node_instance, cost_vec);
> > > >        if (!res)
> > > >  	return res;
> > > > -   }
> > > > +    }
> > > > +
> > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > > >
> > > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > >      {
> > > >        case vect_internal_def:
> > > > +      case vect_early_exit_def:
> > > >          break;
> > > >
> > > >        case vect_reduction_def:
> > > > @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >      {
> > > >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > > >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> > > >        *need_to_vectorize = true;
> > > >      }
> > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > index
> > >
> ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b3
> > > 40baff61d18da8e242dd 100644
> > > > --- a/gcc/tree-vectorizer.h
> > > > +++ b/gcc/tree-vectorizer.h
> > > > @@ -63,6 +63,7 @@ enum vect_def_type {
> > > >    vect_internal_def,
> > > >    vect_induction_def,
> > > >    vect_reduction_def,
> > > > +  vect_early_exit_def,
> 
> can you avoid putting this inbetween reduction and double reduction
> please?  Just put it before vect_unknown_def_type.  In fact the COND
> isn't a def ... maybe we should have pattern recogized
> 
>  if (a < b) exit;
> 
> as
> 
>  cond = a < b;
>  if (cond != 0) exit;
> 
> so the part that we need to vectorize is more clear.
> 
> > > >    vect_double_reduction_def,
> > > >    vect_nested_cycle,
> > > >    vect_first_order_recurrence,
> > > > @@ -876,6 +877,13 @@ public:
> > > >       we need to peel off iterations at the end to form an epilogue loop.  */
> > > >    bool peeling_for_niter;
> > > >
> > > > +  /* When the loop has early breaks that we can vectorize we need to
> peel
> > > > +     the loop for the break finding loop.  */
> > > > +  bool early_breaks;
> > > > +
> > > > +  /* When the loop has a non-early break control flow inside.  */
> > > > +  bool non_break_control_flow;
> > > > +
> > > >    /* List of loop additional IV conditionals found in the loop.  */
> > > >    auto_vec<gcond *> conds;
> > > >
> > > > @@ -985,9 +993,11 @@ public:
> > > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> > > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> > > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > > >  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> > > >early_break_conflict
> > > >  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)-
> >early_break_dest_bb
> > > >  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> > > > +#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)-
> > > >non_break_control_flow
> > > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > > >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > > >no_data_dependencies
> > > > @@ -1038,8 +1048,8 @@ public:
> > > >     stack.  */
> > > >  typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
> > > >
> > > > -inline loop_vec_info
> > > > -loop_vec_info_for_loop (class loop *loop)
> > > > +static inline loop_vec_info
> > > > +loop_vec_info_for_loop (const class loop *loop)
> > > >  {
> > > >    return (loop_vec_info) loop->aux;
> > > >  }
> > > > @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
> > > >  {
> > > >    if (bb == (bb->loop_father)->header)
> > > >      return true;
> > > > -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> > > > +
> > > >    return false;
> > > >  }
> > > >
> > > > @@ -2176,9 +2186,10 @@ class auto_purge_vect_location
> > > >     in tree-vect-loop-manip.cc.  */
> > > >  extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > >  				     tree, tree, tree, bool);
> > > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> const_edge);
> > > > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> > > const_edge);
> > > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > > > -						     class loop *, edge);
> > > > +						    class loop *, edge, bool,
> > > > +						    vec<basic_block> * = NULL);
> > > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);
> > > >  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > > >  				    tree *, tree *, tree *, int, bool, bool,
> > > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > > index
> > >
> a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c
> > > 67f29f5dc9a8d72235f4 100644
> > > > --- a/gcc/tree-vectorizer.cc
> > > > +++ b/gcc/tree-vectorizer.cc
> > > > @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
> > > >  	 predicates that need to be shared for optimal predicate usage.
> > > >  	 However reassoc will re-order them and prevent CSE from working
> > > >  	 as it should.  CSE only the loop body, not the entry.  */
> > > > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> 
> seeing this more and more I think we want a simple way to iterate over
> all exits without copying to a vector when we have them recorded.  My
> C++ fu is too limited to support
> 
>   for (auto exit : recorded_exits (loop))
>     ...
> 
> (maybe that's enough for somebody to jump onto this ;))
> 
> Don't treat all review comments as change orders, but it should be clear
> the code isn't 100% obvious.  Maybe the patch can be simplified by
> splitting out the LC SSA cleanup parts.
> 
> Thanks,
> Richard.
> 
> > > > +      for (edge exit : exits)
> > > > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> > > >
> > > >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> > > >        do_rpo_vn (fun, entry, exit_bbs);
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg,
> > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > > Moerman;
> > > HRB 36809 (AG Nuernberg)
> >

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
  2023-08-18 11:35         ` Tamar Christina
@ 2023-08-18 12:53           ` Richard Biener
  2023-08-18 13:12             ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-08-18 12:53 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 18 Aug 2023, Tamar Christina wrote:

> > > Yeah if you comment it out one of the testcases should fail.
> > 
> > using new_preheader instead of e->dest would make things clearer.
> > 
> > You are now adding the same arg to every exit (you've just queried the
> > main exit redirect_edge_var_map_vector).
> > 
> > OK, so I think I understand what you're doing.  If I understand
> > correctly we know that when we exit the main loop via one of the
> > early exits we are definitely going to enter the epilog but when
> > we take the main exit we might not.
> > 
> 
> Correct.. but..
> 
> > Looking at the CFG we create currently this isn't reflected and
> > this complicates this PHI node updating.  What I'd try to do
> > is leave redirecting the alternate exits until after
> 
> It is, in the case of the alternate exits this is reflected in copying
> the same values, as they are the values of the number of completed 
> iterations since the scalar code restarts the last iteration.
> 
> So all the PHI nodes of the alternate exits are correct.  The vector
> Iteration doesn't handle the partial iteration.
> 
> > slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> > means leaving it almost unchanged besides the LC SSA maintaining
> > changes.  After that for the multi-exit case split the
> > epilog preheader edge and redirect all the alternate exits to the
> > new preheader.  So the CFG becomes
> > 
> >                  <original loop>
> >                 /      |
> >                /    <main exit w/ original LC PHI>
> >               /      if (epilog)
> >    alt exits /        /  \
> >             /        /    loop around
> >             |       /
> >            preheader with "header" PHIs
> >               |
> >           <epilog>
> > 
> > note you need the header PHIs also on the main exit path but you
> > only need the loop end PHIs there.
> > 
> > It seems so that at least currently the order of things makes
> > them more complicated than necessary.
> 
> I've been trying to, but this representation seems a lot harder to work with,
> In particular at the moment once we exit slpeel_tree_duplicate_loop_to_edge_cfg
> the loop structure is exactly the same as one expects from any normal epilog vectorization.
> 
> But this new representation requires me to place the guard much earlier than the epilogue
> preheader,  yet I still have to adjust the PHI nodes in the preheader.  So it seems that this split
> is there to only indicate that we always enter the epilog when taking an early exit.
> 
> Today this is reflected in the values of the PHI nodes rather than structurally.  Once we place
> The guard we update the nodes and the alternate exits get their value for ivtmp updated to VF.
> 
> This representation also forces me to do the redirection in every call site of
> slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated in all use sites.
> 
> But I think this doesn't address the main reason why the slpeel_tree_duplicate_loop_to_edge_cfg
> code has a large block of code to deal with PHI node updates.
> 
> The reason as you mentioned somewhere else is that after we redirect the edges I have to reconstruct
> the phi nodes.  For most it's straight forwards, but for live values or vuse chains it requires extra code.
> 
> You're right in that before we redirect the edges they are all correct in the exit block, you mentioned that
> the API for the edge redirection is supposed to copy the values over if I create the phi nodes before hand.
> 
> However this doesn't seem to work:
> 
>      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> 	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
> 	{
> 	  gimple *from_phi = gsi_stmt (gsi_from);
> 	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> 	  create_phi_node (new_res, new_preheader);
> 	}
> 
>       for (edge exit : loop_exits)
> 	redirect_edge_and_branch (exit, new_preheader);
> 
> Still leaves them empty.  Grepping around most code seems to pair redirect_edge_and_branch with
> copy_phi_arg_into_existing_phi.  The problem is that in all these cases after redirecting an edge they
> call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the phi nodes.

You need to call flush_pending_stmts on each edge you redirect.
copy_phi_arg_into_existing_phi isn't suitable for edge redirecting.

> This is because as I redirect_edge_and_branch destroys the phi node entries and copy_phi_arg_into_existing_phi
> simply just reads the gimple_phi_arg_def which would be NULL.
> 
> You could point it to the src block of the exit, in which case it copies the wrong values in for the vuses.  At the end
> of vectorization the cfgcleanup code does the same thing to maintain LCSSA if you haven't.  This code always goes
> wrong for multiple exits because of the problem described above.  There's no node for it to copy the right value
> from.
> 
> As an alternate approach I can split the exit edges, copy the phi nodes into the split and after that redirect them.
> This however creates the awkwardness of having the exit edges no longer connect to the preheader.
> 
> All of this then begs the question if this is all easier than the current approach which is just to read the edge var
> map to figure out the nodes that were removed during the redirect.

But the edge map is supposed to be applied via flush_pending_stmts,
specifically it relies on PHI nodes having a 1:1 correspondence between
old and new destination and thus is really designed for the case
you copy the destination and redirect an edge to the copy.

That is, the main issue I have with the CFG manipulation is that it
isn't broken down to simple operations that in themselves leave
everything correct.  I think it should be possible to do this
and as 2nd step only do the special massaging for the early exit
LC PHIs that feed into the epilogue loop.

As you say the code is quite complicated even without early break
vectorization which is why I originally suggested to try "fixing" it
as prerequesite.  It does have the same fundamental issue when feeding
the epilogue - the "missing" LC PHIs, the difference is only that
without early break vectorization we take the exit values while
for early break vectorization we take the latch values from the
previous iteration(?)

> Maybe I'm still misunderstanding the API, but reading the sources of the functions, they all copy values from *existing*
> phi nodes.  And any existing phi node after the redirect are not correct.
> 
> gimple_redirect_edge_and_branch has a chunk that indicates it should have updated the PHI nodes before calling
> ssa_redirect_edge to remove the old ones, but there's no code there. It's all empty.
> 
> Most of the other refactorings/changes were easy enough to do, but this one I seem to be a struggling with.

I see.  If you are tired of trying feel free to send an updated series
with the other changes, if it looks awkward but correct we can see
someone doing the cleanup afterwards.

Richard.

> Thanks,
> Tamar
> > 
> > > >
> > > > >  	}
> > > > > -      redirect_edge_and_branch_force (e, new_preheader);
> > > > > -      flush_pending_stmts (e);
> > > > > +
> > > > >        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e-
> > >src);
> > > > > -      if (was_imm_dom || duplicate_outer_loop)
> > > > > +
> > > > > +      if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p)
> > > > >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit-
> > > > >src);
> > > > >
> > > > >        /* And remove the non-necessary forwarder again.  Keep the other
> > > > > @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > (class
> > > > loop *loop,
> > > > >        delete_basic_block (preheader);
> > > > >        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
> > > > >  			       loop_preheader_edge (scalar_loop)->src);
> > > > > +
> > > > > +      /* Finally after wiring the new epilogue we need to update its main
> > exit
> > > > > +	 to the original function exit we recorded.  Other exits are already
> > > > > +	 correct.  */
> > > > > +      if (multiple_exits_p)
> > > > > +	{
> > > > > +	  for (edge e : get_loop_exit_edges (loop))
> > > > > +	    doms.safe_push (e->dest);
> > > > > +	  update_loop = new_loop;
> > > > > +	  doms.safe_push (exit_dest);
> > > > > +
> > > > > +	  /* Likely a fall-through edge, so update if needed.  */
> > > > > +	  if (single_succ_p (exit_dest))
> > > > > +	    doms.safe_push (single_succ (exit_dest));
> > > > > +	}
> > > > >      }
> > > > >    else /* Add the copy at entry.  */
> > > > >      {
> > > > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > > > +	 block and the new loop header.  This allows us to later split the
> > > > > +	 preheader block and still find the right LC nodes.  */
> > > > > +      edge old_latch_loop = loop_latch_edge (loop);
> > > > > +      edge old_latch_init = loop_preheader_edge (loop);
> > > > > +      edge new_latch_loop = loop_latch_edge (new_loop);
> > > > > +      edge new_latch_init = loop_preheader_edge (new_loop);
> > > > > +      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),
> > > >
> > > > see above
> > > >
> > > > > +	   gsi_to = gsi_start_phis (old_latch_loop->dest);
> > > > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > > > +	{
> > > > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > > > new_latch_loop);
> > > > > +	  adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg);
> > > > > +	}
> > > > > +
> > > > >        if (scalar_loop != loop)
> > > > >  	{
> > > > >  	  /* Remove the non-necessary forwarder of scalar_loop again.  */
> > > > > @@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > (class
> > > > loop *loop,
> > > > >        delete_basic_block (new_preheader);
> > > > >        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
> > > > >  			       loop_preheader_edge (new_loop)->src);
> > > > > +
> > > > > +      if (multiple_exits_p)
> > > > > +	update_loop = loop;
> > > > >      }
> > > > >
> > > > > -  if (scalar_loop != loop)
> > > > > +  if (multiple_exits_p)
> > > > >      {
> > > > > -      /* Update new_loop->header PHIs, so that on the preheader
> > > > > -	 edge they are the ones from loop rather than scalar_loop.  */
> > > > > -      gphi_iterator gsi_orig, gsi_new;
> > > > > -      edge orig_e = loop_preheader_edge (loop);
> > > > > -      edge new_e = loop_preheader_edge (new_loop);
> > > > > -
> > > > > -      for (gsi_orig = gsi_start_phis (loop->header),
> > > > > -	   gsi_new = gsi_start_phis (new_loop->header);
> > > > > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > > > > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > > > > +      for (edge e : get_loop_exit_edges (update_loop))
> > > > >  	{
> > > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > > -	  gphi *new_phi = gsi_new.phi ();
> > > > > -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > > > > -	  location_t orig_locus
> > > > > -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > > > > -
> > > > > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > > > > +	  edge ex;
> > > > > +	  edge_iterator ei;
> > > > > +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> > > > > +	    {
> > > > > +	      /* Find the first non-fallthrough block as fall-throughs can't
> > > > > +		 dominate other blocks.  */
> > > > > +	      while ((ex->flags & EDGE_FALLTHRU)
> > 
> > For the prologue peeling any early exit we take would skip all other
> > loops so we can simply leave them and their LC PHI nodes in place.
> > We need extra PHIs only on the path to the main vector loop.  I
> > think the comment isn't accurately reflecting what we do.  In
> > fact we do not add any LC PHI nodes here but simply adjust the
> > main loop header PHI arguments?
> > 
> > > > I don't think EDGE_FALLTHRU is set correctly, what's wrong with
> > > > just using single_succ_p here?  A fallthru edge src dominates the
> > > > fallthru edge dest, so the sentence above doesn't make sense.
> > >
> > > I wanted to say, that the immediate dominator of a block is never
> > > an fall through block.  At least from what I understood from how
> > > the dominators are calculated in the code, though may have missed
> > > something.
> > 
> >  BB1
> >   |
> >  BB2
> >   |
> >  BB3
> > 
> > here the immediate dominator of BB3 is BB2 and that of BB2 is BB1.
> > 
> > > >
> > > > > +		     && single_succ_p (ex->dest))
> > > > > +		{
> > > > > +		  doms.safe_push (ex->dest);
> > > > > +		  ex = single_succ_edge (ex->dest);
> > > > > +		}
> > > > > +	      doms.safe_push (ex->dest);
> > > > > +	    }
> > > > > +	  doms.safe_push (e->dest);
> > > > >  	}
> > > > > -    }
> > > > >
> > > > > +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > > +      if (updated_doms)
> > > > > +	updated_doms->safe_splice (doms);
> > > > > +    }
> > > > >    free (new_bbs);
> > > > >    free (bbs);
> > > > >
> > > > > @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const
> > > > loop_vec_info loop_vinfo, const_edge e)
> > > > >    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> > > > >    unsigned int num_bb = loop->inner? 5 : 2;
> > > > >
> > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
> > > > > +
> > > >
> > > > I think checking the number of BBs is odd, I don't remember anything
> > > > in slpeel is specifically tied to that?  I think we can simply drop
> > > > this or do you remember anything that would depend on ->num_nodes
> > > > being only exactly 5 or 2?
> > >
> > > Never actually seemed to require it, but they're used as some check to
> > > see if there are unexpected control flow in the loop.
> > >
> > > i.e. this would say no if you have an if statement in the loop that wasn't
> > > converted.  The other part of this and the accompanying explanation is in
> > > vect_analyze_loop_form.  In the patch series I had to remove the hard
> > > num_nodes == 2 check from there because number of nodes restricted
> > > things too much.  If you have an empty fall through block, which seems to
> > > happen often between the main exit and the latch block then we'd not
> > > vectorize.
> > >
> > > So instead I now rejects loops after analyzing the gcond.  So think this check
> > > can go/needs to be different.
> > 
> > Lets remove it from this function then.
> > 
> > > >
> > > > >    /* All loops have an outer scope; the only case loop->outer is NULL is for
> > > > >       the function itself.  */
> > > > >    if (!loop_outer (loop)
> > > > > @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer
> > > > (loop_vec_info loop_vinfo,
> > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > >    basic_block update_bb = update_e->dest;
> > > > >
> > > > > +  /* For early exits we'll update the IVs in
> > > > > +     vect_update_ivs_after_early_break.  */
> > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +    return;
> > > > > +
> > > > >    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > >
> > > > >    /* Make sure there exists a single-predecessor exit bb:  */
> > > > > @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer
> > > > (loop_vec_info loop_vinfo,
> > > > >        /* Fix phi expressions in the successor bb.  */
> > > > >        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > > >      }
> > > > > +  return;
> > > >
> > > > we don't usually place a return at the end of void functions
> > > >
> > > > > +}
> > > > > +
> > > > > +/*   Function vect_update_ivs_after_early_break.
> > > > > +
> > > > > +     "Advance" the induction variables of LOOP to the value they should
> > take
> > > > > +     after the execution of LOOP.  This is currently necessary because the
> > > > > +     vectorizer does not handle induction variables that are used after the
> > > > > +     loop.  Such a situation occurs when the last iterations of LOOP are
> > > > > +     peeled, because of the early exit.  With an early exit we always peel
> > the
> > > > > +     loop.
> > > > > +
> > > > > +     Input:
> > > > > +     - LOOP_VINFO - a loop info structure for the loop that is going to be
> > > > > +		    vectorized. The last few iterations of LOOP were peeled.
> > > > > +     - LOOP - a loop that is going to be vectorized. The last few iterations
> > > > > +	      of LOOP were peeled.
> > > > > +     - VF - The loop vectorization factor.
> > > > > +     - NITERS_ORIG - the number of iterations that LOOP executes (before
> > it is
> > > > > +		     vectorized). i.e, the number of times the ivs should be
> > > > > +		     bumped.
> > > > > +     - NITERS_VECTOR - The number of iterations that the vector LOOP
> > > > executes.
> > > > > +     - UPDATE_E - a successor edge of LOOP->exit that is on the (only)
> > path
> > > > > +		  coming out from LOOP on which there are uses of the LOOP
> > > > ivs
> > > > > +		  (this is the path from LOOP->exit to epilog_loop->preheader).
> > > > > +
> > > > > +		  The new definitions of the ivs are placed in LOOP->exit.
> > > > > +		  The phi args associated with the edge UPDATE_E in the bb
> > > > > +		  UPDATE_E->dest are updated accordingly.
> > > > > +
> > > > > +     Output:
> > > > > +       - If available, the LCSSA phi node for the loop IV temp.
> > > > > +
> > > > > +     Assumption 1: Like the rest of the vectorizer, this function assumes
> > > > > +     a single loop exit that has a single predecessor.
> > > > > +
> > > > > +     Assumption 2: The phi nodes in the LOOP header and in update_bb
> > are
> > > > > +     organized in the same order.
> > > > > +
> > > > > +     Assumption 3: The access function of the ivs is simple enough (see
> > > > > +     vect_can_advance_ivs_p).  This assumption will be relaxed in the
> > future.
> > > > > +
> > > > > +     Assumption 4: Exactly one of the successors of LOOP exit-bb is on a
> > path
> > > > > +     coming out of LOOP on which the ivs of LOOP are used (this is the
> > path
> > > > > +     that leads to the epilog loop; other paths skip the epilog loop).  This
> > > > > +     path starts with the edge UPDATE_E, and its destination (denoted
> > > > update_bb)
> > > > > +     needs to have its phis updated.
> > > > > + */
> > > > > +
> > > > > +static tree
> > > > > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class
> > loop *
> > > > epilog,
> > > > > +				   poly_int64 vf, tree niters_orig,
> > > > > +				   tree niters_vector, edge update_e)
> > > > > +{
> > > > > +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +    return NULL;
> > > > > +
> > > > > +  gphi_iterator gsi, gsi1;
> > > > > +  tree ni_name, ivtmp = NULL;
> > > > > +  basic_block update_bb = update_e->dest;
> > > > > +  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > > +  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > +  basic_block exit_bb = loop_iv->dest;
> > > > > +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > +  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
> > > > > +
> > > > > +  gcc_assert (cond);
> > > > > +
> > > > > +  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> > (update_bb);
> > > > > +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > > +       gsi_next (&gsi), gsi_next (&gsi1))
> > > > > +    {
> > > > > +      tree init_expr, final_expr, step_expr;
> > > > > +      tree type;
> > > > > +      tree var, ni, off;
> > > > > +      gimple_stmt_iterator last_gsi;
> > > > > +
> > > > > +      gphi *phi = gsi1.phi ();
> > > > > +      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi,
> > loop_preheader_edge
> > > > (epilog));
> > > >
> > > > I'm confused about the setup.  update_bb looks like the block with the
> > > > loop-closed PHI nodes of 'loop' and the exit (update_e)?  How does
> > > > loop_preheader_edge (epilog) come into play here?  That would feed into
> > > > epilog->header PHIs?!
> > >
> > > We can't query the type of the phis in the block with the LC PHI nodes, so the
> > > Typical pattern seems to be that we iterate over a block that's part of the
> > loop
> > > and that would have the PHIs in the same order, just so we can get to the
> > > stmt_vec_info.
> > >
> > > >
> > > > It would be nice to name 'gsi[1]', 'update_e' and 'update_bb' in a
> > > > better way?  Is update_bb really epilog->header?!
> > > >
> > > > We're missing checking in PHI_ARG_DEF_FROM_EDGE, namely that
> > > > E->dest == gimple_bb (PHI) - we're just using E->dest_idx there
> > > > which "works" even for totally unrelated edges.
> > > >
> > > > > +      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
> > > > > +      if (!phi1)
> > > >
> > > > shouldn't that be an assert?
> > > >
> > > > > +	continue;
> > > > > +      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> > > > > +      if (dump_enabled_p ())
> > > > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +			 "vect_update_ivs_after_early_break: phi: %G",
> > > > > +			 (gimple *)phi);
> > > > > +
> > > > > +      /* Skip reduction and virtual phis.  */
> > > > > +      if (!iv_phi_p (phi_info))
> > > > > +	{
> > > > > +	  if (dump_enabled_p ())
> > > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > +			     "reduc or virtual phi. skip.\n");
> > > > > +	  continue;
> > > > > +	}
> > > > > +
> > > > > +      /* For multiple exits where we handle early exits we need to carry on
> > > > > +	 with the previous IV as loop iteration was not done because we exited
> > > > > +	 early.  As such just grab the original IV.  */
> > > > > +      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge
> > > > (loop));
> > > >
> > > > but this should be taken care of by LC SSA?
> > >
> > > It is, the comment is probably missing details, this part just scales the
> > counter
> > > from VF to scalar counts.  It's just a reminder that this scaling is done
> > differently
> > > from normal single exit vectorization.
> > >
> > > >
> > > > OK, have to continue tomorrow from here.
> > >
> > > Cheers, Thank you!
> > >
> > > Tamar
> > >
> > > >
> > > > Richard.
> > > >
> > > > > +      if (gimple_cond_lhs (cond) != phi_ssa
> > > > > +	  && gimple_cond_rhs (cond) != phi_ssa)
> > 
> > so this is a way to avoid touching the main IV?  Looks a bit fragile to
> > me.  Hmm, we're iterating over the main loop header PHIs here?
> > Can't you check, say, the relevancy of the PHI node instead?  Though
> > it might also be used as induction.  Can't it be used as alternate
> > exit like
> > 
> >   for (i)
> >    {
> >      if (i & bit)
> >        break;
> >    }
> > 
> > and would we need to adjust 'i' then?
> > 
> > > > > +	{
> > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > > +	  step_expr = unshare_expr (step_expr);
> > > > > +
> > > > > +	  /* We previously generated the new merged phi in the same BB as
> > > > the
> > > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > > +	     normal loop phi which don't take the early breaks into account.  */
> > > > > +	  final_expr = gimple_phi_result (phi1);
> > > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (),
> > > > loop_preheader_edge (loop));
> > > > > +
> > > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > > +	  /* For early break the final loop IV is:
> > > > > +	     init + (final - init) * vf which takes into account peeling
> > > > > +	     values and non-single steps.  */
> > > > > +	  off = fold_build2 (MINUS_EXPR, stype,
> > > > > +			     fold_convert (stype, final_expr),
> > > > > +			     fold_convert (stype, init_expr));
> > > > > +	  /* Now adjust for VF to get the final iteration value.  */
> > > > > +	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf));
> > > > > +
> > > > > +	  /* Adjust the value with the offset.  */
> > > > > +	  if (POINTER_TYPE_P (type))
> > > > > +	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > +	  else
> > > > > +	    ni = fold_convert (type,
> > > > > +			       fold_build2 (PLUS_EXPR, stype,
> > > > > +					    fold_convert (stype, init_expr),
> > > > > +					    off));
> > > > > +	  var = create_tmp_var (type, "tmp");
> > 
> > so how does the non-early break code deal with updating inductions?
> > And how do you avoid altering this when we flow in from the normal
> > exit?  That is, you are updating the value in the epilog loop
> > header but don't you need to instead update the value only on
> > the alternate exit edges from the main loop (and keep the not
> > updated value on the main exit edge)?
> > 
> > > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > > +	  gimple_seq new_stmts = NULL;
> > > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > > +	  if (!gsi_end_p (last_gsi))
> > > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > > +	  else
> > > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > > +
> > > > > +	  /* Fix phi expressions in the successor bb.  */
> > > > > +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > > > > +	}
> > > > > +      else
> > > > > +	{
> > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > > +	  step_expr = unshare_expr (step_expr);
> > > > > +
> > > > > +	  /* We previously generated the new merged phi in the same BB as
> > > > the
> > > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > > +	     normal loop phi which don't take the early breaks into account.  */
> > > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge
> > > > (loop));
> > > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > > +
> > > > > +	  if (vf.is_constant ())
> > > > > +	    {
> > > > > +	      ni = fold_build2 (MULT_EXPR, stype,
> > > > > +				fold_convert (stype,
> > > > > +					      niters_vector),
> > > > > +				build_int_cst (stype, vf));
> > > > > +
> > > > > +	      ni = fold_build2 (MINUS_EXPR, stype,
> > > > > +				fold_convert (stype,
> > > > > +					      niters_orig),
> > > > > +				fold_convert (stype, ni));
> > > > > +	    }
> > > > > +	  else
> > > > > +	    /* If the loop's VF isn't constant then the loop must have been
> > > > > +	       masked, so at the end of the loop we know we have finished
> > > > > +	       the entire loop and found nothing.  */
> > > > > +	    ni = build_zero_cst (stype);
> > > > > +
> > > > > +	  ni = fold_convert (type, ni);
> > > > > +	  /* We don't support variable n in this version yet.  */
> > > > > +	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> > > > > +
> > > > > +	  var = create_tmp_var (type, "tmp");
> > > > > +
> > > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > > +	  gimple_seq new_stmts = NULL;
> > > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > > +	  if (!gsi_end_p (last_gsi))
> > > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > > +	  else
> > > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > > +
> > > > > +	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
> > > > > +
> > > > > +	  for (edge exit : alt_exits)
> > > > > +	    adjust_phi_and_debug_stmts (phi1, exit,
> > > > > +					build_int_cst (TREE_TYPE (step_expr),
> > > > > +						       vf));
> > > > > +	  ivtmp = gimple_phi_result (phi1);
> > > > > +	}
> > > > > +    }
> > > > > +
> > > > > +  return ivtmp;
> > > > >  }
> > > > >
> > > > >  /* Return a gimple value containing the misalignment (measured in
> > vector
> > > > > @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf
> > > > (loop_vec_info loop_vinfo,
> > > > >
> > > > >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
> > > > >     this function searches for the corresponding lcssa phi node in exit
> > > > > -   bb of LOOP.  If it is found, return the phi result; otherwise return
> > > > > -   NULL.  */
> > > > > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
> > > > > +   return the phi result; otherwise return NULL.  */
> > > > >
> > > > >  static tree
> > > > >  find_guard_arg (class loop *loop, class loop *epilog
> > ATTRIBUTE_UNUSED,
> > > > > -		gphi *lcssa_phi)
> > > > > +		gphi *lcssa_phi, int lcssa_edge = 0)
> > > > >  {
> > > > >    gphi_iterator gsi;
> > > > >    edge e = loop->vec_loop_iv;
> > > > >
> > > > > -  gcc_assert (single_pred_p (e->dest));
> > > > >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > >      {
> > > > >        gphi *phi = gsi.phi ();
> > > > > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > > > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > > > -	return PHI_RESULT (phi);
> > > > > -    }
> > > > > -  return NULL_TREE;
> > > > > -}
> > > > > -
> > > > > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > > > FIRST/SECOND
> > > > > -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> > > > > -   edge, the two loops are arranged as below:
> > > > > -
> > > > > -       preheader_a:
> > > > > -     first_loop:
> > > > > -       header_a:
> > > > > -	 i_1 = PHI<i_0, i_2>;
> > > > > -	 ...
> > > > > -	 i_2 = i_1 + 1;
> > > > > -	 if (cond_a)
> > > > > -	   goto latch_a;
> > > > > -	 else
> > > > > -	   goto between_bb;
> > > > > -       latch_a:
> > > > > -	 goto header_a;
> > > > > -
> > > > > -       between_bb:
> > > > > -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> > > > > -
> > > > > -     second_loop:
> > > > > -       header_b:
> > > > > -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > > > > -				 or with i_2 if no LCSSA phi is created
> > > > > -				 under condition of
> > > > CREATE_LCSSA_FOR_IV_PHIS.
> > > > > -	 ...
> > > > > -	 i_4 = i_3 + 1;
> > > > > -	 if (cond_b)
> > > > > -	   goto latch_b;
> > > > > -	 else
> > > > > -	   goto exit_bb;
> > > > > -       latch_b:
> > > > > -	 goto header_b;
> > > > > -
> > > > > -       exit_bb:
> > > > > -
> > > > > -   This function creates loop closed SSA for the first loop; update the
> > > > > -   second loop's PHI nodes by replacing argument on incoming edge with
> > the
> > > > > -   result of newly created lcssa PHI nodes.  IF
> > CREATE_LCSSA_FOR_IV_PHIS
> > > > > -   is false, Loop closed ssa phis will only be created for non-iv phis for
> > > > > -   the first loop.
> > > > > -
> > > > > -   This function assumes exit bb of the first loop is preheader bb of the
> > > > > -   second loop, i.e, between_bb in the example code.  With PHIs updated,
> > > > > -   the second loop will execute rest iterations of the first.  */
> > > > > -
> > > > > -static void
> > > > > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > > > -				   class loop *first, class loop *second,
> > > > > -				   bool create_lcssa_for_iv_phis)
> > > > > -{
> > > > > -  gphi_iterator gsi_update, gsi_orig;
> > > > > -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > -
> > > > > -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > > > > -  edge second_preheader_e = loop_preheader_edge (second);
> > > > > -  basic_block between_bb = single_exit (first)->dest;
> > > > > -
> > > > > -  gcc_assert (between_bb == second_preheader_e->src);
> > > > > -  gcc_assert (single_pred_p (between_bb) && single_succ_p
> > (between_bb));
> > > > > -  /* Either the first loop or the second is the loop to be vectorized.  */
> > > > > -  gcc_assert (loop == first || loop == second);
> > > > > -
> > > > > -  for (gsi_orig = gsi_start_phis (first->header),
> > > > > -       gsi_update = gsi_start_phis (second->header);
> > > > > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > > > > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > > > > -    {
> > > > > -      gphi *orig_phi = gsi_orig.phi ();
> > > > > -      gphi *update_phi = gsi_update.phi ();
> > > > > -
> > > > > -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> > > > > -      /* Generate lcssa PHI node for the first loop.  */
> > > > > -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> > > > > -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
> > > > > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > > > > +      /* Nested loops with multiple exits can have different no# phi node
> > > > > +	 arguments between the main loop and epilog as epilog falls to the
> > > > > +	 second loop.  */
> > > > > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> > > > >  	{
> > > > > -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > > > > -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > > > > -	  add_phi_arg (lcssa_phi, arg, single_exit (first),
> > > > UNKNOWN_LOCATION);
> > > > > -	  arg = new_res;
> > > > > -	}
> > > > > -
> > > > > -      /* Update PHI node in the second loop by replacing arg on the loop's
> > > > > -	 incoming edge.  */
> > > > > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e,
> > arg);
> > > > > -    }
> > > > > -
> > > > > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > > > > -     for correct vectorization of live stmts.  */
> > > > > -  if (loop == first)
> > > > > -    {
> > > > > -      basic_block orig_exit = single_exit (second)->dest;
> > > > > -      for (gsi_orig = gsi_start_phis (orig_exit);
> > > > > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > > > -	{
> > > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > > -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> > > > > -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> > > > (orig_arg))
> > > > > -	    continue;
> > > > > -
> > > > > -	  /* Already created in the above loop.   */
> > > > > -	  if (find_guard_arg (first, second, orig_phi))
> > > > > +	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
> > > > > +	  if (TREE_CODE (var) != SSA_NAME)
> > > > >  	    continue;
> > > > >
> > > > > -	  tree new_res = copy_ssa_name (orig_arg);
> > > > > -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > > > > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> > > > UNKNOWN_LOCATION);
> > > > > +	  if (operand_equal_p (get_current_def (var),
> > > > > +			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > > > > +	    return PHI_RESULT (phi);
> > > > >  	}
> > > > >      }
> > > > > +  return NULL_TREE;
> > > > >  }
> > > > >
> > > > >  /* Function slpeel_add_loop_guard adds guard skipping from the
> > beginning
> > > > > @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2
> > (class
> > > > loop *loop, class loop *epilog,
> > > > >    gcc_assert (single_succ_p (merge_bb));
> > > > >    edge e = single_succ_edge (merge_bb);
> > > > >    basic_block exit_bb = e->dest;
> > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > > > >
> > > > >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > >      {
> > > > >        gphi *update_phi = gsi.phi ();
> > > > > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > > > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > > > >
> > > > >        tree merge_arg = NULL_TREE;
> > > > >
> > > > > @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class
> > loop
> > > > *loop, class loop *epilog,
> > > > >        if (!merge_arg)
> > > > >  	merge_arg = old_arg;
> > > > >
> > > > > -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> > > > > +      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e-
> > >dest_idx);
> > > > >        /* If the var is live after loop but not a reduction, we simply
> > > > >  	 use the old arg.  */
> > > > >        if (!guard_arg)
> > > > > @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class
> > > > loop *loop, class loop *epilog,
> > > > >      }
> > > > >  }
> > > > >
> > > > > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > > > > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > > > > -
> > > > > -static void
> > > > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > > > > -{
> > > > > -  gphi_iterator gsi;
> > > > > -  basic_block exit_bb = single_exit (epilog)->dest;
> > > > > -
> > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > -  edge e = EDGE_PRED (exit_bb, 0);
> > > > > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > > > -}
> > > > > -
> > 
> > I wonder if we can still split these changes out to before early break
> > vect?
> > 
> > > > >  /* EPILOGUE_VINFO is an epilogue loop that we now know would need
> > to
> > > > >     iterate exactly CONST_NITERS times.  Make a final decision about
> > > > >     whether the epilogue loop should be used, returning true if so.  */
> > > > > @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree
> > > > niters, tree nitersm1,
> > > > >      bound_epilog += vf - 1;
> > > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > >      bound_epilog += 1;
> > > > > +  /* For early breaks the scalar loop needs to execute at most VF times
> > > > > +     to find the element that caused the break.  */
> > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +    {
> > > > > +      bound_epilog = vf;
> > > > > +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> > > > > +      vect_epilogues = false;
> > > > > +    }
> > > > >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > > > >    poly_uint64 bound_scalar = bound_epilog;
> > > > >
> > > > > @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo,
> > > > tree niters, tree nitersm1,
> > > > >  				  bound_prolog + bound_epilog)
> > > > >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > > > >  			 || vect_epilogues));
> > > > > +
> > > > > +  /* We only support early break vectorization on known bounds at this
> > > > time.
> > > > > +     This means that if the vector loop can't be entered then we won't
> > > > generate
> > > > > +     it at all.  So for now force skip_vector off because the additional
> > control
> > > > > +     flow messes with the BB exits and we've already analyzed them.  */
> > > > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS
> > (loop_vinfo);
> > > > > +
> > 
> > I think it should be as "easy" as entering the epilog via the block taking
> > the regular exit?
> > 
> > > > >    /* Epilog loop must be executed if the number of iterations for epilog
> > > > >       loop is known at compile time, otherwise we need to add a check at
> > > > >       the end of vector loop and skip to the end of epilog loop.  */
> > > > >    bool skip_epilog = (prolog_peeling < 0
> > > > >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > > >  		      || !vf.is_constant ());
> > > > > -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.
> > */
> > > > > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > > +  /* PEELING_FOR_GAPS and peeling for early breaks are special because
> > > > epilog
> > > > > +     loop must be executed.  */
> > > > > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > >      skip_epilog = false;
> > > > > -
> > > > >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > > >    auto_vec<profile_count> original_counts;
> > > > >    basic_block *original_bbs = NULL;
> > > > > @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo,
> > > > tree niters, tree nitersm1,
> > > > >    if (prolog_peeling)
> > > > >      {
> > > > >        e = loop_preheader_edge (loop);
> > > > > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > > > > -
> > > > > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
> > > > >        /* Peel prolog and put it on preheader edge of loop.  */
> > > > > -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
> > e);
> > > > > +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
> > e,
> > > > > +						       true);
> > > > >        gcc_assert (prolog);
> > > > >        prolog->force_vectorize = false;
> > > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> > > > > +
> > > > >        first_loop = prolog;
> > > > >        reset_original_copy_tables ();
> > > > >
> > > > > @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo,
> > > > tree niters, tree nitersm1,
> > > > >  	 as the transformations mentioned above make less or no sense when
> > > > not
> > > > >  	 vectorizing.  */
> > > > >        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> > > > > -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > > > > +      auto_vec<basic_block> doms;
> > > > > +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e,
> > true,
> > > > > +						       &doms);
> > > > >        gcc_assert (epilog);
> > > > >
> > > > >        epilog->force_vectorize = false;
> > > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
> > > > >
> > > > >        /* Scalar version loop may be preferred.  In this case, add guard
> > > > >  	 and skip to epilog.  Note this only happens when the number of
> > > > > @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree
> > > > niters, tree nitersm1,
> > > > >        vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > > > >  					update_e);
> > > > >
> > > > > +      /* For early breaks we must create a guard to check how many
> > iterations
> > > > > +	 of the scalar loop are yet to be performed.  */
> > 
> > We have this check anyway, no?  In fact don't we know that we always enter
> > the epilog (see above)?
> > 
> > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +	{
> > > > > +	  tree ivtmp =
> > > > > +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters,
> > > > > +					       *niters_vector, update_e);
> > > > > +
> > > > > +	  gcc_assert (ivtmp);
> > > > > +	  tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > > +					 fold_convert (TREE_TYPE (niters),
> > > > > +						       ivtmp),
> > > > > +					 build_zero_cst (TREE_TYPE (niters)));
> > > > > +	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > > +
> > > > > +	  /* If we had a fallthrough edge, the guard will the threaded through
> > > > > +	     and so we may need to find the actual final edge.  */
> > > > > +	  edge final_edge = epilog->vec_loop_iv;
> > > > > +	  /* slpeel_update_phi_nodes_for_guard2 expects an empty block in
> > > > > +	     between the guard and the exit edge.  It only adds new nodes and
> > > > > +	     doesn't update existing one in the current scheme.  */
> > > > > +	  basic_block guard_to = split_edge (final_edge);
> > > > > +	  edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
> > > > guard_to,
> > > > > +						guard_bb, prob_epilog.invert
> > > > (),
> > > > > +						irred_flag);
> > > > > +	  doms.safe_push (guard_bb);
> > > > > +
> > > > > +	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > > +
> > > > > +	  /* We must update all the edges from the new guard_bb.  */
> > > > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > > > +					      final_edge);
> > > > > +
> > > > > +	  /* If the loop was versioned we'll have an intermediate BB between
> > > > > +	     the guard and the exit.  This intermediate block is required
> > > > > +	     because in the current scheme of things the guard block phi
> > > > > +	     updating can only maintain LCSSA by creating new blocks.  In this
> > > > > +	     case we just need to update the uses in this block as well.  */
> > > > > +	  if (loop != scalar_loop)
> > > > > +	    {
> > > > > +	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
> > > > > +		   !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > +		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (),
> > > > guard_e));
> > > > > +	    }
> > > > > +
> > > > > +	  flush_pending_stmts (guard_e);
> > > > > +	}
> > > > > +
> > > > >        if (skip_epilog)
> > > > >  	{
> > > > >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > > @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > tree
> > > > niters, tree nitersm1,
> > > > >  	    }
> > > > >  	  scale_loop_profile (epilog, prob_epilog, 0);
> > > > >  	}
> > > > > -      else
> > > > > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> > > > >
> > > > >        unsigned HOST_WIDE_INT bound;
> > > > >        if (bound_scalar.is_constant (&bound))
> > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > index
> > > >
> > b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49
> > > > f21421958e7ee25eada 100644
> > > > > --- a/gcc/tree-vect-loop.cc
> > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop
> > > > *loop_in, vec_info_shared *shared)
> > > > >      partial_load_store_bias (0),
> > > > >      peeling_for_gaps (false),
> > > > >      peeling_for_niter (false),
> > > > > +    early_breaks (false),
> > > > > +    non_break_control_flow (false),
> > > > >      no_data_dependencies (false),
> > > > >      has_mask_store (false),
> > > > >      scalar_loop_scaling (profile_probability::uninitialized ()),
> > > > > @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p
> > > > (loop_vec_info loop_vinfo)
> > > > >      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> > > > (LOOP_VINFO_ORIG_LOOP_INFO
> > > > >  					  (loop_vinfo));
> > > > >
> > > > > +  /* When we have multiple exits and VF is unknown, we must require
> > > > partial
> > > > > +     vectors because the loop bounds is not a minimum but a maximum.
> > > > That is to
> > > > > +     say we cannot unpredicate the main loop unless we peel or use partial
> > > > > +     vectors in the epilogue.  */
> > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > > > > +    return true;
> > > > > +
> > > > >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > > >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> > > > >      {
> > > > > @@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost
> > > > (loop_vec_info loop_vinfo)
> > > > >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> > > > >  }
> > > > >
> > > > > -
> > > > >  /* Function vect_analyze_loop_form.
> > > > >
> > > > >     Verify that certain CFG restrictions hold, including:
> > > > >     - the loop has a pre-header
> > > > > -   - the loop has a single entry and exit
> > > > > +   - the loop has a single entry
> > > > > +   - nested loops can have only a single exit.
> > > > >     - the loop exit condition is simple enough
> > > > >     - the number of iterations can be analyzed, i.e, a countable loop.  The
> > > > >       niter could be analyzed under some assumptions.  */
> > > > > @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop,
> > > > vect_loop_form_info *info)
> > > > >                             |
> > > > >                          (exit-bb)  */
> > > > >
> > > > > -      if (loop->num_nodes != 2)
> > > > > -	return opt_result::failure_at (vect_location,
> > > > > -				       "not vectorized:"
> > > > > -				       " control flow in loop.\n");
> > > > > -
> > > > >        if (empty_block_p (loop->header))
> > > > >  	return opt_result::failure_at (vect_location,
> > > > >  				       "not vectorized: empty loop.\n");
> > > > > @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop,
> > > > vect_loop_form_info *info)
> > > > >          dump_printf_loc (MSG_NOTE, vect_location,
> > > > >  			 "Considering outer-loop vectorization.\n");
> > > > >        info->inner_loop_cond = inner.loop_cond;
> > > > > +
> > > > > +      if (!single_exit (loop))
> > > > > +	return opt_result::failure_at (vect_location,
> > > > > +				       "not vectorized: multiple exits.\n");
> > > > > +
> > > > >      }
> > > > >
> > > > > -  if (!single_exit (loop))
> > > > > -    return opt_result::failure_at (vect_location,
> > > > > -				   "not vectorized: multiple exits.\n");
> > > > >    if (EDGE_COUNT (loop->header->preds) != 2)
> > > > >      return opt_result::failure_at (vect_location,
> > > > >  				   "not vectorized:"
> > > > > @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop,
> > > > vect_loop_form_info *info)
> > > > >  				   "not vectorized: latch block not empty.\n");
> > > > >
> > > > >    /* Make sure the exit is not abnormal.  */
> > > > > -  edge e = single_exit (loop);
> > > > > -  if (e->flags & EDGE_ABNORMAL)
> > > > > -    return opt_result::failure_at (vect_location,
> > > > > -				   "not vectorized:"
> > > > > -				   " abnormal loop exit edge.\n");
> > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > +  edge nexit = loop->vec_loop_iv;
> > > > > +  for (edge e : exits)
> > > > > +    {
> > > > > +      if (e->flags & EDGE_ABNORMAL)
> > > > > +	return opt_result::failure_at (vect_location,
> > > > > +				       "not vectorized:"
> > > > > +				       " abnormal loop exit edge.\n");
> > > > > +      /* Early break BB must be after the main exit BB.  In theory we should
> > > > > +	 be able to vectorize the inverse order, but the current flow in the
> > > > > +	 the vectorizer always assumes you update successor PHI nodes, not
> > > > > +	 preds.  */
> > > > > +      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e-
> > > > >src))
> > > > > +	return opt_result::failure_at (vect_location,
> > > > > +				       "not vectorized:"
> > > > > +				       " abnormal loop exit edge order.\n");
> > 
> > "unsupported loop exit order", but I don't understand the comment.
> > 
> > > > > +    }
> > > > > +
> > > > > +  /* We currently only support early exit loops with known bounds.   */
> > 
> > Btw, why's that?  Is that because we don't support the loop-around edge?
> > IMHO this is the most serious limitation (and as said above it should be
> > trivial to fix).
> > 
> > > > > +  if (exits.length () > 1)
> > > > > +    {
> > > > > +      class tree_niter_desc niter;
> > > > > +      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter,
> > NULL)
> > > > > +	  || chrec_contains_undetermined (niter.niter)
> > > > > +	  || !evolution_function_is_constant_p (niter.niter))
> > > > > +	return opt_result::failure_at (vect_location,
> > > > > +				       "not vectorized:"
> > > > > +				       " early breaks only supported on loops"
> > > > > +				       " with known iteration bounds.\n");
> > > > > +    }
> > > > >
> > > > >    info->conds
> > > > >      = vect_get_loop_niters (loop, &info->assumptions,
> > > > > @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop,
> > > > vec_info_shared *shared,
> > > > >    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info-
> > > > >alt_loop_conds);
> > > > >    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
> > > > >
> > > > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > > > +
> > > > >    if (info->inner_loop_cond)
> > > > >      {
> > > > >        stmt_vec_info inner_loop_cond_info
> > > > > @@ -3070,7 +3106,8 @@ start_over:
> > > > >
> > > > >    /* If an epilogue loop is required make sure we can create one.  */
> > > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > > > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > >      {
> > > > >        if (dump_enabled_p ())
> > > > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop
> > required\n");
> > > > > @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction
> > (loop_vec_info
> > > > loop_vinfo,
> > > > >    basic_block exit_bb;
> > > > >    tree scalar_dest;
> > > > >    tree scalar_type;
> > > > > -  gimple *new_phi = NULL, *phi;
> > > > > +  gimple *new_phi = NULL, *phi = NULL;
> > > > >    gimple_stmt_iterator exit_gsi;
> > > > >    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> > > > >    gimple *epilog_stmt = NULL;
> > > > > @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction
> > > > (loop_vec_info loop_vinfo,
> > > > >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> > > > >  	  reduc_inputs.quick_push (new_def);
> > > > >  	}
> > > > > +
> > > > > +	/* Update the other exits.  */
> > > > > +	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +	  {
> > > > > +	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > > +	    gphi_iterator gsi, gsi1;
> > > > > +	    for (edge exit : alt_exits)
> > > > > +	      {
> > > > > +		/* Find the phi node to propaget into the exit block for each
> > > > > +		   exit edge.  */
> > > > > +		for (gsi = gsi_start_phis (exit_bb),
> > > > > +		     gsi1 = gsi_start_phis (exit->src);
> > 
> > exit->src == loop->header, right?  I think this won't work for multiple
> > alternate exits.  It's probably easier to do this where we create the
> > LC PHI node for the reduction result?
> > 
> > > > > +		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > > +		     gsi_next (&gsi), gsi_next (&gsi1))
> > > > > +		  {
> > > > > +		    /* There really should be a function to just get the number
> > > > > +		       of phis inside a bb.  */
> > > > > +		    if (phi && phi == gsi.phi ())
> > > > > +		      {
> > > > > +			gphi *phi1 = gsi1.phi ();
> > > > > +			SET_PHI_ARG_DEF (phi, exit->dest_idx,
> > > > > +					 PHI_RESULT (phi1));
> > 
> > I think we know the header PHI of a reduction perfectly well, there
> > shouldn't be the need to "search" for it.
> > 
> > > > > +			break;
> > > > > +		      }
> > > > > +		  }
> > > > > +	      }
> > > > > +	  }
> > > > >        gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > > > >      }
> > > > >
> > > > > @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info
> > *vinfo,
> > > > >  	   new_tree = lane_extract <vec_lhs', ...>;
> > > > >  	   lhs' = new_tree;  */
> > > > >
> > > > > +      /* When vectorizing an early break, any live statement that is used
> > > > > +	 outside of the loop are dead.  The loop will never get to them.
> > > > > +	 We could change the liveness value during analysis instead but since
> > > > > +	 the below code is invalid anyway just ignore it during codegen.  */
> > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +	return true;
> > 
> > But what about the value that's live across the main exit when the
> > epilogue is not entered?
> > 
> > > > > +
> > > > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > >        basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > >        gcc_assert (single_pred_p (exit_bb));
> > > > > @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info
> > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > > > >       versioning.   */
> > > > >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > -  if (! single_pred_p (e->dest))
> > > > > +  if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> > > > (loop_vinfo))
> > 
> > e can be NULL here?  I think we should reject such loops earlier.
> > 
> > > > >      {
> > > > >        split_loop_exit_edge (e, true);
> > > > >        if (dump_enabled_p ())
> > > > > @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info
> > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > > > >      {
> > > > >        e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > > > -      if (! single_pred_p (e->dest))
> > > > > +      if (e && ! single_pred_p (e->dest))
> > > > >  	{
> > > > >  	  split_loop_exit_edge (e, true);
> > > > >  	  if (dump_enabled_p ())
> > > > > @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info
> > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > >
> > > > >    /* Loops vectorized with a variable factor won't benefit from
> > > > >       unrolling/peeling.  */
> > 
> > update the comment?  Why would we unroll a VLA loop with early breaks?
> > Or did you mean to use || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)?
> > 
> > > > > -  if (!vf.is_constant ())
> > > > > +  if (!vf.is_constant ()
> > > > > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > >      {
> > > > >        loop->unroll = 1;
> > > > >        if (dump_enabled_p ())
> > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > index
> > > >
> > 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82af
> > > > e78ee2a7c627be45b 100644
> > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info
> > stmt_info,
> > > > loop_vec_info loop_vinfo,
> > > > >    *live_p = false;
> > > > >
> > > > >    /* cond stmt other than loop exit cond.  */
> > > > > -  if (is_ctrl_stmt (stmt_info->stmt)
> > > > > -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> > > > > -    *relevant = vect_used_in_scope;
> > 
> > how was that ever hit before?  For outer loop processing with outer loop
> > vectorization?
> > 
> > > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > > +    {
> > > > > +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge,
> > but
> > > > > +	 it looks like loop_manip doesn't do that..  So we have to do it
> > > > > +	 the hard way.  */
> > > > > +      basic_block bb = gimple_bb (stmt_info->stmt);
> > > > > +      bool exit_bb = false, early_exit = false;
> > > > > +      edge_iterator ei;
> > > > > +      edge e;
> > > > > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > > > > +        if (!flow_bb_inside_loop_p (loop, e->dest))
> > > > > +	  {
> > > > > +	    exit_bb = true;
> > > > > +	    early_exit = loop->vec_loop_iv->src != bb;
> > > > > +	    break;
> > > > > +	  }
> > > > > +
> > > > > +      /* We should have processed any exit edge, so an edge not an early
> > > > > +	 break must be a loop IV edge.  We need to distinguish between the
> > > > > +	 two as we don't want to generate code for the main loop IV.  */
> > > > > +      if (exit_bb)
> > > > > +	{
> > > > > +	  if (early_exit)
> > > > > +	    *relevant = vect_used_in_scope;
> > > > > +	}
> > 
> > I wonder why you can't simply do
> > 
> >          if (is_ctrl_stmt (stmt_info->stmt)
> >              && stmt_info->stmt != LOOP_VINFO_COND (loop_info))
> > 
> > ?
> > 
> > > > > +      else if (bb->loop_father == loop)
> > > > > +	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;
> > 
> > so for control flow not exiting the loop you can check
> > loop_exits_from_bb_p ().
> > 
> > > > > +    }
> > > > >
> > > > >    /* changing memory.  */
> > > > >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> > > > > @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info
> > stmt_info,
> > > > loop_vec_info loop_vinfo,
> > > > >  	*relevant = vect_used_in_scope;
> > > > >        }
> > > > >
> > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > +  auto_bitmap exit_bbs;
> > > > > +  for (edge exit : exits)
> > > > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > > +
> > > > >    /* uses outside the loop.  */
> > > > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > > SSA_OP_DEF)
> > > > >      {
> > > > > @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > > > loop_vec_info loop_vinfo,
> > > > >  	      /* We expect all such uses to be in the loop exit phis
> > > > >  		 (because of loop closed form)   */
> > > > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > > > > -	      gcc_assert (bb == single_exit (loop)->dest);
> > > > > +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
> > 
> > That now becomes quite expensive checking already covered by the LC SSA
> > verifier so I suggest to simply drop this assert instead.
> > 
> > > > >                *live_p = true;
> > > > >  	    }
> > > > > @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized
> > > > (loop_vec_info loop_vinfo, bool *fatal)
> > > > >  	}
> > > > >      }
> > > > >
> > > > > +  /* Ideally this should be in vect_analyze_loop_form but we haven't
> > seen all
> > > > > +     the conds yet at that point and there's no quick way to retrieve them.
> > */
> > > > > +  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
> > > > > +    return opt_result::failure_at (vect_location,
> > > > > +				   "not vectorized:"
> > > > > +				   " unsupported control flow in loop.\n");
> > 
> > so we didn't do this before?  But see above where I wondered.  So when
> > does this hit with early exits and why can't we check for this in
> > vect_verify_loop_form?
> > 
> > > > > +
> > > > >    /* 2. Process_worklist */
> > > > >    while (worklist.length () > 0)
> > > > >      {
> > > > > @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized
> > > > (loop_vec_info loop_vinfo, bool *fatal)
> > > > >  			return res;
> > > > >  		    }
> > > > >                   }
> > > > > +	    }
> > > > > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> > > > > +	    {
> > > > > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > > > > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > > > > +	      opt_result res
> > > > > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > > > +			       loop_vinfo, relevant, &worklist, false);
> > > > > +	      if (!res)
> > > > > +		return res;
> > > > > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > > > +				loop_vinfo, relevant, &worklist, false);
> > > > > +	      if (!res)
> > > > > +		return res;
> > > > >              }
> > > > >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> > > > >  	    {
> > > > > @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > >  			     node_instance, cost_vec);
> > > > >        if (!res)
> > > > >  	return res;
> > > > > -   }
> > > > > +    }
> > > > > +
> > > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > > > >
> > > > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > > >      {
> > > > >        case vect_internal_def:
> > > > > +      case vect_early_exit_def:
> > > > >          break;
> > > > >
> > > > >        case vect_reduction_def:
> > > > > @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > >      {
> > > > >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > > > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > > > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > > > >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> > > > >        *need_to_vectorize = true;
> > > > >      }
> > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > > index
> > > >
> > ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b3
> > > > 40baff61d18da8e242dd 100644
> > > > > --- a/gcc/tree-vectorizer.h
> > > > > +++ b/gcc/tree-vectorizer.h
> > > > > @@ -63,6 +63,7 @@ enum vect_def_type {
> > > > >    vect_internal_def,
> > > > >    vect_induction_def,
> > > > >    vect_reduction_def,
> > > > > +  vect_early_exit_def,
> > 
> > can you avoid putting this inbetween reduction and double reduction
> > please?  Just put it before vect_unknown_def_type.  In fact the COND
> > isn't a def ... maybe we should have pattern recogized
> > 
> >  if (a < b) exit;
> > 
> > as
> > 
> >  cond = a < b;
> >  if (cond != 0) exit;
> > 
> > so the part that we need to vectorize is more clear.
> > 
> > > > >    vect_double_reduction_def,
> > > > >    vect_nested_cycle,
> > > > >    vect_first_order_recurrence,
> > > > > @@ -876,6 +877,13 @@ public:
> > > > >       we need to peel off iterations at the end to form an epilogue loop.  */
> > > > >    bool peeling_for_niter;
> > > > >
> > > > > +  /* When the loop has early breaks that we can vectorize we need to
> > peel
> > > > > +     the loop for the break finding loop.  */
> > > > > +  bool early_breaks;
> > > > > +
> > > > > +  /* When the loop has a non-early break control flow inside.  */
> > > > > +  bool non_break_control_flow;
> > > > > +
> > > > >    /* List of loop additional IV conditionals found in the loop.  */
> > > > >    auto_vec<gcond *> conds;
> > > > >
> > > > > @@ -985,9 +993,11 @@ public:
> > > > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> > > > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> > > > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > > > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > > > >  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> > > > >early_break_conflict
> > > > >  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)-
> > >early_break_dest_bb
> > > > >  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> > > > > +#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)-
> > > > >non_break_control_flow
> > > > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > > > >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > > > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > > > >no_data_dependencies
> > > > > @@ -1038,8 +1048,8 @@ public:
> > > > >     stack.  */
> > > > >  typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
> > > > >
> > > > > -inline loop_vec_info
> > > > > -loop_vec_info_for_loop (class loop *loop)
> > > > > +static inline loop_vec_info
> > > > > +loop_vec_info_for_loop (const class loop *loop)
> > > > >  {
> > > > >    return (loop_vec_info) loop->aux;
> > > > >  }
> > > > > @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
> > > > >  {
> > > > >    if (bb == (bb->loop_father)->header)
> > > > >      return true;
> > > > > -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> > > > > +
> > > > >    return false;
> > > > >  }
> > > > >
> > > > > @@ -2176,9 +2186,10 @@ class auto_purge_vect_location
> > > > >     in tree-vect-loop-manip.cc.  */
> > > > >  extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > > >  				     tree, tree, tree, bool);
> > > > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> > const_edge);
> > > > > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> > > > const_edge);
> > > > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > > > > -						     class loop *, edge);
> > > > > +						    class loop *, edge, bool,
> > > > > +						    vec<basic_block> * = NULL);
> > > > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);
> > > > >  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > > > >  				    tree *, tree *, tree *, int, bool, bool,
> > > > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > > > index
> > > >
> > a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c
> > > > 67f29f5dc9a8d72235f4 100644
> > > > > --- a/gcc/tree-vectorizer.cc
> > > > > +++ b/gcc/tree-vectorizer.cc
> > > > > @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
> > > > >  	 predicates that need to be shared for optimal predicate usage.
> > > > >  	 However reassoc will re-order them and prevent CSE from working
> > > > >  	 as it should.  CSE only the loop body, not the entry.  */
> > > > > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> > 
> > seeing this more and more I think we want a simple way to iterate over
> > all exits without copying to a vector when we have them recorded.  My
> > C++ fu is too limited to support
> > 
> >   for (auto exit : recorded_exits (loop))
> >     ...
> > 
> > (maybe that's enough for somebody to jump onto this ;))
> > 
> > Don't treat all review comments as change orders, but it should be clear
> > the code isn't 100% obvious.  Maybe the patch can be simplified by
> > splitting out the LC SSA cleanup parts.
> > 
> > Thanks,
> > Richard.
> > 
> > > > > +      for (edge exit : exits)
> > > > > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > >
> > > > >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> > > > >        do_rpo_vn (fun, entry, exit_bbs);
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > Nuernberg,
> > > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > > > Moerman;
> > > > HRB 36809 (AG Nuernberg)
> > >
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
  2023-08-18 12:53           ` Richard Biener
@ 2023-08-18 13:12             ` Tamar Christina
  2023-08-18 13:15               ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-08-18 13:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, August 18, 2023 2:53 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Fri, 18 Aug 2023, Tamar Christina wrote:
> 
> > > > Yeah if you comment it out one of the testcases should fail.
> > >
> > > using new_preheader instead of e->dest would make things clearer.
> > >
> > > You are now adding the same arg to every exit (you've just queried the
> > > main exit redirect_edge_var_map_vector).
> > >
> > > OK, so I think I understand what you're doing.  If I understand
> > > correctly we know that when we exit the main loop via one of the
> > > early exits we are definitely going to enter the epilog but when
> > > we take the main exit we might not.
> > >
> >
> > Correct.. but..
> >
> > > Looking at the CFG we create currently this isn't reflected and
> > > this complicates this PHI node updating.  What I'd try to do
> > > is leave redirecting the alternate exits until after
> >
> > It is, in the case of the alternate exits this is reflected in copying
> > the same values, as they are the values of the number of completed
> > iterations since the scalar code restarts the last iteration.
> >
> > So all the PHI nodes of the alternate exits are correct.  The vector
> > Iteration doesn't handle the partial iteration.
> >
> > > slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> > > means leaving it almost unchanged besides the LC SSA maintaining
> > > changes.  After that for the multi-exit case split the
> > > epilog preheader edge and redirect all the alternate exits to the
> > > new preheader.  So the CFG becomes
> > >
> > >                  <original loop>
> > >                 /      |
> > >                /    <main exit w/ original LC PHI>
> > >               /      if (epilog)
> > >    alt exits /        /  \
> > >             /        /    loop around
> > >             |       /
> > >            preheader with "header" PHIs
> > >               |
> > >           <epilog>
> > >
> > > note you need the header PHIs also on the main exit path but you
> > > only need the loop end PHIs there.
> > >
> > > It seems so that at least currently the order of things makes
> > > them more complicated than necessary.
> >
> > I've been trying to, but this representation seems a lot harder to work with,
> > In particular at the moment once we exit
> slpeel_tree_duplicate_loop_to_edge_cfg
> > the loop structure is exactly the same as one expects from any normal epilog
> vectorization.
> >
> > But this new representation requires me to place the guard much earlier than
> the epilogue
> > preheader,  yet I still have to adjust the PHI nodes in the preheader.  So it
> seems that this split
> > is there to only indicate that we always enter the epilog when taking an early
> exit.
> >
> > Today this is reflected in the values of the PHI nodes rather than structurally.
> Once we place
> > The guard we update the nodes and the alternate exits get their value for
> ivtmp updated to VF.
> >
> > This representation also forces me to do the redirection in every call site of
> > slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated
> in all use sites.
> >
> > But I think this doesn't address the main reason why the
> slpeel_tree_duplicate_loop_to_edge_cfg
> > code has a large block of code to deal with PHI node updates.
> >
> > The reason as you mentioned somewhere else is that after we redirect the
> edges I have to reconstruct
> > the phi nodes.  For most it's straight forwards, but for live values or vuse
> chains it requires extra code.
> >
> > You're right in that before we redirect the edges they are all correct in the exit
> block, you mentioned that
> > the API for the edge redirection is supposed to copy the values over if I
> create the phi nodes before hand.
> >
> > However this doesn't seem to work:
> >
> >      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> > 	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
> > 	{
> > 	  gimple *from_phi = gsi_stmt (gsi_from);
> > 	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > 	  create_phi_node (new_res, new_preheader);
> > 	}
> >
> >       for (edge exit : loop_exits)
> > 	redirect_edge_and_branch (exit, new_preheader);
> >
> > Still leaves them empty.  Grepping around most code seems to pair
> redirect_edge_and_branch with
> > copy_phi_arg_into_existing_phi.  The problem is that in all these cases after
> redirecting an edge they
> > call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the phi
> nodes.
> 
> You need to call flush_pending_stmts on each edge you redirect.
> copy_phi_arg_into_existing_phi isn't suitable for edge redirecting.

Oh. I'll give that a try, that would make sense.. I didn't flush it in the current approach
because I needed the map, but since I want to get rid of the map, this makes sense.

> 
> > This is because as I redirect_edge_and_branch destroys the phi node entries
> and copy_phi_arg_into_existing_phi
> > simply just reads the gimple_phi_arg_def which would be NULL.
> >
> > You could point it to the src block of the exit, in which case it copies the
> wrong values in for the vuses.  At the end
> > of vectorization the cfgcleanup code does the same thing to maintain LCSSA
> if you haven't.  This code always goes
> > wrong for multiple exits because of the problem described above.  There's no
> node for it to copy the right value
> > from.
> >
> > As an alternate approach I can split the exit edges, copy the phi nodes into
> the split and after that redirect them.
> > This however creates the awkwardness of having the exit edges no longer
> connect to the preheader.
> >
> > All of this then begs the question if this is all easier than the current approach
> which is just to read the edge var
> > map to figure out the nodes that were removed during the redirect.
> 
> But the edge map is supposed to be applied via flush_pending_stmts,
> specifically it relies on PHI nodes having a 1:1 correspondence between
> old and new destination and thus is really designed for the case
> you copy the destination and redirect an edge to the copy.

Ah you were referring to flush_pending_stmts,  right that would make more sense.
Will give it  a go, thanks!

> 
> That is, the main issue I have with the CFG manipulation is that it
> isn't broken down to simple operations that in themselves leave
> everything correct.  I think it should be possible to do this
> and as 2nd step only do the special massaging for the early exit
> LC PHIs that feed into the epilogue loop.
> 

I see, So I think, If I understand what you're saying correctly, is that
you would like instead of having 1 big function, to have little smaller
helper functions that each on their own do something small but
correct and when called in sequence perform the complicated work?

i.e. you still have one top level function, and at the end of that call
everything is in order? 

> As you say the code is quite complicated even without early break
> vectorization which is why I originally suggested to try "fixing" it
> as prerequesite.  It does have the same fundamental issue when feeding
> the epilogue - the "missing" LC PHIs, the difference is only that
> without early break vectorization we take the exit values while
> for early break vectorization we take the latch values from the
> previous iteration(?)

Ok, If I understood the above correctly with how you wanted the sequence
split out, then would this work for you:

I pull out from the patch series:

1. the single_exit removal and using of our own IV in the vectorizer.
2. The refactoring of the current peeling code, without new functionality but
     splitting it up into the logical little helper functions.

And get those committed separately and then rebase on the early break stuff
which will add handling the additional case for multiple exits in the peeling step?

I already did 1, I didn't do 2 because the function was reworked in one go to both
clean it up and support multiple exits.  But would make sense to pull them out.

Is it ok to also pull out the vectorable_comparison refactoring as well? I haven't
committed it yet because without the early_break support it doesn't look useful
but rebases are tricky when it changes.

> 
> > Maybe I'm still misunderstanding the API, but reading the sources of the
> functions, they all copy values from *existing*
> > phi nodes.  And any existing phi node after the redirect are not correct.
> >
> > gimple_redirect_edge_and_branch has a chunk that indicates it should have
> updated the PHI nodes before calling
> > ssa_redirect_edge to remove the old ones, but there's no code there. It's all
> empty.
> >
> > Most of the other refactorings/changes were easy enough to do, but this
> one I seem to be a struggling with.
> 
> I see.  If you are tired of trying feel free to send an updated series
> with the other changes, if it looks awkward but correct we can see
> someone doing the cleanup afterwards.
> 

Thanks, I'm not tired yet 😊, just confused, and I think it's clearer now.  If you agree
with the above I can pull them out of the series.

Cheers,
Tamar

> Richard.
> 
> > Thanks,
> > Tamar
> > >
> > > > >
> > > > > >  	}
> > > > > > -      redirect_edge_and_branch_force (e, new_preheader);
> > > > > > -      flush_pending_stmts (e);
> > > > > > +
> > > > > >        set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> e-
> > > >src);
> > > > > > -      if (was_imm_dom || duplicate_outer_loop)
> > > > > > +
> > > > > > +      if ((was_imm_dom || duplicate_outer_loop) &&
> !multiple_exits_p)
> > > > > >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest,
> new_exit-
> > > > > >src);
> > > > > >
> > > > > >        /* And remove the non-necessary forwarder again.  Keep the
> other
> > > > > > @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > > (class
> > > > > loop *loop,
> > > > > >        delete_basic_block (preheader);
> > > > > >        set_immediate_dominator (CDI_DOMINATORS, scalar_loop-
> >header,
> > > > > >  			       loop_preheader_edge (scalar_loop)->src);
> > > > > > +
> > > > > > +      /* Finally after wiring the new epilogue we need to update its
> main
> > > exit
> > > > > > +	 to the original function exit we recorded.  Other exits are
> already
> > > > > > +	 correct.  */
> > > > > > +      if (multiple_exits_p)
> > > > > > +	{
> > > > > > +	  for (edge e : get_loop_exit_edges (loop))
> > > > > > +	    doms.safe_push (e->dest);
> > > > > > +	  update_loop = new_loop;
> > > > > > +	  doms.safe_push (exit_dest);
> > > > > > +
> > > > > > +	  /* Likely a fall-through edge, so update if needed.  */
> > > > > > +	  if (single_succ_p (exit_dest))
> > > > > > +	    doms.safe_push (single_succ (exit_dest));
> > > > > > +	}
> > > > > >      }
> > > > > >    else /* Add the copy at entry.  */
> > > > > >      {
> > > > > > +      /* Copy the current loop LC PHI nodes between the original loop
> exit
> > > > > > +	 block and the new loop header.  This allows us to later split
> the
> > > > > > +	 preheader block and still find the right LC nodes.  */
> > > > > > +      edge old_latch_loop = loop_latch_edge (loop);
> > > > > > +      edge old_latch_init = loop_preheader_edge (loop);
> > > > > > +      edge new_latch_loop = loop_latch_edge (new_loop);
> > > > > > +      edge new_latch_init = loop_preheader_edge (new_loop);
> > > > > > +      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),
> > > > >
> > > > > see above
> > > > >
> > > > > > +	   gsi_to = gsi_start_phis (old_latch_loop->dest);
> > > > > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p
> (gsi_to);
> > > > > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > > > > +	{
> > > > > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > > > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > > > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > > > > new_latch_loop);
> > > > > > +	  adjust_phi_and_debug_stmts (to_phi, old_latch_init,
> new_arg);
> > > > > > +	}
> > > > > > +
> > > > > >        if (scalar_loop != loop)
> > > > > >  	{
> > > > > >  	  /* Remove the non-necessary forwarder of scalar_loop again.
> */
> > > > > > @@ -1677,31 +1819,36 @@
> slpeel_tree_duplicate_loop_to_edge_cfg
> > > (class
> > > > > loop *loop,
> > > > > >        delete_basic_block (new_preheader);
> > > > > >        set_immediate_dominator (CDI_DOMINATORS, new_loop-
> >header,
> > > > > >  			       loop_preheader_edge (new_loop)->src);
> > > > > > +
> > > > > > +      if (multiple_exits_p)
> > > > > > +	update_loop = loop;
> > > > > >      }
> > > > > >
> > > > > > -  if (scalar_loop != loop)
> > > > > > +  if (multiple_exits_p)
> > > > > >      {
> > > > > > -      /* Update new_loop->header PHIs, so that on the preheader
> > > > > > -	 edge they are the ones from loop rather than scalar_loop.  */
> > > > > > -      gphi_iterator gsi_orig, gsi_new;
> > > > > > -      edge orig_e = loop_preheader_edge (loop);
> > > > > > -      edge new_e = loop_preheader_edge (new_loop);
> > > > > > -
> > > > > > -      for (gsi_orig = gsi_start_phis (loop->header),
> > > > > > -	   gsi_new = gsi_start_phis (new_loop->header);
> > > > > > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > > > > > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > > > > > +      for (edge e : get_loop_exit_edges (update_loop))
> > > > > >  	{
> > > > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > > > -	  gphi *new_phi = gsi_new.phi ();
> > > > > > -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > > > > > -	  location_t orig_locus
> > > > > > -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > > > > > -
> > > > > > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > > > > > +	  edge ex;
> > > > > > +	  edge_iterator ei;
> > > > > > +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> > > > > > +	    {
> > > > > > +	      /* Find the first non-fallthrough block as fall-throughs can't
> > > > > > +		 dominate other blocks.  */
> > > > > > +	      while ((ex->flags & EDGE_FALLTHRU)
> > >
> > > For the prologue peeling any early exit we take would skip all other
> > > loops so we can simply leave them and their LC PHI nodes in place.
> > > We need extra PHIs only on the path to the main vector loop.  I
> > > think the comment isn't accurately reflecting what we do.  In
> > > fact we do not add any LC PHI nodes here but simply adjust the
> > > main loop header PHI arguments?
> > >
> > > > > I don't think EDGE_FALLTHRU is set correctly, what's wrong with
> > > > > just using single_succ_p here?  A fallthru edge src dominates the
> > > > > fallthru edge dest, so the sentence above doesn't make sense.
> > > >
> > > > I wanted to say, that the immediate dominator of a block is never
> > > > an fall through block.  At least from what I understood from how
> > > > the dominators are calculated in the code, though may have missed
> > > > something.
> > >
> > >  BB1
> > >   |
> > >  BB2
> > >   |
> > >  BB3
> > >
> > > here the immediate dominator of BB3 is BB2 and that of BB2 is BB1.
> > >
> > > > >
> > > > > > +		     && single_succ_p (ex->dest))
> > > > > > +		{
> > > > > > +		  doms.safe_push (ex->dest);
> > > > > > +		  ex = single_succ_edge (ex->dest);
> > > > > > +		}
> > > > > > +	      doms.safe_push (ex->dest);
> > > > > > +	    }
> > > > > > +	  doms.safe_push (e->dest);
> > > > > >  	}
> > > > > > -    }
> > > > > >
> > > > > > +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > > > +      if (updated_doms)
> > > > > > +	updated_doms->safe_splice (doms);
> > > > > > +    }
> > > > > >    free (new_bbs);
> > > > > >    free (bbs);
> > > > > >
> > > > > > @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const
> > > > > loop_vec_info loop_vinfo, const_edge e)
> > > > > >    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> > > > > >    unsigned int num_bb = loop->inner? 5 : 2;
> > > > > >
> > > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
> > > > > > +
> > > > >
> > > > > I think checking the number of BBs is odd, I don't remember anything
> > > > > in slpeel is specifically tied to that?  I think we can simply drop
> > > > > this or do you remember anything that would depend on ->num_nodes
> > > > > being only exactly 5 or 2?
> > > >
> > > > Never actually seemed to require it, but they're used as some check to
> > > > see if there are unexpected control flow in the loop.
> > > >
> > > > i.e. this would say no if you have an if statement in the loop that wasn't
> > > > converted.  The other part of this and the accompanying explanation is in
> > > > vect_analyze_loop_form.  In the patch series I had to remove the hard
> > > > num_nodes == 2 check from there because number of nodes restricted
> > > > things too much.  If you have an empty fall through block, which seems
> to
> > > > happen often between the main exit and the latch block then we'd not
> > > > vectorize.
> > > >
> > > > So instead I now rejects loops after analyzing the gcond.  So think this
> check
> > > > can go/needs to be different.
> > >
> > > Lets remove it from this function then.
> > >
> > > > >
> > > > > >    /* All loops have an outer scope; the only case loop->outer is NULL is
> for
> > > > > >       the function itself.  */
> > > > > >    if (!loop_outer (loop)
> > > > > > @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer
> > > > > (loop_vec_info loop_vinfo,
> > > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > >    basic_block update_bb = update_e->dest;
> > > > > >
> > > > > > +  /* For early exits we'll update the IVs in
> > > > > > +     vect_update_ivs_after_early_break.  */
> > > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +    return;
> > > > > > +
> > > > > >    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > > >
> > > > > >    /* Make sure there exists a single-predecessor exit bb:  */
> > > > > > @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer
> > > > > (loop_vec_info loop_vinfo,
> > > > > >        /* Fix phi expressions in the successor bb.  */
> > > > > >        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > > > >      }
> > > > > > +  return;
> > > > >
> > > > > we don't usually place a return at the end of void functions
> > > > >
> > > > > > +}
> > > > > > +
> > > > > > +/*   Function vect_update_ivs_after_early_break.
> > > > > > +
> > > > > > +     "Advance" the induction variables of LOOP to the value they
> should
> > > take
> > > > > > +     after the execution of LOOP.  This is currently necessary because
> the
> > > > > > +     vectorizer does not handle induction variables that are used after
> the
> > > > > > +     loop.  Such a situation occurs when the last iterations of LOOP are
> > > > > > +     peeled, because of the early exit.  With an early exit we always peel
> > > the
> > > > > > +     loop.
> > > > > > +
> > > > > > +     Input:
> > > > > > +     - LOOP_VINFO - a loop info structure for the loop that is going to
> be
> > > > > > +		    vectorized. The last few iterations of LOOP were
> peeled.
> > > > > > +     - LOOP - a loop that is going to be vectorized. The last few
> iterations
> > > > > > +	      of LOOP were peeled.
> > > > > > +     - VF - The loop vectorization factor.
> > > > > > +     - NITERS_ORIG - the number of iterations that LOOP executes
> (before
> > > it is
> > > > > > +		     vectorized). i.e, the number of times the ivs should
> be
> > > > > > +		     bumped.
> > > > > > +     - NITERS_VECTOR - The number of iterations that the vector LOOP
> > > > > executes.
> > > > > > +     - UPDATE_E - a successor edge of LOOP->exit that is on the (only)
> > > path
> > > > > > +		  coming out from LOOP on which there are uses of
> the LOOP
> > > > > ivs
> > > > > > +		  (this is the path from LOOP->exit to epilog_loop-
> >preheader).
> > > > > > +
> > > > > > +		  The new definitions of the ivs are placed in LOOP-
> >exit.
> > > > > > +		  The phi args associated with the edge UPDATE_E in
> the bb
> > > > > > +		  UPDATE_E->dest are updated accordingly.
> > > > > > +
> > > > > > +     Output:
> > > > > > +       - If available, the LCSSA phi node for the loop IV temp.
> > > > > > +
> > > > > > +     Assumption 1: Like the rest of the vectorizer, this function
> assumes
> > > > > > +     a single loop exit that has a single predecessor.
> > > > > > +
> > > > > > +     Assumption 2: The phi nodes in the LOOP header and in
> update_bb
> > > are
> > > > > > +     organized in the same order.
> > > > > > +
> > > > > > +     Assumption 3: The access function of the ivs is simple enough (see
> > > > > > +     vect_can_advance_ivs_p).  This assumption will be relaxed in the
> > > future.
> > > > > > +
> > > > > > +     Assumption 4: Exactly one of the successors of LOOP exit-bb is on
> a
> > > path
> > > > > > +     coming out of LOOP on which the ivs of LOOP are used (this is the
> > > path
> > > > > > +     that leads to the epilog loop; other paths skip the epilog loop).
> This
> > > > > > +     path starts with the edge UPDATE_E, and its destination (denoted
> > > > > update_bb)
> > > > > > +     needs to have its phis updated.
> > > > > > + */
> > > > > > +
> > > > > > +static tree
> > > > > > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class
> > > loop *
> > > > > epilog,
> > > > > > +				   poly_int64 vf, tree niters_orig,
> > > > > > +				   tree niters_vector, edge update_e)
> > > > > > +{
> > > > > > +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +    return NULL;
> > > > > > +
> > > > > > +  gphi_iterator gsi, gsi1;
> > > > > > +  tree ni_name, ivtmp = NULL;
> > > > > > +  basic_block update_bb = update_e->dest;
> > > > > > +  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > > > +  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > > +  basic_block exit_bb = loop_iv->dest;
> > > > > > +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > +  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
> > > > > > +
> > > > > > +  gcc_assert (cond);
> > > > > > +
> > > > > > +  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> > > (update_bb);
> > > > > > +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > > > +       gsi_next (&gsi), gsi_next (&gsi1))
> > > > > > +    {
> > > > > > +      tree init_expr, final_expr, step_expr;
> > > > > > +      tree type;
> > > > > > +      tree var, ni, off;
> > > > > > +      gimple_stmt_iterator last_gsi;
> > > > > > +
> > > > > > +      gphi *phi = gsi1.phi ();
> > > > > > +      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi,
> > > loop_preheader_edge
> > > > > (epilog));
> > > > >
> > > > > I'm confused about the setup.  update_bb looks like the block with the
> > > > > loop-closed PHI nodes of 'loop' and the exit (update_e)?  How does
> > > > > loop_preheader_edge (epilog) come into play here?  That would feed
> into
> > > > > epilog->header PHIs?!
> > > >
> > > > We can't query the type of the phis in the block with the LC PHI nodes, so
> the
> > > > Typical pattern seems to be that we iterate over a block that's part of the
> > > loop
> > > > and that would have the PHIs in the same order, just so we can get to the
> > > > stmt_vec_info.
> > > >
> > > > >
> > > > > It would be nice to name 'gsi[1]', 'update_e' and 'update_bb' in a
> > > > > better way?  Is update_bb really epilog->header?!
> > > > >
> > > > > We're missing checking in PHI_ARG_DEF_FROM_EDGE, namely that
> > > > > E->dest == gimple_bb (PHI) - we're just using E->dest_idx there
> > > > > which "works" even for totally unrelated edges.
> > > > >
> > > > > > +      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT
> (phi_ssa));
> > > > > > +      if (!phi1)
> > > > >
> > > > > shouldn't that be an assert?
> > > > >
> > > > > > +	continue;
> > > > > > +      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> > > > > > +      if (dump_enabled_p ())
> > > > > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > +			 "vect_update_ivs_after_early_break: phi:
> %G",
> > > > > > +			 (gimple *)phi);
> > > > > > +
> > > > > > +      /* Skip reduction and virtual phis.  */
> > > > > > +      if (!iv_phi_p (phi_info))
> > > > > > +	{
> > > > > > +	  if (dump_enabled_p ())
> > > > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > +			     "reduc or virtual phi. skip.\n");
> > > > > > +	  continue;
> > > > > > +	}
> > > > > > +
> > > > > > +      /* For multiple exits where we handle early exits we need to carry
> on
> > > > > > +	 with the previous IV as loop iteration was not done because
> we exited
> > > > > > +	 early.  As such just grab the original IV.  */
> > > > > > +      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge
> > > > > (loop));
> > > > >
> > > > > but this should be taken care of by LC SSA?
> > > >
> > > > It is, the comment is probably missing details, this part just scales the
> > > counter
> > > > from VF to scalar counts.  It's just a reminder that this scaling is done
> > > differently
> > > > from normal single exit vectorization.
> > > >
> > > > >
> > > > > OK, have to continue tomorrow from here.
> > > >
> > > > Cheers, Thank you!
> > > >
> > > > Tamar
> > > >
> > > > >
> > > > > Richard.
> > > > >
> > > > > > +      if (gimple_cond_lhs (cond) != phi_ssa
> > > > > > +	  && gimple_cond_rhs (cond) != phi_ssa)
> > >
> > > so this is a way to avoid touching the main IV?  Looks a bit fragile to
> > > me.  Hmm, we're iterating over the main loop header PHIs here?
> > > Can't you check, say, the relevancy of the PHI node instead?  Though
> > > it might also be used as induction.  Can't it be used as alternate
> > > exit like
> > >
> > >   for (i)
> > >    {
> > >      if (i & bit)
> > >        break;
> > >    }
> > >
> > > and would we need to adjust 'i' then?
> > >
> > > > > > +	{
> > > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART
> (phi_info);
> > > > > > +	  step_expr = unshare_expr (step_expr);
> > > > > > +
> > > > > > +	  /* We previously generated the new merged phi in the same
> BB as
> > > > > the
> > > > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > > > +	     normal loop phi which don't take the early breaks into
> account.  */
> > > > > > +	  final_expr = gimple_phi_result (phi1);
> > > > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (),
> > > > > loop_preheader_edge (loop));
> > > > > > +
> > > > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > > > +	  /* For early break the final loop IV is:
> > > > > > +	     init + (final - init) * vf which takes into account peeling
> > > > > > +	     values and non-single steps.  */
> > > > > > +	  off = fold_build2 (MINUS_EXPR, stype,
> > > > > > +			     fold_convert (stype, final_expr),
> > > > > > +			     fold_convert (stype, init_expr));
> > > > > > +	  /* Now adjust for VF to get the final iteration value.  */
> > > > > > +	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst
> (stype, vf));
> > > > > > +
> > > > > > +	  /* Adjust the value with the offset.  */
> > > > > > +	  if (POINTER_TYPE_P (type))
> > > > > > +	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > +	  else
> > > > > > +	    ni = fold_convert (type,
> > > > > > +			       fold_build2 (PLUS_EXPR, stype,
> > > > > > +					    fold_convert (stype,
> init_expr),
> > > > > > +					    off));
> > > > > > +	  var = create_tmp_var (type, "tmp");
> > >
> > > so how does the non-early break code deal with updating inductions?
> > > And how do you avoid altering this when we flow in from the normal
> > > exit?  That is, you are updating the value in the epilog loop
> > > header but don't you need to instead update the value only on
> > > the alternate exit edges from the main loop (and keep the not
> > > updated value on the main exit edge)?
> > >
> > > > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > > > +	  gimple_seq new_stmts = NULL;
> > > > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false,
> var);
> > > > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > > > +	  if (!gsi_end_p (last_gsi))
> > > > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts,
> GSI_SAME_STMT);
> > > > > > +	  else
> > > > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts,
> GSI_SAME_STMT);
> > > > > > +
> > > > > > +	  /* Fix phi expressions in the successor bb.  */
> > > > > > +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > > > > > +	}
> > > > > > +      else
> > > > > > +	{
> > > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART
> (phi_info);
> > > > > > +	  step_expr = unshare_expr (step_expr);
> > > > > > +
> > > > > > +	  /* We previously generated the new merged phi in the same
> BB as
> > > > > the
> > > > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > > > +	     normal loop phi which don't take the early breaks into
> account.  */
> > > > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1,
> loop_preheader_edge
> > > > > (loop));
> > > > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > > > +
> > > > > > +	  if (vf.is_constant ())
> > > > > > +	    {
> > > > > > +	      ni = fold_build2 (MULT_EXPR, stype,
> > > > > > +				fold_convert (stype,
> > > > > > +					      niters_vector),
> > > > > > +				build_int_cst (stype, vf));
> > > > > > +
> > > > > > +	      ni = fold_build2 (MINUS_EXPR, stype,
> > > > > > +				fold_convert (stype,
> > > > > > +					      niters_orig),
> > > > > > +				fold_convert (stype, ni));
> > > > > > +	    }
> > > > > > +	  else
> > > > > > +	    /* If the loop's VF isn't constant then the loop must have
> been
> > > > > > +	       masked, so at the end of the loop we know we have
> finished
> > > > > > +	       the entire loop and found nothing.  */
> > > > > > +	    ni = build_zero_cst (stype);
> > > > > > +
> > > > > > +	  ni = fold_convert (type, ni);
> > > > > > +	  /* We don't support variable n in this version yet.  */
> > > > > > +	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> > > > > > +
> > > > > > +	  var = create_tmp_var (type, "tmp");
> > > > > > +
> > > > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > > > +	  gimple_seq new_stmts = NULL;
> > > > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false,
> var);
> > > > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > > > +	  if (!gsi_end_p (last_gsi))
> > > > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts,
> GSI_SAME_STMT);
> > > > > > +	  else
> > > > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts,
> GSI_SAME_STMT);
> > > > > > +
> > > > > > +	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
> > > > > > +
> > > > > > +	  for (edge exit : alt_exits)
> > > > > > +	    adjust_phi_and_debug_stmts (phi1, exit,
> > > > > > +					build_int_cst (TREE_TYPE
> (step_expr),
> > > > > > +						       vf));
> > > > > > +	  ivtmp = gimple_phi_result (phi1);
> > > > > > +	}
> > > > > > +    }
> > > > > > +
> > > > > > +  return ivtmp;
> > > > > >  }
> > > > > >
> > > > > >  /* Return a gimple value containing the misalignment (measured in
> > > vector
> > > > > > @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf
> > > > > (loop_vec_info loop_vinfo,
> > > > > >
> > > > > >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
> > > > > >     this function searches for the corresponding lcssa phi node in exit
> > > > > > -   bb of LOOP.  If it is found, return the phi result; otherwise return
> > > > > > -   NULL.  */
> > > > > > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is
> found,
> > > > > > +   return the phi result; otherwise return NULL.  */
> > > > > >
> > > > > >  static tree
> > > > > >  find_guard_arg (class loop *loop, class loop *epilog
> > > ATTRIBUTE_UNUSED,
> > > > > > -		gphi *lcssa_phi)
> > > > > > +		gphi *lcssa_phi, int lcssa_edge = 0)
> > > > > >  {
> > > > > >    gphi_iterator gsi;
> > > > > >    edge e = loop->vec_loop_iv;
> > > > > >
> > > > > > -  gcc_assert (single_pred_p (e->dest));
> > > > > >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > >      {
> > > > > >        gphi *phi = gsi.phi ();
> > > > > > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > > > > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > > > > -	return PHI_RESULT (phi);
> > > > > > -    }
> > > > > > -  return NULL_TREE;
> > > > > > -}
> > > > > > -
> > > > > > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > > > > FIRST/SECOND
> > > > > > -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> > > > > > -   edge, the two loops are arranged as below:
> > > > > > -
> > > > > > -       preheader_a:
> > > > > > -     first_loop:
> > > > > > -       header_a:
> > > > > > -	 i_1 = PHI<i_0, i_2>;
> > > > > > -	 ...
> > > > > > -	 i_2 = i_1 + 1;
> > > > > > -	 if (cond_a)
> > > > > > -	   goto latch_a;
> > > > > > -	 else
> > > > > > -	   goto between_bb;
> > > > > > -       latch_a:
> > > > > > -	 goto header_a;
> > > > > > -
> > > > > > -       between_bb:
> > > > > > -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> > > > > > -
> > > > > > -     second_loop:
> > > > > > -       header_b:
> > > > > > -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > > > > > -				 or with i_2 if no LCSSA phi is created
> > > > > > -				 under condition of
> > > > > CREATE_LCSSA_FOR_IV_PHIS.
> > > > > > -	 ...
> > > > > > -	 i_4 = i_3 + 1;
> > > > > > -	 if (cond_b)
> > > > > > -	   goto latch_b;
> > > > > > -	 else
> > > > > > -	   goto exit_bb;
> > > > > > -       latch_b:
> > > > > > -	 goto header_b;
> > > > > > -
> > > > > > -       exit_bb:
> > > > > > -
> > > > > > -   This function creates loop closed SSA for the first loop; update the
> > > > > > -   second loop's PHI nodes by replacing argument on incoming edge
> with
> > > the
> > > > > > -   result of newly created lcssa PHI nodes.  IF
> > > CREATE_LCSSA_FOR_IV_PHIS
> > > > > > -   is false, Loop closed ssa phis will only be created for non-iv phis for
> > > > > > -   the first loop.
> > > > > > -
> > > > > > -   This function assumes exit bb of the first loop is preheader bb of
> the
> > > > > > -   second loop, i.e, between_bb in the example code.  With PHIs
> updated,
> > > > > > -   the second loop will execute rest iterations of the first.  */
> > > > > > -
> > > > > > -static void
> > > > > > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > > > > -				   class loop *first, class loop *second,
> > > > > > -				   bool create_lcssa_for_iv_phis)
> > > > > > -{
> > > > > > -  gphi_iterator gsi_update, gsi_orig;
> > > > > > -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > -
> > > > > > -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > > > > > -  edge second_preheader_e = loop_preheader_edge (second);
> > > > > > -  basic_block between_bb = single_exit (first)->dest;
> > > > > > -
> > > > > > -  gcc_assert (between_bb == second_preheader_e->src);
> > > > > > -  gcc_assert (single_pred_p (between_bb) && single_succ_p
> > > (between_bb));
> > > > > > -  /* Either the first loop or the second is the loop to be vectorized.  */
> > > > > > -  gcc_assert (loop == first || loop == second);
> > > > > > -
> > > > > > -  for (gsi_orig = gsi_start_phis (first->header),
> > > > > > -       gsi_update = gsi_start_phis (second->header);
> > > > > > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > > > > > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > > > > > -    {
> > > > > > -      gphi *orig_phi = gsi_orig.phi ();
> > > > > > -      gphi *update_phi = gsi_update.phi ();
> > > > > > -
> > > > > > -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> > > > > > -      /* Generate lcssa PHI node for the first loop.  */
> > > > > > -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> > > > > > -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt
> (vect_phi);
> > > > > > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > > > > > +      /* Nested loops with multiple exits can have different no# phi
> node
> > > > > > +	 arguments between the main loop and epilog as epilog falls to
> the
> > > > > > +	 second loop.  */
> > > > > > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> > > > > >  	{
> > > > > > -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > > > > > -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > > > > > -	  add_phi_arg (lcssa_phi, arg, single_exit (first),
> > > > > UNKNOWN_LOCATION);
> > > > > > -	  arg = new_res;
> > > > > > -	}
> > > > > > -
> > > > > > -      /* Update PHI node in the second loop by replacing arg on the
> loop's
> > > > > > -	 incoming edge.  */
> > > > > > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e,
> > > arg);
> > > > > > -    }
> > > > > > -
> > > > > > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > > > > > -     for correct vectorization of live stmts.  */
> > > > > > -  if (loop == first)
> > > > > > -    {
> > > > > > -      basic_block orig_exit = single_exit (second)->dest;
> > > > > > -      for (gsi_orig = gsi_start_phis (orig_exit);
> > > > > > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > > > > -	{
> > > > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > > > -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> > > > > > -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> > > > > (orig_arg))
> > > > > > -	    continue;
> > > > > > -
> > > > > > -	  /* Already created in the above loop.   */
> > > > > > -	  if (find_guard_arg (first, second, orig_phi))
> > > > > > +	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
> > > > > > +	  if (TREE_CODE (var) != SSA_NAME)
> > > > > >  	    continue;
> > > > > >
> > > > > > -	  tree new_res = copy_ssa_name (orig_arg);
> > > > > > -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > > > > > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> > > > > UNKNOWN_LOCATION);
> > > > > > +	  if (operand_equal_p (get_current_def (var),
> > > > > > +			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > > > > > +	    return PHI_RESULT (phi);
> > > > > >  	}
> > > > > >      }
> > > > > > +  return NULL_TREE;
> > > > > >  }
> > > > > >
> > > > > >  /* Function slpeel_add_loop_guard adds guard skipping from the
> > > beginning
> > > > > > @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2
> > > (class
> > > > > loop *loop, class loop *epilog,
> > > > > >    gcc_assert (single_succ_p (merge_bb));
> > > > > >    edge e = single_succ_edge (merge_bb);
> > > > > >    basic_block exit_bb = e->dest;
> > > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > > > > >
> > > > > >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > >      {
> > > > > >        gphi *update_phi = gsi.phi ();
> > > > > > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > > > > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > > > > >
> > > > > >        tree merge_arg = NULL_TREE;
> > > > > >
> > > > > > @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2
> (class
> > > loop
> > > > > *loop, class loop *epilog,
> > > > > >        if (!merge_arg)
> > > > > >  	merge_arg = old_arg;
> > > > > >
> > > > > > -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> > > > > > +      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e-
> > > >dest_idx);
> > > > > >        /* If the var is live after loop but not a reduction, we simply
> > > > > >  	 use the old arg.  */
> > > > > >        if (!guard_arg)
> > > > > > @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2
> (class
> > > > > loop *loop, class loop *epilog,
> > > > > >      }
> > > > > >  }
> > > > > >
> > > > > > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > > > > > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > > > > > -
> > > > > > -static void
> > > > > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > > > > > -{
> > > > > > -  gphi_iterator gsi;
> > > > > > -  basic_block exit_bb = single_exit (epilog)->dest;
> > > > > > -
> > > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > > -  edge e = EDGE_PRED (exit_bb, 0);
> > > > > > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > > > > -}
> > > > > > -
> > >
> > > I wonder if we can still split these changes out to before early break
> > > vect?
> > >
> > > > > >  /* EPILOGUE_VINFO is an epilogue loop that we now know would
> need
> > > to
> > > > > >     iterate exactly CONST_NITERS times.  Make a final decision about
> > > > > >     whether the epilogue loop should be used, returning true if so.  */
> > > > > > @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree
> > > > > niters, tree nitersm1,
> > > > > >      bound_epilog += vf - 1;
> > > > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > > >      bound_epilog += 1;
> > > > > > +  /* For early breaks the scalar loop needs to execute at most VF
> times
> > > > > > +     to find the element that caused the break.  */
> > > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +    {
> > > > > > +      bound_epilog = vf;
> > > > > > +      /* Force a scalar epilogue as we can't vectorize the index finding.
> */
> > > > > > +      vect_epilogues = false;
> > > > > > +    }
> > > > > >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > > > > >    poly_uint64 bound_scalar = bound_epilog;
> > > > > >
> > > > > > @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info
> > > loop_vinfo,
> > > > > tree niters, tree nitersm1,
> > > > > >  				  bound_prolog + bound_epilog)
> > > > > >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > > > > >  			 || vect_epilogues));
> > > > > > +
> > > > > > +  /* We only support early break vectorization on known bounds at
> this
> > > > > time.
> > > > > > +     This means that if the vector loop can't be entered then we won't
> > > > > generate
> > > > > > +     it at all.  So for now force skip_vector off because the additional
> > > control
> > > > > > +     flow messes with the BB exits and we've already analyzed them.
> */
> > > > > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS
> > > (loop_vinfo);
> > > > > > +
> > >
> > > I think it should be as "easy" as entering the epilog via the block taking
> > > the regular exit?
> > >
> > > > > >    /* Epilog loop must be executed if the number of iterations for epilog
> > > > > >       loop is known at compile time, otherwise we need to add a check
> at
> > > > > >       the end of vector loop and skip to the end of epilog loop.  */
> > > > > >    bool skip_epilog = (prolog_peeling < 0
> > > > > >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > > > >  		      || !vf.is_constant ());
> > > > > > -  /* PEELING_FOR_GAPS is special because epilog loop must be
> executed.
> > > */
> > > > > > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > > > +  /* PEELING_FOR_GAPS and peeling for early breaks are special
> because
> > > > > epilog
> > > > > > +     loop must be executed.  */
> > > > > > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > >      skip_epilog = false;
> > > > > > -
> > > > > >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > > > >    auto_vec<profile_count> original_counts;
> > > > > >    basic_block *original_bbs = NULL;
> > > > > > @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info
> > > loop_vinfo,
> > > > > tree niters, tree nitersm1,
> > > > > >    if (prolog_peeling)
> > > > > >      {
> > > > > >        e = loop_preheader_edge (loop);
> > > > > > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > > > > > -
> > > > > > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo,
> e));
> > > > > >        /* Peel prolog and put it on preheader edge of loop.  */
> > > > > > -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop,
> scalar_loop,
> > > e);
> > > > > > +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop,
> scalar_loop,
> > > e,
> > > > > > +						       true);
> > > > > >        gcc_assert (prolog);
> > > > > >        prolog->force_vectorize = false;
> > > > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop,
> true);
> > > > > > +
> > > > > >        first_loop = prolog;
> > > > > >        reset_original_copy_tables ();
> > > > > >
> > > > > > @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info
> > > loop_vinfo,
> > > > > tree niters, tree nitersm1,
> > > > > >  	 as the transformations mentioned above make less or no
> sense when
> > > > > not
> > > > > >  	 vectorizing.  */
> > > > > >        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> > > > > > -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > > > > > +      auto_vec<basic_block> doms;
> > > > > > +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e,
> > > true,
> > > > > > +						       &doms);
> > > > > >        gcc_assert (epilog);
> > > > > >
> > > > > >        epilog->force_vectorize = false;
> > > > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog,
> false);
> > > > > >
> > > > > >        /* Scalar version loop may be preferred.  In this case, add guard
> > > > > >  	 and skip to epilog.  Note this only happens when the number
> of
> > > > > > @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree
> > > > > niters, tree nitersm1,
> > > > > >        vect_update_ivs_after_vectorizer (loop_vinfo,
> niters_vector_mult_vf,
> > > > > >  					update_e);
> > > > > >
> > > > > > +      /* For early breaks we must create a guard to check how many
> > > iterations
> > > > > > +	 of the scalar loop are yet to be performed.  */
> > >
> > > We have this check anyway, no?  In fact don't we know that we always
> enter
> > > the epilog (see above)?
> > >
> > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +	{
> > > > > > +	  tree ivtmp =
> > > > > > +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf,
> niters,
> > > > > > +					       *niters_vector, update_e);
> > > > > > +
> > > > > > +	  gcc_assert (ivtmp);
> > > > > > +	  tree guard_cond = fold_build2 (EQ_EXPR,
> boolean_type_node,
> > > > > > +					 fold_convert (TREE_TYPE
> (niters),
> > > > > > +						       ivtmp),
> > > > > > +					 build_zero_cst (TREE_TYPE
> (niters)));
> > > > > > +	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)-
> >dest;
> > > > > > +
> > > > > > +	  /* If we had a fallthrough edge, the guard will the threaded
> through
> > > > > > +	     and so we may need to find the actual final edge.  */
> > > > > > +	  edge final_edge = epilog->vec_loop_iv;
> > > > > > +	  /* slpeel_update_phi_nodes_for_guard2 expects an empty
> block in
> > > > > > +	     between the guard and the exit edge.  It only adds new
> nodes and
> > > > > > +	     doesn't update existing one in the current scheme.  */
> > > > > > +	  basic_block guard_to = split_edge (final_edge);
> > > > > > +	  edge guard_e = slpeel_add_loop_guard (guard_bb,
> guard_cond,
> > > > > guard_to,
> > > > > > +						guard_bb,
> prob_epilog.invert
> > > > > (),
> > > > > > +						irred_flag);
> > > > > > +	  doms.safe_push (guard_bb);
> > > > > > +
> > > > > > +	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > > > +
> > > > > > +	  /* We must update all the edges from the new guard_bb.  */
> > > > > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > > > > +					      final_edge);
> > > > > > +
> > > > > > +	  /* If the loop was versioned we'll have an intermediate BB
> between
> > > > > > +	     the guard and the exit.  This intermediate block is required
> > > > > > +	     because in the current scheme of things the guard block phi
> > > > > > +	     updating can only maintain LCSSA by creating new blocks.
> In this
> > > > > > +	     case we just need to update the uses in this block as well.
> */
> > > > > > +	  if (loop != scalar_loop)
> > > > > > +	    {
> > > > > > +	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
> > > > > > +		   !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > > +		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE
> (gsi.phi (),
> > > > > guard_e));
> > > > > > +	    }
> > > > > > +
> > > > > > +	  flush_pending_stmts (guard_e);
> > > > > > +	}
> > > > > > +
> > > > > >        if (skip_epilog)
> > > > > >  	{
> > > > > >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > > > @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree
> > > > > niters, tree nitersm1,
> > > > > >  	    }
> > > > > >  	  scale_loop_profile (epilog, prob_epilog, 0);
> > > > > >  	}
> > > > > > -      else
> > > > > > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> > > > > >
> > > > > >        unsigned HOST_WIDE_INT bound;
> > > > > >        if (bound_scalar.is_constant (&bound))
> > > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > > index
> > > > >
> > >
> b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49
> > > > > f21421958e7ee25eada 100644
> > > > > > --- a/gcc/tree-vect-loop.cc
> > > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > > @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop
> > > > > *loop_in, vec_info_shared *shared)
> > > > > >      partial_load_store_bias (0),
> > > > > >      peeling_for_gaps (false),
> > > > > >      peeling_for_niter (false),
> > > > > > +    early_breaks (false),
> > > > > > +    non_break_control_flow (false),
> > > > > >      no_data_dependencies (false),
> > > > > >      has_mask_store (false),
> > > > > >      scalar_loop_scaling (profile_probability::uninitialized ()),
> > > > > > @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p
> > > > > (loop_vec_info loop_vinfo)
> > > > > >      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> > > > > (LOOP_VINFO_ORIG_LOOP_INFO
> > > > > >  					  (loop_vinfo));
> > > > > >
> > > > > > +  /* When we have multiple exits and VF is unknown, we must
> require
> > > > > partial
> > > > > > +     vectors because the loop bounds is not a minimum but a
> maximum.
> > > > > That is to
> > > > > > +     say we cannot unpredicate the main loop unless we peel or use
> partial
> > > > > > +     vectors in the epilogue.  */
> > > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > > > > > +    return true;
> > > > > > +
> > > > > >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > > > >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> > > > > >      {
> > > > > > @@ -1652,12 +1662,12 @@
> vect_compute_single_scalar_iteration_cost
> > > > > (loop_vec_info loop_vinfo)
> > > > > >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> > > > > >  }
> > > > > >
> > > > > > -
> > > > > >  /* Function vect_analyze_loop_form.
> > > > > >
> > > > > >     Verify that certain CFG restrictions hold, including:
> > > > > >     - the loop has a pre-header
> > > > > > -   - the loop has a single entry and exit
> > > > > > +   - the loop has a single entry
> > > > > > +   - nested loops can have only a single exit.
> > > > > >     - the loop exit condition is simple enough
> > > > > >     - the number of iterations can be analyzed, i.e, a countable loop.
> The
> > > > > >       niter could be analyzed under some assumptions.  */
> > > > > > @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop
> *loop,
> > > > > vect_loop_form_info *info)
> > > > > >                             |
> > > > > >                          (exit-bb)  */
> > > > > >
> > > > > > -      if (loop->num_nodes != 2)
> > > > > > -	return opt_result::failure_at (vect_location,
> > > > > > -				       "not vectorized:"
> > > > > > -				       " control flow in loop.\n");
> > > > > > -
> > > > > >        if (empty_block_p (loop->header))
> > > > > >  	return opt_result::failure_at (vect_location,
> > > > > >  				       "not vectorized: empty loop.\n");
> > > > > > @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop
> *loop,
> > > > > vect_loop_form_info *info)
> > > > > >          dump_printf_loc (MSG_NOTE, vect_location,
> > > > > >  			 "Considering outer-loop vectorization.\n");
> > > > > >        info->inner_loop_cond = inner.loop_cond;
> > > > > > +
> > > > > > +      if (!single_exit (loop))
> > > > > > +	return opt_result::failure_at (vect_location,
> > > > > > +				       "not vectorized: multiple
> exits.\n");
> > > > > > +
> > > > > >      }
> > > > > >
> > > > > > -  if (!single_exit (loop))
> > > > > > -    return opt_result::failure_at (vect_location,
> > > > > > -				   "not vectorized: multiple exits.\n");
> > > > > >    if (EDGE_COUNT (loop->header->preds) != 2)
> > > > > >      return opt_result::failure_at (vect_location,
> > > > > >  				   "not vectorized:"
> > > > > > @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop
> *loop,
> > > > > vect_loop_form_info *info)
> > > > > >  				   "not vectorized: latch block not
> empty.\n");
> > > > > >
> > > > > >    /* Make sure the exit is not abnormal.  */
> > > > > > -  edge e = single_exit (loop);
> > > > > > -  if (e->flags & EDGE_ABNORMAL)
> > > > > > -    return opt_result::failure_at (vect_location,
> > > > > > -				   "not vectorized:"
> > > > > > -				   " abnormal loop exit edge.\n");
> > > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > +  edge nexit = loop->vec_loop_iv;
> > > > > > +  for (edge e : exits)
> > > > > > +    {
> > > > > > +      if (e->flags & EDGE_ABNORMAL)
> > > > > > +	return opt_result::failure_at (vect_location,
> > > > > > +				       "not vectorized:"
> > > > > > +				       " abnormal loop exit edge.\n");
> > > > > > +      /* Early break BB must be after the main exit BB.  In theory we
> should
> > > > > > +	 be able to vectorize the inverse order, but the current flow in
> the
> > > > > > +	 the vectorizer always assumes you update successor PHI
> nodes, not
> > > > > > +	 preds.  */
> > > > > > +      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src,
> e-
> > > > > >src))
> > > > > > +	return opt_result::failure_at (vect_location,
> > > > > > +				       "not vectorized:"
> > > > > > +				       " abnormal loop exit edge
> order.\n");
> > >
> > > "unsupported loop exit order", but I don't understand the comment.
> > >
> > > > > > +    }
> > > > > > +
> > > > > > +  /* We currently only support early exit loops with known bounds.
> */
> > >
> > > Btw, why's that?  Is that because we don't support the loop-around edge?
> > > IMHO this is the most serious limitation (and as said above it should be
> > > trivial to fix).
> > >
> > > > > > +  if (exits.length () > 1)
> > > > > > +    {
> > > > > > +      class tree_niter_desc niter;
> > > > > > +      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter,
> > > NULL)
> > > > > > +	  || chrec_contains_undetermined (niter.niter)
> > > > > > +	  || !evolution_function_is_constant_p (niter.niter))
> > > > > > +	return opt_result::failure_at (vect_location,
> > > > > > +				       "not vectorized:"
> > > > > > +				       " early breaks only supported on
> loops"
> > > > > > +				       " with known iteration
> bounds.\n");
> > > > > > +    }
> > > > > >
> > > > > >    info->conds
> > > > > >      = vect_get_loop_niters (loop, &info->assumptions,
> > > > > > @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop,
> > > > > vec_info_shared *shared,
> > > > > >    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info-
> > > > > >alt_loop_conds);
> > > > > >    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
> > > > > >
> > > > > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > > > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > > > > +
> > > > > >    if (info->inner_loop_cond)
> > > > > >      {
> > > > > >        stmt_vec_info inner_loop_cond_info
> > > > > > @@ -3070,7 +3106,8 @@ start_over:
> > > > > >
> > > > > >    /* If an epilogue loop is required make sure we can create one.  */
> > > > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > > > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > > > > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > >      {
> > > > > >        if (dump_enabled_p ())
> > > > > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop
> > > required\n");
> > > > > > @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction
> > > (loop_vec_info
> > > > > loop_vinfo,
> > > > > >    basic_block exit_bb;
> > > > > >    tree scalar_dest;
> > > > > >    tree scalar_type;
> > > > > > -  gimple *new_phi = NULL, *phi;
> > > > > > +  gimple *new_phi = NULL, *phi = NULL;
> > > > > >    gimple_stmt_iterator exit_gsi;
> > > > > >    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> > > > > >    gimple *epilog_stmt = NULL;
> > > > > > @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction
> > > > > (loop_vec_info loop_vinfo,
> > > > > >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> > > > > >  	  reduc_inputs.quick_push (new_def);
> > > > > >  	}
> > > > > > +
> > > > > > +	/* Update the other exits.  */
> > > > > > +	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +	  {
> > > > > > +	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > > > +	    gphi_iterator gsi, gsi1;
> > > > > > +	    for (edge exit : alt_exits)
> > > > > > +	      {
> > > > > > +		/* Find the phi node to propaget into the exit block for
> each
> > > > > > +		   exit edge.  */
> > > > > > +		for (gsi = gsi_start_phis (exit_bb),
> > > > > > +		     gsi1 = gsi_start_phis (exit->src);
> > >
> > > exit->src == loop->header, right?  I think this won't work for multiple
> > > alternate exits.  It's probably easier to do this where we create the
> > > LC PHI node for the reduction result?
> > >
> > > > > > +		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > > > +		     gsi_next (&gsi), gsi_next (&gsi1))
> > > > > > +		  {
> > > > > > +		    /* There really should be a function to just get the
> number
> > > > > > +		       of phis inside a bb.  */
> > > > > > +		    if (phi && phi == gsi.phi ())
> > > > > > +		      {
> > > > > > +			gphi *phi1 = gsi1.phi ();
> > > > > > +			SET_PHI_ARG_DEF (phi, exit->dest_idx,
> > > > > > +					 PHI_RESULT (phi1));
> > >
> > > I think we know the header PHI of a reduction perfectly well, there
> > > shouldn't be the need to "search" for it.
> > >
> > > > > > +			break;
> > > > > > +		      }
> > > > > > +		  }
> > > > > > +	      }
> > > > > > +	  }
> > > > > >        gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > > > > >      }
> > > > > >
> > > > > > @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info
> > > *vinfo,
> > > > > >  	   new_tree = lane_extract <vec_lhs', ...>;
> > > > > >  	   lhs' = new_tree;  */
> > > > > >
> > > > > > +      /* When vectorizing an early break, any live statement that is
> used
> > > > > > +	 outside of the loop are dead.  The loop will never get to them.
> > > > > > +	 We could change the liveness value during analysis instead
> but since
> > > > > > +	 the below code is invalid anyway just ignore it during
> codegen.  */
> > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +	return true;
> > >
> > > But what about the value that's live across the main exit when the
> > > epilogue is not entered?
> > >
> > > > > > +
> > > > > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > >        basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > > >        gcc_assert (single_pred_p (exit_bb));
> > > > > > @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info
> > > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > > > > >       versioning.   */
> > > > > >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > > -  if (! single_pred_p (e->dest))
> > > > > > +  if (e && ! single_pred_p (e->dest) &&
> !LOOP_VINFO_EARLY_BREAKS
> > > > > (loop_vinfo))
> > >
> > > e can be NULL here?  I think we should reject such loops earlier.
> > >
> > > > > >      {
> > > > > >        split_loop_exit_edge (e, true);
> > > > > >        if (dump_enabled_p ())
> > > > > > @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info
> > > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > > >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > > > > >      {
> > > > > >        e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > > > > -      if (! single_pred_p (e->dest))
> > > > > > +      if (e && ! single_pred_p (e->dest))
> > > > > >  	{
> > > > > >  	  split_loop_exit_edge (e, true);
> > > > > >  	  if (dump_enabled_p ())
> > > > > > @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info
> > > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > > >
> > > > > >    /* Loops vectorized with a variable factor won't benefit from
> > > > > >       unrolling/peeling.  */
> > >
> > > update the comment?  Why would we unroll a VLA loop with early breaks?
> > > Or did you mean to use || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)?
> > >
> > > > > > -  if (!vf.is_constant ())
> > > > > > +  if (!vf.is_constant ()
> > > > > > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > >      {
> > > > > >        loop->unroll = 1;
> > > > > >        if (dump_enabled_p ())
> > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > > index
> > > > >
> > >
> 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82af
> > > > > e78ee2a7c627be45b 100644
> > > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > > @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info
> > > stmt_info,
> > > > > loop_vec_info loop_vinfo,
> > > > > >    *live_p = false;
> > > > > >
> > > > > >    /* cond stmt other than loop exit cond.  */
> > > > > > -  if (is_ctrl_stmt (stmt_info->stmt)
> > > > > > -      && STMT_VINFO_TYPE (stmt_info) !=
> loop_exit_ctrl_vec_info_type)
> > > > > > -    *relevant = vect_used_in_scope;
> > >
> > > how was that ever hit before?  For outer loop processing with outer loop
> > > vectorization?
> > >
> > > > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > > > +    {
> > > > > > +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge,
> > > but
> > > > > > +	 it looks like loop_manip doesn't do that..  So we have to do it
> > > > > > +	 the hard way.  */
> > > > > > +      basic_block bb = gimple_bb (stmt_info->stmt);
> > > > > > +      bool exit_bb = false, early_exit = false;
> > > > > > +      edge_iterator ei;
> > > > > > +      edge e;
> > > > > > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > > > > > +        if (!flow_bb_inside_loop_p (loop, e->dest))
> > > > > > +	  {
> > > > > > +	    exit_bb = true;
> > > > > > +	    early_exit = loop->vec_loop_iv->src != bb;
> > > > > > +	    break;
> > > > > > +	  }
> > > > > > +
> > > > > > +      /* We should have processed any exit edge, so an edge not an
> early
> > > > > > +	 break must be a loop IV edge.  We need to distinguish
> between the
> > > > > > +	 two as we don't want to generate code for the main loop IV.
> */
> > > > > > +      if (exit_bb)
> > > > > > +	{
> > > > > > +	  if (early_exit)
> > > > > > +	    *relevant = vect_used_in_scope;
> > > > > > +	}
> > >
> > > I wonder why you can't simply do
> > >
> > >          if (is_ctrl_stmt (stmt_info->stmt)
> > >              && stmt_info->stmt != LOOP_VINFO_COND (loop_info))
> > >
> > > ?
> > >
> > > > > > +      else if (bb->loop_father == loop)
> > > > > > +	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;
> > >
> > > so for control flow not exiting the loop you can check
> > > loop_exits_from_bb_p ().
> > >
> > > > > > +    }
> > > > > >
> > > > > >    /* changing memory.  */
> > > > > >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> > > > > > @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info
> > > stmt_info,
> > > > > loop_vec_info loop_vinfo,
> > > > > >  	*relevant = vect_used_in_scope;
> > > > > >        }
> > > > > >
> > > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > +  auto_bitmap exit_bbs;
> > > > > > +  for (edge exit : exits)
> > > > > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > > > +
> > > > > >    /* uses outside the loop.  */
> > > > > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > > > SSA_OP_DEF)
> > > > > >      {
> > > > > > @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info
> stmt_info,
> > > > > loop_vec_info loop_vinfo,
> > > > > >  	      /* We expect all such uses to be in the loop exit phis
> > > > > >  		 (because of loop closed form)   */
> > > > > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) ==
> GIMPLE_PHI);
> > > > > > -	      gcc_assert (bb == single_exit (loop)->dest);
> > > > > > +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
> > >
> > > That now becomes quite expensive checking already covered by the LC SSA
> > > verifier so I suggest to simply drop this assert instead.
> > >
> > > > > >                *live_p = true;
> > > > > >  	    }
> > > > > > @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized
> > > > > (loop_vec_info loop_vinfo, bool *fatal)
> > > > > >  	}
> > > > > >      }
> > > > > >
> > > > > > +  /* Ideally this should be in vect_analyze_loop_form but we haven't
> > > seen all
> > > > > > +     the conds yet at that point and there's no quick way to retrieve
> them.
> > > */
> > > > > > +  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
> > > > > > +    return opt_result::failure_at (vect_location,
> > > > > > +				   "not vectorized:"
> > > > > > +				   " unsupported control flow in
> loop.\n");
> > >
> > > so we didn't do this before?  But see above where I wondered.  So when
> > > does this hit with early exits and why can't we check for this in
> > > vect_verify_loop_form?
> > >
> > > > > > +
> > > > > >    /* 2. Process_worklist */
> > > > > >    while (worklist.length () > 0)
> > > > > >      {
> > > > > > @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized
> > > > > (loop_vec_info loop_vinfo, bool *fatal)
> > > > > >  			return res;
> > > > > >  		    }
> > > > > >                   }
> > > > > > +	    }
> > > > > > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo-
> >stmt))
> > > > > > +	    {
> > > > > > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > > > > > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) ==
> tcc_comparison);
> > > > > > +	      opt_result res
> > > > > > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > > > > +			       loop_vinfo, relevant, &worklist, false);
> > > > > > +	      if (!res)
> > > > > > +		return res;
> > > > > > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > > > > +				loop_vinfo, relevant, &worklist, false);
> > > > > > +	      if (!res)
> > > > > > +		return res;
> > > > > >              }
> > > > > >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> > > > > >  	    {
> > > > > > @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > > >  			     node_instance, cost_vec);
> > > > > >        if (!res)
> > > > > >  	return res;
> > > > > > -   }
> > > > > > +    }
> > > > > > +
> > > > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > > > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > > > > >
> > > > > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > > > >      {
> > > > > >        case vect_internal_def:
> > > > > > +      case vect_early_exit_def:
> > > > > >          break;
> > > > > >
> > > > > >        case vect_reduction_def:
> > > > > > @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > > >      {
> > > > > >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > > > > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > > > > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > > > > >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> > > > > >        *need_to_vectorize = true;
> > > > > >      }
> > > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > > > index
> > > > >
> > >
> ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b3
> > > > > 40baff61d18da8e242dd 100644
> > > > > > --- a/gcc/tree-vectorizer.h
> > > > > > +++ b/gcc/tree-vectorizer.h
> > > > > > @@ -63,6 +63,7 @@ enum vect_def_type {
> > > > > >    vect_internal_def,
> > > > > >    vect_induction_def,
> > > > > >    vect_reduction_def,
> > > > > > +  vect_early_exit_def,
> > >
> > > can you avoid putting this inbetween reduction and double reduction
> > > please?  Just put it before vect_unknown_def_type.  In fact the COND
> > > isn't a def ... maybe we should have pattern recogized
> > >
> > >  if (a < b) exit;
> > >
> > > as
> > >
> > >  cond = a < b;
> > >  if (cond != 0) exit;
> > >
> > > so the part that we need to vectorize is more clear.
> > >
> > > > > >    vect_double_reduction_def,
> > > > > >    vect_nested_cycle,
> > > > > >    vect_first_order_recurrence,
> > > > > > @@ -876,6 +877,13 @@ public:
> > > > > >       we need to peel off iterations at the end to form an epilogue loop.
> */
> > > > > >    bool peeling_for_niter;
> > > > > >
> > > > > > +  /* When the loop has early breaks that we can vectorize we need to
> > > peel
> > > > > > +     the loop for the break finding loop.  */
> > > > > > +  bool early_breaks;
> > > > > > +
> > > > > > +  /* When the loop has a non-early break control flow inside.  */
> > > > > > +  bool non_break_control_flow;
> > > > > > +
> > > > > >    /* List of loop additional IV conditionals found in the loop.  */
> > > > > >    auto_vec<gcond *> conds;
> > > > > >
> > > > > > @@ -985,9 +993,11 @@ public:
> > > > > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)-
> >reduction_chains
> > > > > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)-
> >peeling_for_gaps
> > > > > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)-
> >peeling_for_niter
> > > > > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > > > > >  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> > > > > >early_break_conflict
> > > > > >  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)-
> > > >early_break_dest_bb
> > > > > >  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)-
> >early_break_vuses
> > > > > > +#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)-
> > > > > >non_break_control_flow
> > > > > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > > > > >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > > > > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > > > > >no_data_dependencies
> > > > > > @@ -1038,8 +1048,8 @@ public:
> > > > > >     stack.  */
> > > > > >  typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
> > > > > >
> > > > > > -inline loop_vec_info
> > > > > > -loop_vec_info_for_loop (class loop *loop)
> > > > > > +static inline loop_vec_info
> > > > > > +loop_vec_info_for_loop (const class loop *loop)
> > > > > >  {
> > > > > >    return (loop_vec_info) loop->aux;
> > > > > >  }
> > > > > > @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
> > > > > >  {
> > > > > >    if (bb == (bb->loop_father)->header)
> > > > > >      return true;
> > > > > > -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> > > > > > +
> > > > > >    return false;
> > > > > >  }
> > > > > >
> > > > > > @@ -2176,9 +2186,10 @@ class auto_purge_vect_location
> > > > > >     in tree-vect-loop-manip.cc.  */
> > > > > >  extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > > > >  				     tree, tree, tree, bool);
> > > > > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> > > const_edge);
> > > > > > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> > > > > const_edge);
> > > > > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > > > > > -						     class loop *, edge);
> > > > > > +						    class loop *, edge,
> bool,
> > > > > > +						    vec<basic_block> *
> = NULL);
> > > > > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);
> > > > > >  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > > > > >  				    tree *, tree *, tree *, int, bool, bool,
> > > > > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > > > > index
> > > > >
> > >
> a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c
> > > > > 67f29f5dc9a8d72235f4 100644
> > > > > > --- a/gcc/tree-vectorizer.cc
> > > > > > +++ b/gcc/tree-vectorizer.cc
> > > > > > @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
> > > > > >  	 predicates that need to be shared for optimal predicate usage.
> > > > > >  	 However reassoc will re-order them and prevent CSE from
> working
> > > > > >  	 as it should.  CSE only the loop body, not the entry.  */
> > > > > > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > > > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> > >
> > > seeing this more and more I think we want a simple way to iterate over
> > > all exits without copying to a vector when we have them recorded.  My
> > > C++ fu is too limited to support
> > >
> > >   for (auto exit : recorded_exits (loop))
> > >     ...
> > >
> > > (maybe that's enough for somebody to jump onto this ;))
> > >
> > > Don't treat all review comments as change orders, but it should be clear
> > > the code isn't 100% obvious.  Maybe the patch can be simplified by
> > > splitting out the LC SSA cleanup parts.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > > > +      for (edge exit : exits)
> > > > > > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > > >
> > > > > >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> > > > > >        do_rpo_vn (fun, entry, exit_bbs);
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > Nuernberg,
> > > > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > > > > Moerman;
> > > > > HRB 36809 (AG Nuernberg)
> > > >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
  2023-08-18 13:12             ` Tamar Christina
@ 2023-08-18 13:15               ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-08-18 13:15 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 18 Aug 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Friday, August 18, 2023 2:53 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> > updates for early break.
> > 
> > On Fri, 18 Aug 2023, Tamar Christina wrote:
> > 
> > > > > Yeah if you comment it out one of the testcases should fail.
> > > >
> > > > using new_preheader instead of e->dest would make things clearer.
> > > >
> > > > You are now adding the same arg to every exit (you've just queried the
> > > > main exit redirect_edge_var_map_vector).
> > > >
> > > > OK, so I think I understand what you're doing.  If I understand
> > > > correctly we know that when we exit the main loop via one of the
> > > > early exits we are definitely going to enter the epilog but when
> > > > we take the main exit we might not.
> > > >
> > >
> > > Correct.. but..
> > >
> > > > Looking at the CFG we create currently this isn't reflected and
> > > > this complicates this PHI node updating.  What I'd try to do
> > > > is leave redirecting the alternate exits until after
> > >
> > > It is, in the case of the alternate exits this is reflected in copying
> > > the same values, as they are the values of the number of completed
> > > iterations since the scalar code restarts the last iteration.
> > >
> > > So all the PHI nodes of the alternate exits are correct.  The vector
> > > Iteration doesn't handle the partial iteration.
> > >
> > > > slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> > > > means leaving it almost unchanged besides the LC SSA maintaining
> > > > changes.  After that for the multi-exit case split the
> > > > epilog preheader edge and redirect all the alternate exits to the
> > > > new preheader.  So the CFG becomes
> > > >
> > > >                  <original loop>
> > > >                 /      |
> > > >                /    <main exit w/ original LC PHI>
> > > >               /      if (epilog)
> > > >    alt exits /        /  \
> > > >             /        /    loop around
> > > >             |       /
> > > >            preheader with "header" PHIs
> > > >               |
> > > >           <epilog>
> > > >
> > > > note you need the header PHIs also on the main exit path but you
> > > > only need the loop end PHIs there.
> > > >
> > > > It seems so that at least currently the order of things makes
> > > > them more complicated than necessary.
> > >
> > > I've been trying to, but this representation seems a lot harder to work with,
> > > In particular at the moment once we exit
> > slpeel_tree_duplicate_loop_to_edge_cfg
> > > the loop structure is exactly the same as one expects from any normal epilog
> > vectorization.
> > >
> > > But this new representation requires me to place the guard much earlier than
> > the epilogue
> > > preheader,  yet I still have to adjust the PHI nodes in the preheader.  So it
> > seems that this split
> > > is there to only indicate that we always enter the epilog when taking an early
> > exit.
> > >
> > > Today this is reflected in the values of the PHI nodes rather than structurally.
> > Once we place
> > > The guard we update the nodes and the alternate exits get their value for
> > ivtmp updated to VF.
> > >
> > > This representation also forces me to do the redirection in every call site of
> > > slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated
> > in all use sites.
> > >
> > > But I think this doesn't address the main reason why the
> > slpeel_tree_duplicate_loop_to_edge_cfg
> > > code has a large block of code to deal with PHI node updates.
> > >
> > > The reason as you mentioned somewhere else is that after we redirect the
> > edges I have to reconstruct
> > > the phi nodes.  For most it's straight forwards, but for live values or vuse
> > chains it requires extra code.
> > >
> > > You're right in that before we redirect the edges they are all correct in the exit
> > block, you mentioned that
> > > the API for the edge redirection is supposed to copy the values over if I
> > create the phi nodes before hand.
> > >
> > > However this doesn't seem to work:
> > >
> > >      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> > > 	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
> > > 	{
> > > 	  gimple *from_phi = gsi_stmt (gsi_from);
> > > 	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > > 	  create_phi_node (new_res, new_preheader);
> > > 	}
> > >
> > >       for (edge exit : loop_exits)
> > > 	redirect_edge_and_branch (exit, new_preheader);
> > >
> > > Still leaves them empty.  Grepping around most code seems to pair
> > redirect_edge_and_branch with
> > > copy_phi_arg_into_existing_phi.  The problem is that in all these cases after
> > redirecting an edge they
> > > call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the phi
> > nodes.
> > 
> > You need to call flush_pending_stmts on each edge you redirect.
> > copy_phi_arg_into_existing_phi isn't suitable for edge redirecting.
> 
> Oh. I'll give that a try, that would make sense.. I didn't flush it in the current approach
> because I needed the map, but since I want to get rid of the map, this makes sense.
> 
> > 
> > > This is because as I redirect_edge_and_branch destroys the phi node entries
> > and copy_phi_arg_into_existing_phi
> > > simply just reads the gimple_phi_arg_def which would be NULL.
> > >
> > > You could point it to the src block of the exit, in which case it copies the
> > wrong values in for the vuses.  At the end
> > > of vectorization the cfgcleanup code does the same thing to maintain LCSSA
> > if you haven't.  This code always goes
> > > wrong for multiple exits because of the problem described above.  There's no
> > node for it to copy the right value
> > > from.
> > >
> > > As an alternate approach I can split the exit edges, copy the phi nodes into
> > the split and after that redirect them.
> > > This however creates the awkwardness of having the exit edges no longer
> > connect to the preheader.
> > >
> > > All of this then begs the question if this is all easier than the current approach
> > which is just to read the edge var
> > > map to figure out the nodes that were removed during the redirect.
> > 
> > But the edge map is supposed to be applied via flush_pending_stmts,
> > specifically it relies on PHI nodes having a 1:1 correspondence between
> > old and new destination and thus is really designed for the case
> > you copy the destination and redirect an edge to the copy.
> 
> Ah you were referring to flush_pending_stmts,  right that would make more sense.
> Will give it  a go, thanks!
> 
> > 
> > That is, the main issue I have with the CFG manipulation is that it
> > isn't broken down to simple operations that in themselves leave
> > everything correct.  I think it should be possible to do this
> > and as 2nd step only do the special massaging for the early exit
> > LC PHIs that feed into the epilogue loop.
> > 
> 
> I see, So I think, If I understand what you're saying correctly, is that
> you would like instead of having 1 big function, to have little smaller
> helper functions that each on their own do something small but
> correct and when called in sequence perform the complicated work?
> 
> i.e. you still have one top level function, and at the end of that call
> everything is in order? 

Well, ideally we'd be able to use existing APIs to lay out the CFG
and only need to do vectorizer transform specific things for the
"missing" PHIs.

> > As you say the code is quite complicated even without early break
> > vectorization which is why I originally suggested to try "fixing" it
> > as prerequesite.  It does have the same fundamental issue when feeding
> > the epilogue - the "missing" LC PHIs, the difference is only that
> > without early break vectorization we take the exit values while
> > for early break vectorization we take the latch values from the
> > previous iteration(?)
> 
> Ok, If I understood the above correctly with how you wanted the sequence
> split out, then would this work for you:
> 
> I pull out from the patch series:
> 
> 1. the single_exit removal and using of our own IV in the vectorizer.
> 2. The refactoring of the current peeling code, without new functionality but
>      splitting it up into the logical little helper functions.
> 
> And get those committed separately and then rebase on the early break stuff
> which will add handling the additional case for multiple exits in the peeling step?

Yeah, that would be nice.

> I already did 1, I didn't do 2 because the function was reworked in one go to both
> clean it up and support multiple exits.  But would make sense to pull them out.
> 
> Is it ok to also pull out the vectorable_comparison refactoring as well? I haven't
> committed it yet because without the early_break support it doesn't look useful
> but rebases are tricky when it changes.

I don't remember exactly what that was, but I think yes, that's fine.

> > 
> > > Maybe I'm still misunderstanding the API, but reading the sources of the
> > functions, they all copy values from *existing*
> > > phi nodes.  And any existing phi node after the redirect are not correct.
> > >
> > > gimple_redirect_edge_and_branch has a chunk that indicates it should have
> > updated the PHI nodes before calling
> > > ssa_redirect_edge to remove the old ones, but there's no code there. It's all
> > empty.
> > >
> > > Most of the other refactorings/changes were easy enough to do, but this
> > one I seem to be a struggling with.
> > 
> > I see.  If you are tired of trying feel free to send an updated series
> > with the other changes, if it looks awkward but correct we can see
> > someone doing the cleanup afterwards.
> > 
> 
> Thanks, I'm not tired yet ?, just confused, and I think it's clearer now.  If you agree
> with the above I can pull them out of the series.
> 
> Cheers,
> Tamar
> 
> > Richard.
> > 
> > > Thanks,
> > > Tamar
> > > >
> > > > > >
> > > > > > >  	}
> > > > > > > -      redirect_edge_and_branch_force (e, new_preheader);
> > > > > > > -      flush_pending_stmts (e);
> > > > > > > +
> > > > > > >        set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> > e-
> > > > >src);
> > > > > > > -      if (was_imm_dom || duplicate_outer_loop)
> > > > > > > +
> > > > > > > +      if ((was_imm_dom || duplicate_outer_loop) &&
> > !multiple_exits_p)
> > > > > > >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest,
> > new_exit-
> > > > > > >src);
> > > > > > >
> > > > > > >        /* And remove the non-necessary forwarder again.  Keep the
> > other
> > > > > > > @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > > > (class
> > > > > > loop *loop,
> > > > > > >        delete_basic_block (preheader);
> > > > > > >        set_immediate_dominator (CDI_DOMINATORS, scalar_loop-
> > >header,
> > > > > > >  			       loop_preheader_edge (scalar_loop)->src);
> > > > > > > +
> > > > > > > +      /* Finally after wiring the new epilogue we need to update its
> > main
> > > > exit
> > > > > > > +	 to the original function exit we recorded.  Other exits are
> > already
> > > > > > > +	 correct.  */
> > > > > > > +      if (multiple_exits_p)
> > > > > > > +	{
> > > > > > > +	  for (edge e : get_loop_exit_edges (loop))
> > > > > > > +	    doms.safe_push (e->dest);
> > > > > > > +	  update_loop = new_loop;
> > > > > > > +	  doms.safe_push (exit_dest);
> > > > > > > +
> > > > > > > +	  /* Likely a fall-through edge, so update if needed.  */
> > > > > > > +	  if (single_succ_p (exit_dest))
> > > > > > > +	    doms.safe_push (single_succ (exit_dest));
> > > > > > > +	}
> > > > > > >      }
> > > > > > >    else /* Add the copy at entry.  */
> > > > > > >      {
> > > > > > > +      /* Copy the current loop LC PHI nodes between the original loop
> > exit
> > > > > > > +	 block and the new loop header.  This allows us to later split
> > the
> > > > > > > +	 preheader block and still find the right LC nodes.  */
> > > > > > > +      edge old_latch_loop = loop_latch_edge (loop);
> > > > > > > +      edge old_latch_init = loop_preheader_edge (loop);
> > > > > > > +      edge new_latch_loop = loop_latch_edge (new_loop);
> > > > > > > +      edge new_latch_init = loop_preheader_edge (new_loop);
> > > > > > > +      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),
> > > > > >
> > > > > > see above
> > > > > >
> > > > > > > +	   gsi_to = gsi_start_phis (old_latch_loop->dest);
> > > > > > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p
> > (gsi_to);
> > > > > > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > > > > > +	{
> > > > > > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > > > > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > > > > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > > > > > new_latch_loop);
> > > > > > > +	  adjust_phi_and_debug_stmts (to_phi, old_latch_init,
> > new_arg);
> > > > > > > +	}
> > > > > > > +
> > > > > > >        if (scalar_loop != loop)
> > > > > > >  	{
> > > > > > >  	  /* Remove the non-necessary forwarder of scalar_loop again.
> > */
> > > > > > > @@ -1677,31 +1819,36 @@
> > slpeel_tree_duplicate_loop_to_edge_cfg
> > > > (class
> > > > > > loop *loop,
> > > > > > >        delete_basic_block (new_preheader);
> > > > > > >        set_immediate_dominator (CDI_DOMINATORS, new_loop-
> > >header,
> > > > > > >  			       loop_preheader_edge (new_loop)->src);
> > > > > > > +
> > > > > > > +      if (multiple_exits_p)
> > > > > > > +	update_loop = loop;
> > > > > > >      }
> > > > > > >
> > > > > > > -  if (scalar_loop != loop)
> > > > > > > +  if (multiple_exits_p)
> > > > > > >      {
> > > > > > > -      /* Update new_loop->header PHIs, so that on the preheader
> > > > > > > -	 edge they are the ones from loop rather than scalar_loop.  */
> > > > > > > -      gphi_iterator gsi_orig, gsi_new;
> > > > > > > -      edge orig_e = loop_preheader_edge (loop);
> > > > > > > -      edge new_e = loop_preheader_edge (new_loop);
> > > > > > > -
> > > > > > > -      for (gsi_orig = gsi_start_phis (loop->header),
> > > > > > > -	   gsi_new = gsi_start_phis (new_loop->header);
> > > > > > > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > > > > > > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > > > > > > +      for (edge e : get_loop_exit_edges (update_loop))
> > > > > > >  	{
> > > > > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > > > > -	  gphi *new_phi = gsi_new.phi ();
> > > > > > > -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > > > > > > -	  location_t orig_locus
> > > > > > > -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > > > > > > -
> > > > > > > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > > > > > > +	  edge ex;
> > > > > > > +	  edge_iterator ei;
> > > > > > > +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> > > > > > > +	    {
> > > > > > > +	      /* Find the first non-fallthrough block as fall-throughs can't
> > > > > > > +		 dominate other blocks.  */
> > > > > > > +	      while ((ex->flags & EDGE_FALLTHRU)
> > > >
> > > > For the prologue peeling any early exit we take would skip all other
> > > > loops so we can simply leave them and their LC PHI nodes in place.
> > > > We need extra PHIs only on the path to the main vector loop.  I
> > > > think the comment isn't accurately reflecting what we do.  In
> > > > fact we do not add any LC PHI nodes here but simply adjust the
> > > > main loop header PHI arguments?
> > > >
> > > > > > I don't think EDGE_FALLTHRU is set correctly, what's wrong with
> > > > > > just using single_succ_p here?  A fallthru edge src dominates the
> > > > > > fallthru edge dest, so the sentence above doesn't make sense.
> > > > >
> > > > > I wanted to say, that the immediate dominator of a block is never
> > > > > an fall through block.  At least from what I understood from how
> > > > > the dominators are calculated in the code, though may have missed
> > > > > something.
> > > >
> > > >  BB1
> > > >   |
> > > >  BB2
> > > >   |
> > > >  BB3
> > > >
> > > > here the immediate dominator of BB3 is BB2 and that of BB2 is BB1.
> > > >
> > > > > >
> > > > > > > +		     && single_succ_p (ex->dest))
> > > > > > > +		{
> > > > > > > +		  doms.safe_push (ex->dest);
> > > > > > > +		  ex = single_succ_edge (ex->dest);
> > > > > > > +		}
> > > > > > > +	      doms.safe_push (ex->dest);
> > > > > > > +	    }
> > > > > > > +	  doms.safe_push (e->dest);
> > > > > > >  	}
> > > > > > > -    }
> > > > > > >
> > > > > > > +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > > > > +      if (updated_doms)
> > > > > > > +	updated_doms->safe_splice (doms);
> > > > > > > +    }
> > > > > > >    free (new_bbs);
> > > > > > >    free (bbs);
> > > > > > >
> > > > > > > @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const
> > > > > > loop_vec_info loop_vinfo, const_edge e)
> > > > > > >    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> > > > > > >    unsigned int num_bb = loop->inner? 5 : 2;
> > > > > > >
> > > > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > +    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
> > > > > > > +
> > > > > >
> > > > > > I think checking the number of BBs is odd, I don't remember anything
> > > > > > in slpeel is specifically tied to that?  I think we can simply drop
> > > > > > this or do you remember anything that would depend on ->num_nodes
> > > > > > being only exactly 5 or 2?
> > > > >
> > > > > Never actually seemed to require it, but they're used as some check to
> > > > > see if there are unexpected control flow in the loop.
> > > > >
> > > > > i.e. this would say no if you have an if statement in the loop that wasn't
> > > > > converted.  The other part of this and the accompanying explanation is in
> > > > > vect_analyze_loop_form.  In the patch series I had to remove the hard
> > > > > num_nodes == 2 check from there because number of nodes restricted
> > > > > things too much.  If you have an empty fall through block, which seems
> > to
> > > > > happen often between the main exit and the latch block then we'd not
> > > > > vectorize.
> > > > >
> > > > > So instead I now rejects loops after analyzing the gcond.  So think this
> > check
> > > > > can go/needs to be different.
> > > >
> > > > Lets remove it from this function then.
> > > >
> > > > > >
> > > > > > >    /* All loops have an outer scope; the only case loop->outer is NULL is
> > for
> > > > > > >       the function itself.  */
> > > > > > >    if (!loop_outer (loop)
> > > > > > > @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer
> > > > > > (loop_vec_info loop_vinfo,
> > > > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > >    basic_block update_bb = update_e->dest;
> > > > > > >
> > > > > > > +  /* For early exits we'll update the IVs in
> > > > > > > +     vect_update_ivs_after_early_break.  */
> > > > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > +    return;
> > > > > > > +
> > > > > > >    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > > > >
> > > > > > >    /* Make sure there exists a single-predecessor exit bb:  */
> > > > > > > @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer
> > > > > > (loop_vec_info loop_vinfo,
> > > > > > >        /* Fix phi expressions in the successor bb.  */
> > > > > > >        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > > > > >      }
> > > > > > > +  return;
> > > > > >
> > > > > > we don't usually place a return at the end of void functions
> > > > > >
> > > > > > > +}
> > > > > > > +
> > > > > > > +/*   Function vect_update_ivs_after_early_break.
> > > > > > > +
> > > > > > > +     "Advance" the induction variables of LOOP to the value they
> > should
> > > > take
> > > > > > > +     after the execution of LOOP.  This is currently necessary because
> > the
> > > > > > > +     vectorizer does not handle induction variables that are used after
> > the
> > > > > > > +     loop.  Such a situation occurs when the last iterations of LOOP are
> > > > > > > +     peeled, because of the early exit.  With an early exit we always peel
> > > > the
> > > > > > > +     loop.
> > > > > > > +
> > > > > > > +     Input:
> > > > > > > +     - LOOP_VINFO - a loop info structure for the loop that is going to
> > be
> > > > > > > +		    vectorized. The last few iterations of LOOP were
> > peeled.
> > > > > > > +     - LOOP - a loop that is going to be vectorized. The last few
> > iterations
> > > > > > > +	      of LOOP were peeled.
> > > > > > > +     - VF - The loop vectorization factor.
> > > > > > > +     - NITERS_ORIG - the number of iterations that LOOP executes
> > (before
> > > > it is
> > > > > > > +		     vectorized). i.e, the number of times the ivs should
> > be
> > > > > > > +		     bumped.
> > > > > > > +     - NITERS_VECTOR - The number of iterations that the vector LOOP
> > > > > > executes.
> > > > > > > +     - UPDATE_E - a successor edge of LOOP->exit that is on the (only)
> > > > path
> > > > > > > +		  coming out from LOOP on which there are uses of
> > the LOOP
> > > > > > ivs
> > > > > > > +		  (this is the path from LOOP->exit to epilog_loop-
> > >preheader).
> > > > > > > +
> > > > > > > +		  The new definitions of the ivs are placed in LOOP-
> > >exit.
> > > > > > > +		  The phi args associated with the edge UPDATE_E in
> > the bb
> > > > > > > +		  UPDATE_E->dest are updated accordingly.
> > > > > > > +
> > > > > > > +     Output:
> > > > > > > +       - If available, the LCSSA phi node for the loop IV temp.
> > > > > > > +
> > > > > > > +     Assumption 1: Like the rest of the vectorizer, this function
> > assumes
> > > > > > > +     a single loop exit that has a single predecessor.
> > > > > > > +
> > > > > > > +     Assumption 2: The phi nodes in the LOOP header and in
> > update_bb
> > > > are
> > > > > > > +     organized in the same order.
> > > > > > > +
> > > > > > > +     Assumption 3: The access function of the ivs is simple enough (see
> > > > > > > +     vect_can_advance_ivs_p).  This assumption will be relaxed in the
> > > > future.
> > > > > > > +
> > > > > > > +     Assumption 4: Exactly one of the successors of LOOP exit-bb is on
> > a
> > > > path
> > > > > > > +     coming out of LOOP on which the ivs of LOOP are used (this is the
> > > > path
> > > > > > > +     that leads to the epilog loop; other paths skip the epilog loop).
> > This
> > > > > > > +     path starts with the edge UPDATE_E, and its destination (denoted
> > > > > > update_bb)
> > > > > > > +     needs to have its phis updated.
> > > > > > > + */
> > > > > > > +
> > > > > > > +static tree
> > > > > > > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class
> > > > loop *
> > > > > > epilog,
> > > > > > > +				   poly_int64 vf, tree niters_orig,
> > > > > > > +				   tree niters_vector, edge update_e)
> > > > > > > +{
> > > > > > > +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > +    return NULL;
> > > > > > > +
> > > > > > > +  gphi_iterator gsi, gsi1;
> > > > > > > +  tree ni_name, ivtmp = NULL;
> > > > > > > +  basic_block update_bb = update_e->dest;
> > > > > > > +  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > > > > +  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > > > +  basic_block exit_bb = loop_iv->dest;
> > > > > > > +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > > +  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
> > > > > > > +
> > > > > > > +  gcc_assert (cond);
> > > > > > > +
> > > > > > > +  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> > > > (update_bb);
> > > > > > > +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > > > > +       gsi_next (&gsi), gsi_next (&gsi1))
> > > > > > > +    {
> > > > > > > +      tree init_expr, final_expr, step_expr;
> > > > > > > +      tree type;
> > > > > > > +      tree var, ni, off;
> > > > > > > +      gimple_stmt_iterator last_gsi;
> > > > > > > +
> > > > > > > +      gphi *phi = gsi1.phi ();
> > > > > > > +      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi,
> > > > loop_preheader_edge
> > > > > > (epilog));
> > > > > >
> > > > > > I'm confused about the setup.  update_bb looks like the block with the
> > > > > > loop-closed PHI nodes of 'loop' and the exit (update_e)?  How does
> > > > > > loop_preheader_edge (epilog) come into play here?  That would feed
> > into
> > > > > > epilog->header PHIs?!
> > > > >
> > > > > We can't query the type of the phis in the block with the LC PHI nodes, so
> > the
> > > > > Typical pattern seems to be that we iterate over a block that's part of the
> > > > loop
> > > > > and that would have the PHIs in the same order, just so we can get to the
> > > > > stmt_vec_info.
> > > > >
> > > > > >
> > > > > > It would be nice to name 'gsi[1]', 'update_e' and 'update_bb' in a
> > > > > > better way?  Is update_bb really epilog->header?!
> > > > > >
> > > > > > We're missing checking in PHI_ARG_DEF_FROM_EDGE, namely that
> > > > > > E->dest == gimple_bb (PHI) - we're just using E->dest_idx there
> > > > > > which "works" even for totally unrelated edges.
> > > > > >
> > > > > > > +      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT
> > (phi_ssa));
> > > > > > > +      if (!phi1)
> > > > > >
> > > > > > shouldn't that be an assert?
> > > > > >
> > > > > > > +	continue;
> > > > > > > +      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> > > > > > > +      if (dump_enabled_p ())
> > > > > > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > +			 "vect_update_ivs_after_early_break: phi:
> > %G",
> > > > > > > +			 (gimple *)phi);
> > > > > > > +
> > > > > > > +      /* Skip reduction and virtual phis.  */
> > > > > > > +      if (!iv_phi_p (phi_info))
> > > > > > > +	{
> > > > > > > +	  if (dump_enabled_p ())
> > > > > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > > +			     "reduc or virtual phi. skip.\n");
> > > > > > > +	  continue;
> > > > > > > +	}
> > > > > > > +
> > > > > > > +      /* For multiple exits where we handle early exits we need to carry
> > on
> > > > > > > +	 with the previous IV as loop iteration was not done because
> > we exited
> > > > > > > +	 early.  As such just grab the original IV.  */
> > > > > > > +      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge
> > > > > > (loop));
> > > > > >
> > > > > > but this should be taken care of by LC SSA?
> > > > >
> > > > > It is, the comment is probably missing details, this part just scales the
> > > > counter
> > > > > from VF to scalar counts.  It's just a reminder that this scaling is done
> > > > differently
> > > > > from normal single exit vectorization.
> > > > >
> > > > > >
> > > > > > OK, have to continue tomorrow from here.
> > > > >
> > > > > Cheers, Thank you!
> > > > >
> > > > > Tamar
> > > > >
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > +      if (gimple_cond_lhs (cond) != phi_ssa
> > > > > > > +	  && gimple_cond_rhs (cond) != phi_ssa)
> > > >
> > > > so this is a way to avoid touching the main IV?  Looks a bit fragile to
> > > > me.  Hmm, we're iterating over the main loop header PHIs here?
> > > > Can't you check, say, the relevancy of the PHI node instead?  Though
> > > > it might also be used as induction.  Can't it be used as alternate
> > > > exit like
> > > >
> > > >   for (i)
> > > >    {
> > > >      if (i & bit)
> > > >        break;
> > > >    }
> > > >
> > > > and would we need to adjust 'i' then?
> > > >
> > > > > > > +	{
> > > > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART
> > (phi_info);
> > > > > > > +	  step_expr = unshare_expr (step_expr);
> > > > > > > +
> > > > > > > +	  /* We previously generated the new merged phi in the same
> > BB as
> > > > > > the
> > > > > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > > > > +	     normal loop phi which don't take the early breaks into
> > account.  */
> > > > > > > +	  final_expr = gimple_phi_result (phi1);
> > > > > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (),
> > > > > > loop_preheader_edge (loop));
> > > > > > > +
> > > > > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > > > > +	  /* For early break the final loop IV is:
> > > > > > > +	     init + (final - init) * vf which takes into account peeling
> > > > > > > +	     values and non-single steps.  */
> > > > > > > +	  off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > +			     fold_convert (stype, final_expr),
> > > > > > > +			     fold_convert (stype, init_expr));
> > > > > > > +	  /* Now adjust for VF to get the final iteration value.  */
> > > > > > > +	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst
> > (stype, vf));
> > > > > > > +
> > > > > > > +	  /* Adjust the value with the offset.  */
> > > > > > > +	  if (POINTER_TYPE_P (type))
> > > > > > > +	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > +	  else
> > > > > > > +	    ni = fold_convert (type,
> > > > > > > +			       fold_build2 (PLUS_EXPR, stype,
> > > > > > > +					    fold_convert (stype,
> > init_expr),
> > > > > > > +					    off));
> > > > > > > +	  var = create_tmp_var (type, "tmp");
> > > >
> > > > so how does the non-early break code deal with updating inductions?
> > > > And how do you avoid altering this when we flow in from the normal
> > > > exit?  That is, you are updating the value in the epilog loop
> > > > header but don't you need to instead update the value only on
> > > > the alternate exit edges from the main loop (and keep the not
> > > > updated value on the main exit edge)?
> > > >
> > > > > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > > > > +	  gimple_seq new_stmts = NULL;
> > > > > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false,
> > var);
> > > > > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > > > > +	  if (!gsi_end_p (last_gsi))
> > > > > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts,
> > GSI_SAME_STMT);
> > > > > > > +	  else
> > > > > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts,
> > GSI_SAME_STMT);
> > > > > > > +
> > > > > > > +	  /* Fix phi expressions in the successor bb.  */
> > > > > > > +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > > > > > > +	}
> > > > > > > +      else
> > > > > > > +	{
> > > > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART
> > (phi_info);
> > > > > > > +	  step_expr = unshare_expr (step_expr);
> > > > > > > +
> > > > > > > +	  /* We previously generated the new merged phi in the same
> > BB as
> > > > > > the
> > > > > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > > > > +	     normal loop phi which don't take the early breaks into
> > account.  */
> > > > > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1,
> > loop_preheader_edge
> > > > > > (loop));
> > > > > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > > > > +
> > > > > > > +	  if (vf.is_constant ())
> > > > > > > +	    {
> > > > > > > +	      ni = fold_build2 (MULT_EXPR, stype,
> > > > > > > +				fold_convert (stype,
> > > > > > > +					      niters_vector),
> > > > > > > +				build_int_cst (stype, vf));
> > > > > > > +
> > > > > > > +	      ni = fold_build2 (MINUS_EXPR, stype,
> > > > > > > +				fold_convert (stype,
> > > > > > > +					      niters_orig),
> > > > > > > +				fold_convert (stype, ni));
> > > > > > > +	    }
> > > > > > > +	  else
> > > > > > > +	    /* If the loop's VF isn't constant then the loop must have
> > been
> > > > > > > +	       masked, so at the end of the loop we know we have
> > finished
> > > > > > > +	       the entire loop and found nothing.  */
> > > > > > > +	    ni = build_zero_cst (stype);
> > > > > > > +
> > > > > > > +	  ni = fold_convert (type, ni);
> > > > > > > +	  /* We don't support variable n in this version yet.  */
> > > > > > > +	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> > > > > > > +
> > > > > > > +	  var = create_tmp_var (type, "tmp");
> > > > > > > +
> > > > > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > > > > +	  gimple_seq new_stmts = NULL;
> > > > > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false,
> > var);
> > > > > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > > > > +	  if (!gsi_end_p (last_gsi))
> > > > > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts,
> > GSI_SAME_STMT);
> > > > > > > +	  else
> > > > > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts,
> > GSI_SAME_STMT);
> > > > > > > +
> > > > > > > +	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
> > > > > > > +
> > > > > > > +	  for (edge exit : alt_exits)
> > > > > > > +	    adjust_phi_and_debug_stmts (phi1, exit,
> > > > > > > +					build_int_cst (TREE_TYPE
> > (step_expr),
> > > > > > > +						       vf));
> > > > > > > +	  ivtmp = gimple_phi_result (phi1);
> > > > > > > +	}
> > > > > > > +    }
> > > > > > > +
> > > > > > > +  return ivtmp;
> > > > > > >  }
> > > > > > >
> > > > > > >  /* Return a gimple value containing the misalignment (measured in
> > > > vector
> > > > > > > @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf
> > > > > > (loop_vec_info loop_vinfo,
> > > > > > >
> > > > > > >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
> > > > > > >     this function searches for the corresponding lcssa phi node in exit
> > > > > > > -   bb of LOOP.  If it is found, return the phi result; otherwise return
> > > > > > > -   NULL.  */
> > > > > > > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is
> > found,
> > > > > > > +   return the phi result; otherwise return NULL.  */
> > > > > > >
> > > > > > >  static tree
> > > > > > >  find_guard_arg (class loop *loop, class loop *epilog
> > > > ATTRIBUTE_UNUSED,
> > > > > > > -		gphi *lcssa_phi)
> > > > > > > +		gphi *lcssa_phi, int lcssa_edge = 0)
> > > > > > >  {
> > > > > > >    gphi_iterator gsi;
> > > > > > >    edge e = loop->vec_loop_iv;
> > > > > > >
> > > > > > > -  gcc_assert (single_pred_p (e->dest));
> > > > > > >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > > >      {
> > > > > > >        gphi *phi = gsi.phi ();
> > > > > > > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > > > > > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > > > > > -	return PHI_RESULT (phi);
> > > > > > > -    }
> > > > > > > -  return NULL_TREE;
> > > > > > > -}
> > > > > > > -
> > > > > > > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > > > > > FIRST/SECOND
> > > > > > > -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> > > > > > > -   edge, the two loops are arranged as below:
> > > > > > > -
> > > > > > > -       preheader_a:
> > > > > > > -     first_loop:
> > > > > > > -       header_a:
> > > > > > > -	 i_1 = PHI<i_0, i_2>;
> > > > > > > -	 ...
> > > > > > > -	 i_2 = i_1 + 1;
> > > > > > > -	 if (cond_a)
> > > > > > > -	   goto latch_a;
> > > > > > > -	 else
> > > > > > > -	   goto between_bb;
> > > > > > > -       latch_a:
> > > > > > > -	 goto header_a;
> > > > > > > -
> > > > > > > -       between_bb:
> > > > > > > -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> > > > > > > -
> > > > > > > -     second_loop:
> > > > > > > -       header_b:
> > > > > > > -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > > > > > > -				 or with i_2 if no LCSSA phi is created
> > > > > > > -				 under condition of
> > > > > > CREATE_LCSSA_FOR_IV_PHIS.
> > > > > > > -	 ...
> > > > > > > -	 i_4 = i_3 + 1;
> > > > > > > -	 if (cond_b)
> > > > > > > -	   goto latch_b;
> > > > > > > -	 else
> > > > > > > -	   goto exit_bb;
> > > > > > > -       latch_b:
> > > > > > > -	 goto header_b;
> > > > > > > -
> > > > > > > -       exit_bb:
> > > > > > > -
> > > > > > > -   This function creates loop closed SSA for the first loop; update the
> > > > > > > -   second loop's PHI nodes by replacing argument on incoming edge
> > with
> > > > the
> > > > > > > -   result of newly created lcssa PHI nodes.  IF
> > > > CREATE_LCSSA_FOR_IV_PHIS
> > > > > > > -   is false, Loop closed ssa phis will only be created for non-iv phis for
> > > > > > > -   the first loop.
> > > > > > > -
> > > > > > > -   This function assumes exit bb of the first loop is preheader bb of
> > the
> > > > > > > -   second loop, i.e, between_bb in the example code.  With PHIs
> > updated,
> > > > > > > -   the second loop will execute rest iterations of the first.  */
> > > > > > > -
> > > > > > > -static void
> > > > > > > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > > > > > -				   class loop *first, class loop *second,
> > > > > > > -				   bool create_lcssa_for_iv_phis)
> > > > > > > -{
> > > > > > > -  gphi_iterator gsi_update, gsi_orig;
> > > > > > > -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > > -
> > > > > > > -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > > > > > > -  edge second_preheader_e = loop_preheader_edge (second);
> > > > > > > -  basic_block between_bb = single_exit (first)->dest;
> > > > > > > -
> > > > > > > -  gcc_assert (between_bb == second_preheader_e->src);
> > > > > > > -  gcc_assert (single_pred_p (between_bb) && single_succ_p
> > > > (between_bb));
> > > > > > > -  /* Either the first loop or the second is the loop to be vectorized.  */
> > > > > > > -  gcc_assert (loop == first || loop == second);
> > > > > > > -
> > > > > > > -  for (gsi_orig = gsi_start_phis (first->header),
> > > > > > > -       gsi_update = gsi_start_phis (second->header);
> > > > > > > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > > > > > > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > > > > > > -    {
> > > > > > > -      gphi *orig_phi = gsi_orig.phi ();
> > > > > > > -      gphi *update_phi = gsi_update.phi ();
> > > > > > > -
> > > > > > > -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> > > > > > > -      /* Generate lcssa PHI node for the first loop.  */
> > > > > > > -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> > > > > > > -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt
> > (vect_phi);
> > > > > > > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > > > > > > +      /* Nested loops with multiple exits can have different no# phi
> > node
> > > > > > > +	 arguments between the main loop and epilog as epilog falls to
> > the
> > > > > > > +	 second loop.  */
> > > > > > > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> > > > > > >  	{
> > > > > > > -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > > > > > > -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > > > > > > -	  add_phi_arg (lcssa_phi, arg, single_exit (first),
> > > > > > UNKNOWN_LOCATION);
> > > > > > > -	  arg = new_res;
> > > > > > > -	}
> > > > > > > -
> > > > > > > -      /* Update PHI node in the second loop by replacing arg on the
> > loop's
> > > > > > > -	 incoming edge.  */
> > > > > > > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e,
> > > > arg);
> > > > > > > -    }
> > > > > > > -
> > > > > > > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > > > > > > -     for correct vectorization of live stmts.  */
> > > > > > > -  if (loop == first)
> > > > > > > -    {
> > > > > > > -      basic_block orig_exit = single_exit (second)->dest;
> > > > > > > -      for (gsi_orig = gsi_start_phis (orig_exit);
> > > > > > > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > > > > > -	{
> > > > > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > > > > -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> > > > > > > -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> > > > > > (orig_arg))
> > > > > > > -	    continue;
> > > > > > > -
> > > > > > > -	  /* Already created in the above loop.   */
> > > > > > > -	  if (find_guard_arg (first, second, orig_phi))
> > > > > > > +	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
> > > > > > > +	  if (TREE_CODE (var) != SSA_NAME)
> > > > > > >  	    continue;
> > > > > > >
> > > > > > > -	  tree new_res = copy_ssa_name (orig_arg);
> > > > > > > -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > > > > > > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> > > > > > UNKNOWN_LOCATION);
> > > > > > > +	  if (operand_equal_p (get_current_def (var),
> > > > > > > +			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > > > > > > +	    return PHI_RESULT (phi);
> > > > > > >  	}
> > > > > > >      }
> > > > > > > +  return NULL_TREE;
> > > > > > >  }
> > > > > > >
> > > > > > >  /* Function slpeel_add_loop_guard adds guard skipping from the
> > > > beginning
> > > > > > > @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2
> > > > (class
> > > > > > loop *loop, class loop *epilog,
> > > > > > >    gcc_assert (single_succ_p (merge_bb));
> > > > > > >    edge e = single_succ_edge (merge_bb);
> > > > > > >    basic_block exit_bb = e->dest;
> > > > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > > > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > > > > > >
> > > > > > >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > > >      {
> > > > > > >        gphi *update_phi = gsi.phi ();
> > > > > > > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > > > > > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > > > > > >
> > > > > > >        tree merge_arg = NULL_TREE;
> > > > > > >
> > > > > > > @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2
> > (class
> > > > loop
> > > > > > *loop, class loop *epilog,
> > > > > > >        if (!merge_arg)
> > > > > > >  	merge_arg = old_arg;
> > > > > > >
> > > > > > > -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> > > > > > > +      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e-
> > > > >dest_idx);
> > > > > > >        /* If the var is live after loop but not a reduction, we simply
> > > > > > >  	 use the old arg.  */
> > > > > > >        if (!guard_arg)
> > > > > > > @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2
> > (class
> > > > > > loop *loop, class loop *epilog,
> > > > > > >      }
> > > > > > >  }
> > > > > > >
> > > > > > > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > > > > > > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > > > > > > -
> > > > > > > -static void
> > > > > > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > > > > > > -{
> > > > > > > -  gphi_iterator gsi;
> > > > > > > -  basic_block exit_bb = single_exit (epilog)->dest;
> > > > > > > -
> > > > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > > > -  edge e = EDGE_PRED (exit_bb, 0);
> > > > > > > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > > > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > > > > > -}
> > > > > > > -
> > > >
> > > > I wonder if we can still split these changes out to before early break
> > > > vect?
> > > >
> > > > > > >  /* EPILOGUE_VINFO is an epilogue loop that we now know would
> > need
> > > > to
> > > > > > >     iterate exactly CONST_NITERS times.  Make a final decision about
> > > > > > >     whether the epilogue loop should be used, returning true if so.  */
> > > > > > > @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo,
> > > > tree
> > > > > > niters, tree nitersm1,
> > > > > > >      bound_epilog += vf - 1;
> > > > > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > > > >      bound_epilog += 1;
> > > > > > > +  /* For early breaks the scalar loop needs to execute at most VF
> > times
> > > > > > > +     to find the element that caused the break.  */
> > > > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > +    {
> > > > > > > +      bound_epilog = vf;
> > > > > > > +      /* Force a scalar epilogue as we can't vectorize the index finding.
> > */
> > > > > > > +      vect_epilogues = false;
> > > > > > > +    }
> > > > > > >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > > > > > >    poly_uint64 bound_scalar = bound_epilog;
> > > > > > >
> > > > > > > @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info
> > > > loop_vinfo,
> > > > > > tree niters, tree nitersm1,
> > > > > > >  				  bound_prolog + bound_epilog)
> > > > > > >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > > > > > >  			 || vect_epilogues));
> > > > > > > +
> > > > > > > +  /* We only support early break vectorization on known bounds at
> > this
> > > > > > time.
> > > > > > > +     This means that if the vector loop can't be entered then we won't
> > > > > > generate
> > > > > > > +     it at all.  So for now force skip_vector off because the additional
> > > > control
> > > > > > > +     flow messes with the BB exits and we've already analyzed them.
> > */
> > > > > > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS
> > > > (loop_vinfo);
> > > > > > > +
> > > >
> > > > I think it should be as "easy" as entering the epilog via the block taking
> > > > the regular exit?
> > > >
> > > > > > >    /* Epilog loop must be executed if the number of iterations for epilog
> > > > > > >       loop is known at compile time, otherwise we need to add a check
> > at
> > > > > > >       the end of vector loop and skip to the end of epilog loop.  */
> > > > > > >    bool skip_epilog = (prolog_peeling < 0
> > > > > > >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > > > > >  		      || !vf.is_constant ());
> > > > > > > -  /* PEELING_FOR_GAPS is special because epilog loop must be
> > executed.
> > > > */
> > > > > > > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > > > > +  /* PEELING_FOR_GAPS and peeling for early breaks are special
> > because
> > > > > > epilog
> > > > > > > +     loop must be executed.  */
> > > > > > > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > >      skip_epilog = false;
> > > > > > > -
> > > > > > >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > > > > >    auto_vec<profile_count> original_counts;
> > > > > > >    basic_block *original_bbs = NULL;
> > > > > > > @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info
> > > > loop_vinfo,
> > > > > > tree niters, tree nitersm1,
> > > > > > >    if (prolog_peeling)
> > > > > > >      {
> > > > > > >        e = loop_preheader_edge (loop);
> > > > > > > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > > > > > > -
> > > > > > > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo,
> > e));
> > > > > > >        /* Peel prolog and put it on preheader edge of loop.  */
> > > > > > > -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop,
> > scalar_loop,
> > > > e);
> > > > > > > +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop,
> > scalar_loop,
> > > > e,
> > > > > > > +						       true);
> > > > > > >        gcc_assert (prolog);
> > > > > > >        prolog->force_vectorize = false;
> > > > > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop,
> > true);
> > > > > > > +
> > > > > > >        first_loop = prolog;
> > > > > > >        reset_original_copy_tables ();
> > > > > > >
> > > > > > > @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info
> > > > loop_vinfo,
> > > > > > tree niters, tree nitersm1,
> > > > > > >  	 as the transformations mentioned above make less or no
> > sense when
> > > > > > not
> > > > > > >  	 vectorizing.  */
> > > > > > >        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> > > > > > > -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > > > > > > +      auto_vec<basic_block> doms;
> > > > > > > +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e,
> > > > true,
> > > > > > > +						       &doms);
> > > > > > >        gcc_assert (epilog);
> > > > > > >
> > > > > > >        epilog->force_vectorize = false;
> > > > > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog,
> > false);
> > > > > > >
> > > > > > >        /* Scalar version loop may be preferred.  In this case, add guard
> > > > > > >  	 and skip to epilog.  Note this only happens when the number
> > of
> > > > > > > @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo,
> > > > tree
> > > > > > niters, tree nitersm1,
> > > > > > >        vect_update_ivs_after_vectorizer (loop_vinfo,
> > niters_vector_mult_vf,
> > > > > > >  					update_e);
> > > > > > >
> > > > > > > +      /* For early breaks we must create a guard to check how many
> > > > iterations
> > > > > > > +	 of the scalar loop are yet to be performed.  */
> > > >
> > > > We have this check anyway, no?  In fact don't we know that we always
> > enter
> > > > the epilog (see above)?
> > > >
> > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > +	{
> > > > > > > +	  tree ivtmp =
> > > > > > > +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf,
> > niters,
> > > > > > > +					       *niters_vector, update_e);
> > > > > > > +
> > > > > > > +	  gcc_assert (ivtmp);
> > > > > > > +	  tree guard_cond = fold_build2 (EQ_EXPR,
> > boolean_type_node,
> > > > > > > +					 fold_convert (TREE_TYPE
> > (niters),
> > > > > > > +						       ivtmp),
> > > > > > > +					 build_zero_cst (TREE_TYPE
> > (niters)));
> > > > > > > +	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)-
> > >dest;
> > > > > > > +
> > > > > > > +	  /* If we had a fallthrough edge, the guard will the threaded
> > through
> > > > > > > +	     and so we may need to find the actual final edge.  */
> > > > > > > +	  edge final_edge = epilog->vec_loop_iv;
> > > > > > > +	  /* slpeel_update_phi_nodes_for_guard2 expects an empty
> > block in
> > > > > > > +	     between the guard and the exit edge.  It only adds new
> > nodes and
> > > > > > > +	     doesn't update existing one in the current scheme.  */
> > > > > > > +	  basic_block guard_to = split_edge (final_edge);
> > > > > > > +	  edge guard_e = slpeel_add_loop_guard (guard_bb,
> > guard_cond,
> > > > > > guard_to,
> > > > > > > +						guard_bb,
> > prob_epilog.invert
> > > > > > (),
> > > > > > > +						irred_flag);
> > > > > > > +	  doms.safe_push (guard_bb);
> > > > > > > +
> > > > > > > +	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > > > > +
> > > > > > > +	  /* We must update all the edges from the new guard_bb.  */
> > > > > > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > > > > > +					      final_edge);
> > > > > > > +
> > > > > > > +	  /* If the loop was versioned we'll have an intermediate BB
> > between
> > > > > > > +	     the guard and the exit.  This intermediate block is required
> > > > > > > +	     because in the current scheme of things the guard block phi
> > > > > > > +	     updating can only maintain LCSSA by creating new blocks.
> > In this
> > > > > > > +	     case we just need to update the uses in this block as well.
> > */
> > > > > > > +	  if (loop != scalar_loop)
> > > > > > > +	    {
> > > > > > > +	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
> > > > > > > +		   !gsi_end_p (gsi); gsi_next (&gsi))
> > > > > > > +		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE
> > (gsi.phi (),
> > > > > > guard_e));
> > > > > > > +	    }
> > > > > > > +
> > > > > > > +	  flush_pending_stmts (guard_e);
> > > > > > > +	}
> > > > > > > +
> > > > > > >        if (skip_epilog)
> > > > > > >  	{
> > > > > > >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > > > > @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info
> > loop_vinfo,
> > > > tree
> > > > > > niters, tree nitersm1,
> > > > > > >  	    }
> > > > > > >  	  scale_loop_profile (epilog, prob_epilog, 0);
> > > > > > >  	}
> > > > > > > -      else
> > > > > > > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> > > > > > >
> > > > > > >        unsigned HOST_WIDE_INT bound;
> > > > > > >        if (bound_scalar.is_constant (&bound))
> > > > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > > > index
> > > > > >
> > > >
> > b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49
> > > > > > f21421958e7ee25eada 100644
> > > > > > > --- a/gcc/tree-vect-loop.cc
> > > > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > > > @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop
> > > > > > *loop_in, vec_info_shared *shared)
> > > > > > >      partial_load_store_bias (0),
> > > > > > >      peeling_for_gaps (false),
> > > > > > >      peeling_for_niter (false),
> > > > > > > +    early_breaks (false),
> > > > > > > +    non_break_control_flow (false),
> > > > > > >      no_data_dependencies (false),
> > > > > > >      has_mask_store (false),
> > > > > > >      scalar_loop_scaling (profile_probability::uninitialized ()),
> > > > > > > @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p
> > > > > > (loop_vec_info loop_vinfo)
> > > > > > >      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> > > > > > (LOOP_VINFO_ORIG_LOOP_INFO
> > > > > > >  					  (loop_vinfo));
> > > > > > >
> > > > > > > +  /* When we have multiple exits and VF is unknown, we must
> > require
> > > > > > partial
> > > > > > > +     vectors because the loop bounds is not a minimum but a
> > maximum.
> > > > > > That is to
> > > > > > > +     say we cannot unpredicate the main loop unless we peel or use
> > partial
> > > > > > > +     vectors in the epilogue.  */
> > > > > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > > > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > > > > > > +    return true;
> > > > > > > +
> > > > > > >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > > > > >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> > > > > > >      {
> > > > > > > @@ -1652,12 +1662,12 @@
> > vect_compute_single_scalar_iteration_cost
> > > > > > (loop_vec_info loop_vinfo)
> > > > > > >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> > > > > > >  }
> > > > > > >
> > > > > > > -
> > > > > > >  /* Function vect_analyze_loop_form.
> > > > > > >
> > > > > > >     Verify that certain CFG restrictions hold, including:
> > > > > > >     - the loop has a pre-header
> > > > > > > -   - the loop has a single entry and exit
> > > > > > > +   - the loop has a single entry
> > > > > > > +   - nested loops can have only a single exit.
> > > > > > >     - the loop exit condition is simple enough
> > > > > > >     - the number of iterations can be analyzed, i.e, a countable loop.
> > The
> > > > > > >       niter could be analyzed under some assumptions.  */
> > > > > > > @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop
> > *loop,
> > > > > > vect_loop_form_info *info)
> > > > > > >                             |
> > > > > > >                          (exit-bb)  */
> > > > > > >
> > > > > > > -      if (loop->num_nodes != 2)
> > > > > > > -	return opt_result::failure_at (vect_location,
> > > > > > > -				       "not vectorized:"
> > > > > > > -				       " control flow in loop.\n");
> > > > > > > -
> > > > > > >        if (empty_block_p (loop->header))
> > > > > > >  	return opt_result::failure_at (vect_location,
> > > > > > >  				       "not vectorized: empty loop.\n");
> > > > > > > @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop
> > *loop,
> > > > > > vect_loop_form_info *info)
> > > > > > >          dump_printf_loc (MSG_NOTE, vect_location,
> > > > > > >  			 "Considering outer-loop vectorization.\n");
> > > > > > >        info->inner_loop_cond = inner.loop_cond;
> > > > > > > +
> > > > > > > +      if (!single_exit (loop))
> > > > > > > +	return opt_result::failure_at (vect_location,
> > > > > > > +				       "not vectorized: multiple
> > exits.\n");
> > > > > > > +
> > > > > > >      }
> > > > > > >
> > > > > > > -  if (!single_exit (loop))
> > > > > > > -    return opt_result::failure_at (vect_location,
> > > > > > > -				   "not vectorized: multiple exits.\n");
> > > > > > >    if (EDGE_COUNT (loop->header->preds) != 2)
> > > > > > >      return opt_result::failure_at (vect_location,
> > > > > > >  				   "not vectorized:"
> > > > > > > @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop
> > *loop,
> > > > > > vect_loop_form_info *info)
> > > > > > >  				   "not vectorized: latch block not
> > empty.\n");
> > > > > > >
> > > > > > >    /* Make sure the exit is not abnormal.  */
> > > > > > > -  edge e = single_exit (loop);
> > > > > > > -  if (e->flags & EDGE_ABNORMAL)
> > > > > > > -    return opt_result::failure_at (vect_location,
> > > > > > > -				   "not vectorized:"
> > > > > > > -				   " abnormal loop exit edge.\n");
> > > > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > > +  edge nexit = loop->vec_loop_iv;
> > > > > > > +  for (edge e : exits)
> > > > > > > +    {
> > > > > > > +      if (e->flags & EDGE_ABNORMAL)
> > > > > > > +	return opt_result::failure_at (vect_location,
> > > > > > > +				       "not vectorized:"
> > > > > > > +				       " abnormal loop exit edge.\n");
> > > > > > > +      /* Early break BB must be after the main exit BB.  In theory we
> > should
> > > > > > > +	 be able to vectorize the inverse order, but the current flow in
> > the
> > > > > > > +	 the vectorizer always assumes you update successor PHI
> > nodes, not
> > > > > > > +	 preds.  */
> > > > > > > +      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src,
> > e-
> > > > > > >src))
> > > > > > > +	return opt_result::failure_at (vect_location,
> > > > > > > +				       "not vectorized:"
> > > > > > > +				       " abnormal loop exit edge
> > order.\n");
> > > >
> > > > "unsupported loop exit order", but I don't understand the comment.
> > > >
> > > > > > > +    }
> > > > > > > +
> > > > > > > +  /* We currently only support early exit loops with known bounds.
> > */
> > > >
> > > > Btw, why's that?  Is that because we don't support the loop-around edge?
> > > > IMHO this is the most serious limitation (and as said above it should be
> > > > trivial to fix).
> > > >
> > > > > > > +  if (exits.length () > 1)
> > > > > > > +    {
> > > > > > > +      class tree_niter_desc niter;
> > > > > > > +      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter,
> > > > NULL)
> > > > > > > +	  || chrec_contains_undetermined (niter.niter)
> > > > > > > +	  || !evolution_function_is_constant_p (niter.niter))
> > > > > > > +	return opt_result::failure_at (vect_location,
> > > > > > > +				       "not vectorized:"
> > > > > > > +				       " early breaks only supported on
> > loops"
> > > > > > > +				       " with known iteration
> > bounds.\n");
> > > > > > > +    }
> > > > > > >
> > > > > > >    info->conds
> > > > > > >      = vect_get_loop_niters (loop, &info->assumptions,
> > > > > > > @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop,
> > > > > > vec_info_shared *shared,
> > > > > > >    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info-
> > > > > > >alt_loop_conds);
> > > > > > >    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
> > > > > > >
> > > > > > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > > > > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > > > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > > > > > +
> > > > > > >    if (info->inner_loop_cond)
> > > > > > >      {
> > > > > > >        stmt_vec_info inner_loop_cond_info
> > > > > > > @@ -3070,7 +3106,8 @@ start_over:
> > > > > > >
> > > > > > >    /* If an epilogue loop is required make sure we can create one.  */
> > > > > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > > > > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > > > > > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > > > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > >      {
> > > > > > >        if (dump_enabled_p ())
> > > > > > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop
> > > > required\n");
> > > > > > > @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction
> > > > (loop_vec_info
> > > > > > loop_vinfo,
> > > > > > >    basic_block exit_bb;
> > > > > > >    tree scalar_dest;
> > > > > > >    tree scalar_type;
> > > > > > > -  gimple *new_phi = NULL, *phi;
> > > > > > > +  gimple *new_phi = NULL, *phi = NULL;
> > > > > > >    gimple_stmt_iterator exit_gsi;
> > > > > > >    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> > > > > > >    gimple *epilog_stmt = NULL;
> > > > > > > @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction
> > > > > > (loop_vec_info loop_vinfo,
> > > > > > >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> > > > > > >  	  reduc_inputs.quick_push (new_def);
> > > > > > >  	}
> > > > > > > +
> > > > > > > +	/* Update the other exits.  */
> > > > > > > +	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > +	  {
> > > > > > > +	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > > > > +	    gphi_iterator gsi, gsi1;
> > > > > > > +	    for (edge exit : alt_exits)
> > > > > > > +	      {
> > > > > > > +		/* Find the phi node to propaget into the exit block for
> > each
> > > > > > > +		   exit edge.  */
> > > > > > > +		for (gsi = gsi_start_phis (exit_bb),
> > > > > > > +		     gsi1 = gsi_start_phis (exit->src);
> > > >
> > > > exit->src == loop->header, right?  I think this won't work for multiple
> > > > alternate exits.  It's probably easier to do this where we create the
> > > > LC PHI node for the reduction result?
> > > >
> > > > > > > +		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > > > > +		     gsi_next (&gsi), gsi_next (&gsi1))
> > > > > > > +		  {
> > > > > > > +		    /* There really should be a function to just get the
> > number
> > > > > > > +		       of phis inside a bb.  */
> > > > > > > +		    if (phi && phi == gsi.phi ())
> > > > > > > +		      {
> > > > > > > +			gphi *phi1 = gsi1.phi ();
> > > > > > > +			SET_PHI_ARG_DEF (phi, exit->dest_idx,
> > > > > > > +					 PHI_RESULT (phi1));
> > > >
> > > > I think we know the header PHI of a reduction perfectly well, there
> > > > shouldn't be the need to "search" for it.
> > > >
> > > > > > > +			break;
> > > > > > > +		      }
> > > > > > > +		  }
> > > > > > > +	      }
> > > > > > > +	  }
> > > > > > >        gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > > > > > >      }
> > > > > > >
> > > > > > > @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info
> > > > *vinfo,
> > > > > > >  	   new_tree = lane_extract <vec_lhs', ...>;
> > > > > > >  	   lhs' = new_tree;  */
> > > > > > >
> > > > > > > +      /* When vectorizing an early break, any live statement that is
> > used
> > > > > > > +	 outside of the loop are dead.  The loop will never get to them.
> > > > > > > +	 We could change the liveness value during analysis instead
> > but since
> > > > > > > +	 the below code is invalid anyway just ignore it during
> > codegen.  */
> > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > +	return true;
> > > >
> > > > But what about the value that's live across the main exit when the
> > > > epilogue is not entered?
> > > >
> > > > > > > +
> > > > > > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > >        basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > > > >        gcc_assert (single_pred_p (exit_bb));
> > > > > > > @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info
> > > > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > > > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > > > > > >       versioning.   */
> > > > > > >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > > > -  if (! single_pred_p (e->dest))
> > > > > > > +  if (e && ! single_pred_p (e->dest) &&
> > !LOOP_VINFO_EARLY_BREAKS
> > > > > > (loop_vinfo))
> > > >
> > > > e can be NULL here?  I think we should reject such loops earlier.
> > > >
> > > > > > >      {
> > > > > > >        split_loop_exit_edge (e, true);
> > > > > > >        if (dump_enabled_p ())
> > > > > > > @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info
> > > > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > > > >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > > > > > >      {
> > > > > > >        e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > > > > > -      if (! single_pred_p (e->dest))
> > > > > > > +      if (e && ! single_pred_p (e->dest))
> > > > > > >  	{
> > > > > > >  	  split_loop_exit_edge (e, true);
> > > > > > >  	  if (dump_enabled_p ())
> > > > > > > @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info
> > > > > > loop_vinfo, gimple *loop_vectorized_call)
> > > > > > >
> > > > > > >    /* Loops vectorized with a variable factor won't benefit from
> > > > > > >       unrolling/peeling.  */
> > > >
> > > > update the comment?  Why would we unroll a VLA loop with early breaks?
> > > > Or did you mean to use || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)?
> > > >
> > > > > > > -  if (!vf.is_constant ())
> > > > > > > +  if (!vf.is_constant ()
> > > > > > > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > >      {
> > > > > > >        loop->unroll = 1;
> > > > > > >        if (dump_enabled_p ())
> > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > > > index
> > > > > >
> > > >
> > 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82af
> > > > > > e78ee2a7c627be45b 100644
> > > > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > > > @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info
> > > > stmt_info,
> > > > > > loop_vec_info loop_vinfo,
> > > > > > >    *live_p = false;
> > > > > > >
> > > > > > >    /* cond stmt other than loop exit cond.  */
> > > > > > > -  if (is_ctrl_stmt (stmt_info->stmt)
> > > > > > > -      && STMT_VINFO_TYPE (stmt_info) !=
> > loop_exit_ctrl_vec_info_type)
> > > > > > > -    *relevant = vect_used_in_scope;
> > > >
> > > > how was that ever hit before?  For outer loop processing with outer loop
> > > > vectorization?
> > > >
> > > > > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > > > > +    {
> > > > > > > +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge,
> > > > but
> > > > > > > +	 it looks like loop_manip doesn't do that..  So we have to do it
> > > > > > > +	 the hard way.  */
> > > > > > > +      basic_block bb = gimple_bb (stmt_info->stmt);
> > > > > > > +      bool exit_bb = false, early_exit = false;
> > > > > > > +      edge_iterator ei;
> > > > > > > +      edge e;
> > > > > > > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > > > > > > +        if (!flow_bb_inside_loop_p (loop, e->dest))
> > > > > > > +	  {
> > > > > > > +	    exit_bb = true;
> > > > > > > +	    early_exit = loop->vec_loop_iv->src != bb;
> > > > > > > +	    break;
> > > > > > > +	  }
> > > > > > > +
> > > > > > > +      /* We should have processed any exit edge, so an edge not an
> > early
> > > > > > > +	 break must be a loop IV edge.  We need to distinguish
> > between the
> > > > > > > +	 two as we don't want to generate code for the main loop IV.
> > */
> > > > > > > +      if (exit_bb)
> > > > > > > +	{
> > > > > > > +	  if (early_exit)
> > > > > > > +	    *relevant = vect_used_in_scope;
> > > > > > > +	}
> > > >
> > > > I wonder why you can't simply do
> > > >
> > > >          if (is_ctrl_stmt (stmt_info->stmt)
> > > >              && stmt_info->stmt != LOOP_VINFO_COND (loop_info))
> > > >
> > > > ?
> > > >
> > > > > > > +      else if (bb->loop_father == loop)
> > > > > > > +	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;
> > > >
> > > > so for control flow not exiting the loop you can check
> > > > loop_exits_from_bb_p ().
> > > >
> > > > > > > +    }
> > > > > > >
> > > > > > >    /* changing memory.  */
> > > > > > >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> > > > > > > @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info
> > > > stmt_info,
> > > > > > loop_vec_info loop_vinfo,
> > > > > > >  	*relevant = vect_used_in_scope;
> > > > > > >        }
> > > > > > >
> > > > > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > > > > +  auto_bitmap exit_bbs;
> > > > > > > +  for (edge exit : exits)
> > > > > > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > > > > +
> > > > > > >    /* uses outside the loop.  */
> > > > > > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > > > > SSA_OP_DEF)
> > > > > > >      {
> > > > > > > @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info
> > stmt_info,
> > > > > > loop_vec_info loop_vinfo,
> > > > > > >  	      /* We expect all such uses to be in the loop exit phis
> > > > > > >  		 (because of loop closed form)   */
> > > > > > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) ==
> > GIMPLE_PHI);
> > > > > > > -	      gcc_assert (bb == single_exit (loop)->dest);
> > > > > > > +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
> > > >
> > > > That now becomes quite expensive checking already covered by the LC SSA
> > > > verifier so I suggest to simply drop this assert instead.
> > > >
> > > > > > >                *live_p = true;
> > > > > > >  	    }
> > > > > > > @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized
> > > > > > (loop_vec_info loop_vinfo, bool *fatal)
> > > > > > >  	}
> > > > > > >      }
> > > > > > >
> > > > > > > +  /* Ideally this should be in vect_analyze_loop_form but we haven't
> > > > seen all
> > > > > > > +     the conds yet at that point and there's no quick way to retrieve
> > them.
> > > > */
> > > > > > > +  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
> > > > > > > +    return opt_result::failure_at (vect_location,
> > > > > > > +				   "not vectorized:"
> > > > > > > +				   " unsupported control flow in
> > loop.\n");
> > > >
> > > > so we didn't do this before?  But see above where I wondered.  So when
> > > > does this hit with early exits and why can't we check for this in
> > > > vect_verify_loop_form?
> > > >
> > > > > > > +
> > > > > > >    /* 2. Process_worklist */
> > > > > > >    while (worklist.length () > 0)
> > > > > > >      {
> > > > > > > @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized
> > > > > > (loop_vec_info loop_vinfo, bool *fatal)
> > > > > > >  			return res;
> > > > > > >  		    }
> > > > > > >                   }
> > > > > > > +	    }
> > > > > > > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo-
> > >stmt))
> > > > > > > +	    {
> > > > > > > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > > > > > > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) ==
> > tcc_comparison);
> > > > > > > +	      opt_result res
> > > > > > > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > > > > > +			       loop_vinfo, relevant, &worklist, false);
> > > > > > > +	      if (!res)
> > > > > > > +		return res;
> > > > > > > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > > > > > +				loop_vinfo, relevant, &worklist, false);
> > > > > > > +	      if (!res)
> > > > > > > +		return res;
> > > > > > >              }
> > > > > > >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> > > > > > >  	    {
> > > > > > > @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > > > >  			     node_instance, cost_vec);
> > > > > > >        if (!res)
> > > > > > >  	return res;
> > > > > > > -   }
> > > > > > > +    }
> > > > > > > +
> > > > > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > > > > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > > > > > >
> > > > > > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > > > > >      {
> > > > > > >        case vect_internal_def:
> > > > > > > +      case vect_early_exit_def:
> > > > > > >          break;
> > > > > > >
> > > > > > >        case vect_reduction_def:
> > > > > > > @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > > > > >      {
> > > > > > >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > > > > > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > > > > > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > > > > > >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> > > > > > >        *need_to_vectorize = true;
> > > > > > >      }
> > > > > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > > > > index
> > > > > >
> > > >
> > ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b3
> > > > > > 40baff61d18da8e242dd 100644
> > > > > > > --- a/gcc/tree-vectorizer.h
> > > > > > > +++ b/gcc/tree-vectorizer.h
> > > > > > > @@ -63,6 +63,7 @@ enum vect_def_type {
> > > > > > >    vect_internal_def,
> > > > > > >    vect_induction_def,
> > > > > > >    vect_reduction_def,
> > > > > > > +  vect_early_exit_def,
> > > >
> > > > can you avoid putting this inbetween reduction and double reduction
> > > > please?  Just put it before vect_unknown_def_type.  In fact the COND
> > > > isn't a def ... maybe we should have pattern recogized
> > > >
> > > >  if (a < b) exit;
> > > >
> > > > as
> > > >
> > > >  cond = a < b;
> > > >  if (cond != 0) exit;
> > > >
> > > > so the part that we need to vectorize is more clear.
> > > >
> > > > > > >    vect_double_reduction_def,
> > > > > > >    vect_nested_cycle,
> > > > > > >    vect_first_order_recurrence,
> > > > > > > @@ -876,6 +877,13 @@ public:
> > > > > > >       we need to peel off iterations at the end to form an epilogue loop.
> > */
> > > > > > >    bool peeling_for_niter;
> > > > > > >
> > > > > > > +  /* When the loop has early breaks that we can vectorize we need to
> > > > peel
> > > > > > > +     the loop for the break finding loop.  */
> > > > > > > +  bool early_breaks;
> > > > > > > +
> > > > > > > +  /* When the loop has a non-early break control flow inside.  */
> > > > > > > +  bool non_break_control_flow;
> > > > > > > +
> > > > > > >    /* List of loop additional IV conditionals found in the loop.  */
> > > > > > >    auto_vec<gcond *> conds;
> > > > > > >
> > > > > > > @@ -985,9 +993,11 @@ public:
> > > > > > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)-
> > >reduction_chains
> > > > > > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)-
> > >peeling_for_gaps
> > > > > > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)-
> > >peeling_for_niter
> > > > > > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > > > > > >  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> > > > > > >early_break_conflict
> > > > > > >  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)-
> > > > >early_break_dest_bb
> > > > > > >  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)-
> > >early_break_vuses
> > > > > > > +#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)-
> > > > > > >non_break_control_flow
> > > > > > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > > > > > >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > > > > > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > > > > > >no_data_dependencies
> > > > > > > @@ -1038,8 +1048,8 @@ public:
> > > > > > >     stack.  */
> > > > > > >  typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
> > > > > > >
> > > > > > > -inline loop_vec_info
> > > > > > > -loop_vec_info_for_loop (class loop *loop)
> > > > > > > +static inline loop_vec_info
> > > > > > > +loop_vec_info_for_loop (const class loop *loop)
> > > > > > >  {
> > > > > > >    return (loop_vec_info) loop->aux;
> > > > > > >  }
> > > > > > > @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
> > > > > > >  {
> > > > > > >    if (bb == (bb->loop_father)->header)
> > > > > > >      return true;
> > > > > > > -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> > > > > > > +
> > > > > > >    return false;
> > > > > > >  }
> > > > > > >
> > > > > > > @@ -2176,9 +2186,10 @@ class auto_purge_vect_location
> > > > > > >     in tree-vect-loop-manip.cc.  */
> > > > > > >  extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > > > > >  				     tree, tree, tree, bool);
> > > > > > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> > > > const_edge);
> > > > > > > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> > > > > > const_edge);
> > > > > > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > > > > > > -						     class loop *, edge);
> > > > > > > +						    class loop *, edge,
> > bool,
> > > > > > > +						    vec<basic_block> *
> > = NULL);
> > > > > > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);
> > > > > > >  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > > > > > >  				    tree *, tree *, tree *, int, bool, bool,
> > > > > > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > > > > > index
> > > > > >
> > > >
> > a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c
> > > > > > 67f29f5dc9a8d72235f4 100644
> > > > > > > --- a/gcc/tree-vectorizer.cc
> > > > > > > +++ b/gcc/tree-vectorizer.cc
> > > > > > > @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
> > > > > > >  	 predicates that need to be shared for optimal predicate usage.
> > > > > > >  	 However reassoc will re-order them and prevent CSE from
> > working
> > > > > > >  	 as it should.  CSE only the loop body, not the entry.  */
> > > > > > > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > > > > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > >
> > > > seeing this more and more I think we want a simple way to iterate over
> > > > all exits without copying to a vector when we have them recorded.  My
> > > > C++ fu is too limited to support
> > > >
> > > >   for (auto exit : recorded_exits (loop))
> > > >     ...
> > > >
> > > > (maybe that's enough for somebody to jump onto this ;))
> > > >
> > > > Don't treat all review comments as change orders, but it should be clear
> > > > the code isn't 100% obvious.  Maybe the patch can be simplified by
> > > > splitting out the LC SSA cleanup parts.
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > > > +      for (edge exit : exits)
> > > > > > > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > > > >
> > > > > > >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> > > > > > >        do_rpo_vn (fun, entry, exit_bbs);
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > Nuernberg,
> > > > > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > > > > > Moerman;
> > > > > > HRB 36809 (AG Nuernberg)
> > > > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
  2023-07-14 13:34       ` Richard Biener
  2023-07-17 10:56         ` Tamar Christina
  2023-08-18 11:35         ` Tamar Christina
@ 2023-10-23 20:21         ` Tamar Christina
  2 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-10-23 20:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, July 14, 2023 2:35 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Thu, 13 Jul 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Thursday, July 13, 2023 6:31 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> jlaw@ventanamicro.com
> > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > > updates for early break.
> > >
> > > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > > The rewrite also naturally takes into account multiple exits and so it didn't
> > > > make sense to split them off.
> > > >
> > > > For the purposes of peeling the only change for multiple exits is that the
> > > > secondary exits are all wired to the start of the new loop preheader when
> > > doing
> > > > epilogue peeling.
> > > >
> > > > When doing prologue peeling the CFG is kept in tact.
> > > >
> > > > For both epilogue and prologue peeling we wire through between the
> two
> > > loops any
> > > > PHI nodes that escape the first loop into the second loop if flow_loops is
> > > > specified.  The reason for this conditionality is because
> > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
> > > >   - prologue peeling
> > > >   - epilogue peeling
> > > >   - loop distribution
> > > >
> > > > for the last case the loops should remain independent, and so not be
> > > connected.
> > > > Because of this propagation of only used phi nodes get_current_def can
> be
> > > used
> > > > to easily find the previous definitions.  However live statements that are
> > > > not used inside the loop itself are not propagated (since if unused, the
> > > moment
> > > > we add the guard in between the two loops the value across the bypass
> edge
> > > can
> > > > be wrong if the loop has been peeled.)
> > > >
> > > > This is dealt with easily enough in find_guard_arg.
> > > >
> > > > For multiple exits, while we are in LCSSA form, and have a correct DOM
> tree,
> > > the
> > > > moment we add the guard block we will change the dominators again.  To
> > > deal with
> > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the
> blocks
> > > to
> > > > update without having to recompute the list of blocks to update again.
> > > >
> > > > When multiple exits and doing epilogue peeling we will also temporarily
> have
> > > an
> > > > incorrect VUSES chain for the secondary exits as it anticipates the final
> result
> > > > after the VDEFs have been moved.  This will thus be corrected once the
> code
> > > > motion is applied.
> > > >
> > > > Lastly by doing things this way we can remove the helper functions that
> > > > previously did lock step iterations to update things as it went along.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > Not sure if I get through all of this in one go - so be prepared that
> > > the rest of the review follows another day.
> >
> > No worries, I appreciate the reviews!
> > Just giving some quick replies for when you continue.
> 
> Continueing.
> 
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* tree-loop-distribution.cc (copy_loop_before): Pass flow_loops =
> > > false.
> > > > 	* tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
> > > > 	* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > > additional
> > > > 	assert.
> > > > 	(vect_set_loop_condition_normal): Skip modifying loop IV for multiple
> > > > 	exits.
> > > > 	(slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> > > peeling.
> > > > 	(slpeel_can_duplicate_loop_p): Likewise.
> > > > 	(vect_update_ivs_after_vectorizer): Don't enter this...
> > > > 	(vect_update_ivs_after_early_break): ...but instead enter here.
> > > > 	(find_guard_arg): Update for new peeling code.
> > > > 	(slpeel_update_phi_nodes_for_loops): Remove.
> > > > 	(slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> > > checks.
> > > > 	(slpeel_update_phi_nodes_for_lcssa): Remove.
> > > > 	(vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > > > 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > > > 	non_break_control_flow and early_breaks.
> > > > 	(vect_need_peeling_or_partial_vectors_p): Force partial vector if
> > > > 	multiple exits and VLA.
> > > > 	(vect_analyze_loop_form): Support inner loop multiple exits.
> > > > 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > > > 	(vect_create_epilog_for_reduction):  Update live phi nodes.
> > > > 	(vectorizable_live_operation): Ignore live operations in vector loop
> > > > 	when multiple exits.
> > > > 	(vect_transform_loop): Force unrolling for VF loops and multiple exits.
> > > > 	* tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
> > > > 	(vect_mark_stmts_to_be_vectorized): Check for non-exit control flow
> > > and
> > > > 	analyze gcond params.
> > > > 	(vect_analyze_stmt): Support gcond.
> > > > 	* tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
> > > > 	in RPO pass.
> > > > 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > > > 	(LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW):
> > > New.
> > > > 	(loop_vec_info_for_loop): Change to const and static.
> > > > 	(is_loop_header_bb_p): Drop assert.
> > > > 	(slpeel_can_duplicate_loop_p): Update prototype.
> > > > 	(class loop): Add early_breaks and non_break_control_flow.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
> > > > index
> > >
> 97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40
> > > d833a968314a4442b9e 100644
> > > > --- a/gcc/tree-loop-distribution.cc
> > > > +++ b/gcc/tree-loop-distribution.cc
> > > > @@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool
> > > redirect_lc_phi_defs)
> > > >    edge preheader = loop_preheader_edge (loop);
> > > >
> > > >    initialize_original_copy_tables ();
> > > > -  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
> > > > +  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader,
> > > false);
> > > >    gcc_assert (res != NULL);
> > > >
> > > >    /* When a not last partition is supposed to keep the LC PHIs computed
> > > > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > > > index
> > >
> 5d398b67e68c7076760854119590f18b19c622b6..79686f6c4945b7139ba
> > > 377300430c04b7aeefe6c 100644
> > > > --- a/gcc/tree-ssa-loop-niter.cc
> > > > +++ b/gcc/tree-ssa-loop-niter.cc
> > > > @@ -3072,7 +3072,12 @@ loop_only_exit_p (const class loop *loop,
> > > basic_block *body, const_edge exit)
> > > >    gimple_stmt_iterator bsi;
> > > >    unsigned i;
> > > >
> > > > -  if (exit != single_exit (loop))
> > > > +  /* We need to check for alternative exits since exit can be NULL.  */
> > >
> > > You mean we pass in exit == NULL in some cases?  I'm not sure what
> > > the desired behavior in that case is - can you point out the
> > > callers you are fixing here?
> > >
> > > I think we should add gcc_assert (exit != nullptr)
> > >
> > > >    for (i = 0; i < loop->num_nodes; i++)
> > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > > index
> > >
> 6b93fb3f9af8f2bbdf5dec28f0009177aa5171ab..550d7f40002cf0b58f8a92
> > > 7cb150edd7c2aa9999 100644
> > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > @@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple
> *update_phi,
> > > edge e, tree new_def)
> > > >  {
> > > >    tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e);
> > > >
> > > > +  gcc_assert (TREE_CODE (orig_def) != SSA_NAME
> > > > +	      || orig_def != new_def);
> > > > +
> > > >    SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def);
> > > >
> > > >    if (MAY_HAVE_DEBUG_BIND_STMTS)
> > > > @@ -1292,7 +1295,8 @@ vect_set_loop_condition_normal
> (loop_vec_info
> > > loop_vinfo,
> > > >    gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
> > > >
> > > >    /* Record the number of latch iterations.  */
> > > > -  if (limit == niters)
> > > > +  if (limit == niters
> > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      /* Case A: the loop iterates NITERS times.  Subtract one to get the
> > > >         latch count.  */
> > > >      loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> > > > @@ -1303,7 +1307,13 @@ vect_set_loop_condition_normal
> > > (loop_vec_info loop_vinfo,
> > > >      loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type,
> > > >  				       limit, step);
> > > >
> > > > -  if (final_iv)
> > > > +  /* For multiple exits we've already maintained LCSSA form and handled
> > > > +     the scalar iteration update in the code that deals with the merge
> > > > +     block and its updated guard.  I could move that code here instead
> > > > +     of in vect_update_ivs_after_early_break but I have to still deal
> > > > +     with the updates to the counter `i`.  So for now I'll keep them
> > > > +     together.  */
> > > > +  if (final_iv && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      {
> > > >        gassign *assign;
> > > >        edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > @@ -1509,11 +1519,19 @@ vec_init_exit_info (class loop *loop)
> > > >     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
> > > >     non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy
> the
> > > >     basic blocks from SCALAR_LOOP instead of LOOP, but to either the
> > > > -   entry or exit of LOOP.  */
> > > > +   entry or exit of LOOP.  If FLOW_LOOPS then connect LOOP to
> > > SCALAR_LOOP as a
> > > > +   continuation.  This is correct for cases where one loop continues from
> the
> > > > +   other like in the vectorizer, but not true for uses in e.g. loop
> distribution
> > > > +   where the loop is duplicated and then modified.
> > > > +
> > >
> > > but for loop distribution the flow also continues?  I'm not sure what you
> > > are refering to here.  Do you by chance have a branch with the patches
> > > installed?
> >
> > Yup, they're at refs/users/tnfchris/heads/gcc-14-early-break in the repo.
> >
> > >
> > > > +   If UPDATED_DOMS is not NULL it is update with the list of basic blocks
> > > whoms
> > > > +   dominators were updated during the peeling.  */
> > > >
> > > >  class loop *
> > > >  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop,
> > > > -					class loop *scalar_loop, edge e)
> > > > +					class loop *scalar_loop, edge e,
> > > > +					bool flow_loops,
> > > > +					vec<basic_block> *updated_doms)
> > > >  {
> > > >    class loop *new_loop;
> > > >    basic_block *new_bbs, *bbs, *pbbs;
> > > > @@ -1602,6 +1620,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class
> > > loop *loop,
> > > >    for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++)
> > > >      rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
> > > >
> > > > +  /* Rename the exit uses.  */
> > > > +  for (edge exit : get_loop_exit_edges (new_loop))
> > > > +    for (auto gsi = gsi_start_phis (exit->dest);
> > > > +	 !gsi_end_p (gsi); gsi_next (&gsi))
> > > > +      {
> > > > +	tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit);
> > > > +	rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit));
> > > > +	if (MAY_HAVE_DEBUG_BIND_STMTS)
> > > > +	  adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest);
> > > > +      }
> > > > +
> > > > +  /* This condition happens when the loop has been versioned. e.g. due
> to
> > > ifcvt
> > > > +     versioning the loop.  */
> > > >    if (scalar_loop != loop)
> > > >      {
> > > >        /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs
> from
> > > > @@ -1616,28 +1647,106 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> > > (class loop *loop,
> > > >  						EDGE_SUCC (loop->latch, 0));
> > > >      }
> > > >
> > > > +  vec<edge> alt_exits = loop->vec_loop_alt_exits;
> > >
> > > So 'e' is not one of alt_exits, right?  I wonder if we can simply
> > > compute the vector from all exits of 'loop' and removing 'e'?
> > >
> > > > +  bool multiple_exits_p = !alt_exits.is_empty ();
> > > > +  auto_vec<basic_block> doms;
> > > > +  class loop *update_loop = NULL;
> > > > +
> > > >    if (at_exit) /* Add the loop copy at exit.  */
> > > >      {
> > > > -      if (scalar_loop != loop)
> > > > +      if (scalar_loop != loop && new_exit->dest != exit_dest)
> > > >  	{
> > > > -	  gphi_iterator gsi;
> > > >  	  new_exit = redirect_edge_and_branch (new_exit, exit_dest);
> > > > +	  flush_pending_stmts (new_exit);
> > > > +	}
> > > >
> > > > -	  for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi);
> > > > -	       gsi_next (&gsi))
> > > > +      auto loop_exits = get_loop_exit_edges (loop);
> > > > +      for (edge exit : loop_exits)
> > > > +	redirect_edge_and_branch (exit, new_preheader);
> > > > +
> > > > +
> > >
> > > one line vertical space too much
> > >
> > > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > > +	 block and the new loop header.  This allows us to later split the
> > > > +	 preheader block and still find the right LC nodes.  */
> > > > +      edge latch_new = single_succ_edge (new_preheader);
> > > > +      edge latch_old = loop_latch_edge (loop);
> > > > +      hash_set <tree> lcssa_vars;
> > > > +      for (auto gsi_from = gsi_start_phis (latch_old->dest),
> > >
> > > so that's loop->header (and makes it more clear which PHI nodes you are
> > > looking at)
> 
> 
> So I'm now in a debug session - I think that conceptually it would
> make more sense to create the LC PHI nodes that are present at the
> old exit destination in the new preheader _before_ you redirect them
> above and then flush_pending_stmts after redirecting, that should deal
> with the copying.
> 
> Now, your copying actually iterates over all PHIs in the loop _header_,
> so it doesn't actually copy LC PHI nodes but possibly creates additional
> ones.  The intent does seem to do this since you want a different value
> on those edges for all but the main loop exit.  But then the
> overall comments should better reflect that and maybe you should
> do what I suggested anyway and have this loop alter only the alternate
> exit LC PHIs?
> 
> If you don't flush_pending_stmts on an edge after redirecting you
> should call redirect_edge_var_map_clear (edge), otherwise the stale
> info might break things later.
> 
> > > > +	   gsi_to = gsi_start_phis (latch_new->dest);
> > >
> > > likewise new_loop->header
> > >
> > > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > > +	{
> > > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, latch_old);
> > > > +	  /* In all cases, even in early break situations we're only
> > > > +	     interested in the number of fully executed loop iters.  As such
> > > > +	     we discard any partially done iteration.  So we simply propagate
> > > > +	     the phi nodes from the latch to the merge block.  */
> > > > +	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> > > > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > > > +
> > > > +	  lcssa_vars.add (new_arg);
> > > > +
> > > > +	  /* Main loop exit should use the final iter value.  */
> > > > +	  add_phi_arg (lcssa_phi, new_arg, loop->vec_loop_iv,
> > > UNKNOWN_LOCATION);
> > >
> > > above you are creating the PHI node at e->dest but here add the PHI arg to
> > > loop->vec_loop_iv - that's 'e' here, no?  Consistency makes it easier
> > > to follow.  I _think_ this code doesn't need to know about the "special"
> > > edge.
> > >
> > > > +
> > > > +	  /* All other exits use the previous iters.  */
> > > > +	  for (edge e : alt_exits)
> > > > +	    add_phi_arg (lcssa_phi, gimple_phi_result (from_phi), e,
> > > > +			 UNKNOWN_LOCATION);
> > > > +
> > > > +	  adjust_phi_and_debug_stmts (to_phi, latch_new, new_res);
> > > > +	}
> > > > +
> > > > +      /* Copy over any live SSA vars that may not have been materialized in
> > > the
> > > > +	 loops themselves but would be in the exit block.  However when the
> > > live
> > > > +	 value is not used inside the loop then we don't need to do this,  if we
> > > do
> > > > +	 then when we split the guard block the branch edge can end up
> > > containing the
> > > > +	 wrong reference,  particularly if it shares an edge with something that
> > > has
> > > > +	 bypassed the loop.  This is not something peeling can check so we
> > > need to
> > > > +	 anticipate the usage of the live variable here.  */
> > > > +      auto exit_map = redirect_edge_var_map_vector (exit);
> > >
> > > Hmm, did I use that in my attemt to refactor things? ...
> >
> > Indeed, I didn't always use it, but found it was the best way to deal with the
> > variables being live in various BB after the loop.
> 
> As said this whole piece of code is possibly more complicated than
> necessary.  First copying/creating the PHI nodes that are present
> at the exit (the old LC PHI nodes), then redirecting edges and flushing
> stmts should deal with half of this.
> 
> > >
> > > > +      if (exit_map)
> > > > +        for (auto vm : exit_map)
> > > > +	{
> > > > +	  if (lcssa_vars.contains (vm.def)
> > > > +	      || TREE_CODE (vm.def) != SSA_NAME)
> > >
> > > the latter check is cheaper so it should come first
> > >
> > > > +	    continue;
> > > > +
> > > > +	  imm_use_iterator imm_iter;
> > > > +	  use_operand_p use_p;
> > > > +	  bool use_in_loop = false;
> > > > +
> > > > +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def)
> > > >  	    {
> > > > -	      gphi *phi = gsi.phi ();
> > > > -	      tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> > > > -	      location_t orig_locus
> > > > -		= gimple_phi_arg_location_from_edge (phi, e);
> > > > +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> > > > +	      if (flow_bb_inside_loop_p (loop, bb)
> > > > +		  && !gimple_vuse (USE_STMT (use_p)))
> 
> what's this gimple_vuse check?  I see now for vect-early-break_17.c this
> code triggers and ignores
> 
>   vect_b[i_18] = _2;
> 
> > > > +		{
> > > > +		  use_in_loop = true;
> > > > +		  break;
> > > > +		}
> > > > +	    }
> > > >
> > > > -	      add_phi_arg (phi, orig_arg, new_exit, orig_locus);
> > > > +	  if (!use_in_loop)
> > > > +	    {
> > > > +	       /* Do a final check to see if it's perhaps defined in the loop.  This
> > > > +		  mirrors the relevancy analysis's used_outside_scope.  */
> > > > +	      gimple *stmt = SSA_NAME_DEF_STMT (vm.def);
> > > > +	      if (!stmt || !flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
> > > > +		continue;
> > > >  	    }
> 
> since the def was on a LC PHI the def should always be defined inside the
> loop.
> 
> > > > +
> > > > +	  tree new_res = copy_ssa_name (vm.result);
> > > > +	  gphi *lcssa_phi = create_phi_node (new_res, e->dest);
> > > > +	  for (edge exit : loop_exits)
> > > > +	     add_phi_arg (lcssa_phi, vm.def, exit, vm.locus);
> > >
> > > not sure what you are doing above - I guess I have to play with it
> > > in a debug session.
> >
> > Yeah if you comment it out one of the testcases should fail.
> 
> using new_preheader instead of e->dest would make things clearer.
> 
> You are now adding the same arg to every exit (you've just queried the
> main exit redirect_edge_var_map_vector).
> 
> OK, so I think I understand what you're doing.  If I understand
> correctly we know that when we exit the main loop via one of the
> early exits we are definitely going to enter the epilog but when
> we take the main exit we might not.
> 
> Looking at the CFG we create currently this isn't reflected and
> this complicates this PHI node updating.  What I'd try to do
> is leave redirecting the alternate exits until after
> slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> means leaving it almost unchanged besides the LC SSA maintaining
> changes.  After that for the multi-exit case split the
> epilog preheader edge and redirect all the alternate exits to the
> new preheader.  So the CFG becomes
> 
>                  <original loop>
>                 /      |
>                /    <main exit w/ original LC PHI>
>               /      if (epilog)
>    alt exits /        /  \
>             /        /    loop around
>             |       /
>            preheader with "header" PHIs
>               |
>           <epilog>
> 
> note you need the header PHIs also on the main exit path but you
> only need the loop end PHIs there.
> 
> It seems so that at least currently the order of things makes
> them more complicated than necessary.
> 

Hi,

I'm re-spinning this particular change and would like some clarification.

I assume the reason you prefer flow is because the updates on the "normal" pre-header remains
the same as today. And the alternate exits are easy since the iteration count is just VF.

I think that should be easy enough to do and I guess this makes it easier since I can re-use
vect_update_ivs_after_vectorizer as is. Or at the very least needs only a small update.

What I require some advice on is how to handle the alt_exits in order to create `preheader with "header" PHIs`.

With the last refactoring we've started using redirect_edge_and_branch to redirect the exits. But for this to work
all the exits need to have the PHI nodes in the same order.  In the above scheme the alt exits need to.

However every exit can have varying amount of PHI nodes due to live values.

Consider:

#define N 1024
unsigned vect_a[N];
unsigned vect_b[N];
 
unsigned test4(unsigned x, unsigned y)
{
 unsigned ret = 0;
 unsigned sum = 0;
 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i] > x)
     return vect_a[i];
  
  vect_b[i] += x + i;
  if (vect_a[i] > y)
      return sum;
  sum += vect_a[i];
  vect_a[i] = x;
 }
 return ret;
}

The CFG for this looks like https://tinyurl.com/2p9p2he8

The first exit contains: _10, .MEM
The second exit contains: .MEM, sum
The main exit contains: .MEM

This goes wrong when using redirect_edge_and_branch because alt_exit1 and alt_exit2 have the nodes in different orders.

This patch deals with it by using the rename map to recreate the PHI nodes but I'm not sure how to get this to work with the
redirect_edge_and_branch approach.

Cheers,
Tamar
> > >
> > > >  	}
> > > > -      redirect_edge_and_branch_force (e, new_preheader);
> > > > -      flush_pending_stmts (e);
> > > > +
> > > >        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e-
> >src);
> > > > -      if (was_imm_dom || duplicate_outer_loop)
> > > > +
> > > > +      if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p)
> > > >  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit-
> > > >src);
> > > >
> > > >        /* And remove the non-necessary forwarder again.  Keep the other
> > > > @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class
> > > loop *loop,
> > > >        delete_basic_block (preheader);
> > > >        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
> > > >  			       loop_preheader_edge (scalar_loop)->src);
> > > > +
> > > > +      /* Finally after wiring the new epilogue we need to update its main
> exit
> > > > +	 to the original function exit we recorded.  Other exits are already
> > > > +	 correct.  */
> > > > +      if (multiple_exits_p)
> > > > +	{
> > > > +	  for (edge e : get_loop_exit_edges (loop))
> > > > +	    doms.safe_push (e->dest);
> > > > +	  update_loop = new_loop;
> > > > +	  doms.safe_push (exit_dest);
> > > > +
> > > > +	  /* Likely a fall-through edge, so update if needed.  */
> > > > +	  if (single_succ_p (exit_dest))
> > > > +	    doms.safe_push (single_succ (exit_dest));
> > > > +	}
> > > >      }
> > > >    else /* Add the copy at entry.  */
> > > >      {
> > > > +      /* Copy the current loop LC PHI nodes between the original loop exit
> > > > +	 block and the new loop header.  This allows us to later split the
> > > > +	 preheader block and still find the right LC nodes.  */
> > > > +      edge old_latch_loop = loop_latch_edge (loop);
> > > > +      edge old_latch_init = loop_preheader_edge (loop);
> > > > +      edge new_latch_loop = loop_latch_edge (new_loop);
> > > > +      edge new_latch_init = loop_preheader_edge (new_loop);
> > > > +      for (auto gsi_from = gsi_start_phis (new_latch_init->dest),
> > >
> > > see above
> > >
> > > > +	   gsi_to = gsi_start_phis (old_latch_loop->dest);
> > > > +	   flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> > > > +	   gsi_next (&gsi_from), gsi_next (&gsi_to))
> > > > +	{
> > > > +	  gimple *from_phi = gsi_stmt (gsi_from);
> > > > +	  gimple *to_phi = gsi_stmt (gsi_to);
> > > > +	  tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> > > new_latch_loop);
> > > > +	  adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg);
> > > > +	}
> > > > +
> > > >        if (scalar_loop != loop)
> > > >  	{
> > > >  	  /* Remove the non-necessary forwarder of scalar_loop again.  */
> > > > @@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg
> (class
> > > loop *loop,
> > > >        delete_basic_block (new_preheader);
> > > >        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
> > > >  			       loop_preheader_edge (new_loop)->src);
> > > > +
> > > > +      if (multiple_exits_p)
> > > > +	update_loop = loop;
> > > >      }
> > > >
> > > > -  if (scalar_loop != loop)
> > > > +  if (multiple_exits_p)
> > > >      {
> > > > -      /* Update new_loop->header PHIs, so that on the preheader
> > > > -	 edge they are the ones from loop rather than scalar_loop.  */
> > > > -      gphi_iterator gsi_orig, gsi_new;
> > > > -      edge orig_e = loop_preheader_edge (loop);
> > > > -      edge new_e = loop_preheader_edge (new_loop);
> > > > -
> > > > -      for (gsi_orig = gsi_start_phis (loop->header),
> > > > -	   gsi_new = gsi_start_phis (new_loop->header);
> > > > -	   !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new);
> > > > -	   gsi_next (&gsi_orig), gsi_next (&gsi_new))
> > > > +      for (edge e : get_loop_exit_edges (update_loop))
> > > >  	{
> > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > -	  gphi *new_phi = gsi_new.phi ();
> > > > -	  tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e);
> > > > -	  location_t orig_locus
> > > > -	    = gimple_phi_arg_location_from_edge (orig_phi, orig_e);
> > > > -
> > > > -	  add_phi_arg (new_phi, orig_arg, new_e, orig_locus);
> > > > +	  edge ex;
> > > > +	  edge_iterator ei;
> > > > +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> > > > +	    {
> > > > +	      /* Find the first non-fallthrough block as fall-throughs can't
> > > > +		 dominate other blocks.  */
> > > > +	      while ((ex->flags & EDGE_FALLTHRU)
> 
> For the prologue peeling any early exit we take would skip all other
> loops so we can simply leave them and their LC PHI nodes in place.
> We need extra PHIs only on the path to the main vector loop.  I
> think the comment isn't accurately reflecting what we do.  In
> fact we do not add any LC PHI nodes here but simply adjust the
> main loop header PHI arguments?
> 
> > > I don't think EDGE_FALLTHRU is set correctly, what's wrong with
> > > just using single_succ_p here?  A fallthru edge src dominates the
> > > fallthru edge dest, so the sentence above doesn't make sense.
> >
> > I wanted to say, that the immediate dominator of a block is never
> > an fall through block.  At least from what I understood from how
> > the dominators are calculated in the code, though may have missed
> > something.
> 
>  BB1
>   |
>  BB2
>   |
>  BB3
> 
> here the immediate dominator of BB3 is BB2 and that of BB2 is BB1.
> 
> > >
> > > > +		     && single_succ_p (ex->dest))
> > > > +		{
> > > > +		  doms.safe_push (ex->dest);
> > > > +		  ex = single_succ_edge (ex->dest);
> > > > +		}
> > > > +	      doms.safe_push (ex->dest);
> > > > +	    }
> > > > +	  doms.safe_push (e->dest);
> > > >  	}
> > > > -    }
> > > >
> > > > +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > +      if (updated_doms)
> > > > +	updated_doms->safe_splice (doms);
> > > > +    }
> > > >    free (new_bbs);
> > > >    free (bbs);
> > > >
> > > > @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const
> > > loop_vec_info loop_vinfo, const_edge e)
> > > >    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> > > >    unsigned int num_bb = loop->inner? 5 : 2;
> > > >
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length ();
> > > > +
> > >
> > > I think checking the number of BBs is odd, I don't remember anything
> > > in slpeel is specifically tied to that?  I think we can simply drop
> > > this or do you remember anything that would depend on ->num_nodes
> > > being only exactly 5 or 2?
> >
> > Never actually seemed to require it, but they're used as some check to
> > see if there are unexpected control flow in the loop.
> >
> > i.e. this would say no if you have an if statement in the loop that wasn't
> > converted.  The other part of this and the accompanying explanation is in
> > vect_analyze_loop_form.  In the patch series I had to remove the hard
> > num_nodes == 2 check from there because number of nodes restricted
> > things too much.  If you have an empty fall through block, which seems to
> > happen often between the main exit and the latch block then we'd not
> > vectorize.
> >
> > So instead I now rejects loops after analyzing the gcond.  So think this check
> > can go/needs to be different.
> 
> Lets remove it from this function then.
> 
> > >
> > > >    /* All loops have an outer scope; the only case loop->outer is NULL is for
> > > >       the function itself.  */
> > > >    if (!loop_outer (loop)
> > > > @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer
> > > (loop_vec_info loop_vinfo,
> > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > >    basic_block update_bb = update_e->dest;
> > > >
> > > > +  /* For early exits we'll update the IVs in
> > > > +     vect_update_ivs_after_early_break.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    return;
> > > > +
> > > >    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > >
> > > >    /* Make sure there exists a single-predecessor exit bb:  */
> > > > @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer
> > > (loop_vec_info loop_vinfo,
> > > >        /* Fix phi expressions in the successor bb.  */
> > > >        adjust_phi_and_debug_stmts (phi1, update_e, ni_name);
> > > >      }
> > > > +  return;
> > >
> > > we don't usually place a return at the end of void functions
> > >
> > > > +}
> > > > +
> > > > +/*   Function vect_update_ivs_after_early_break.
> > > > +
> > > > +     "Advance" the induction variables of LOOP to the value they should
> take
> > > > +     after the execution of LOOP.  This is currently necessary because the
> > > > +     vectorizer does not handle induction variables that are used after the
> > > > +     loop.  Such a situation occurs when the last iterations of LOOP are
> > > > +     peeled, because of the early exit.  With an early exit we always peel
> the
> > > > +     loop.
> > > > +
> > > > +     Input:
> > > > +     - LOOP_VINFO - a loop info structure for the loop that is going to be
> > > > +		    vectorized. The last few iterations of LOOP were peeled.
> > > > +     - LOOP - a loop that is going to be vectorized. The last few iterations
> > > > +	      of LOOP were peeled.
> > > > +     - VF - The loop vectorization factor.
> > > > +     - NITERS_ORIG - the number of iterations that LOOP executes (before
> it is
> > > > +		     vectorized). i.e, the number of times the ivs should be
> > > > +		     bumped.
> > > > +     - NITERS_VECTOR - The number of iterations that the vector LOOP
> > > executes.
> > > > +     - UPDATE_E - a successor edge of LOOP->exit that is on the (only)
> path
> > > > +		  coming out from LOOP on which there are uses of the LOOP
> > > ivs
> > > > +		  (this is the path from LOOP->exit to epilog_loop->preheader).
> > > > +
> > > > +		  The new definitions of the ivs are placed in LOOP->exit.
> > > > +		  The phi args associated with the edge UPDATE_E in the bb
> > > > +		  UPDATE_E->dest are updated accordingly.
> > > > +
> > > > +     Output:
> > > > +       - If available, the LCSSA phi node for the loop IV temp.
> > > > +
> > > > +     Assumption 1: Like the rest of the vectorizer, this function assumes
> > > > +     a single loop exit that has a single predecessor.
> > > > +
> > > > +     Assumption 2: The phi nodes in the LOOP header and in update_bb
> are
> > > > +     organized in the same order.
> > > > +
> > > > +     Assumption 3: The access function of the ivs is simple enough (see
> > > > +     vect_can_advance_ivs_p).  This assumption will be relaxed in the
> future.
> > > > +
> > > > +     Assumption 4: Exactly one of the successors of LOOP exit-bb is on a
> path
> > > > +     coming out of LOOP on which the ivs of LOOP are used (this is the
> path
> > > > +     that leads to the epilog loop; other paths skip the epilog loop).  This
> > > > +     path starts with the edge UPDATE_E, and its destination (denoted
> > > update_bb)
> > > > +     needs to have its phis updated.
> > > > + */
> > > > +
> > > > +static tree
> > > > +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class
> loop *
> > > epilog,
> > > > +				   poly_int64 vf, tree niters_orig,
> > > > +				   tree niters_vector, edge update_e)
> > > > +{
> > > > +  if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    return NULL;
> > > > +
> > > > +  gphi_iterator gsi, gsi1;
> > > > +  tree ni_name, ivtmp = NULL;
> > > > +  basic_block update_bb = update_e->dest;
> > > > +  vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > +  edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > +  basic_block exit_bb = loop_iv->dest;
> > > > +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > +  gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo);
> > > > +
> > > > +  gcc_assert (cond);
> > > > +
> > > > +  for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> (update_bb);
> > > > +       !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > +       gsi_next (&gsi), gsi_next (&gsi1))
> > > > +    {
> > > > +      tree init_expr, final_expr, step_expr;
> > > > +      tree type;
> > > > +      tree var, ni, off;
> > > > +      gimple_stmt_iterator last_gsi;
> > > > +
> > > > +      gphi *phi = gsi1.phi ();
> > > > +      tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi,
> loop_preheader_edge
> > > (epilog));
> > >
> > > I'm confused about the setup.  update_bb looks like the block with the
> > > loop-closed PHI nodes of 'loop' and the exit (update_e)?  How does
> > > loop_preheader_edge (epilog) come into play here?  That would feed into
> > > epilog->header PHIs?!
> >
> > We can't query the type of the phis in the block with the LC PHI nodes, so the
> > Typical pattern seems to be that we iterate over a block that's part of the
> loop
> > and that would have the PHIs in the same order, just so we can get to the
> > stmt_vec_info.
> >
> > >
> > > It would be nice to name 'gsi[1]', 'update_e' and 'update_bb' in a
> > > better way?  Is update_bb really epilog->header?!
> > >
> > > We're missing checking in PHI_ARG_DEF_FROM_EDGE, namely that
> > > E->dest == gimple_bb (PHI) - we're just using E->dest_idx there
> > > which "works" even for totally unrelated edges.
> > >
> > > > +      gphi *phi1 = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (phi_ssa));
> > > > +      if (!phi1)
> > >
> > > shouldn't that be an assert?
> > >
> > > > +	continue;
> > > > +      stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ());
> > > > +      if (dump_enabled_p ())
> > > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > > +			 "vect_update_ivs_after_early_break: phi: %G",
> > > > +			 (gimple *)phi);
> > > > +
> > > > +      /* Skip reduction and virtual phis.  */
> > > > +      if (!iv_phi_p (phi_info))
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > > +			     "reduc or virtual phi. skip.\n");
> > > > +	  continue;
> > > > +	}
> > > > +
> > > > +      /* For multiple exits where we handle early exits we need to carry on
> > > > +	 with the previous IV as loop iteration was not done because we exited
> > > > +	 early.  As such just grab the original IV.  */
> > > > +      phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge
> > > (loop));
> > >
> > > but this should be taken care of by LC SSA?
> >
> > It is, the comment is probably missing details, this part just scales the
> counter
> > from VF to scalar counts.  It's just a reminder that this scaling is done
> differently
> > from normal single exit vectorization.
> >
> > >
> > > OK, have to continue tomorrow from here.
> >
> > Cheers, Thank you!
> >
> > Tamar
> >
> > >
> > > Richard.
> > >
> > > > +      if (gimple_cond_lhs (cond) != phi_ssa
> > > > +	  && gimple_cond_rhs (cond) != phi_ssa)
> 
> so this is a way to avoid touching the main IV?  Looks a bit fragile to
> me.  Hmm, we're iterating over the main loop header PHIs here?
> Can't you check, say, the relevancy of the PHI node instead?  Though
> it might also be used as induction.  Can't it be used as alternate
> exit like
> 
>   for (i)
>    {
>      if (i & bit)
>        break;
>    }
> 
> and would we need to adjust 'i' then?
> 
> > > > +	{
> > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > +	  step_expr = unshare_expr (step_expr);
> > > > +
> > > > +	  /* We previously generated the new merged phi in the same BB as
> > > the
> > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > +	     normal loop phi which don't take the early breaks into account.  */
> > > > +	  final_expr = gimple_phi_result (phi1);
> > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (),
> > > loop_preheader_edge (loop));
> > > > +
> > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > +	  /* For early break the final loop IV is:
> > > > +	     init + (final - init) * vf which takes into account peeling
> > > > +	     values and non-single steps.  */
> > > > +	  off = fold_build2 (MINUS_EXPR, stype,
> > > > +			     fold_convert (stype, final_expr),
> > > > +			     fold_convert (stype, init_expr));
> > > > +	  /* Now adjust for VF to get the final iteration value.  */
> > > > +	  off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf));
> > > > +
> > > > +	  /* Adjust the value with the offset.  */
> > > > +	  if (POINTER_TYPE_P (type))
> > > > +	    ni = fold_build_pointer_plus (init_expr, off);
> > > > +	  else
> > > > +	    ni = fold_convert (type,
> > > > +			       fold_build2 (PLUS_EXPR, stype,
> > > > +					    fold_convert (stype, init_expr),
> > > > +					    off));
> > > > +	  var = create_tmp_var (type, "tmp");
> 
> so how does the non-early break code deal with updating inductions?
> And how do you avoid altering this when we flow in from the normal
> exit?  That is, you are updating the value in the epilog loop
> header but don't you need to instead update the value only on
> the alternate exit edges from the main loop (and keep the not
> updated value on the main exit edge)?
> 
> > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > +	  gimple_seq new_stmts = NULL;
> > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > +	  if (!gsi_end_p (last_gsi))
> > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +	  else
> > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +
> > > > +	  /* Fix phi expressions in the successor bb.  */
> > > > +	  adjust_phi_and_debug_stmts (phi, update_e, ni_name);
> > > > +	}
> > > > +      else
> > > > +	{
> > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > +	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
> > > > +	  step_expr = unshare_expr (step_expr);
> > > > +
> > > > +	  /* We previously generated the new merged phi in the same BB as
> > > the
> > > > +	     guard.  So use that to perform the scaling on rather than the
> > > > +	     normal loop phi which don't take the early breaks into account.  */
> > > > +	  init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge
> > > (loop));
> > > > +	  tree stype = TREE_TYPE (step_expr);
> > > > +
> > > > +	  if (vf.is_constant ())
> > > > +	    {
> > > > +	      ni = fold_build2 (MULT_EXPR, stype,
> > > > +				fold_convert (stype,
> > > > +					      niters_vector),
> > > > +				build_int_cst (stype, vf));
> > > > +
> > > > +	      ni = fold_build2 (MINUS_EXPR, stype,
> > > > +				fold_convert (stype,
> > > > +					      niters_orig),
> > > > +				fold_convert (stype, ni));
> > > > +	    }
> > > > +	  else
> > > > +	    /* If the loop's VF isn't constant then the loop must have been
> > > > +	       masked, so at the end of the loop we know we have finished
> > > > +	       the entire loop and found nothing.  */
> > > > +	    ni = build_zero_cst (stype);
> > > > +
> > > > +	  ni = fold_convert (type, ni);
> > > > +	  /* We don't support variable n in this version yet.  */
> > > > +	  gcc_assert (TREE_CODE (ni) == INTEGER_CST);
> > > > +
> > > > +	  var = create_tmp_var (type, "tmp");
> > > > +
> > > > +	  last_gsi = gsi_last_bb (exit_bb);
> > > > +	  gimple_seq new_stmts = NULL;
> > > > +	  ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > > +	  /* Exit_bb shouldn't be empty.  */
> > > > +	  if (!gsi_end_p (last_gsi))
> > > > +	    gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +	  else
> > > > +	    gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT);
> > > > +
> > > > +	  adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name);
> > > > +
> > > > +	  for (edge exit : alt_exits)
> > > > +	    adjust_phi_and_debug_stmts (phi1, exit,
> > > > +					build_int_cst (TREE_TYPE (step_expr),
> > > > +						       vf));
> > > > +	  ivtmp = gimple_phi_result (phi1);
> > > > +	}
> > > > +    }
> > > > +
> > > > +  return ivtmp;
> > > >  }
> > > >
> > > >  /* Return a gimple value containing the misalignment (measured in
> vector
> > > > @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf
> > > (loop_vec_info loop_vinfo,
> > > >
> > > >  /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP,
> > > >     this function searches for the corresponding lcssa phi node in exit
> > > > -   bb of LOOP.  If it is found, return the phi result; otherwise return
> > > > -   NULL.  */
> > > > +   bb of LOOP following the LCSSA_EDGE to the exit node.  If it is found,
> > > > +   return the phi result; otherwise return NULL.  */
> > > >
> > > >  static tree
> > > >  find_guard_arg (class loop *loop, class loop *epilog
> ATTRIBUTE_UNUSED,
> > > > -		gphi *lcssa_phi)
> > > > +		gphi *lcssa_phi, int lcssa_edge = 0)
> > > >  {
> > > >    gphi_iterator gsi;
> > > >    edge e = loop->vec_loop_iv;
> > > >
> > > > -  gcc_assert (single_pred_p (e->dest));
> > > >    for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
> > > >      {
> > > >        gphi *phi = gsi.phi ();
> > > > -      if (operand_equal_p (PHI_ARG_DEF (phi, 0),
> > > > -			   PHI_ARG_DEF (lcssa_phi, 0), 0))
> > > > -	return PHI_RESULT (phi);
> > > > -    }
> > > > -  return NULL_TREE;
> > > > -}
> > > > -
> > > > -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates
> > > FIRST/SECOND
> > > > -   from SECOND/FIRST and puts it at the original loop's preheader/exit
> > > > -   edge, the two loops are arranged as below:
> > > > -
> > > > -       preheader_a:
> > > > -     first_loop:
> > > > -       header_a:
> > > > -	 i_1 = PHI<i_0, i_2>;
> > > > -	 ...
> > > > -	 i_2 = i_1 + 1;
> > > > -	 if (cond_a)
> > > > -	   goto latch_a;
> > > > -	 else
> > > > -	   goto between_bb;
> > > > -       latch_a:
> > > > -	 goto header_a;
> > > > -
> > > > -       between_bb:
> > > > -	 ;; i_x = PHI<i_2>;   ;; LCSSA phi node to be created for FIRST,
> > > > -
> > > > -     second_loop:
> > > > -       header_b:
> > > > -	 i_3 = PHI<i_0, i_4>; ;; Use of i_0 to be replaced with i_x,
> > > > -				 or with i_2 if no LCSSA phi is created
> > > > -				 under condition of
> > > CREATE_LCSSA_FOR_IV_PHIS.
> > > > -	 ...
> > > > -	 i_4 = i_3 + 1;
> > > > -	 if (cond_b)
> > > > -	   goto latch_b;
> > > > -	 else
> > > > -	   goto exit_bb;
> > > > -       latch_b:
> > > > -	 goto header_b;
> > > > -
> > > > -       exit_bb:
> > > > -
> > > > -   This function creates loop closed SSA for the first loop; update the
> > > > -   second loop's PHI nodes by replacing argument on incoming edge with
> the
> > > > -   result of newly created lcssa PHI nodes.  IF
> CREATE_LCSSA_FOR_IV_PHIS
> > > > -   is false, Loop closed ssa phis will only be created for non-iv phis for
> > > > -   the first loop.
> > > > -
> > > > -   This function assumes exit bb of the first loop is preheader bb of the
> > > > -   second loop, i.e, between_bb in the example code.  With PHIs updated,
> > > > -   the second loop will execute rest iterations of the first.  */
> > > > -
> > > > -static void
> > > > -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo,
> > > > -				   class loop *first, class loop *second,
> > > > -				   bool create_lcssa_for_iv_phis)
> > > > -{
> > > > -  gphi_iterator gsi_update, gsi_orig;
> > > > -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > -
> > > > -  edge first_latch_e = EDGE_SUCC (first->latch, 0);
> > > > -  edge second_preheader_e = loop_preheader_edge (second);
> > > > -  basic_block between_bb = single_exit (first)->dest;
> > > > -
> > > > -  gcc_assert (between_bb == second_preheader_e->src);
> > > > -  gcc_assert (single_pred_p (between_bb) && single_succ_p
> (between_bb));
> > > > -  /* Either the first loop or the second is the loop to be vectorized.  */
> > > > -  gcc_assert (loop == first || loop == second);
> > > > -
> > > > -  for (gsi_orig = gsi_start_phis (first->header),
> > > > -       gsi_update = gsi_start_phis (second->header);
> > > > -       !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update);
> > > > -       gsi_next (&gsi_orig), gsi_next (&gsi_update))
> > > > -    {
> > > > -      gphi *orig_phi = gsi_orig.phi ();
> > > > -      gphi *update_phi = gsi_update.phi ();
> > > > -
> > > > -      tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e);
> > > > -      /* Generate lcssa PHI node for the first loop.  */
> > > > -      gphi *vect_phi = (loop == first) ? orig_phi : update_phi;
> > > > -      stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi);
> > > > -      if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info))
> > > > +      /* Nested loops with multiple exits can have different no# phi node
> > > > +	 arguments between the main loop and epilog as epilog falls to the
> > > > +	 second loop.  */
> > > > +      if (gimple_phi_num_args (phi) > e->dest_idx)
> > > >  	{
> > > > -	  tree new_res = copy_ssa_name (PHI_RESULT (orig_phi));
> > > > -	  gphi *lcssa_phi = create_phi_node (new_res, between_bb);
> > > > -	  add_phi_arg (lcssa_phi, arg, single_exit (first),
> > > UNKNOWN_LOCATION);
> > > > -	  arg = new_res;
> > > > -	}
> > > > -
> > > > -      /* Update PHI node in the second loop by replacing arg on the loop's
> > > > -	 incoming edge.  */
> > > > -      adjust_phi_and_debug_stmts (update_phi, second_preheader_e,
> arg);
> > > > -    }
> > > > -
> > > > -  /* For epilogue peeling we have to make sure to copy all LC PHIs
> > > > -     for correct vectorization of live stmts.  */
> > > > -  if (loop == first)
> > > > -    {
> > > > -      basic_block orig_exit = single_exit (second)->dest;
> > > > -      for (gsi_orig = gsi_start_phis (orig_exit);
> > > > -	   !gsi_end_p (gsi_orig); gsi_next (&gsi_orig))
> > > > -	{
> > > > -	  gphi *orig_phi = gsi_orig.phi ();
> > > > -	  tree orig_arg = PHI_ARG_DEF (orig_phi, 0);
> > > > -	  if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p
> > > (orig_arg))
> > > > -	    continue;
> > > > -
> > > > -	  /* Already created in the above loop.   */
> > > > -	  if (find_guard_arg (first, second, orig_phi))
> > > > +	  tree var = PHI_ARG_DEF (phi, e->dest_idx);
> > > > +	  if (TREE_CODE (var) != SSA_NAME)
> > > >  	    continue;
> > > >
> > > > -	  tree new_res = copy_ssa_name (orig_arg);
> > > > -	  gphi *lcphi = create_phi_node (new_res, between_bb);
> > > > -	  add_phi_arg (lcphi, orig_arg, single_exit (first),
> > > UNKNOWN_LOCATION);
> > > > +	  if (operand_equal_p (get_current_def (var),
> > > > +			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
> > > > +	    return PHI_RESULT (phi);
> > > >  	}
> > > >      }
> > > > +  return NULL_TREE;
> > > >  }
> > > >
> > > >  /* Function slpeel_add_loop_guard adds guard skipping from the
> beginning
> > > > @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2
> (class
> > > loop *loop, class loop *epilog,
> > > >    gcc_assert (single_succ_p (merge_bb));
> > > >    edge e = single_succ_edge (merge_bb);
> > > >    basic_block exit_bb = e->dest;
> > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > -  gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest);
> > > >
> > > >    for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > >      {
> > > >        gphi *update_phi = gsi.phi ();
> > > > -      tree old_arg = PHI_ARG_DEF (update_phi, 0);
> > > > +      tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx);
> > > >
> > > >        tree merge_arg = NULL_TREE;
> > > >
> > > > @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class
> loop
> > > *loop, class loop *epilog,
> > > >        if (!merge_arg)
> > > >  	merge_arg = old_arg;
> > > >
> > > > -      tree guard_arg = find_guard_arg (loop, epilog, update_phi);
> > > > +      tree guard_arg = find_guard_arg (loop, epilog, update_phi, e-
> >dest_idx);
> > > >        /* If the var is live after loop but not a reduction, we simply
> > > >  	 use the old arg.  */
> > > >        if (!guard_arg)
> > > > @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class
> > > loop *loop, class loop *epilog,
> > > >      }
> > > >  }
> > > >
> > > > -/* EPILOG loop is duplicated from the original loop for vectorizing,
> > > > -   the arg of its loop closed ssa PHI needs to be updated.  */
> > > > -
> > > > -static void
> > > > -slpeel_update_phi_nodes_for_lcssa (class loop *epilog)
> > > > -{
> > > > -  gphi_iterator gsi;
> > > > -  basic_block exit_bb = single_exit (epilog)->dest;
> > > > -
> > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > -  edge e = EDGE_PRED (exit_bb, 0);
> > > > -  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > > -    rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e));
> > > > -}
> > > > -
> 
> I wonder if we can still split these changes out to before early break
> vect?
> 
> > > >  /* EPILOGUE_VINFO is an epilogue loop that we now know would need
> to
> > > >     iterate exactly CONST_NITERS times.  Make a final decision about
> > > >     whether the epilogue loop should be used, returning true if so.  */
> > > > @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >      bound_epilog += vf - 1;
> > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > >      bound_epilog += 1;
> > > > +  /* For early breaks the scalar loop needs to execute at most VF times
> > > > +     to find the element that caused the break.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    {
> > > > +      bound_epilog = vf;
> > > > +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> > > > +      vect_epilogues = false;
> > > > +    }
> > > >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > > >    poly_uint64 bound_scalar = bound_epilog;
> > > >
> > > > @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >  				  bound_prolog + bound_epilog)
> > > >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > > >  			 || vect_epilogues));
> > > > +
> > > > +  /* We only support early break vectorization on known bounds at this
> > > time.
> > > > +     This means that if the vector loop can't be entered then we won't
> > > generate
> > > > +     it at all.  So for now force skip_vector off because the additional
> control
> > > > +     flow messes with the BB exits and we've already analyzed them.  */
> > > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS
> (loop_vinfo);
> > > > +
> 
> I think it should be as "easy" as entering the epilog via the block taking
> the regular exit?
> 
> > > >    /* Epilog loop must be executed if the number of iterations for epilog
> > > >       loop is known at compile time, otherwise we need to add a check at
> > > >       the end of vector loop and skip to the end of epilog loop.  */
> > > >    bool skip_epilog = (prolog_peeling < 0
> > > >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > >  		      || !vf.is_constant ());
> > > > -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.
> */
> > > > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > > +  /* PEELING_FOR_GAPS and peeling for early breaks are special because
> > > epilog
> > > > +     loop must be executed.  */
> > > > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      skip_epilog = false;
> > > > -
> > > >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > >    auto_vec<profile_count> original_counts;
> > > >    basic_block *original_bbs = NULL;
> > > > @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >    if (prolog_peeling)
> > > >      {
> > > >        e = loop_preheader_edge (loop);
> > > > -      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e));
> > > > -
> > > > +      gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e));
> > > >        /* Peel prolog and put it on preheader edge of loop.  */
> > > > -      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
> e);
> > > > +      prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop,
> e,
> > > > +						       true);
> > > >        gcc_assert (prolog);
> > > >        prolog->force_vectorize = false;
> > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true);
> > > > +
> > > >        first_loop = prolog;
> > > >        reset_original_copy_tables ();
> > > >
> > > > @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info
> loop_vinfo,
> > > tree niters, tree nitersm1,
> > > >  	 as the transformations mentioned above make less or no sense when
> > > not
> > > >  	 vectorizing.  */
> > > >        epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
> > > > -      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e);
> > > > +      auto_vec<basic_block> doms;
> > > > +      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e,
> true,
> > > > +						       &doms);
> > > >        gcc_assert (epilog);
> > > >
> > > >        epilog->force_vectorize = false;
> > > > -      slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false);
> > > >
> > > >        /* Scalar version loop may be preferred.  In this case, add guard
> > > >  	 and skip to epilog.  Note this only happens when the number of
> > > > @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >        vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > > >  					update_e);
> > > >
> > > > +      /* For early breaks we must create a guard to check how many
> iterations
> > > > +	 of the scalar loop are yet to be performed.  */
> 
> We have this check anyway, no?  In fact don't we know that we always enter
> the epilog (see above)?
> 
> > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	{
> > > > +	  tree ivtmp =
> > > > +	    vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters,
> > > > +					       *niters_vector, update_e);
> > > > +
> > > > +	  gcc_assert (ivtmp);
> > > > +	  tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > +					 fold_convert (TREE_TYPE (niters),
> > > > +						       ivtmp),
> > > > +					 build_zero_cst (TREE_TYPE (niters)));
> > > > +	  basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > +
> > > > +	  /* If we had a fallthrough edge, the guard will the threaded through
> > > > +	     and so we may need to find the actual final edge.  */
> > > > +	  edge final_edge = epilog->vec_loop_iv;
> > > > +	  /* slpeel_update_phi_nodes_for_guard2 expects an empty block in
> > > > +	     between the guard and the exit edge.  It only adds new nodes and
> > > > +	     doesn't update existing one in the current scheme.  */
> > > > +	  basic_block guard_to = split_edge (final_edge);
> > > > +	  edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
> > > guard_to,
> > > > +						guard_bb, prob_epilog.invert
> > > (),
> > > > +						irred_flag);
> > > > +	  doms.safe_push (guard_bb);
> > > > +
> > > > +	  iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> > > > +
> > > > +	  /* We must update all the edges from the new guard_bb.  */
> > > > +	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e,
> > > > +					      final_edge);
> > > > +
> > > > +	  /* If the loop was versioned we'll have an intermediate BB between
> > > > +	     the guard and the exit.  This intermediate block is required
> > > > +	     because in the current scheme of things the guard block phi
> > > > +	     updating can only maintain LCSSA by creating new blocks.  In this
> > > > +	     case we just need to update the uses in this block as well.  */
> > > > +	  if (loop != scalar_loop)
> > > > +	    {
> > > > +	      for (gphi_iterator gsi = gsi_start_phis (guard_to);
> > > > +		   !gsi_end_p (gsi); gsi_next (&gsi))
> > > > +		rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (),
> > > guard_e));
> > > > +	    }
> > > > +
> > > > +	  flush_pending_stmts (guard_e);
> > > > +	}
> > > > +
> > > >        if (skip_epilog)
> > > >  	{
> > > >  	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
> > > > @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> tree
> > > niters, tree nitersm1,
> > > >  	    }
> > > >  	  scale_loop_profile (epilog, prob_epilog, 0);
> > > >  	}
> > > > -      else
> > > > -	slpeel_update_phi_nodes_for_lcssa (epilog);
> > > >
> > > >        unsigned HOST_WIDE_INT bound;
> > > >        if (bound_scalar.is_constant (&bound))
> > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > index
> > >
> b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49
> > > f21421958e7ee25eada 100644
> > > > --- a/gcc/tree-vect-loop.cc
> > > > +++ b/gcc/tree-vect-loop.cc
> > > > @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop
> > > *loop_in, vec_info_shared *shared)
> > > >      partial_load_store_bias (0),
> > > >      peeling_for_gaps (false),
> > > >      peeling_for_niter (false),
> > > > +    early_breaks (false),
> > > > +    non_break_control_flow (false),
> > > >      no_data_dependencies (false),
> > > >      has_mask_store (false),
> > > >      scalar_loop_scaling (profile_probability::uninitialized ()),
> > > > @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p
> > > (loop_vec_info loop_vinfo)
> > > >      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> > > (LOOP_VINFO_ORIG_LOOP_INFO
> > > >  					  (loop_vinfo));
> > > >
> > > > +  /* When we have multiple exits and VF is unknown, we must require
> > > partial
> > > > +     vectors because the loop bounds is not a minimum but a maximum.
> > > That is to
> > > > +     say we cannot unpredicate the main loop unless we peel or use partial
> > > > +     vectors in the epilogue.  */
> > > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > > > +    return true;
> > > > +
> > > >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> > > >      {
> > > > @@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost
> > > (loop_vec_info loop_vinfo)
> > > >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> > > >  }
> > > >
> > > > -
> > > >  /* Function vect_analyze_loop_form.
> > > >
> > > >     Verify that certain CFG restrictions hold, including:
> > > >     - the loop has a pre-header
> > > > -   - the loop has a single entry and exit
> > > > +   - the loop has a single entry
> > > > +   - nested loops can have only a single exit.
> > > >     - the loop exit condition is simple enough
> > > >     - the number of iterations can be analyzed, i.e, a countable loop.  The
> > > >       niter could be analyzed under some assumptions.  */
> > > > @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >                             |
> > > >                          (exit-bb)  */
> > > >
> > > > -      if (loop->num_nodes != 2)
> > > > -	return opt_result::failure_at (vect_location,
> > > > -				       "not vectorized:"
> > > > -				       " control flow in loop.\n");
> > > > -
> > > >        if (empty_block_p (loop->header))
> > > >  	return opt_result::failure_at (vect_location,
> > > >  				       "not vectorized: empty loop.\n");
> > > > @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >          dump_printf_loc (MSG_NOTE, vect_location,
> > > >  			 "Considering outer-loop vectorization.\n");
> > > >        info->inner_loop_cond = inner.loop_cond;
> > > > +
> > > > +      if (!single_exit (loop))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized: multiple exits.\n");
> > > > +
> > > >      }
> > > >
> > > > -  if (!single_exit (loop))
> > > > -    return opt_result::failure_at (vect_location,
> > > > -				   "not vectorized: multiple exits.\n");
> > > >    if (EDGE_COUNT (loop->header->preds) != 2)
> > > >      return opt_result::failure_at (vect_location,
> > > >  				   "not vectorized:"
> > > > @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >  				   "not vectorized: latch block not empty.\n");
> > > >
> > > >    /* Make sure the exit is not abnormal.  */
> > > > -  edge e = single_exit (loop);
> > > > -  if (e->flags & EDGE_ABNORMAL)
> > > > -    return opt_result::failure_at (vect_location,
> > > > -				   "not vectorized:"
> > > > -				   " abnormal loop exit edge.\n");
> > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > +  edge nexit = loop->vec_loop_iv;
> > > > +  for (edge e : exits)
> > > > +    {
> > > > +      if (e->flags & EDGE_ABNORMAL)
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " abnormal loop exit edge.\n");
> > > > +      /* Early break BB must be after the main exit BB.  In theory we should
> > > > +	 be able to vectorize the inverse order, but the current flow in the
> > > > +	 the vectorizer always assumes you update successor PHI nodes, not
> > > > +	 preds.  */
> > > > +      if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e-
> > > >src))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " abnormal loop exit edge order.\n");
> 
> "unsupported loop exit order", but I don't understand the comment.
> 
> > > > +    }
> > > > +
> > > > +  /* We currently only support early exit loops with known bounds.   */
> 
> Btw, why's that?  Is that because we don't support the loop-around edge?
> IMHO this is the most serious limitation (and as said above it should be
> trivial to fix).
> 
> > > > +  if (exits.length () > 1)
> > > > +    {
> > > > +      class tree_niter_desc niter;
> > > > +      if (!number_of_iterations_exit_assumptions (loop, nexit, &niter,
> NULL)
> > > > +	  || chrec_contains_undetermined (niter.niter)
> > > > +	  || !evolution_function_is_constant_p (niter.niter))
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " early breaks only supported on loops"
> > > > +				       " with known iteration bounds.\n");
> > > > +    }
> > > >
> > > >    info->conds
> > > >      = vect_get_loop_niters (loop, &info->assumptions,
> > > > @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop,
> > > vec_info_shared *shared,
> > > >    LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info-
> > > >alt_loop_conds);
> > > >    LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond;
> > > >
> > > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > > +
> > > >    if (info->inner_loop_cond)
> > > >      {
> > > >        stmt_vec_info inner_loop_cond_info
> > > > @@ -3070,7 +3106,8 @@ start_over:
> > > >
> > > >    /* If an epilogue loop is required make sure we can create one.  */
> > > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      {
> > > >        if (dump_enabled_p ())
> > > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop
> required\n");
> > > > @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction
> (loop_vec_info
> > > loop_vinfo,
> > > >    basic_block exit_bb;
> > > >    tree scalar_dest;
> > > >    tree scalar_type;
> > > > -  gimple *new_phi = NULL, *phi;
> > > > +  gimple *new_phi = NULL, *phi = NULL;
> > > >    gimple_stmt_iterator exit_gsi;
> > > >    tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> > > >    gimple *epilog_stmt = NULL;
> > > > @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction
> > > (loop_vec_info loop_vinfo,
> > > >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> > > >  	  reduc_inputs.quick_push (new_def);
> > > >  	}
> > > > +
> > > > +	/* Update the other exits.  */
> > > > +	if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	  {
> > > > +	    vec<edge> alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo);
> > > > +	    gphi_iterator gsi, gsi1;
> > > > +	    for (edge exit : alt_exits)
> > > > +	      {
> > > > +		/* Find the phi node to propaget into the exit block for each
> > > > +		   exit edge.  */
> > > > +		for (gsi = gsi_start_phis (exit_bb),
> > > > +		     gsi1 = gsi_start_phis (exit->src);
> 
> exit->src == loop->header, right?  I think this won't work for multiple
> alternate exits.  It's probably easier to do this where we create the
> LC PHI node for the reduction result?
> 
> > > > +		     !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> > > > +		     gsi_next (&gsi), gsi_next (&gsi1))
> > > > +		  {
> > > > +		    /* There really should be a function to just get the number
> > > > +		       of phis inside a bb.  */
> > > > +		    if (phi && phi == gsi.phi ())
> > > > +		      {
> > > > +			gphi *phi1 = gsi1.phi ();
> > > > +			SET_PHI_ARG_DEF (phi, exit->dest_idx,
> > > > +					 PHI_RESULT (phi1));
> 
> I think we know the header PHI of a reduction perfectly well, there
> shouldn't be the need to "search" for it.
> 
> > > > +			break;
> > > > +		      }
> > > > +		  }
> > > > +	      }
> > > > +	  }
> > > >        gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > > >      }
> > > >
> > > > @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info
> *vinfo,
> > > >  	   new_tree = lane_extract <vec_lhs', ...>;
> > > >  	   lhs' = new_tree;  */
> > > >
> > > > +      /* When vectorizing an early break, any live statement that is used
> > > > +	 outside of the loop are dead.  The loop will never get to them.
> > > > +	 We could change the liveness value during analysis instead but since
> > > > +	 the below code is invalid anyway just ignore it during codegen.  */
> > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	return true;
> 
> But what about the value that's live across the main exit when the
> epilogue is not entered?
> 
> > > > +
> > > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > >        basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > >        gcc_assert (single_pred_p (exit_bb));
> > > > @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > > >       versioning.   */
> > > >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > -  if (! single_pred_p (e->dest))
> > > > +  if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> > > (loop_vinfo))
> 
> e can be NULL here?  I think we should reject such loops earlier.
> 
> > > >      {
> > > >        split_loop_exit_edge (e, true);
> > > >        if (dump_enabled_p ())
> > > > @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >    if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))
> > > >      {
> > > >        e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
> > > > -      if (! single_pred_p (e->dest))
> > > > +      if (e && ! single_pred_p (e->dest))
> > > >  	{
> > > >  	  split_loop_exit_edge (e, true);
> > > >  	  if (dump_enabled_p ())
> > > > @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info
> > > loop_vinfo, gimple *loop_vectorized_call)
> > > >
> > > >    /* Loops vectorized with a variable factor won't benefit from
> > > >       unrolling/peeling.  */
> 
> update the comment?  Why would we unroll a VLA loop with early breaks?
> Or did you mean to use || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)?
> 
> > > > -  if (!vf.is_constant ())
> > > > +  if (!vf.is_constant ()
> > > > +      && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > >      {
> > > >        loop->unroll = 1;
> > > >        if (dump_enabled_p ())
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index
> > >
> 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82af
> > > e78ee2a7c627be45b 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info
> stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >    *live_p = false;
> > > >
> > > >    /* cond stmt other than loop exit cond.  */
> > > > -  if (is_ctrl_stmt (stmt_info->stmt)
> > > > -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> > > > -    *relevant = vect_used_in_scope;
> 
> how was that ever hit before?  For outer loop processing with outer loop
> vectorization?
> 
> > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > +    {
> > > > +      /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge,
> but
> > > > +	 it looks like loop_manip doesn't do that..  So we have to do it
> > > > +	 the hard way.  */
> > > > +      basic_block bb = gimple_bb (stmt_info->stmt);
> > > > +      bool exit_bb = false, early_exit = false;
> > > > +      edge_iterator ei;
> > > > +      edge e;
> > > > +      FOR_EACH_EDGE (e, ei, bb->succs)
> > > > +        if (!flow_bb_inside_loop_p (loop, e->dest))
> > > > +	  {
> > > > +	    exit_bb = true;
> > > > +	    early_exit = loop->vec_loop_iv->src != bb;
> > > > +	    break;
> > > > +	  }
> > > > +
> > > > +      /* We should have processed any exit edge, so an edge not an early
> > > > +	 break must be a loop IV edge.  We need to distinguish between the
> > > > +	 two as we don't want to generate code for the main loop IV.  */
> > > > +      if (exit_bb)
> > > > +	{
> > > > +	  if (early_exit)
> > > > +	    *relevant = vect_used_in_scope;
> > > > +	}
> 
> I wonder why you can't simply do
> 
>          if (is_ctrl_stmt (stmt_info->stmt)
>              && stmt_info->stmt != LOOP_VINFO_COND (loop_info))
> 
> ?
> 
> > > > +      else if (bb->loop_father == loop)
> > > > +	LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true;
> 
> so for control flow not exiting the loop you can check
> loop_exits_from_bb_p ().
> 
> > > > +    }
> > > >
> > > >    /* changing memory.  */
> > > >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
> > > > @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info
> stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >  	*relevant = vect_used_in_scope;
> > > >        }
> > > >
> > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > +  auto_bitmap exit_bbs;
> > > > +  for (edge exit : exits)
> > > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > +
> > > >    /* uses outside the loop.  */
> > > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > SSA_OP_DEF)
> > > >      {
> > > > @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > > loop_vec_info loop_vinfo,
> > > >  	      /* We expect all such uses to be in the loop exit phis
> > > >  		 (because of loop closed form)   */
> > > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > > > -	      gcc_assert (bb == single_exit (loop)->dest);
> > > > +	      gcc_assert (bitmap_bit_p (exit_bbs, bb->index));
> 
> That now becomes quite expensive checking already covered by the LC SSA
> verifier so I suggest to simply drop this assert instead.
> 
> > > >                *live_p = true;
> > > >  	    }
> > > > @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized
> > > (loop_vec_info loop_vinfo, bool *fatal)
> > > >  	}
> > > >      }
> > > >
> > > > +  /* Ideally this should be in vect_analyze_loop_form but we haven't
> seen all
> > > > +     the conds yet at that point and there's no quick way to retrieve them.
> */
> > > > +  if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo))
> > > > +    return opt_result::failure_at (vect_location,
> > > > +				   "not vectorized:"
> > > > +				   " unsupported control flow in loop.\n");
> 
> so we didn't do this before?  But see above where I wondered.  So when
> does this hit with early exits and why can't we check for this in
> vect_verify_loop_form?
> 
> > > > +
> > > >    /* 2. Process_worklist */
> > > >    while (worklist.length () > 0)
> > > >      {
> > > > @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized
> > > (loop_vec_info loop_vinfo, bool *fatal)
> > > >  			return res;
> > > >  		    }
> > > >                   }
> > > > +	    }
> > > > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> > > > +	    {
> > > > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > > > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > > > +	      opt_result res
> > > > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > > +			       loop_vinfo, relevant, &worklist, false);
> > > > +	      if (!res)
> > > > +		return res;
> > > > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > > +				loop_vinfo, relevant, &worklist, false);
> > > > +	      if (!res)
> > > > +		return res;
> > > >              }
> > > >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> > > >  	    {
> > > > @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >  			     node_instance, cost_vec);
> > > >        if (!res)
> > > >  	return res;
> > > > -   }
> > > > +    }
> > > > +
> > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > > >
> > > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > >      {
> > > >        case vect_internal_def:
> > > > +      case vect_early_exit_def:
> > > >          break;
> > > >
> > > >        case vect_reduction_def:
> > > > @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >      {
> > > >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > > >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> > > >        *need_to_vectorize = true;
> > > >      }
> > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > index
> > >
> ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b3
> > > 40baff61d18da8e242dd 100644
> > > > --- a/gcc/tree-vectorizer.h
> > > > +++ b/gcc/tree-vectorizer.h
> > > > @@ -63,6 +63,7 @@ enum vect_def_type {
> > > >    vect_internal_def,
> > > >    vect_induction_def,
> > > >    vect_reduction_def,
> > > > +  vect_early_exit_def,
> 
> can you avoid putting this inbetween reduction and double reduction
> please?  Just put it before vect_unknown_def_type.  In fact the COND
> isn't a def ... maybe we should have pattern recogized
> 
>  if (a < b) exit;
> 
> as
> 
>  cond = a < b;
>  if (cond != 0) exit;
> 
> so the part that we need to vectorize is more clear.
> 
> > > >    vect_double_reduction_def,
> > > >    vect_nested_cycle,
> > > >    vect_first_order_recurrence,
> > > > @@ -876,6 +877,13 @@ public:
> > > >       we need to peel off iterations at the end to form an epilogue loop.  */
> > > >    bool peeling_for_niter;
> > > >
> > > > +  /* When the loop has early breaks that we can vectorize we need to
> peel
> > > > +     the loop for the break finding loop.  */
> > > > +  bool early_breaks;
> > > > +
> > > > +  /* When the loop has a non-early break control flow inside.  */
> > > > +  bool non_break_control_flow;
> > > > +
> > > >    /* List of loop additional IV conditionals found in the loop.  */
> > > >    auto_vec<gcond *> conds;
> > > >
> > > > @@ -985,9 +993,11 @@ public:
> > > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> > > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> > > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > > >  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> > > >early_break_conflict
> > > >  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)-
> >early_break_dest_bb
> > > >  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> > > > +#define LOOP_VINFO_GENERAL_CTR_FLOW(L)     (L)-
> > > >non_break_control_flow
> > > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > > >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > > >no_data_dependencies
> > > > @@ -1038,8 +1048,8 @@ public:
> > > >     stack.  */
> > > >  typedef opt_pointer_wrapper <loop_vec_info> opt_loop_vec_info;
> > > >
> > > > -inline loop_vec_info
> > > > -loop_vec_info_for_loop (class loop *loop)
> > > > +static inline loop_vec_info
> > > > +loop_vec_info_for_loop (const class loop *loop)
> > > >  {
> > > >    return (loop_vec_info) loop->aux;
> > > >  }
> > > > @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb)
> > > >  {
> > > >    if (bb == (bb->loop_father)->header)
> > > >      return true;
> > > > -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> > > > +
> > > >    return false;
> > > >  }
> > > >
> > > > @@ -2176,9 +2186,10 @@ class auto_purge_vect_location
> > > >     in tree-vect-loop-manip.cc.  */
> > > >  extern void vect_set_loop_condition (class loop *, loop_vec_info,
> > > >  				     tree, tree, tree, bool);
> > > > -extern bool slpeel_can_duplicate_loop_p (const class loop *,
> const_edge);
> > > > +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info,
> > > const_edge);
> > > >  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *,
> > > > -						     class loop *, edge);
> > > > +						    class loop *, edge, bool,
> > > > +						    vec<basic_block> * = NULL);
> > > >  class loop *vect_loop_versioning (loop_vec_info, gimple *);
> > > >  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
> > > >  				    tree *, tree *, tree *, int, bool, bool,
> > > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > > index
> > >
> a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c
> > > 67f29f5dc9a8d72235f4 100644
> > > > --- a/gcc/tree-vectorizer.cc
> > > > +++ b/gcc/tree-vectorizer.cc
> > > > @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun)
> > > >  	 predicates that need to be shared for optimal predicate usage.
> > > >  	 However reassoc will re-order them and prevent CSE from working
> > > >  	 as it should.  CSE only the loop body, not the entry.  */
> > > > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> 
> seeing this more and more I think we want a simple way to iterate over
> all exits without copying to a vector when we have them recorded.  My
> C++ fu is too limited to support
> 
>   for (auto exit : recorded_exits (loop))
>     ...
> 
> (maybe that's enough for somebody to jump onto this ;))
> 
> Don't treat all review comments as change orders, but it should be clear
> the code isn't 100% obvious.  Maybe the patch can be simplified by
> splitting out the LC SSA cleanup parts.
> 
> Thanks,
> Richard.
> 
> > > > +      for (edge exit : exits)
> > > > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> > > >
> > > >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> > > >        do_rpo_vn (fun, entry, exit_bbs);
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg,
> > > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien
> > > Moerman;
> > > HRB 36809 (AG Nuernberg)
> >

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (19 preceding siblings ...)
       [not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
@ 2023-11-06  7:36 ` Tamar Christina
  2023-11-06  7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:36 UTC (permalink / raw)
  To: rguenther, gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 8293 bytes --]

Hi All,

This patch adds initial support for early break vectorization in GCC.
The support is added for any target that implements a vector cbranch optab,
this includes both fully masked and non-masked targets.

Depending on the operation, the vectorizer may also require support for boolean
mask reductions using Inclusive OR.  This is however only checked then the
comparison would produce multiple statements.

Note: I am currently struggling to get patch 7 correct in all cases and could use
      some feedback there.

Concretely the kind of loops supported are of the forms:

 for (int i = 0; i < N; i++)
 {
   <statements1>
   if (<condition>)
     {
       ...
       <action>;
     }
   <statements2>
 }

where <action> can be:
 - break
 - return
 - goto

Any number of statements can be used before the <action> occurs.

Since this is an initial version for GCC 14 it has the following limitations and
features:

- Only fixed sized iterations and buffers are supported.  That is to say any
  vectors loaded or stored must be to statically allocated arrays with known
  sizes. N must also be known.  This limitation is because our primary target
  for this optimization is SVE.  For VLA SVE we can't easily do cross page
  iteraion checks. The result is likely to also not be beneficial. For that
  reason we punt support for variable buffers till we have First-Faulting
  support in GCC.
- any stores in <statements1> should not be to the same objects as in
  <condition>.  Loads are fine as long as they don't have the possibility to
  alias.  More concretely, we block RAW dependencies when the intermediate value
  can't be separated fromt the store, or the store itself can't be moved.
- Prologue peeling, alignment peelinig and loop versioning are supported.
- Fully masked loops, unmasked loops and partially masked loops are supported
- Any number of loop early exits are supported.
- No support for epilogue vectorization.  The only epilogue supported is the
  scalar final one.  Peeling code supports it but the code motion code cannot
  find instructions to make the move in the epilog.
- Early breaks are only supported for inner loop vectorization.

I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break

With the help of IPA and LTO this still gets hit quite often.  During bootstrap
it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
since these are tests for support for early exit vectorization.

This implementation does not support completely handling the early break inside
the vector loop itself but instead supports adding checks such that if we know
that we have to exit in the current iteration then we branch to scalar code to
actually do the final VF iterations which handles all the code in <action>.

For the scalar loop we know that whatever exit you take you have to perform at
most VF iterations.  For vector code we only case about the state of fully
performed iteration and reset the scalar code to the (partially) remaining loop.

That is to say, the first vector loop executes so long as the early exit isn't
needed.  Once the exit is taken, the scalar code will perform at most VF extra
iterations.  The exact number depending on peeling and iteration start and which
exit was taken (natural or early).   For this scalar loop, all early exits are
treated the same.

When we vectorize we move any statement not related to the early break itself
and that would be incorrect to execute before the break (i.e. has side effects)
to after the break.  If this is not possible we decline to vectorize.

This means that we check at the start of iterations whether we are going to exit
or not.  During the analyis phase we check whether we are allowed to do this
moving of statements.  Also note that we only move the scalar statements, but
only do so after peeling but just before we start transforming statements.

Codegen:

for e.g.

#define N 803
unsigned vect_a[N];
unsigned vect_b[N];

unsigned test4(unsigned x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i] > x)
     break;
   vect_a[i] = x;

 }
 return ret;
}

We generate for Adv. SIMD:

test4:
        adrp    x2, .LC0
        adrp    x3, .LANCHOR0
        dup     v2.4s, w0
        add     x3, x3, :lo12:.LANCHOR0
        movi    v4.4s, 0x4
        add     x4, x3, 3216
        ldr     q1, [x2, #:lo12:.LC0]
        mov     x1, 0
        mov     w2, 0
        .p2align 3,,7
.L3:
        ldr     q0, [x3, x1]
        add     v3.4s, v1.4s, v2.4s
        add     v1.4s, v1.4s, v4.4s
        cmhi    v0.4s, v0.4s, v2.4s
        umaxp   v0.4s, v0.4s, v0.4s
        fmov    x5, d0
        cbnz    x5, .L6
        add     w2, w2, 1
        str     q3, [x1, x4]
        str     q2, [x3, x1]
        add     x1, x1, 16
        cmp     w2, 200
        bne     .L3
        mov     w7, 3
.L2:
        lsl     w2, w2, 2
        add     x5, x3, 3216
        add     w6, w2, w0
        sxtw    x4, w2
        ldr     w1, [x3, x4, lsl 2]
        str     w6, [x5, x4, lsl 2]
        cmp     w0, w1
        bcc     .L4
        add     w1, w2, 1
        str     w0, [x3, x4, lsl 2]
        add     w6, w1, w0
        sxtw    x1, w1
        ldr     w4, [x3, x1, lsl 2]
        str     w6, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        add     w4, w2, 2
        str     w0, [x3, x1, lsl 2]
        sxtw    x1, w4
        add     w6, w1, w0
        ldr     w4, [x3, x1, lsl 2]
        str     w6, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        str     w0, [x3, x1, lsl 2]
        add     w2, w2, 3
        cmp     w7, 3
        beq     .L4
        sxtw    x1, w2
        add     w2, w2, w0
        ldr     w4, [x3, x1, lsl 2]
        str     w2, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        str     w0, [x3, x1, lsl 2]
.L4:
        mov     w0, 0
        ret
        .p2align 2,,3
.L6:
        mov     w7, 4
        b       .L2

and for SVE:

test4:
        adrp    x2, .LANCHOR0
        add     x2, x2, :lo12:.LANCHOR0
        add     x5, x2, 3216
        mov     x3, 0
        mov     w1, 0
        cntw    x4
        mov     z1.s, w0
        index   z0.s, #0, #1
        ptrue   p1.b, all
        ptrue   p0.s, all
        .p2align 3,,7
.L3:
        ld1w    z2.s, p1/z, [x2, x3, lsl 2]
        add     z3.s, z0.s, z1.s
        cmplo   p2.s, p0/z, z1.s, z2.s
        b.any   .L2
        st1w    z3.s, p1, [x5, x3, lsl 2]
        add     w1, w1, 1
        st1w    z1.s, p1, [x2, x3, lsl 2]
        add     x3, x3, x4
        incw    z0.s
        cmp     w3, 803
        bls     .L3
.L5:
        mov     w0, 0
        ret
        .p2align 2,,3
.L2:
        cntw    x5
        mul     w1, w1, w5
        cbz     w5, .L5
        sxtw    x1, w1
        sub     w5, w5, #1
        add     x5, x5, x1
        add     x6, x2, 3216
        b       .L6
        .p2align 2,,3
.L14:
        str     w0, [x2, x1, lsl 2]
        cmp     x1, x5
        beq     .L5
        mov     x1, x4
.L6:
        ldr     w3, [x2, x1, lsl 2]
        add     w4, w0, w1
        str     w4, [x6, x1, lsl 2]
        add     x4, x1, 1
        cmp     w0, w3
        bcs     .L14
        mov     w0, 0
        ret

On the workloads this work is based on we see between 2-3x performance uplift
using this patch.

Follow up plan:
 - Boolean vectorization has several shortcomings.  I've filed PR110223 with the
   bigger ones that cause vectorization to fail with this patch.
 - SLP support.  This is planned for GCC 15 as for majority of the cases build
   SLP itself fails.  This means I'll need to spend time in making this more
   robust first.  Additionally it requires:
     * Adding support for vectorizing CFG (gconds)
     * Support for CFG to differ between vector and scalar loops.
   Both of which would be disruptive to the tree and I suspect I'll be handling
   fallouts from this patch for a while.  So I plan to work on the surrounding
   building blocks first for the remainder of the year.

Bootstrapped Regtested on aarch64-none-linux-gnu and some issues so looking for
some feedback.  Also ran across various workloads and no issues.

When closer to acceptance I will run on other targets as well and clean up
related testsuite fallouts there.

--- inline copy of patch -- 

-- 

[-- Attachment #2: rb17494.patch --]
[-- Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (20 preceding siblings ...)
  2023-11-06  7:36 ` [PATCH v6 0/21]middle-end: " Tamar Christina
@ 2023-11-06  7:37 ` Tamar Christina
  2023-11-07  9:46   ` Richard Biener
  2023-11-06  7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
                   ` (20 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 3238 bytes --]

Hi All,

This adds pragma GCC novector to testcases that have showed up
since last regression run and due to this series detecting more.

Is it ok that when it comes time to commit I can just update any
new cases before committing? since this seems a cat and mouse game..

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/no-scevccp-slp-30.c: Add pragma novector.
	* gcc.dg/vect/no-scevccp-slp-31.c: Likewise.
	* gcc.dg/vect/no-section-anchors-vect-69.c: Likewise.
	* gcc.target/aarch64/vect-xorsign_exec.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
index 00d0eca56eeca6aee6f11567629dc955c0924c74..534bee4a1669a7cbd95cf6007f28dafd23bab8da 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
@@ -24,9 +24,9 @@ main1 ()
    }
 
   /* check results:  */
-#pragma GCC novector
    for (j = 0; j < N; j++)
    {
+#pragma GCC novector
     for (i = 0; i < N; i++)
       {
         if (out[i*4] != 8
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
index 48b6a9b0681cf1fe410755c3e639b825b27895b0..22817a57ef81398cc018a78597755397d20e0eb9 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
@@ -27,6 +27,7 @@ main1 ()
 #pragma GCC novector
  for (i = 0; i < N; i++)
    {
+#pragma GCC novector
     for (j = 0; j < N; j++) 
       {
         if (a[i][j] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
index a0e53d5fef91868dfdbd542dd0a98dff92bd265b..0861d488e134d3f01a2fa83c56eff7174f36ddfb 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
@@ -83,9 +83,9 @@ int main1 ()
     }
 
   /* check results:  */
-#pragma GCC novector
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
 	{
           if (tmp1[2].e.n[1][i][j] != 8)
@@ -103,9 +103,9 @@ int main1 ()
     }
 
   /* check results:  */
-#pragma GCC novector
   for (i = 0; i < N - NINTS; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N - NINTS; j++)
 	{
           if (tmp2[2].e.n[1][i][j] != 8)
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c b/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
index cfa22115831272cb1d4e1a38512f10c3a1c6ad77..84f33d3f6cce9b0017fd12ab961019041245ffae 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
@@ -33,6 +33,7 @@ main (void)
     r[i] = a[i] * __builtin_copysignf (1.0f, b[i]);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (r[i] != a[i] * __builtin_copysignf (1.0f, b[i]))
       abort ();
@@ -41,6 +42,7 @@ main (void)
     rd[i] = ad[i] * __builtin_copysign (1.0d, bd[i]);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (rd[i] != ad[i] * __builtin_copysign (1.0d, bd[i]))
       abort ();




-- 

[-- Attachment #2: rb17961.patch --]
[-- Type: text/plain, Size: 2597 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
index 00d0eca56eeca6aee6f11567629dc955c0924c74..534bee4a1669a7cbd95cf6007f28dafd23bab8da 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
@@ -24,9 +24,9 @@ main1 ()
    }
 
   /* check results:  */
-#pragma GCC novector
    for (j = 0; j < N; j++)
    {
+#pragma GCC novector
     for (i = 0; i < N; i++)
       {
         if (out[i*4] != 8
diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
index 48b6a9b0681cf1fe410755c3e639b825b27895b0..22817a57ef81398cc018a78597755397d20e0eb9 100644
--- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
+++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
@@ -27,6 +27,7 @@ main1 ()
 #pragma GCC novector
  for (i = 0; i < N; i++)
    {
+#pragma GCC novector
     for (j = 0; j < N; j++) 
       {
         if (a[i][j] != 8)
diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
index a0e53d5fef91868dfdbd542dd0a98dff92bd265b..0861d488e134d3f01a2fa83c56eff7174f36ddfb 100644
--- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
+++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
@@ -83,9 +83,9 @@ int main1 ()
     }
 
   /* check results:  */
-#pragma GCC novector
   for (i = 0; i < N; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N; j++)
 	{
           if (tmp1[2].e.n[1][i][j] != 8)
@@ -103,9 +103,9 @@ int main1 ()
     }
 
   /* check results:  */
-#pragma GCC novector
   for (i = 0; i < N - NINTS; i++)
     {
+#pragma GCC novector
       for (j = 0; j < N - NINTS; j++)
 	{
           if (tmp2[2].e.n[1][i][j] != 8)
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c b/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
index cfa22115831272cb1d4e1a38512f10c3a1c6ad77..84f33d3f6cce9b0017fd12ab961019041245ffae 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
@@ -33,6 +33,7 @@ main (void)
     r[i] = a[i] * __builtin_copysignf (1.0f, b[i]);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (r[i] != a[i] * __builtin_copysignf (1.0f, b[i]))
       abort ();
@@ -41,6 +42,7 @@ main (void)
     rd[i] = ad[i] * __builtin_copysign (1.0d, bd[i]);
 
   /* check results:  */
+#pragma GCC novector
   for (i = 0; i < N; i++)
     if (rd[i] != ad[i] * __builtin_copysign (1.0d, bd[i]))
       abort ();




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (21 preceding siblings ...)
  2023-11-06  7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
@ 2023-11-06  7:37 ` Tamar Christina
  2023-11-07  9:52   ` Richard Biener
  2023-11-06  7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
                   ` (19 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 112764 bytes --]

Hi All,

This adds new test to check for all the early break functionality.
It includes a number of codegen and runtime tests checking the values at
different needles in the array.

They also check the values on different array sizes and peeling positions,
datatypes, VL, ncopies and every other variant I could think of.

Additionally it also contains reduced cases from issues found running over
various codebases.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Also regtested with:
 -march=armv8.3-a+sve
 -march=armv8.3-a+nosve
 -march=armv9-a

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* doc/sourcebuild.texi: Document it.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp:
	* g++.dg/vect/vect-early-break_1.cc: New test.
	* g++.dg/vect/vect-early-break_2.cc: New test.
	* g++.dg/vect/vect-early-break_3.cc: New test.
	* gcc.dg/vect/vect-early-break-run_1.c: New test.
	* gcc.dg/vect/vect-early-break-run_10.c: New test.
	* gcc.dg/vect/vect-early-break-run_2.c: New test.
	* gcc.dg/vect/vect-early-break-run_3.c: New test.
	* gcc.dg/vect/vect-early-break-run_4.c: New test.
	* gcc.dg/vect/vect-early-break-run_5.c: New test.
	* gcc.dg/vect/vect-early-break-run_6.c: New test.
	* gcc.dg/vect/vect-early-break-run_7.c: New test.
	* gcc.dg/vect/vect-early-break-run_8.c: New test.
	* gcc.dg/vect/vect-early-break-run_9.c: New test.
	* gcc.dg/vect/vect-early-break-template_1.c: New test.
	* gcc.dg/vect/vect-early-break-template_2.c: New test.
	* gcc.dg/vect/vect-early-break_1.c: New test.
	* gcc.dg/vect/vect-early-break_10.c: New test.
	* gcc.dg/vect/vect-early-break_11.c: New test.
	* gcc.dg/vect/vect-early-break_12.c: New test.
	* gcc.dg/vect/vect-early-break_13.c: New test.
	* gcc.dg/vect/vect-early-break_14.c: New test.
	* gcc.dg/vect/vect-early-break_15.c: New test.
	* gcc.dg/vect/vect-early-break_16.c: New test.
	* gcc.dg/vect/vect-early-break_17.c: New test.
	* gcc.dg/vect/vect-early-break_18.c: New test.
	* gcc.dg/vect/vect-early-break_19.c: New test.
	* gcc.dg/vect/vect-early-break_2.c: New test.
	* gcc.dg/vect/vect-early-break_20.c: New test.
	* gcc.dg/vect/vect-early-break_21.c: New test.
	* gcc.dg/vect/vect-early-break_22.c: New test.
	* gcc.dg/vect/vect-early-break_23.c: New test.
	* gcc.dg/vect/vect-early-break_24.c: New test.
	* gcc.dg/vect/vect-early-break_25.c: New test.
	* gcc.dg/vect/vect-early-break_26.c: New test.
	* gcc.dg/vect/vect-early-break_27.c: New test.
	* gcc.dg/vect/vect-early-break_28.c: New test.
	* gcc.dg/vect/vect-early-break_29.c: New test.
	* gcc.dg/vect/vect-early-break_3.c: New test.
	* gcc.dg/vect/vect-early-break_30.c: New test.
	* gcc.dg/vect/vect-early-break_31.c: New test.
	* gcc.dg/vect/vect-early-break_32.c: New test.
	* gcc.dg/vect/vect-early-break_33.c: New test.
	* gcc.dg/vect/vect-early-break_34.c: New test.
	* gcc.dg/vect/vect-early-break_35.c: New test.
	* gcc.dg/vect/vect-early-break_36.c: New test.
	* gcc.dg/vect/vect-early-break_37.c: New test.
	* gcc.dg/vect/vect-early-break_38.c: New test.
	* gcc.dg/vect/vect-early-break_39.c: New test.
	* gcc.dg/vect/vect-early-break_4.c: New test.
	* gcc.dg/vect/vect-early-break_40.c: New test.
	* gcc.dg/vect/vect-early-break_41.c: New test.
	* gcc.dg/vect/vect-early-break_42.c: New test.
	* gcc.dg/vect/vect-early-break_43.c: New test.
	* gcc.dg/vect/vect-early-break_44.c: New test.
	* gcc.dg/vect/vect-early-break_45.c: New test.
	* gcc.dg/vect/vect-early-break_46.c: New test.
	* gcc.dg/vect/vect-early-break_47.c: New test.
	* gcc.dg/vect/vect-early-break_48.c: New test.
	* gcc.dg/vect/vect-early-break_49.c: New test.
	* gcc.dg/vect/vect-early-break_5.c: New test.
	* gcc.dg/vect/vect-early-break_50.c: New test.
	* gcc.dg/vect/vect-early-break_51.c: New test.
	* gcc.dg/vect/vect-early-break_52.c: New test.
	* gcc.dg/vect/vect-early-break_53.c: New test.
	* gcc.dg/vect/vect-early-break_54.c: New test.
	* gcc.dg/vect/vect-early-break_55.c: New test.
	* gcc.dg/vect/vect-early-break_56.c: New test.
	* gcc.dg/vect/vect-early-break_57.c: New test.
	* gcc.dg/vect/vect-early-break_58.c: New test.
	* gcc.dg/vect/vect-early-break_59.c: New test.
	* gcc.dg/vect/vect-early-break_6.c: New test.
	* gcc.dg/vect/vect-early-break_60.c: New test.
	* gcc.dg/vect/vect-early-break_61.c: New test.
	* gcc.dg/vect/vect-early-break_62.c: New test.
	* gcc.dg/vect/vect-early-break_63.c: New test.
	* gcc.dg/vect/vect-early-break_64.c: New test.
	* gcc.dg/vect/vect-early-break_65.c: New test.
	* gcc.dg/vect/vect-early-break_66.c: New test.
	* gcc.dg/vect/vect-early-break_67.c: New test.
	* gcc.dg/vect/vect-early-break_68.c: New test.
	* gcc.dg/vect/vect-early-break_69.c: New test.
	* gcc.dg/vect/vect-early-break_7.c: New test.
	* gcc.dg/vect/vect-early-break_70.c: New test.
	* gcc.dg/vect/vect-early-break_71.c: New test.
	* gcc.dg/vect/vect-early-break_72.c: New test.
	* gcc.dg/vect/vect-early-break_73.c: New test.
	* gcc.dg/vect/vect-early-break_74.c: New test.
	* gcc.dg/vect/vect-early-break_75.c: New test.
	* gcc.dg/vect/vect-early-break_76.c: New test.
	* gcc.dg/vect/vect-early-break_8.c: New test.
	* gcc.dg/vect/vect-early-break_9.c: New test.
	* gcc.target/aarch64/opt_mismatch_1.c: New test.
	* gcc.target/aarch64/opt_mismatch_2.c: New test.
	* gcc.target/aarch64/opt_mismatch_3.c: New test.
	* gcc.target/aarch64/vect-early-break-cbranch_1.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index c20af31c64237baff70f8781b1dc47f4d1a48aa9..4c351335f2bec9c6bb6856bd38d9132da7447c13 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1636,6 +1636,10 @@ Target supports hardware vectors of @code{float} when
 @option{-funsafe-math-optimizations} is not in effect.
 This implies @code{vect_float}.
 
+@item vect_early_break
+Target supports hardware vectorization of loops with early breaks.
+This requires an implementation of the cbranch optab for vectors.
+
 @item vect_int
 Target supports hardware vectors of @code{int}.
 
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
new file mode 100644
index 0000000000000000000000000000000000000000..810d990e3efab0cf0363a3b76481f2cb649ad3ba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-w -O2" } */
+
+void fancy_abort(char *, int, const char *) __attribute__((__noreturn__));
+template <unsigned N, typename> struct poly_int_pod { int coeffs[N]; };
+template <unsigned N, typename> class poly_int : public poly_int_pod<N, int> {
+public:
+  template <typename Ca> poly_int &operator+=(const poly_int_pod<N, Ca> &);
+};
+template <unsigned N, typename C>
+template <typename Ca>
+poly_int<N, C> &poly_int<N, C>::operator+=(const poly_int_pod<N, Ca> &a) {
+  for (int i = 0; i < N; i++)
+    this->coeffs[i] += a.coeffs[i];
+  return *this;
+}
+template <unsigned N, typename Ca, typename Cb>
+poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
+  poly_int<N, long> r;
+  return r;
+}
+struct vec_prefix {
+  unsigned m_num;
+};
+struct vl_ptr;
+struct va_heap {
+  typedef vl_ptr default_layout;
+};
+template <typename, typename A, typename = typename A::default_layout>
+struct vec;
+template <typename T, typename A> struct vec<T, A, int> {
+  T &operator[](unsigned);
+  vec_prefix m_vecpfx;
+  T m_vecdata[];
+};
+template <typename T, typename A> T &vec<T, A, int>::operator[](unsigned ix) {
+  m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
+  return m_vecdata[ix];
+}
+template <typename T> struct vec<T, va_heap> {
+  T &operator[](unsigned ix) { return m_vec[ix]; }
+  vec<T, va_heap, int> m_vec;
+};
+class auto_vec : public vec<poly_int<2, long>, va_heap> {};
+template <typename> class vector_builder : public auto_vec {};
+class int_vector_builder : public vector_builder<int> {
+public:
+  int_vector_builder(poly_int<2, long>, int, int);
+};
+bool vect_grouped_store_supported() {
+  int i;
+  poly_int<2, long> nelt;
+  int_vector_builder sel(nelt, 2, 3);
+  for (i = 0; i < 6; i++)
+    sel[i] += exact_div(nelt, 2);
+}
+
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
new file mode 100644
index 0000000000000000000000000000000000000000..810d990e3efab0cf0363a3b76481f2cb649ad3ba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-w -O2" } */
+
+void fancy_abort(char *, int, const char *) __attribute__((__noreturn__));
+template <unsigned N, typename> struct poly_int_pod { int coeffs[N]; };
+template <unsigned N, typename> class poly_int : public poly_int_pod<N, int> {
+public:
+  template <typename Ca> poly_int &operator+=(const poly_int_pod<N, Ca> &);
+};
+template <unsigned N, typename C>
+template <typename Ca>
+poly_int<N, C> &poly_int<N, C>::operator+=(const poly_int_pod<N, Ca> &a) {
+  for (int i = 0; i < N; i++)
+    this->coeffs[i] += a.coeffs[i];
+  return *this;
+}
+template <unsigned N, typename Ca, typename Cb>
+poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
+  poly_int<N, long> r;
+  return r;
+}
+struct vec_prefix {
+  unsigned m_num;
+};
+struct vl_ptr;
+struct va_heap {
+  typedef vl_ptr default_layout;
+};
+template <typename, typename A, typename = typename A::default_layout>
+struct vec;
+template <typename T, typename A> struct vec<T, A, int> {
+  T &operator[](unsigned);
+  vec_prefix m_vecpfx;
+  T m_vecdata[];
+};
+template <typename T, typename A> T &vec<T, A, int>::operator[](unsigned ix) {
+  m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
+  return m_vecdata[ix];
+}
+template <typename T> struct vec<T, va_heap> {
+  T &operator[](unsigned ix) { return m_vec[ix]; }
+  vec<T, va_heap, int> m_vec;
+};
+class auto_vec : public vec<poly_int<2, long>, va_heap> {};
+template <typename> class vector_builder : public auto_vec {};
+class int_vector_builder : public vector_builder<int> {
+public:
+  int_vector_builder(poly_int<2, long>, int, int);
+};
+bool vect_grouped_store_supported() {
+  int i;
+  poly_int<2, long> nelt;
+  int_vector_builder sel(nelt, 2, 3);
+  for (i = 0; i < 6; i++)
+    sel[i] += exact_div(nelt, 2);
+}
+
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
new file mode 100644
index 0000000000000000000000000000000000000000..a12e5ca434b2ac37c03dbaa12273fd8e5aa2018c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-w -O2" } */
+
+int aarch64_advsimd_valid_immediate_hs_val32;
+bool aarch64_advsimd_valid_immediate_hs() {
+  for (int shift = 0; shift < 32; shift += 8)
+    if (aarch64_advsimd_valid_immediate_hs_val32 & shift)
+      return aarch64_advsimd_valid_immediate_hs_val32;
+  for (;;)
+    ;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..2495b36a72eae94cb7abc4a0d17a5c979fd78083
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 0
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
new file mode 100644
index 0000000000000000000000000000000000000000..9bcd7f7e57ef9a1d4649d18569b3406050e54603
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 800
+#define P 799
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..63f63101a467909f328be7f3acbc5bcb721967ff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 802
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..626b95e9b8517081d41d794e9e0264d6301c8589
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 5
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..7e0e6426120551152a7bd800c15d9ed6ab15bada
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 278
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..242cf486f9c40055df0aef5fd238d1aff7a7c7da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 800
+#define P 799
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..9fe7136b7213a463ca6573c60476b7c8f531ddcb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 0
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
new file mode 100644
index 0000000000000000000000000000000000000000..02f93d77dba31b938f6fd9e8c7f5e4acde4aeec9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 802
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
new file mode 100644
index 0000000000000000000000000000000000000000..a614925465606b54c638221ffb95a5e8d3bee797
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 5
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
new file mode 100644
index 0000000000000000000000000000000000000000..94e2b9c301456eda8f9ad7eaa67604563f0afee7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 278
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..af70a8e2a5a9dc9756edb5580f2de02ddcc95de9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
@@ -0,0 +1,47 @@
+#ifndef N
+#define N 803
+#endif
+
+#ifndef P
+#define P 0
+#endif
+
+unsigned vect_a[N] = {0};
+unsigned vect_b[N] = {0};
+  
+__attribute__((noipa, noinline))
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+
+  int x = 1;
+  int idx = P;
+  vect_a[idx] = x + 1;
+
+  test4(x);
+
+  if (vect_b[idx] != (x + idx))
+    abort ();
+
+  if (vect_a[idx] != x + 1)
+    abort ();
+
+  if (idx > 0 && vect_a[idx-1] != x)
+    abort ();
+
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d0f924d904437e71567d27cc1f1089e5607dca0d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
@@ -0,0 +1,50 @@
+#ifndef N
+#define N 803
+#endif
+
+#ifndef P
+#define P 0
+#endif
+
+unsigned vect_a[N] = {0};
+unsigned vect_b[N] = {0};
+  
+__attribute__((noipa, noinline))
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+
+  int x = 1;
+  int idx = P;
+  vect_a[idx] = x + 1;
+
+  unsigned res = test4(x);
+
+  if (res != idx)
+    abort ();
+
+  if (vect_b[idx] != (x + idx))
+    abort ();
+
+  if (vect_a[idx] != x + 1)
+    abort ();
+
+  if (idx > 0 && vect_a[idx-1] != x)
+    abort ();
+
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..51e7d6489b99c25b9b4b3d1c839f98562b6d4dd7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
new file mode 100644
index 0000000000000000000000000000000000000000..9e4ad1763202dfdab3ed7961ead5114fcc61a11b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x,int y, int z)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+ }
+
+ ret = x + y * z;
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
new file mode 100644
index 0000000000000000000000000000000000000000..a613dd9909fb09278dd92a81a24ef854994a9890
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, int y)
+{
+ unsigned ret = 0;
+for (int o = 0; o < y; o++)
+{
+ ret += o;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+}
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
new file mode 100644
index 0000000000000000000000000000000000000000..cc10f3238f1cb8e1307e024a3ebcb5c25a39d1b2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, int y)
+{
+ unsigned ret = 0;
+for (int o = 0; o < y; o++)
+{
+ ret += o;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   
+ }
+}
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
new file mode 100644
index 0000000000000000000000000000000000000000..6967b7395ed7c19e38a436d6edcfe7c1580c7113
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i] * x;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
new file mode 100644
index 0000000000000000000000000000000000000000..03cce5cf6cadecb520b46be666bf608e3bc6a511
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 803
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+int test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
new file mode 100644
index 0000000000000000000000000000000000000000..dec6872e1115ff66695f5a500ffa7ca01c0f8d3a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 803
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+int test4(unsigned x)
+{
+ int ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
new file mode 100644
index 0000000000000000000000000000000000000000..30812d12a39bd94b4b8a3aade6512b162697d659
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret += vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
new file mode 100644
index 0000000000000000000000000000000000000000..510227a18435a8e47c5a754580180c6d340c0823
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret = vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
new file mode 100644
index 0000000000000000000000000000000000000000..1372f79242b250cabbab29757b62cbc28a9064a8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
new file mode 100644
index 0000000000000000000000000000000000000000..677487f7da496a8f467d8c529575d47ff22c6a31
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, unsigned step)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=step)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..7268f6ae2485d0274fd85ea53cc1e44ef4b84d5c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
new file mode 100644
index 0000000000000000000000000000000000000000..ed41377d1c979bf14e0a4e80401831c09ffa463f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_b[N];
+struct testStruct {
+ long e;
+ long f;
+ bool a : 1;
+ bool b : 1;
+ int c : 14;
+ int d;
+};
+struct testStruct vect_a[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i].a > x)
+     return true;
+   vect_a[i].e = x;
+ }
+ return ret;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
new file mode 100644
index 0000000000000000000000000000000000000000..6415e4951cb9ef70e56b7cfb1db3d3151368666d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_b[N];
+struct testStruct {
+ long e;
+ long f;
+ bool a : 1;
+ bool b : 1;
+ int c : 14;
+ int d;
+};
+struct testStruct vect_a[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i].a)
+     return true;
+   vect_a[i].e = x;
+ }
+ return ret;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
new file mode 100644
index 0000000000000000000000000000000000000000..2ca189899fb6bd6dfdf63de7729f54e3bee06ba0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_perm } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include "tree-vect.h"
+
+void __attribute__((noipa))
+foo (int * __restrict__ a, short * __restrict__ b, int * __restrict__ c)
+{
+  int t1 = *c;
+  int t2 = *c;
+  for (int i = 0; i < 64; i+=2)
+    {
+      b[i] = a[i] - t1;
+      t1 = a[i];
+      b[i+1] = a[i+1] - t2;
+      t2 = a[i+1];
+    }
+}
+
+int a[64];
+short b[64];
+
+int
+main ()
+{
+  check_vect ();
+  for (int i = 0; i < 64; ++i)
+    {
+      a[i] = i;
+      __asm__ volatile ("" ::: "memory");
+    }
+  int c = 7;
+  foo (a, b, &c);
+  for (int i = 2; i < 64; i+=2)
+    if (b[i] != a[i] - a[i-2]
+	|| b[i+1] != a[i+1] - a[i-1])
+      abort ();
+  if (b[0] != -7 || b[1] != -6)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
new file mode 100644
index 0000000000000000000000000000000000000000..f3298656d5d67fd137c4029a96a2f9c1bae344ce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
@@ -0,0 +1,61 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#define N 200
+#define M 4
+
+typedef signed char sc;
+typedef unsigned char uc;
+typedef signed short ss;
+typedef unsigned short us;
+typedef int si;
+typedef unsigned int ui;
+typedef signed long long sll;
+typedef unsigned long long ull;
+
+#define FOR_EACH_TYPE(M) \
+  M (sc) M (uc) \
+  M (ss) M (us) \
+  M (si) M (ui) \
+  M (sll) M (ull) \
+  M (float) M (double)
+
+#define TEST_VALUE(I) ((I) * 17 / 2)
+
+#define ADD_TEST(TYPE)				\
+  void __attribute__((noinline, noclone))	\
+  test_##TYPE (TYPE *a, TYPE *b)		\
+  {						\
+    for (int i = 0; i < N; i += 2)		\
+      {						\
+	a[i + 0] = b[i + 0] + 2;		\
+	a[i + 1] = b[i + 1] + 3;		\
+      }						\
+  }
+
+#define DO_TEST(TYPE)					\
+  for (int j = 1; j < M; ++j)				\
+    {							\
+      TYPE a[N + M];					\
+      for (int i = 0; i < N + M; ++i)			\
+	a[i] = TEST_VALUE (i);				\
+      test_##TYPE (a + j, a);				\
+      for (int i = 0; i < N; i += 2)			\
+	if (a[i + j] != (TYPE) (a[i] + 2)		\
+	    || a[i + j + 1] != (TYPE) (a[i + 1] + 3))	\
+	  __builtin_abort ();				\
+    }
+
+FOR_EACH_TYPE (ADD_TEST)
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (DO_TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
new file mode 100644
index 0000000000000000000000000000000000000000..7b4b2ffb9b75db6d5ca7e313d1f18d9b51f5b566
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include "tree-vect.h"
+
+extern void abort (void);
+void __attribute__((noinline,noclone))
+foo (double *b, double *d, double *f)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      d[2*i] = 2. * d[2*i];
+      d[2*i+1] = 4. * d[2*i+1];
+      b[i] = d[2*i] - 1.;
+      f[i] = d[2*i+1] + 2.;
+    }
+}
+int main()
+{
+  double b[1024], d[2*1024], f[1024];
+  int i;
+
+  check_vect ();
+
+  for (i = 0; i < 2*1024; i++)
+    d[i] = 1.;
+  foo (b, d, f);
+  for (i = 0; i < 1024; i+= 2)
+    {
+      if (d[2*i] != 2.)
+	abort ();
+      if (d[2*i+1] != 4.)
+	abort ();
+    }
+  for (i = 0; i < 1024; i++)
+    {
+      if (b[i] != 1.)
+	abort ();
+      if (f[i] != 6.)
+	abort ();
+    }
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
new file mode 100644
index 0000000000000000000000000000000000000000..8db9b60128b9e21529ae73ea1902afb8fa327112
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+
+#include "vect-peel-1-src.c"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 14 "vect" { target { { vect_element_align } && { vect_aligned_arrays } } xfail { ! vect_unaligned_possible } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail vect_element_align_preferred } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
new file mode 100644
index 0000000000000000000000000000000000000000..5905847cc0b6b393dde728a9f4ecb44c8ab42da5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_perm } */
+
+#include "tree-vect.h"
+
+void __attribute__((noipa))
+foo (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c)
+{
+  int t1 = *c;
+  int t2 = *c;
+  for (int i = 0; i < 64; i+=2)
+    {
+      b[i] = a[i] - t1;
+      t1 = a[i];
+      b[i+1] = a[i+1] - t2;
+      t2 = a[i+1];
+    }
+}
+
+int a[64], b[64];
+
+int
+main ()
+{
+  check_vect ();
+  for (int i = 0; i < 64; ++i)
+    {
+      a[i] = i;
+      __asm__ volatile ("" ::: "memory");
+    }
+  int c = 7;
+  foo (a, b, &c);
+  for (int i = 2; i < 64; i+=2)
+    if (b[i] != a[i] - a[i-2]
+	|| b[i+1] != a[i+1] - a[i-1])
+      abort ();
+  if (b[0] != -7 || b[1] != -6)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
new file mode 100644
index 0000000000000000000000000000000000000000..d0cfbb01667fa016d72828d098aeaa252c2c9318
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[128];
+
+int main ()
+{
+  int i;
+  for (i = 1; i < 128; i++)
+    if (a[i] != i%4 + 1)
+      abort ();
+  if (a[0] != 5)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
new file mode 100644
index 0000000000000000000000000000000000000000..a5eae81f3f5f5b7d92082f1588c6453a71e205cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[128];
+int main ()
+{
+  int i;
+  for (i = 1; i < 128; i++)
+    if (a[i] != i%4 + 1)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
new file mode 100644
index 0000000000000000000000000000000000000000..75d87e99e939fab61f751be025ca0398fa5bd078
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int in[100];
+int out[100 * 2];
+
+int main (void)
+{
+  if (out[0] != in[100 - 1])
+  for (int i = 1; i <= 100; ++i)
+    if (out[i] != 2)
+      __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3c6d28bd2d6e6e794146baf89e43c3b70293b7d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+unsigned test4(char x, char *vect, int n)
+{  
+ unsigned ret = 0;
+ for (int i = 0; i < n; i++)
+ {
+   if (vect[i] > x)
+     return 1;
+
+   vect[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
new file mode 100644
index 0000000000000000000000000000000000000000..e09d883db84685679e73867d83aba9900563983d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int x[100];
+int choose1(int);
+int choose2();
+void consume(int);
+void f() {
+    for (int i = 0; i < 100; ++i) {
+        if (x[i] == 11) {
+            if (choose1(i))
+                goto A;
+            else
+                goto B;
+        }
+    }
+    if (choose2())
+        goto B;
+A:
+    for (int i = 0; i < 100; ++i)
+        consume(i);
+B:
+    for (int i = 0; i < 100; ++i)
+        consume(i * i);
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
new file mode 100644
index 0000000000000000000000000000000000000000..6001523162d24d140af73143435f25bcd3a217c8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 1025
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret += vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
new file mode 100644
index 0000000000000000000000000000000000000000..73abddc267a0170c2d97a7e7c680525721455f22
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 1024
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret = vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
new file mode 100644
index 0000000000000000000000000000000000000000..29b37f70939af7fa9409edd3a1e29f718c959706
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+
+unsigned test4(unsigned x, int z)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     {
+       for (int y = 0; y < z; y++)
+	 vect_a2 [y] *= vect_a1[i];
+       break;
+     }
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 2 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
new file mode 100644
index 0000000000000000000000000000000000000000..2c48e3cee33fc37f45ef59c2bbaff7bc5a76b460
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+
+unsigned vect_a[N] __attribute__ ((aligned (4)));;
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ 
+ for (int i = 1; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
new file mode 100644
index 0000000000000000000000000000000000000000..3442484a81161f9bd09e30bc268fbcf66a899902
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     break;
+   vect_a1[i] = x;
+   if (vect_a2[i]*4 > x)
+     break;
+   vect_a2[i] = x*x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
new file mode 100644
index 0000000000000000000000000000000000000000..027766c51f508eab157db365a1653f3e92dcac10
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     break;
+   vect_a1[i] = x;
+   if (vect_a2[i]*4 > x)
+     return i;
+   vect_a2[i] = x*x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
new file mode 100644
index 0000000000000000000000000000000000000000..8d363120898232bb1402b9cf7b4b83b38a10505b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 4
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 != x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
new file mode 100644
index 0000000000000000000000000000000000000000..226d55d7194ca3f676ab52976fea25b7e335bbec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
new file mode 100644
index 0000000000000000000000000000000000000000..554e6ec84318c600c87982ad6ef0f90e8b47af01
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, unsigned n)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+= (N % 4))
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..216c56faf330449bf1969b7e51ff1e94270dc861
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect[N];
+
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (i > 16 && vect[i] > x)
+     break;
+
+   vect[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
new file mode 100644
index 0000000000000000000000000000000000000000..f2ae372cd96e74cc06254937c2b8fa69ecdedf09
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i*=3)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* SCEV can't currently analyze this loop bounds.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
new file mode 100644
index 0000000000000000000000000000000000000000..6ad9b3f17ddb953bfbf614e9331fa81f565b262f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+#pragma GCC novector
+#pragma GCC unroll 4
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += vect_a[i] + x;
+ }
+ return ret;
+}
+
+/* novector should have blocked vectorization.  */
+/* { dg-final { scan-tree-dump-not "vectorized \d loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
new file mode 100644
index 0000000000000000000000000000000000000000..88652f01595cb49a8736a1da6563507b607aae8f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 800
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
new file mode 100644
index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 802
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i + 1;
+   if (vect_a[i]*2 > x)
+     break;
+   if (vect_a[i+1]*2 > x)
+     break;
+   vect_a[i] = x;
+   vect_a[i+1] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
new file mode 100644
index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 802
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i + 1;
+   if (vect_a[i]*2 > x)
+     break;
+   if (vect_a[i+1]*2 > x)
+     break;
+   vect_a[i] = x;
+   vect_a[i+1] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
new file mode 100644
index 0000000000000000000000000000000000000000..cf1cb903b31d5fb5527bc6216c0cb9047357da96
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
new file mode 100644
index 0000000000000000000000000000000000000000..356d971e3a1f69f5c190b49d1d108e6be8766b39
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
+
+/* At -O2 we can't currently vectorize this because of the libcalls not being
+   lowered.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect"  { xfail *-*-* } } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
new file mode 100644
index 0000000000000000000000000000000000000000..d1cca4a33a25fbf6b631d46ce3dcd3608cffa046
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+void abort ();
+
+float results1[16] = {192.00,240.00,288.00,336.00,384.00,432.00,480.00,528.00,0.00};
+float results2[16] = {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,54.00,120.00,198.00,288.00,390.00,504.00,630.00};
+float a[16] = {0};
+float e[16] = {0};
+float b[16] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+int main1 ()
+{
+  int i;
+  for (i=0; i<16; i++)
+    {
+      if (a[i] != results1[i] || e[i] != results2[i])
+        abort();
+    }
+
+  if (a[i+3] != b[i-1])
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
new file mode 100644
index 0000000000000000000000000000000000000000..77043182860321a9e265a89ad8f29ec7946b17e8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int main (void)
+{
+  signed char a[50], b[50], c[50];
+  for (int i = 0; i < 50; ++i)
+    if (a[i] != ((((signed int) -1 < 0 ? -126 : 4) + ((signed int) -1 < 0 ? -101 : 26) + i * 9 + 0) >> 1))
+      __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
new file mode 100644
index 0000000000000000000000000000000000000000..bc9e5bf899a54c5b2ef67e0193d56b243ec5f043
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort();
+struct foostr {
+  _Complex short f1;
+  _Complex short f2;
+};
+struct foostr a[16] __attribute__ ((__aligned__(16))) = {};
+struct foostr c[16] __attribute__ ((__aligned__(16)));
+struct foostr res[16] = {};
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < 16; i++)
+    {
+      if (c[i].f1 != res[i].f1)
+ abort ();
+      if (c[i].f2 != res[i].f2)
+ abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..4a36d6979db1fd1f97ba2a290f78ac3b84f6de24
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
new file mode 100644
index 0000000000000000000000000000000000000000..e2ac8283091597f6f4776560c86f89d1f98b58ee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
new file mode 100644
index 0000000000000000000000000000000000000000..af036079457a7f5e50eae5a9ad4c952f33e62f87
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int x_in[32];
+int x_out_a[32], x_out_b[32];
+int c[16] = {3,2,1,10,1,42,3,4,50,9,32,8,11,10,1,2};
+int a[16 +1] = {0,16,32,48,64,128,256,512,0,16,32,48,64,128,256,512,1024};
+int b[16 +1] = {17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1};
+
+void foo ()
+{
+  int j, i, x;
+  int curr_a, flag, next_a, curr_b, next_b;
+    {
+      for (i = 0; i < 16; i++)
+        {
+          next_b = b[i+1];
+          curr_b = flag ? next_b : curr_b;
+        }
+      x_out_b[j] = curr_b;
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
new file mode 100644
index 0000000000000000000000000000000000000000..85cdfe0938e4093c7725e7f397accf26198f6a53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort();
+int main1 (short X)
+{
+  unsigned char a[128];
+  unsigned short b[128];
+  unsigned int c[128];
+  short myX = X;
+  int i;
+  for (i = 0; i < 128; i++)
+    {
+      if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
+        abort ();
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
new file mode 100644
index 0000000000000000000000000000000000000000..f066ddcfe458ca04bb1336f832121c91d7a3e80e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[64], b[64];
+int main ()
+{
+  int c = 7;
+  for (int i = 1; i < 64; ++i)
+    if (b[i] != a[i] - a[i-1])
+      abort ();
+  if (b[0] != -7)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
new file mode 100644
index 0000000000000000000000000000000000000000..9d0dd8dc5fccb05aeabcbce4014c4994bafdfb05
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ unsigned tmp[N];
+ for (int i = 0; i < N; i++)
+ {
+   tmp[i] = x + i;
+   vect_b[i] = tmp[i];
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
new file mode 100644
index 0000000000000000000000000000000000000000..073cbdf614f81525975dbd188632582218e60e9e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   volatile unsigned tmp = x + i;
+   vect_b[i] = tmp;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
new file mode 100644
index 0000000000000000000000000000000000000000..9086e885f56974d17f8cdf2dce4c6a44e580d74b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
@@ -0,0 +1,101 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-add-options bind_pic_locally } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+unsigned short sa[N];
+unsigned short sc[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[N];
+unsigned int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+/* Current peeling-for-alignment scheme will consider the 'sa[i+7]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-7%VF. This will result in a peeling factor 1,
+   which will also align the access to 'ia[i+3]', and the loop could be
+   vectorized on all targets that support unaligned loads.
+   Without cost model on targets that support misaligned stores, no peeling
+   will be applied since we want to keep the four loads aligned.  */
+
+__attribute__ ((noinline))
+int main1 ()
+{
+  int i;
+  int n = N - 7;
+
+  /* Multiple types with different sizes, used in independent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      sa[i+7] = sb[i] + sc[i];
+      ia[i+3] = ib[i] + ic[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+	abort ();
+    }
+
+  return 0;
+}
+
+/* Current peeling-for-alignment scheme will consider the 'ia[i+3]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-3%VF. This will result in a peeling factor
+   1 if VF=4,2. This will not align the access to 'sa[i+3]', for which we 
+   need to peel 5,1 iterations for VF=4,2 respectively, so the loop can not 
+   be vectorized.  However, 'ia[i+3]' also gets aligned if we peel 5
+   iterations, so the loop is vectorizable on all targets that support
+   unaligned loads.
+   Without cost model on targets that support misaligned stores, no peeling
+   will be applied since we want to keep the four loads aligned.  */
+
+__attribute__ ((noinline))
+int main2 ()
+{
+  int i;
+  int n = N-3;
+
+  /* Multiple types with different sizes, used in independent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      ia[i+3] = ib[i] + ic[i];
+      sa[i+3] = sb[i] + sc[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 ();
+  main2 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 2 "vect" { xfail { vect_early_break && { ! vect_hw_misalign } } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
new file mode 100644
index 0000000000000000000000000000000000000000..be4a0c7426093059ce37a9f824defb7ae270094d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+void abort ();
+
+unsigned short sa[32];
+unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[32];
+unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+int main2 (int n)
+{
+  int i;
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
new file mode 100644
index 0000000000000000000000000000000000000000..84ea627b4927609079297f11674bdb4c6b301140
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != ((i % 3) == 0))
+      abort ();
+}
+
+/* Pattern didn't match inside gcond.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
new file mode 100644
index 0000000000000000000000000000000000000000..193f14e8a4d90793f65a5902eabb8d06496bd6e1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != (i == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..63ff6662f5c2c93201897e43680daa580ed53867
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < (N/2); i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i+1;
+   if (vect_a[i] > x || vect_a[i+1] > x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   vect_a[i+1] += x * vect_b[i+1]; 
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
new file mode 100644
index 0000000000000000000000000000000000000000..4c523d4e714ba67e84b213c2aaf3a56231f8b7e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  char i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != (i == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
new file mode 100644
index 0000000000000000000000000000000000000000..a0c34f71e3bbd3516247a8e026fe513c25413252
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+typedef float real_t;
+__attribute__((aligned(64))) real_t a[32000], b[32000], c[32000];
+real_t s482()
+{
+    for (int nl = 0; nl < 10000; nl++) {
+        for (int i = 0; i < 32000; i++) {
+            a[i] += b[i] * c[i];
+            if (c[i] > b[i]) break;
+        }
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
new file mode 100644
index 0000000000000000000000000000000000000000..9b94772934f75e685d71a41f3a0336fbfb7320d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int a, b;
+int e() {
+  int d, c;
+  d = 0;
+  for (; d < b; d++)
+    a = 0;
+  d = 0;
+  for (; d < b; d++)
+    if (d)
+      c++;
+  for (;;)
+    if (c)
+      break;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
new file mode 100644
index 0000000000000000000000000000000000000000..11f7fb8547b351734a964175380d1ada696011ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
@@ -0,0 +1,28 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-do compile } */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_long } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-additional-options "-fno-tree-scev-cprop" } */
+
+/* Statement used outside the loop.
+   NOTE: SCEV disabled to ensure the live operation is not removed before
+   vectorization.  */
+__attribute__ ((noinline)) int
+liveloop (int start, int n, int *x, int *y)
+{
+  int i = start;
+  int j;
+  int ret;
+
+  for (j = 0; j < n; ++j)
+    {
+      i += 1;
+      x[j] = i;
+      ret = y[j];
+    }
+  return ret;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
new file mode 100644
index 0000000000000000000000000000000000000000..32b9c087feba1780223e3aee8a2636c99990408c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fdump-tree-vect-all" } */
+
+int d(unsigned);
+
+void a() {
+  char b[8];
+  unsigned c = 0;
+  while (c < 7 && b[c])
+    ++c;
+  if (d(c))
+    return;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_partial_vectors } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
new file mode 100644
index 0000000000000000000000000000000000000000..577c4e96ba91d4dd4aa448233c632de508286eb9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-Ofast -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+enum a { b };
+
+struct {
+  enum a c;
+} d[10], *e;
+
+void f() {
+  int g;
+  for (g = 0, e = d; g < sizeof(1); g++, e++)
+    if (e->c)
+      return;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
new file mode 100644
index 0000000000000000000000000000000000000000..b56a4f755f89225cedd8c156cc7385fe5e07eee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int a[0];
+int b;
+
+void g();
+
+void f() {
+  int d, e;
+  for (; e; e++) {
+    int c;
+    switch (b)
+    case '9': {
+      for (; d < 1; d++)
+        if (a[d])
+          c = 1;
+      break;
+    case '<':
+      g();
+      c = 0;
+    }
+      while (c)
+        ;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c
new file mode 100644
index 0000000000000000000000000000000000000000..80f23d1e2431133035895946a5d6b24bef3ca294
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int32plus } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+
+
+int main()
+{
+  int var6 = -1267827473;
+  do {
+      ++var6;
+      double s1_115[4], s2_108[4];
+      int var8 = -161498264;
+      do {
+	  ++var8;
+	  int var12 = 1260960076;
+	  for (; var12 <= 1260960080; ++var12) {
+	      int var13 = 1960990937;
+	      do {
+		  ++var13;
+		  int var14 = 2128638723;
+		  for (; var14 <= 2128638728; ++var14) {
+		      int var22 = -1141190839;
+		      do {
+			  ++var22;
+			  if (s2_108 > s1_115) {
+			      int var23 = -890798748;
+			      do {
+				  long long e_119[4];
+			      } while (var23 <= -890798746);
+			  }
+		      } while (var22 <= -1141190829);
+		  }
+	      } while (var13 <= 1960990946);
+	  }
+      } while (var8 <= -161498254);
+  } while (var6 <= -1267827462);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c
new file mode 100644
index 0000000000000000000000000000000000000000..c9a8298a8b51e05079041ae7a05086a47b1be5dd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 800
+#endif
+unsigned vect_a1[N];
+unsigned vect_b1[N];
+unsigned vect_c1[N];
+unsigned vect_d1[N];
+  
+unsigned vect_a2[N];
+unsigned vect_b2[N];
+unsigned vect_c2[N];
+unsigned vect_d2[N];
+
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b1[i] += x + i;
+   vect_c1[i] += x + i;
+   vect_d1[i] += x + i;
+   if (vect_a1[i]*2 != x)
+     break;
+   vect_a1[i] = x;
+
+   vect_b2[i] += x + i;
+   vect_c2[i] += x + i;
+   vect_d2[i] += x + i;
+   if (vect_a2[i]*2 != x)
+     break;
+   vect_a2[i] = x;
+
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c
new file mode 100644
index 0000000000000000000000000000000000000000..f99de8e1f0650a3b590ed8bd9052e18173fc97d0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c
@@ -0,0 +1,76 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#include <limits.h>
+#include <assert.h>
+
+#  define BITSIZEOF_INT 32
+#  define BITSIZEOF_LONG 64
+#  define BITSIZEOF_LONG_LONG 64
+
+#define MAKE_FUNS(suffix, type)						\
+int my_ffs##suffix(type x) {						\
+    int i;								\
+    if (x == 0)								\
+	 return 0; 							\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1  << i))					\
+	    break;							\
+    return i + 1;							\
+}									\
+									\
+int my_clz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
+	    break;							\
+    return i;								\
+}
+
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS32					\
+  {                                             \
+    0x00000000UL,                               \
+    0x00000001UL,                               \
+    0x80000000UL,                               \
+    0x00000002UL,                               \
+    0x40000000UL,                               \
+    0x00010000UL,                               \
+    0x00008000UL,                               \
+    0xa5a5a5a5UL,                               \
+    0x5a5a5a5aUL,                               \
+    0xcafe0000UL,                               \
+    0x00cafe00UL,                               \
+    0x0000cafeUL,                               \
+    0xffffffffUL                                \
+  }
+
+
+unsigned int ints[] = NUMS32;
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+  for (i = 0; i < N(ints); i++)
+    {
+      if (__builtin_ffs (ints[i]) != my_ffs (ints[i]))
+	abort ();
+      if (ints[i] != 0
+	  && __builtin_clz (ints[i]) != my_clz (ints[i]))
+	abort ();
+    }
+
+  exit (0);
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
new file mode 100644
index 0000000000000000000000000000000000000000..10fd8b42952c42f3d3a014da103931ca394423d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c
new file mode 100644
index 0000000000000000000000000000000000000000..9073130197e124527f8e38c238d8f13452a7780e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#  define BITSIZEOF_INT 32
+#  define BITSIZEOF_LONG 64
+#  define BITSIZEOF_LONG_LONG 64
+
+#define MAKE_FUNS(suffix, type)						\
+__attribute__((noinline)) \
+int my_clz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
+	    break;							\
+    return i;								\
+}
+
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS32					\
+  {                                             \
+    0x00000000UL,                               \
+    0x00000001UL,                               \
+    0x80000000UL,                               \
+    0x00000002UL,                               \
+    0x40000000UL,                               \
+    0x00010000UL,                               \
+    0x00008000UL,                               \
+    0xa5a5a5a5UL,                               \
+    0x5a5a5a5aUL,                               \
+    0xcafe0000UL,                               \
+    0x00cafe00UL,                               \
+    0x0000cafeUL,                               \
+    0xffffffffUL                                \
+  }
+
+
+unsigned int ints[] = NUMS32;
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(ints); i++)
+    {
+      if (ints[i] != 0
+	  && __builtin_clz (ints[i]) != my_clz (ints[i]))
+	  abort ();
+    }
+
+  exit (0);
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c
new file mode 100644
index 0000000000000000000000000000000000000000..c6d6eb526e618ee93547e04eaba3c6a159a18075
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#  define BITSIZEOF_INT 32
+#  define BITSIZEOF_LONG 64
+#  define BITSIZEOF_LONG_LONG 64
+
+#define MAKE_FUNS(suffix, type)						\
+__attribute__((noinline)) \
+int my_ffs##suffix(type x) {						\
+    int i;								\
+    if (x == 0)								\
+	 return 0; 							\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1  << i))					\
+	    break;							\
+    return i + 1;							\
+}
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS32					\
+  {                                             \
+    0x00000000UL,                               \
+    0x00000001UL,                               \
+    0x80000000UL,                               \
+    0x00000002UL,                               \
+    0x40000000UL,                               \
+    0x00010000UL,                               \
+    0x00008000UL,                               \
+    0xa5a5a5a5UL,                               \
+    0x5a5a5a5aUL,                               \
+    0xcafe0000UL,                               \
+    0x00cafe00UL,                               \
+    0x0000cafeUL,                               \
+    0xffffffffUL                                \
+  }
+
+
+unsigned int ints[] = NUMS32;
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(ints); i++)
+    {
+      if (__builtin_ffs (ints[i]) != my_ffs (ints[i]))
+	abort ();
+    }
+
+  exit (0);
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f0a1f30ab95bf540027efa8c03aff8fe03a960b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c
@@ -0,0 +1,147 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#if __INT_MAX__ > 2147483647L
+# if __INT_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_INT 64
+# else
+#  define BITSIZEOF_INT 32
+# endif
+#else
+# if __INT_MAX__ >= 2147483647L
+#  define BITSIZEOF_INT 32
+# else
+#  define BITSIZEOF_INT 16
+# endif
+#endif
+
+#if __LONG_MAX__ > 2147483647L
+# if __LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG 64
+# else
+#  define BITSIZEOF_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG 32
+#endif
+
+#if __LONG_LONG_MAX__ > 2147483647L
+# if __LONG_LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG_LONG 64
+# else
+#  define BITSIZEOF_LONG_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG_LONG 32
+#endif
+
+#define MAKE_FUNS(suffix, type)						\
+__attribute__((noinline)) \
+int my_ctz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1  << i))					\
+	    break;							\
+    return i;								\
+}
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS16					\
+  {						\
+    0x0000U,					\
+    0x0001U,					\
+    0x8000U,					\
+    0x0002U,					\
+    0x4000U,					\
+    0x0100U,					\
+    0x0080U,					\
+    0xa5a5U,					\
+    0x5a5aU,					\
+    0xcafeU,					\
+    0xffffU					\
+  }
+
+#define NUMS32					\
+  {						\
+    0x00000000UL,				\
+    0x00000001UL,				\
+    0x80000000UL,				\
+    0x00000002UL,				\
+    0x40000000UL,				\
+    0x00010000UL,				\
+    0x00008000UL,				\
+    0xa5a5a5a5UL,				\
+    0x5a5a5a5aUL,				\
+    0xcafe0000UL,				\
+    0x00cafe00UL,				\
+    0x0000cafeUL,				\
+    0xffffffffUL				\
+  }
+
+#define NUMS64					\
+  {						\
+    0x0000000000000000ULL,			\
+    0x0000000000000001ULL,			\
+    0x8000000000000000ULL,			\
+    0x0000000000000002ULL,			\
+    0x4000000000000000ULL,			\
+    0x0000000100000000ULL,			\
+    0x0000000080000000ULL,			\
+    0xa5a5a5a5a5a5a5a5ULL,			\
+    0x5a5a5a5a5a5a5a5aULL,			\
+    0xcafecafe00000000ULL,			\
+    0x0000cafecafe0000ULL,			\
+    0x00000000cafecafeULL,			\
+    0xffffffffffffffffULL			\
+  }
+
+unsigned int ints[] =
+#if BITSIZEOF_INT == 64
+NUMS64;
+#elif BITSIZEOF_INT == 32
+NUMS32;
+#else
+NUMS16;
+#endif
+
+unsigned long longs[] =
+#if BITSIZEOF_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+unsigned long long longlongs[] =
+#if BITSIZEOF_LONG_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(ints); i++)
+    {
+      if (ints[i] != 0
+	  && __builtin_ctz (ints[i]) != my_ctz (ints[i]))
+	  abort ();
+    }
+
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c
new file mode 100644
index 0000000000000000000000000000000000000000..5cce21cd16aa89d96cdac2b302d29ee918b67249
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#  define BITSIZEOF_INT 32
+#  define BITSIZEOF_LONG 64
+#  define BITSIZEOF_LONG_LONG 64
+
+#define MAKE_FUNS(suffix, type)						\
+__attribute__((noinline)) \
+int my_clz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
+	    break;							\
+    return i;								\
+}
+
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS32					\
+  {                                             \
+    0x00000000UL,                               \
+    0x00000001UL,                               \
+    0x80000000UL,                               \
+    0x00000002UL,                               \
+    0x40000000UL,                               \
+    0x00010000UL,                               \
+    0x00008000UL,                               \
+    0xa5a5a5a5UL,                               \
+    0x5a5a5a5aUL,                               \
+    0xcafe0000UL,                               \
+    0x00cafe00UL,                               \
+    0x0000cafeUL,                               \
+    0xffffffffUL                                \
+  }
+
+
+unsigned int ints[] = NUMS32;
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(ints); i++)
+    {
+      if (ints[i] != 0
+	  && __builtin_clz (ints[i]) != my_clz (ints[i]))
+	  abort ();
+    }
+
+  exit (0);
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c
new file mode 100644
index 0000000000000000000000000000000000000000..83676da28884e79874fb0b5cc6a434a0fe6b87cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c
@@ -0,0 +1,161 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#if __INT_MAX__ > 2147483647L
+# if __INT_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_INT 64
+# else
+#  define BITSIZEOF_INT 32
+# endif
+#else
+# if __INT_MAX__ >= 2147483647L
+#  define BITSIZEOF_INT 32
+# else
+#  define BITSIZEOF_INT 16
+# endif
+#endif
+
+#if __LONG_MAX__ > 2147483647L
+# if __LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG 64
+# else
+#  define BITSIZEOF_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG 32
+#endif
+
+#if __LONG_LONG_MAX__ > 2147483647L
+# if __LONG_LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG_LONG 64
+# else
+#  define BITSIZEOF_LONG_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG_LONG 32
+#endif
+
+#define MAKE_FUNS(suffix, type)						\
+int my_clrsb##suffix(type x) {						\
+    int i;								\
+    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;		\
+    for (i = 1; i < CHAR_BIT * sizeof (type); i++)			\
+	if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)		\
+	    != leading)							\
+	    break;							\
+    return i - 1;							\
+}
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS16					\
+  {						\
+    0x0000U,					\
+    0x0001U,					\
+    0x8000U,					\
+    0x0002U,					\
+    0x4000U,					\
+    0x0100U,					\
+    0x0080U,					\
+    0xa5a5U,					\
+    0x5a5aU,					\
+    0xcafeU,					\
+    0xffffU					\
+  }
+
+#define NUMS32					\
+  {						\
+    0x00000000UL,				\
+    0x00000001UL,				\
+    0x80000000UL,				\
+    0x00000002UL,				\
+    0x40000000UL,				\
+    0x00010000UL,				\
+    0x00008000UL,				\
+    0xa5a5a5a5UL,				\
+    0x5a5a5a5aUL,				\
+    0xcafe0000UL,				\
+    0x00cafe00UL,				\
+    0x0000cafeUL,				\
+    0xffffffffUL				\
+  }
+
+#define NUMS64					\
+  {						\
+    0x0000000000000000ULL,			\
+    0x0000000000000001ULL,			\
+    0x8000000000000000ULL,			\
+    0x0000000000000002ULL,			\
+    0x4000000000000000ULL,			\
+    0x0000000100000000ULL,			\
+    0x0000000080000000ULL,			\
+    0xa5a5a5a5a5a5a5a5ULL,			\
+    0x5a5a5a5a5a5a5a5aULL,			\
+    0xcafecafe00000000ULL,			\
+    0x0000cafecafe0000ULL,			\
+    0x00000000cafecafeULL,			\
+    0xffffffffffffffffULL			\
+  }
+
+unsigned int ints[] =
+#if BITSIZEOF_INT == 64
+NUMS64;
+#elif BITSIZEOF_INT == 32
+NUMS32;
+#else
+NUMS16;
+#endif
+
+unsigned long longs[] =
+#if BITSIZEOF_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+unsigned long long longlongs[] =
+#if BITSIZEOF_LONG_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+  /* Test constant folding.  */
+
+#define TEST(x, suffix)							\
+  if (__builtin_clrsb##suffix (x) != my_clrsb##suffix (x))		\
+    abort ();								
+
+#if BITSIZEOF_INT == 32
+  TEST(0x00000000UL,);
+  TEST(0x00000001UL,);
+  TEST(0x80000000UL,);
+  TEST(0x40000000UL,);
+  TEST(0x00010000UL,);
+  TEST(0x00008000UL,);
+  TEST(0xa5a5a5a5UL,);
+  TEST(0x5a5a5a5aUL,);
+  TEST(0xcafe0000UL,);
+  TEST(0x00cafe00UL,);
+  TEST(0x0000cafeUL,);
+  TEST(0xffffffffUL,);
+#endif
+
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c
new file mode 100644
index 0000000000000000000000000000000000000000..cc1ce4cf298ee0747f41ea4941af5a65f8a688ef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c
@@ -0,0 +1,230 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-O3" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#if __INT_MAX__ > 2147483647L
+# if __INT_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_INT 64
+# else
+#  define BITSIZEOF_INT 32
+# endif
+#else
+# if __INT_MAX__ >= 2147483647L
+#  define BITSIZEOF_INT 32
+# else
+#  define BITSIZEOF_INT 16
+# endif
+#endif
+
+#if __LONG_MAX__ > 2147483647L
+# if __LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG 64
+# else
+#  define BITSIZEOF_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG 32
+#endif
+
+#if __LONG_LONG_MAX__ > 2147483647L
+# if __LONG_LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG_LONG 64
+# else
+#  define BITSIZEOF_LONG_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG_LONG 32
+#endif
+
+#define MAKE_FUNS(suffix, type)						\
+__attribute__((noinline)) \
+int my_ffs##suffix(type x) {						\
+    int i;								\
+    if (x == 0)								\
+	 return 0; 							\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1  << i))					\
+	    break;							\
+    return i + 1;							\
+}									\
+									\
+int my_ctz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1  << i))					\
+	    break;							\
+    return i;								\
+}									\
+									\
+__attribute__((noinline)) \
+int my_clz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
+	    break;							\
+    return i;								\
+}									\
+									\
+int my_clrsb##suffix(type x) {						\
+    int i;								\
+    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;		\
+    for (i = 1; i < CHAR_BIT * sizeof (type); i++)			\
+	if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)		\
+	    != leading)							\
+	    break;							\
+    return i - 1;							\
+}									\
+									\
+__attribute__((noinline)) \
+int my_popcount##suffix(type x) {					\
+    int i;								\
+    int count = 0;							\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << i))					\
+	    count++;							\
+    return count;							\
+}									\
+									\
+__attribute__((noinline)) \
+int my_parity##suffix(type x) {						\
+    int i;								\
+    int count = 0;							\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << i))					\
+	    count++;							\
+    return count & 1;							\
+}
+
+MAKE_FUNS (ll, unsigned long long);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS16					\
+  {						\
+    0x0000U,					\
+    0x0001U,					\
+    0x8000U,					\
+    0x0002U,					\
+    0x4000U,					\
+    0x0100U,					\
+    0x0080U,					\
+    0xa5a5U,					\
+    0x5a5aU,					\
+    0xcafeU,					\
+    0xffffU					\
+  }
+
+#define NUMS32					\
+  {						\
+    0x00000000UL,				\
+    0x00000001UL,				\
+    0x80000000UL,				\
+    0x00000002UL,				\
+    0x40000000UL,				\
+    0x00010000UL,				\
+    0x00008000UL,				\
+    0xa5a5a5a5UL,				\
+    0x5a5a5a5aUL,				\
+    0xcafe0000UL,				\
+    0x00cafe00UL,				\
+    0x0000cafeUL,				\
+    0xffffffffUL				\
+  }
+
+#define NUMS64					\
+  {						\
+    0x0000000000000000ULL,			\
+    0x0000000000000001ULL,			\
+    0x8000000000000000ULL,			\
+    0x0000000000000002ULL,			\
+    0x4000000000000000ULL,			\
+    0x0000000100000000ULL,			\
+    0x0000000080000000ULL,			\
+    0xa5a5a5a5a5a5a5a5ULL,			\
+    0x5a5a5a5a5a5a5a5aULL,			\
+    0xcafecafe00000000ULL,			\
+    0x0000cafecafe0000ULL,			\
+    0x00000000cafecafeULL,			\
+    0xffffffffffffffffULL			\
+  }
+
+unsigned int ints[] =
+#if BITSIZEOF_INT == 64
+NUMS64;
+#elif BITSIZEOF_INT == 32
+NUMS32;
+#else
+NUMS16;
+#endif
+
+unsigned long longs[] =
+#if BITSIZEOF_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+unsigned long long longlongs[] =
+#if BITSIZEOF_LONG_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(longlongs); i++)
+    {
+      if (__builtin_ffsll (longlongs[i]) != my_ffsll (longlongs[i]))
+	abort ();
+      if (longlongs[i] != 0
+	  && __builtin_clzll (longlongs[i]) != my_clzll (longlongs[i]))
+	abort ();
+      if (longlongs[i] != 0
+	  && __builtin_ctzll (longlongs[i]) != my_ctzll (longlongs[i]))
+	abort ();
+      if (__builtin_clrsbll (longlongs[i]) != my_clrsbll (longlongs[i]))
+	abort ();
+      if (__builtin_popcountll (longlongs[i]) != my_popcountll (longlongs[i]))
+	abort ();
+      if (__builtin_parityll (longlongs[i]) != my_parityll (longlongs[i]))
+	abort ();
+    }
+
+  /* Test constant folding.  */
+
+#define TEST(x, suffix)							\
+  if (__builtin_ffs##suffix (x) != my_ffs##suffix (x))			\
+    abort ();								\
+
+#if BITSIZEOF_LONG_LONG == 64
+  TEST(0x0000000000000000ULL, ll);
+  TEST(0x0000000000000001ULL, ll);
+  TEST(0x8000000000000000ULL, ll);
+  TEST(0x0000000000000002ULL, ll);
+  TEST(0x4000000000000000ULL, ll);
+  TEST(0x0000000100000000ULL, ll);
+  TEST(0x0000000080000000ULL, ll);
+  TEST(0xa5a5a5a5a5a5a5a5ULL, ll);
+  TEST(0x5a5a5a5a5a5a5a5aULL, ll);
+  TEST(0xcafecafe00000000ULL, ll);
+  TEST(0x0000cafecafe0000ULL, ll);
+  TEST(0x00000000cafecafeULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+#endif
+
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c
new file mode 100644
index 0000000000000000000000000000000000000000..adba337b101f4d7cafaa50329a933594b0d501ad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c
@@ -0,0 +1,165 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-O3" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#if __INT_MAX__ > 2147483647L
+# if __INT_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_INT 64
+# else
+#  define BITSIZEOF_INT 32
+# endif
+#else
+# if __INT_MAX__ >= 2147483647L
+#  define BITSIZEOF_INT 32
+# else
+#  define BITSIZEOF_INT 16
+# endif
+#endif
+
+#if __LONG_MAX__ > 2147483647L
+# if __LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG 64
+# else
+#  define BITSIZEOF_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG 32
+#endif
+
+#if __LONG_LONG_MAX__ > 2147483647L
+# if __LONG_LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG_LONG 64
+# else
+#  define BITSIZEOF_LONG_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG_LONG 32
+#endif
+
+#define MAKE_FUNS(suffix, type)						\
+int my_clrsb##suffix(type x) {						\
+    int i;								\
+    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;		\
+    for (i = 1; i < CHAR_BIT * sizeof (type); i++)			\
+	if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)		\
+	    != leading)							\
+	    break;							\
+    return i - 1;							\
+}									\
+									\
+
+MAKE_FUNS (, unsigned);
+MAKE_FUNS (ll, unsigned long long);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS16					\
+  {						\
+    0x0000U,					\
+    0x0001U,					\
+    0x8000U,					\
+    0x0002U,					\
+    0x4000U,					\
+    0x0100U,					\
+    0x0080U,					\
+    0xa5a5U,					\
+    0x5a5aU,					\
+    0xcafeU,					\
+    0xffffU					\
+  }
+
+#define NUMS32					\
+  {						\
+    0x00000000UL,				\
+    0x00000001UL,				\
+    0x80000000UL,				\
+    0x00000002UL,				\
+    0x40000000UL,				\
+    0x00010000UL,				\
+    0x00008000UL,				\
+    0xa5a5a5a5UL,				\
+    0x5a5a5a5aUL,				\
+    0xcafe0000UL,				\
+    0x00cafe00UL,				\
+    0x0000cafeUL,				\
+    0xffffffffUL				\
+  }
+
+#define NUMS64					\
+  {						\
+    0x0000000000000000ULL,			\
+    0x0000000000000001ULL,			\
+    0x8000000000000000ULL,			\
+    0x0000000000000002ULL,			\
+    0x4000000000000000ULL,			\
+    0x0000000100000000ULL,			\
+    0x0000000080000000ULL,			\
+    0xa5a5a5a5a5a5a5a5ULL,			\
+    0x5a5a5a5a5a5a5a5aULL,			\
+    0xcafecafe00000000ULL,			\
+    0x0000cafecafe0000ULL,			\
+    0x00000000cafecafeULL,			\
+    0xffffffffffffffffULL			\
+  }
+
+unsigned int ints[] =
+#if BITSIZEOF_INT == 64
+NUMS64;
+#elif BITSIZEOF_INT == 32
+NUMS32;
+#else
+NUMS16;
+#endif
+
+unsigned long longs[] =
+#if BITSIZEOF_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+unsigned long long longlongs[] =
+#if BITSIZEOF_LONG_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(ints); i++)
+    {
+      if (__builtin_clrsb (ints[i]) != my_clrsb (ints[i]))
+	abort ();
+    }
+
+  /* Test constant folding.  */
+
+#define TEST(x, suffix)							\
+  if (__builtin_clrsb##suffix (x) != my_clrsb##suffix (x))		\
+    abort ();								
+
+#if BITSIZEOF_LONG_LONG == 64
+  TEST(0xffffffffffffffffULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+#endif
+
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
new file mode 100644
index 0000000000000000000000000000000000000000..ae706b2952cfcecf20546a67a735b8d902cbb607
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+char vect_a[N];
+char vect_b[N];
+  
+char test4(char x, char * restrict res)
+{
+ char ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   res[i] *= vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
new file mode 100644
index 0000000000000000000000000000000000000000..4e8b5bdea5ff9aa0cadbea0af10d51707da011c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_a[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..571aec0ccfdbcdc318ba1f17de31958c16b3e9bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8.3-a -mcpu=neoverse-n1" } */
+
+#include <arm_neon.h>
+
+/* { dg-warning "switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.3-a’ switch and would result in options \\+fp16\\+dotprod\\+profile\\+nopauth" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..cee42c84c4f762a4d4773ea4380163742b5137b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8-a+sve -mcpu=neoverse-n1" } */
+
+#include <arm_neon.h>
+
+/* { dg-warning "switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8-a+sve’ switch and would result in options \\+lse\\+rcpc\\+rdma\\+dotprod\\+profile\\+nosve" } */
diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..0a05b98eedb8bd743bb5af8e4dd3c95aab001c4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8-a -mcpu=neovese-n1 -Wpedentic -Werror" } */
+
+#include <arm_neon.h>
+
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index f0b692a2e19bae3cf3ffee8f27bd39b05aba3b9c..1e47ae84080f9908736d1c3be9c14d589e8772a7 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3975,6 +3975,17 @@ proc check_effective_target_vect_int { } {
 	}}]
 }
 
+# Return 1 if the target supports hardware vectorization of early breaks,
+# 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_early_break { } {
+    return [check_cached_effective_target_indexed vect_early_break {
+      expr {
+	[istarget aarch64*-*-*]
+	}}]
+}
 # Return 1 if the target supports hardware vectorization of complex additions of
 # byte, 0 otherwise.
 #




-- 

[-- Attachment #2: rb17962.patch --]
[-- Type: text/plain, Size: 107404 bytes --]

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index c20af31c64237baff70f8781b1dc47f4d1a48aa9..4c351335f2bec9c6bb6856bd38d9132da7447c13 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1636,6 +1636,10 @@ Target supports hardware vectors of @code{float} when
 @option{-funsafe-math-optimizations} is not in effect.
 This implies @code{vect_float}.
 
+@item vect_early_break
+Target supports hardware vectorization of loops with early breaks.
+This requires an implementation of the cbranch optab for vectors.
+
 @item vect_int
 Target supports hardware vectors of @code{int}.
 
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
new file mode 100644
index 0000000000000000000000000000000000000000..810d990e3efab0cf0363a3b76481f2cb649ad3ba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-w -O2" } */
+
+void fancy_abort(char *, int, const char *) __attribute__((__noreturn__));
+template <unsigned N, typename> struct poly_int_pod { int coeffs[N]; };
+template <unsigned N, typename> class poly_int : public poly_int_pod<N, int> {
+public:
+  template <typename Ca> poly_int &operator+=(const poly_int_pod<N, Ca> &);
+};
+template <unsigned N, typename C>
+template <typename Ca>
+poly_int<N, C> &poly_int<N, C>::operator+=(const poly_int_pod<N, Ca> &a) {
+  for (int i = 0; i < N; i++)
+    this->coeffs[i] += a.coeffs[i];
+  return *this;
+}
+template <unsigned N, typename Ca, typename Cb>
+poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
+  poly_int<N, long> r;
+  return r;
+}
+struct vec_prefix {
+  unsigned m_num;
+};
+struct vl_ptr;
+struct va_heap {
+  typedef vl_ptr default_layout;
+};
+template <typename, typename A, typename = typename A::default_layout>
+struct vec;
+template <typename T, typename A> struct vec<T, A, int> {
+  T &operator[](unsigned);
+  vec_prefix m_vecpfx;
+  T m_vecdata[];
+};
+template <typename T, typename A> T &vec<T, A, int>::operator[](unsigned ix) {
+  m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
+  return m_vecdata[ix];
+}
+template <typename T> struct vec<T, va_heap> {
+  T &operator[](unsigned ix) { return m_vec[ix]; }
+  vec<T, va_heap, int> m_vec;
+};
+class auto_vec : public vec<poly_int<2, long>, va_heap> {};
+template <typename> class vector_builder : public auto_vec {};
+class int_vector_builder : public vector_builder<int> {
+public:
+  int_vector_builder(poly_int<2, long>, int, int);
+};
+bool vect_grouped_store_supported() {
+  int i;
+  poly_int<2, long> nelt;
+  int_vector_builder sel(nelt, 2, 3);
+  for (i = 0; i < 6; i++)
+    sel[i] += exact_div(nelt, 2);
+}
+
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
new file mode 100644
index 0000000000000000000000000000000000000000..810d990e3efab0cf0363a3b76481f2cb649ad3ba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-w -O2" } */
+
+void fancy_abort(char *, int, const char *) __attribute__((__noreturn__));
+template <unsigned N, typename> struct poly_int_pod { int coeffs[N]; };
+template <unsigned N, typename> class poly_int : public poly_int_pod<N, int> {
+public:
+  template <typename Ca> poly_int &operator+=(const poly_int_pod<N, Ca> &);
+};
+template <unsigned N, typename C>
+template <typename Ca>
+poly_int<N, C> &poly_int<N, C>::operator+=(const poly_int_pod<N, Ca> &a) {
+  for (int i = 0; i < N; i++)
+    this->coeffs[i] += a.coeffs[i];
+  return *this;
+}
+template <unsigned N, typename Ca, typename Cb>
+poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
+  poly_int<N, long> r;
+  return r;
+}
+struct vec_prefix {
+  unsigned m_num;
+};
+struct vl_ptr;
+struct va_heap {
+  typedef vl_ptr default_layout;
+};
+template <typename, typename A, typename = typename A::default_layout>
+struct vec;
+template <typename T, typename A> struct vec<T, A, int> {
+  T &operator[](unsigned);
+  vec_prefix m_vecpfx;
+  T m_vecdata[];
+};
+template <typename T, typename A> T &vec<T, A, int>::operator[](unsigned ix) {
+  m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
+  return m_vecdata[ix];
+}
+template <typename T> struct vec<T, va_heap> {
+  T &operator[](unsigned ix) { return m_vec[ix]; }
+  vec<T, va_heap, int> m_vec;
+};
+class auto_vec : public vec<poly_int<2, long>, va_heap> {};
+template <typename> class vector_builder : public auto_vec {};
+class int_vector_builder : public vector_builder<int> {
+public:
+  int_vector_builder(poly_int<2, long>, int, int);
+};
+bool vect_grouped_store_supported() {
+  int i;
+  poly_int<2, long> nelt;
+  int_vector_builder sel(nelt, 2, 3);
+  for (i = 0; i < 6; i++)
+    sel[i] += exact_div(nelt, 2);
+}
+
diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
new file mode 100644
index 0000000000000000000000000000000000000000..a12e5ca434b2ac37c03dbaa12273fd8e5aa2018c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-w -O2" } */
+
+int aarch64_advsimd_valid_immediate_hs_val32;
+bool aarch64_advsimd_valid_immediate_hs() {
+  for (int shift = 0; shift < 32; shift += 8)
+    if (aarch64_advsimd_valid_immediate_hs_val32 & shift)
+      return aarch64_advsimd_valid_immediate_hs_val32;
+  for (;;)
+    ;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..2495b36a72eae94cb7abc4a0d17a5c979fd78083
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 0
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
new file mode 100644
index 0000000000000000000000000000000000000000..9bcd7f7e57ef9a1d4649d18569b3406050e54603
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 800
+#define P 799
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..63f63101a467909f328be7f3acbc5bcb721967ff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 802
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..626b95e9b8517081d41d794e9e0264d6301c8589
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 5
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..7e0e6426120551152a7bd800c15d9ed6ab15bada
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 278
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..242cf486f9c40055df0aef5fd238d1aff7a7c7da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 800
+#define P 799
+#include "vect-early-break-template_1.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..9fe7136b7213a463ca6573c60476b7c8f531ddcb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 0
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
new file mode 100644
index 0000000000000000000000000000000000000000..02f93d77dba31b938f6fd9e8c7f5e4acde4aeec9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 802
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
new file mode 100644
index 0000000000000000000000000000000000000000..a614925465606b54c638221ffb95a5e8d3bee797
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 5
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
new file mode 100644
index 0000000000000000000000000000000000000000..94e2b9c301456eda8f9ad7eaa67604563f0afee7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define N 803
+#define P 278
+#include "vect-early-break-template_2.c"
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..af70a8e2a5a9dc9756edb5580f2de02ddcc95de9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
@@ -0,0 +1,47 @@
+#ifndef N
+#define N 803
+#endif
+
+#ifndef P
+#define P 0
+#endif
+
+unsigned vect_a[N] = {0};
+unsigned vect_b[N] = {0};
+  
+__attribute__((noipa, noinline))
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+
+  int x = 1;
+  int idx = P;
+  vect_a[idx] = x + 1;
+
+  test4(x);
+
+  if (vect_b[idx] != (x + idx))
+    abort ();
+
+  if (vect_a[idx] != x + 1)
+    abort ();
+
+  if (idx > 0 && vect_a[idx-1] != x)
+    abort ();
+
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..d0f924d904437e71567d27cc1f1089e5607dca0d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
@@ -0,0 +1,50 @@
+#ifndef N
+#define N 803
+#endif
+
+#ifndef P
+#define P 0
+#endif
+
+unsigned vect_a[N] = {0};
+unsigned vect_b[N] = {0};
+  
+__attribute__((noipa, noinline))
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+
+  int x = 1;
+  int idx = P;
+  vect_a[idx] = x + 1;
+
+  unsigned res = test4(x);
+
+  if (res != idx)
+    abort ();
+
+  if (vect_b[idx] != (x + idx))
+    abort ();
+
+  if (vect_a[idx] != x + 1)
+    abort ();
+
+  if (idx > 0 && vect_a[idx-1] != x)
+    abort ();
+
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..51e7d6489b99c25b9b4b3d1c839f98562b6d4dd7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
new file mode 100644
index 0000000000000000000000000000000000000000..9e4ad1763202dfdab3ed7961ead5114fcc61a11b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x,int y, int z)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+ }
+
+ ret = x + y * z;
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
new file mode 100644
index 0000000000000000000000000000000000000000..a613dd9909fb09278dd92a81a24ef854994a9890
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, int y)
+{
+ unsigned ret = 0;
+for (int o = 0; o < y; o++)
+{
+ ret += o;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+}
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
new file mode 100644
index 0000000000000000000000000000000000000000..cc10f3238f1cb8e1307e024a3ebcb5c25a39d1b2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, int y)
+{
+ unsigned ret = 0;
+for (int o = 0; o < y; o++)
+{
+ ret += o;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   
+ }
+}
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
new file mode 100644
index 0000000000000000000000000000000000000000..6967b7395ed7c19e38a436d6edcfe7c1580c7113
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i] * x;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
new file mode 100644
index 0000000000000000000000000000000000000000..03cce5cf6cadecb520b46be666bf608e3bc6a511
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 803
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+int test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
new file mode 100644
index 0000000000000000000000000000000000000000..dec6872e1115ff66695f5a500ffa7ca01c0f8d3a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 803
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+int test4(unsigned x)
+{
+ int ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
new file mode 100644
index 0000000000000000000000000000000000000000..30812d12a39bd94b4b8a3aade6512b162697d659
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret += vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
new file mode 100644
index 0000000000000000000000000000000000000000..510227a18435a8e47c5a754580180c6d340c0823
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret = vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
new file mode 100644
index 0000000000000000000000000000000000000000..1372f79242b250cabbab29757b62cbc28a9064a8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
new file mode 100644
index 0000000000000000000000000000000000000000..677487f7da496a8f467d8c529575d47ff22c6a31
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, unsigned step)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=step)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..7268f6ae2485d0274fd85ea53cc1e44ef4b84d5c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
new file mode 100644
index 0000000000000000000000000000000000000000..ed41377d1c979bf14e0a4e80401831c09ffa463f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_b[N];
+struct testStruct {
+ long e;
+ long f;
+ bool a : 1;
+ bool b : 1;
+ int c : 14;
+ int d;
+};
+struct testStruct vect_a[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i].a > x)
+     return true;
+   vect_a[i].e = x;
+ }
+ return ret;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
new file mode 100644
index 0000000000000000000000000000000000000000..6415e4951cb9ef70e56b7cfb1db3d3151368666d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_b[N];
+struct testStruct {
+ long e;
+ long f;
+ bool a : 1;
+ bool b : 1;
+ int c : 14;
+ int d;
+};
+struct testStruct vect_a[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i].a)
+     return true;
+   vect_a[i].e = x;
+ }
+ return ret;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
new file mode 100644
index 0000000000000000000000000000000000000000..2ca189899fb6bd6dfdf63de7729f54e3bee06ba0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_perm } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include "tree-vect.h"
+
+void __attribute__((noipa))
+foo (int * __restrict__ a, short * __restrict__ b, int * __restrict__ c)
+{
+  int t1 = *c;
+  int t2 = *c;
+  for (int i = 0; i < 64; i+=2)
+    {
+      b[i] = a[i] - t1;
+      t1 = a[i];
+      b[i+1] = a[i+1] - t2;
+      t2 = a[i+1];
+    }
+}
+
+int a[64];
+short b[64];
+
+int
+main ()
+{
+  check_vect ();
+  for (int i = 0; i < 64; ++i)
+    {
+      a[i] = i;
+      __asm__ volatile ("" ::: "memory");
+    }
+  int c = 7;
+  foo (a, b, &c);
+  for (int i = 2; i < 64; i+=2)
+    if (b[i] != a[i] - a[i-2]
+	|| b[i+1] != a[i+1] - a[i-1])
+      abort ();
+  if (b[0] != -7 || b[1] != -6)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
new file mode 100644
index 0000000000000000000000000000000000000000..f3298656d5d67fd137c4029a96a2f9c1bae344ce
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
@@ -0,0 +1,61 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#define N 200
+#define M 4
+
+typedef signed char sc;
+typedef unsigned char uc;
+typedef signed short ss;
+typedef unsigned short us;
+typedef int si;
+typedef unsigned int ui;
+typedef signed long long sll;
+typedef unsigned long long ull;
+
+#define FOR_EACH_TYPE(M) \
+  M (sc) M (uc) \
+  M (ss) M (us) \
+  M (si) M (ui) \
+  M (sll) M (ull) \
+  M (float) M (double)
+
+#define TEST_VALUE(I) ((I) * 17 / 2)
+
+#define ADD_TEST(TYPE)				\
+  void __attribute__((noinline, noclone))	\
+  test_##TYPE (TYPE *a, TYPE *b)		\
+  {						\
+    for (int i = 0; i < N; i += 2)		\
+      {						\
+	a[i + 0] = b[i + 0] + 2;		\
+	a[i + 1] = b[i + 1] + 3;		\
+      }						\
+  }
+
+#define DO_TEST(TYPE)					\
+  for (int j = 1; j < M; ++j)				\
+    {							\
+      TYPE a[N + M];					\
+      for (int i = 0; i < N + M; ++i)			\
+	a[i] = TEST_VALUE (i);				\
+      test_##TYPE (a + j, a);				\
+      for (int i = 0; i < N; i += 2)			\
+	if (a[i + j] != (TYPE) (a[i] + 2)		\
+	    || a[i + j + 1] != (TYPE) (a[i + 1] + 3))	\
+	  __builtin_abort ();				\
+    }
+
+FOR_EACH_TYPE (ADD_TEST)
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (DO_TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
new file mode 100644
index 0000000000000000000000000000000000000000..7b4b2ffb9b75db6d5ca7e313d1f18d9b51f5b566
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include "tree-vect.h"
+
+extern void abort (void);
+void __attribute__((noinline,noclone))
+foo (double *b, double *d, double *f)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      d[2*i] = 2. * d[2*i];
+      d[2*i+1] = 4. * d[2*i+1];
+      b[i] = d[2*i] - 1.;
+      f[i] = d[2*i+1] + 2.;
+    }
+}
+int main()
+{
+  double b[1024], d[2*1024], f[1024];
+  int i;
+
+  check_vect ();
+
+  for (i = 0; i < 2*1024; i++)
+    d[i] = 1.;
+  foo (b, d, f);
+  for (i = 0; i < 1024; i+= 2)
+    {
+      if (d[2*i] != 2.)
+	abort ();
+      if (d[2*i+1] != 4.)
+	abort ();
+    }
+  for (i = 0; i < 1024; i++)
+    {
+      if (b[i] != 1.)
+	abort ();
+      if (f[i] != 6.)
+	abort ();
+    }
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
new file mode 100644
index 0000000000000000000000000000000000000000..8db9b60128b9e21529ae73ea1902afb8fa327112
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+
+#include "vect-peel-1-src.c"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 14 "vect" { target { { vect_element_align } && { vect_aligned_arrays } } xfail { ! vect_unaligned_possible } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail vect_element_align_preferred } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
new file mode 100644
index 0000000000000000000000000000000000000000..5905847cc0b6b393dde728a9f4ecb44c8ab42da5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_perm } */
+
+#include "tree-vect.h"
+
+void __attribute__((noipa))
+foo (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c)
+{
+  int t1 = *c;
+  int t2 = *c;
+  for (int i = 0; i < 64; i+=2)
+    {
+      b[i] = a[i] - t1;
+      t1 = a[i];
+      b[i+1] = a[i+1] - t2;
+      t2 = a[i+1];
+    }
+}
+
+int a[64], b[64];
+
+int
+main ()
+{
+  check_vect ();
+  for (int i = 0; i < 64; ++i)
+    {
+      a[i] = i;
+      __asm__ volatile ("" ::: "memory");
+    }
+  int c = 7;
+  foo (a, b, &c);
+  for (int i = 2; i < 64; i+=2)
+    if (b[i] != a[i] - a[i-2]
+	|| b[i+1] != a[i+1] - a[i-1])
+      abort ();
+  if (b[0] != -7 || b[1] != -6)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
new file mode 100644
index 0000000000000000000000000000000000000000..d0cfbb01667fa016d72828d098aeaa252c2c9318
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[128];
+
+int main ()
+{
+  int i;
+  for (i = 1; i < 128; i++)
+    if (a[i] != i%4 + 1)
+      abort ();
+  if (a[0] != 5)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
new file mode 100644
index 0000000000000000000000000000000000000000..a5eae81f3f5f5b7d92082f1588c6453a71e205cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[128];
+int main ()
+{
+  int i;
+  for (i = 1; i < 128; i++)
+    if (a[i] != i%4 + 1)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
new file mode 100644
index 0000000000000000000000000000000000000000..75d87e99e939fab61f751be025ca0398fa5bd078
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int in[100];
+int out[100 * 2];
+
+int main (void)
+{
+  if (out[0] != in[100 - 1])
+  for (int i = 1; i <= 100; ++i)
+    if (out[i] != 2)
+      __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..3c6d28bd2d6e6e794146baf89e43c3b70293b7d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+unsigned test4(char x, char *vect, int n)
+{  
+ unsigned ret = 0;
+ for (int i = 0; i < n; i++)
+ {
+   if (vect[i] > x)
+     return 1;
+
+   vect[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
new file mode 100644
index 0000000000000000000000000000000000000000..e09d883db84685679e73867d83aba9900563983d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int x[100];
+int choose1(int);
+int choose2();
+void consume(int);
+void f() {
+    for (int i = 0; i < 100; ++i) {
+        if (x[i] == 11) {
+            if (choose1(i))
+                goto A;
+            else
+                goto B;
+        }
+    }
+    if (choose2())
+        goto B;
+A:
+    for (int i = 0; i < 100; ++i)
+        consume(i);
+B:
+    for (int i = 0; i < 100; ++i)
+        consume(i * i);
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
new file mode 100644
index 0000000000000000000000000000000000000000..6001523162d24d140af73143435f25bcd3a217c8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 1025
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret += vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
new file mode 100644
index 0000000000000000000000000000000000000000..73abddc267a0170c2d97a7e7c680525721455f22
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 1024
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+   ret = vect_a[i] + vect_b[i];
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
new file mode 100644
index 0000000000000000000000000000000000000000..29b37f70939af7fa9409edd3a1e29f718c959706
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+
+unsigned test4(unsigned x, int z)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     {
+       for (int y = 0; y < z; y++)
+	 vect_a2 [y] *= vect_a1[i];
+       break;
+     }
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 2 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
new file mode 100644
index 0000000000000000000000000000000000000000..2c48e3cee33fc37f45ef59c2bbaff7bc5a76b460
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+
+unsigned vect_a[N] __attribute__ ((aligned (4)));;
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ 
+ for (int i = 1; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
new file mode 100644
index 0000000000000000000000000000000000000000..3442484a81161f9bd09e30bc268fbcf66a899902
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     break;
+   vect_a1[i] = x;
+   if (vect_a2[i]*4 > x)
+     break;
+   vect_a2[i] = x*x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
new file mode 100644
index 0000000000000000000000000000000000000000..027766c51f508eab157db365a1653f3e92dcac10
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a2[N];
+unsigned vect_a1[N];
+unsigned vect_b[N];
+
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a1[i]*2 > x)
+     break;
+   vect_a1[i] = x;
+   if (vect_a2[i]*4 > x)
+     return i;
+   vect_a2[i] = x*x;
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
new file mode 100644
index 0000000000000000000000000000000000000000..8d363120898232bb1402b9cf7b4b83b38a10505b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 4
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 != x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
new file mode 100644
index 0000000000000000000000000000000000000000..226d55d7194ca3f676ab52976fea25b7e335bbec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
new file mode 100644
index 0000000000000000000000000000000000000000..554e6ec84318c600c87982ad6ef0f90e8b47af01
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x, unsigned n)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+= (N % 4))
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..216c56faf330449bf1969b7e51ff1e94270dc861
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect[N];
+
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (i > 16 && vect[i] > x)
+     break;
+
+   vect[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
new file mode 100644
index 0000000000000000000000000000000000000000..f2ae372cd96e74cc06254937c2b8fa69ecdedf09
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i*=3)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* SCEV can't currently analyze this loop bounds.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
new file mode 100644
index 0000000000000000000000000000000000000000..6ad9b3f17ddb953bfbf614e9331fa81f565b262f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+#pragma GCC novector
+#pragma GCC unroll 4
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += vect_a[i] + x;
+ }
+ return ret;
+}
+
+/* novector should have blocked vectorization.  */
+/* { dg-final { scan-tree-dump-not "vectorized \d loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
new file mode 100644
index 0000000000000000000000000000000000000000..88652f01595cb49a8736a1da6563507b607aae8f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 800
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
new file mode 100644
index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 802
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i + 1;
+   if (vect_a[i]*2 > x)
+     break;
+   if (vect_a[i+1]*2 > x)
+     break;
+   vect_a[i] = x;
+   vect_a[i+1] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
new file mode 100644
index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 802
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i + 1;
+   if (vect_a[i]*2 > x)
+     break;
+   if (vect_a[i+1]*2 > x)
+     break;
+   vect_a[i] = x;
+   vect_a[i+1] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
new file mode 100644
index 0000000000000000000000000000000000000000..cf1cb903b31d5fb5527bc6216c0cb9047357da96
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i]*2 > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
new file mode 100644
index 0000000000000000000000000000000000000000..356d971e3a1f69f5c190b49d1d108e6be8766b39
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
+
+/* At -O2 we can't currently vectorize this because of the libcalls not being
+   lowered.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect"  { xfail *-*-* } } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
new file mode 100644
index 0000000000000000000000000000000000000000..d1cca4a33a25fbf6b631d46ce3dcd3608cffa046
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+void abort ();
+
+float results1[16] = {192.00,240.00,288.00,336.00,384.00,432.00,480.00,528.00,0.00};
+float results2[16] = {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,54.00,120.00,198.00,288.00,390.00,504.00,630.00};
+float a[16] = {0};
+float e[16] = {0};
+float b[16] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+int main1 ()
+{
+  int i;
+  for (i=0; i<16; i++)
+    {
+      if (a[i] != results1[i] || e[i] != results2[i])
+        abort();
+    }
+
+  if (a[i+3] != b[i-1])
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
new file mode 100644
index 0000000000000000000000000000000000000000..77043182860321a9e265a89ad8f29ec7946b17e8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int main (void)
+{
+  signed char a[50], b[50], c[50];
+  for (int i = 0; i < 50; ++i)
+    if (a[i] != ((((signed int) -1 < 0 ? -126 : 4) + ((signed int) -1 < 0 ? -101 : 26) + i * 9 + 0) >> 1))
+      __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
new file mode 100644
index 0000000000000000000000000000000000000000..bc9e5bf899a54c5b2ef67e0193d56b243ec5f043
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort();
+struct foostr {
+  _Complex short f1;
+  _Complex short f2;
+};
+struct foostr a[16] __attribute__ ((__aligned__(16))) = {};
+struct foostr c[16] __attribute__ ((__aligned__(16)));
+struct foostr res[16] = {};
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i < 16; i++)
+    {
+      if (c[i].f1 != res[i].f1)
+ abort ();
+      if (c[i].f2 != res[i].f2)
+ abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..4a36d6979db1fd1f97ba2a290f78ac3b84f6de24
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+ 
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     return vect_a[i];
+   vect_a[i] = x;
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
new file mode 100644
index 0000000000000000000000000000000000000000..e2ac8283091597f6f4776560c86f89d1f98b58ee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
new file mode 100644
index 0000000000000000000000000000000000000000..af036079457a7f5e50eae5a9ad4c952f33e62f87
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int x_in[32];
+int x_out_a[32], x_out_b[32];
+int c[16] = {3,2,1,10,1,42,3,4,50,9,32,8,11,10,1,2};
+int a[16 +1] = {0,16,32,48,64,128,256,512,0,16,32,48,64,128,256,512,1024};
+int b[16 +1] = {17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1};
+
+void foo ()
+{
+  int j, i, x;
+  int curr_a, flag, next_a, curr_b, next_b;
+    {
+      for (i = 0; i < 16; i++)
+        {
+          next_b = b[i+1];
+          curr_b = flag ? next_b : curr_b;
+        }
+      x_out_b[j] = curr_b;
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
new file mode 100644
index 0000000000000000000000000000000000000000..85cdfe0938e4093c7725e7f397accf26198f6a53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort();
+int main1 (short X)
+{
+  unsigned char a[128];
+  unsigned short b[128];
+  unsigned int c[128];
+  short myX = X;
+  int i;
+  for (i = 0; i < 128; i++)
+    {
+      if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
+        abort ();
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
new file mode 100644
index 0000000000000000000000000000000000000000..f066ddcfe458ca04bb1336f832121c91d7a3e80e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+void abort ();
+int a[64], b[64];
+int main ()
+{
+  int c = 7;
+  for (int i = 1; i < 64; ++i)
+    if (b[i] != a[i] - a[i-1])
+      abort ();
+  if (b[0] != -7)
+    abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
new file mode 100644
index 0000000000000000000000000000000000000000..9d0dd8dc5fccb05aeabcbce4014c4994bafdfb05
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ unsigned tmp[N];
+ for (int i = 0; i < N; i++)
+ {
+   tmp[i] = x + i;
+   vect_b[i] = tmp[i];
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
new file mode 100644
index 0000000000000000000000000000000000000000..073cbdf614f81525975dbd188632582218e60e9e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   volatile unsigned tmp = x + i;
+   vect_b[i] = tmp;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
new file mode 100644
index 0000000000000000000000000000000000000000..9086e885f56974d17f8cdf2dce4c6a44e580d74b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
@@ -0,0 +1,101 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-add-options bind_pic_locally } */
+/* { dg-require-effective-target vect_early_break } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+unsigned short sa[N];
+unsigned short sc[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[N];
+unsigned int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+/* Current peeling-for-alignment scheme will consider the 'sa[i+7]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-7%VF. This will result in a peeling factor 1,
+   which will also align the access to 'ia[i+3]', and the loop could be
+   vectorized on all targets that support unaligned loads.
+   Without cost model on targets that support misaligned stores, no peeling
+   will be applied since we want to keep the four loads aligned.  */
+
+__attribute__ ((noinline))
+int main1 ()
+{
+  int i;
+  int n = N - 7;
+
+  /* Multiple types with different sizes, used in independent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      sa[i+7] = sb[i] + sc[i];
+      ia[i+3] = ib[i] + ic[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+	abort ();
+    }
+
+  return 0;
+}
+
+/* Current peeling-for-alignment scheme will consider the 'ia[i+3]'
+   access for peeling, and therefore will examine the option of
+   using a peeling factor = VF-3%VF. This will result in a peeling factor
+   1 if VF=4,2. This will not align the access to 'sa[i+3]', for which we 
+   need to peel 5,1 iterations for VF=4,2 respectively, so the loop can not 
+   be vectorized.  However, 'ia[i+3]' also gets aligned if we peel 5
+   iterations, so the loop is vectorizable on all targets that support
+   unaligned loads.
+   Without cost model on targets that support misaligned stores, no peeling
+   will be applied since we want to keep the four loads aligned.  */
+
+__attribute__ ((noinline))
+int main2 ()
+{
+  int i;
+  int n = N-3;
+
+  /* Multiple types with different sizes, used in independent
+     copmutations. Vectorizable.  */
+  for (i = 0; i < n; i++)
+    {
+      ia[i+3] = ib[i] + ic[i];
+      sa[i+3] = sb[i] + sc[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 ();
+  main2 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 2 "vect" { xfail { vect_early_break && { ! vect_hw_misalign } } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
new file mode 100644
index 0000000000000000000000000000000000000000..be4a0c7426093059ce37a9f824defb7ae270094d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+void abort ();
+
+unsigned short sa[32];
+unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[32];
+unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+int main2 (int n)
+{
+  int i;
+  for (i = 0; i < n; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
new file mode 100644
index 0000000000000000000000000000000000000000..84ea627b4927609079297f11674bdb4c6b301140
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != ((i % 3) == 0))
+      abort ();
+}
+
+/* Pattern didn't match inside gcond.  */
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
new file mode 100644
index 0000000000000000000000000000000000000000..193f14e8a4d90793f65a5902eabb8d06496bd6e1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != (i == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..63ff6662f5c2c93201897e43680daa580ed53867
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#define N 1024
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < (N/2); i+=2)
+ {
+   vect_b[i] = x + i;
+   vect_b[i+1] = x + i+1;
+   if (vect_a[i] > x || vect_a[i+1] > x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   vect_a[i+1] += x * vect_b[i+1]; 
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
new file mode 100644
index 0000000000000000000000000000000000000000..4c523d4e714ba67e84b213c2aaf3a56231f8b7e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+extern void abort();
+float a[1024], b[1024], c[1024], d[1024];
+_Bool k[1024];
+
+int main ()
+{
+  char i;
+  for (i = 0; i < 1024; i++)
+    if (k[i] != (i == 0))
+      abort ();
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
new file mode 100644
index 0000000000000000000000000000000000000000..a0c34f71e3bbd3516247a8e026fe513c25413252
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_float } */
+
+typedef float real_t;
+__attribute__((aligned(64))) real_t a[32000], b[32000], c[32000];
+real_t s482()
+{
+    for (int nl = 0; nl < 10000; nl++) {
+        for (int i = 0; i < 32000; i++) {
+            a[i] += b[i] * c[i];
+            if (c[i] > b[i]) break;
+        }
+    }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
new file mode 100644
index 0000000000000000000000000000000000000000..9b94772934f75e685d71a41f3a0336fbfb7320d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int a, b;
+int e() {
+  int d, c;
+  d = 0;
+  for (; d < b; d++)
+    a = 0;
+  d = 0;
+  for (; d < b; d++)
+    if (d)
+      c++;
+  for (;;)
+    if (c)
+      break;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
new file mode 100644
index 0000000000000000000000000000000000000000..11f7fb8547b351734a964175380d1ada696011ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
@@ -0,0 +1,28 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-do compile } */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_long } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-additional-options "-fno-tree-scev-cprop" } */
+
+/* Statement used outside the loop.
+   NOTE: SCEV disabled to ensure the live operation is not removed before
+   vectorization.  */
+__attribute__ ((noinline)) int
+liveloop (int start, int n, int *x, int *y)
+{
+  int i = start;
+  int j;
+  int ret;
+
+  for (j = 0; j < n; ++j)
+    {
+      i += 1;
+      x[j] = i;
+      ret = y[j];
+    }
+  return ret;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
new file mode 100644
index 0000000000000000000000000000000000000000..32b9c087feba1780223e3aee8a2636c99990408c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fdump-tree-vect-all" } */
+
+int d(unsigned);
+
+void a() {
+  char b[8];
+  unsigned c = 0;
+  while (c < 7 && b[c])
+    ++c;
+  if (d(c))
+    return;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_partial_vectors } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
new file mode 100644
index 0000000000000000000000000000000000000000..577c4e96ba91d4dd4aa448233c632de508286eb9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-Ofast -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+enum a { b };
+
+struct {
+  enum a c;
+} d[10], *e;
+
+void f() {
+  int g;
+  for (g = 0, e = d; g < sizeof(1); g++, e++)
+    if (e->c)
+      return;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
new file mode 100644
index 0000000000000000000000000000000000000000..b56a4f755f89225cedd8c156cc7385fe5e07eee5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+int a[0];
+int b;
+
+void g();
+
+void f() {
+  int d, e;
+  for (; e; e++) {
+    int c;
+    switch (b)
+    case '9': {
+      for (; d < 1; d++)
+        if (a[d])
+          c = 1;
+      break;
+    case '<':
+      g();
+      c = 0;
+    }
+      while (c)
+        ;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c
new file mode 100644
index 0000000000000000000000000000000000000000..80f23d1e2431133035895946a5d6b24bef3ca294
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int32plus } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+
+
+int main()
+{
+  int var6 = -1267827473;
+  do {
+      ++var6;
+      double s1_115[4], s2_108[4];
+      int var8 = -161498264;
+      do {
+	  ++var8;
+	  int var12 = 1260960076;
+	  for (; var12 <= 1260960080; ++var12) {
+	      int var13 = 1960990937;
+	      do {
+		  ++var13;
+		  int var14 = 2128638723;
+		  for (; var14 <= 2128638728; ++var14) {
+		      int var22 = -1141190839;
+		      do {
+			  ++var22;
+			  if (s2_108 > s1_115) {
+			      int var23 = -890798748;
+			      do {
+				  long long e_119[4];
+			      } while (var23 <= -890798746);
+			  }
+		      } while (var22 <= -1141190829);
+		  }
+	      } while (var13 <= 1960990946);
+	  }
+      } while (var8 <= -161498254);
+  } while (var6 <= -1267827462);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c
new file mode 100644
index 0000000000000000000000000000000000000000..c9a8298a8b51e05079041ae7a05086a47b1be5dd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 800
+#endif
+unsigned vect_a1[N];
+unsigned vect_b1[N];
+unsigned vect_c1[N];
+unsigned vect_d1[N];
+  
+unsigned vect_a2[N];
+unsigned vect_b2[N];
+unsigned vect_c2[N];
+unsigned vect_d2[N];
+
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b1[i] += x + i;
+   vect_c1[i] += x + i;
+   vect_d1[i] += x + i;
+   if (vect_a1[i]*2 != x)
+     break;
+   vect_a1[i] = x;
+
+   vect_b2[i] += x + i;
+   vect_c2[i] += x + i;
+   vect_d2[i] += x + i;
+   if (vect_a2[i]*2 != x)
+     break;
+   vect_a2[i] = x;
+
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c
new file mode 100644
index 0000000000000000000000000000000000000000..f99de8e1f0650a3b590ed8bd9052e18173fc97d0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c
@@ -0,0 +1,76 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+#include <limits.h>
+#include <assert.h>
+
+#  define BITSIZEOF_INT 32
+#  define BITSIZEOF_LONG 64
+#  define BITSIZEOF_LONG_LONG 64
+
+#define MAKE_FUNS(suffix, type)						\
+int my_ffs##suffix(type x) {						\
+    int i;								\
+    if (x == 0)								\
+	 return 0; 							\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1  << i))					\
+	    break;							\
+    return i + 1;							\
+}									\
+									\
+int my_clz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
+	    break;							\
+    return i;								\
+}
+
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS32					\
+  {                                             \
+    0x00000000UL,                               \
+    0x00000001UL,                               \
+    0x80000000UL,                               \
+    0x00000002UL,                               \
+    0x40000000UL,                               \
+    0x00010000UL,                               \
+    0x00008000UL,                               \
+    0xa5a5a5a5UL,                               \
+    0x5a5a5a5aUL,                               \
+    0xcafe0000UL,                               \
+    0x00cafe00UL,                               \
+    0x0000cafeUL,                               \
+    0xffffffffUL                                \
+  }
+
+
+unsigned int ints[] = NUMS32;
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+  for (i = 0; i < N(ints); i++)
+    {
+      if (__builtin_ffs (ints[i]) != my_ffs (ints[i]))
+	abort ();
+      if (ints[i] != 0
+	  && __builtin_clz (ints[i]) != my_clz (ints[i]))
+	abort ();
+    }
+
+  exit (0);
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
new file mode 100644
index 0000000000000000000000000000000000000000..10fd8b42952c42f3d3a014da103931ca394423d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] == x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c
new file mode 100644
index 0000000000000000000000000000000000000000..9073130197e124527f8e38c238d8f13452a7780e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#  define BITSIZEOF_INT 32
+#  define BITSIZEOF_LONG 64
+#  define BITSIZEOF_LONG_LONG 64
+
+#define MAKE_FUNS(suffix, type)						\
+__attribute__((noinline)) \
+int my_clz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
+	    break;							\
+    return i;								\
+}
+
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS32					\
+  {                                             \
+    0x00000000UL,                               \
+    0x00000001UL,                               \
+    0x80000000UL,                               \
+    0x00000002UL,                               \
+    0x40000000UL,                               \
+    0x00010000UL,                               \
+    0x00008000UL,                               \
+    0xa5a5a5a5UL,                               \
+    0x5a5a5a5aUL,                               \
+    0xcafe0000UL,                               \
+    0x00cafe00UL,                               \
+    0x0000cafeUL,                               \
+    0xffffffffUL                                \
+  }
+
+
+unsigned int ints[] = NUMS32;
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(ints); i++)
+    {
+      if (ints[i] != 0
+	  && __builtin_clz (ints[i]) != my_clz (ints[i]))
+	  abort ();
+    }
+
+  exit (0);
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c
new file mode 100644
index 0000000000000000000000000000000000000000..c6d6eb526e618ee93547e04eaba3c6a159a18075
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#  define BITSIZEOF_INT 32
+#  define BITSIZEOF_LONG 64
+#  define BITSIZEOF_LONG_LONG 64
+
+#define MAKE_FUNS(suffix, type)						\
+__attribute__((noinline)) \
+int my_ffs##suffix(type x) {						\
+    int i;								\
+    if (x == 0)								\
+	 return 0; 							\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1  << i))					\
+	    break;							\
+    return i + 1;							\
+}
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS32					\
+  {                                             \
+    0x00000000UL,                               \
+    0x00000001UL,                               \
+    0x80000000UL,                               \
+    0x00000002UL,                               \
+    0x40000000UL,                               \
+    0x00010000UL,                               \
+    0x00008000UL,                               \
+    0xa5a5a5a5UL,                               \
+    0x5a5a5a5aUL,                               \
+    0xcafe0000UL,                               \
+    0x00cafe00UL,                               \
+    0x0000cafeUL,                               \
+    0xffffffffUL                                \
+  }
+
+
+unsigned int ints[] = NUMS32;
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(ints); i++)
+    {
+      if (__builtin_ffs (ints[i]) != my_ffs (ints[i]))
+	abort ();
+    }
+
+  exit (0);
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c
new file mode 100644
index 0000000000000000000000000000000000000000..0f0a1f30ab95bf540027efa8c03aff8fe03a960b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c
@@ -0,0 +1,147 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#if __INT_MAX__ > 2147483647L
+# if __INT_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_INT 64
+# else
+#  define BITSIZEOF_INT 32
+# endif
+#else
+# if __INT_MAX__ >= 2147483647L
+#  define BITSIZEOF_INT 32
+# else
+#  define BITSIZEOF_INT 16
+# endif
+#endif
+
+#if __LONG_MAX__ > 2147483647L
+# if __LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG 64
+# else
+#  define BITSIZEOF_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG 32
+#endif
+
+#if __LONG_LONG_MAX__ > 2147483647L
+# if __LONG_LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG_LONG 64
+# else
+#  define BITSIZEOF_LONG_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG_LONG 32
+#endif
+
+#define MAKE_FUNS(suffix, type)						\
+__attribute__((noinline)) \
+int my_ctz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1  << i))					\
+	    break;							\
+    return i;								\
+}
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS16					\
+  {						\
+    0x0000U,					\
+    0x0001U,					\
+    0x8000U,					\
+    0x0002U,					\
+    0x4000U,					\
+    0x0100U,					\
+    0x0080U,					\
+    0xa5a5U,					\
+    0x5a5aU,					\
+    0xcafeU,					\
+    0xffffU					\
+  }
+
+#define NUMS32					\
+  {						\
+    0x00000000UL,				\
+    0x00000001UL,				\
+    0x80000000UL,				\
+    0x00000002UL,				\
+    0x40000000UL,				\
+    0x00010000UL,				\
+    0x00008000UL,				\
+    0xa5a5a5a5UL,				\
+    0x5a5a5a5aUL,				\
+    0xcafe0000UL,				\
+    0x00cafe00UL,				\
+    0x0000cafeUL,				\
+    0xffffffffUL				\
+  }
+
+#define NUMS64					\
+  {						\
+    0x0000000000000000ULL,			\
+    0x0000000000000001ULL,			\
+    0x8000000000000000ULL,			\
+    0x0000000000000002ULL,			\
+    0x4000000000000000ULL,			\
+    0x0000000100000000ULL,			\
+    0x0000000080000000ULL,			\
+    0xa5a5a5a5a5a5a5a5ULL,			\
+    0x5a5a5a5a5a5a5a5aULL,			\
+    0xcafecafe00000000ULL,			\
+    0x0000cafecafe0000ULL,			\
+    0x00000000cafecafeULL,			\
+    0xffffffffffffffffULL			\
+  }
+
+unsigned int ints[] =
+#if BITSIZEOF_INT == 64
+NUMS64;
+#elif BITSIZEOF_INT == 32
+NUMS32;
+#else
+NUMS16;
+#endif
+
+unsigned long longs[] =
+#if BITSIZEOF_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+unsigned long long longlongs[] =
+#if BITSIZEOF_LONG_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(ints); i++)
+    {
+      if (ints[i] != 0
+	  && __builtin_ctz (ints[i]) != my_ctz (ints[i]))
+	  abort ();
+    }
+
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c
new file mode 100644
index 0000000000000000000000000000000000000000..5cce21cd16aa89d96cdac2b302d29ee918b67249
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#  define BITSIZEOF_INT 32
+#  define BITSIZEOF_LONG 64
+#  define BITSIZEOF_LONG_LONG 64
+
+#define MAKE_FUNS(suffix, type)						\
+__attribute__((noinline)) \
+int my_clz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
+	    break;							\
+    return i;								\
+}
+
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS32					\
+  {                                             \
+    0x00000000UL,                               \
+    0x00000001UL,                               \
+    0x80000000UL,                               \
+    0x00000002UL,                               \
+    0x40000000UL,                               \
+    0x00010000UL,                               \
+    0x00008000UL,                               \
+    0xa5a5a5a5UL,                               \
+    0x5a5a5a5aUL,                               \
+    0xcafe0000UL,                               \
+    0x00cafe00UL,                               \
+    0x0000cafeUL,                               \
+    0xffffffffUL                                \
+  }
+
+
+unsigned int ints[] = NUMS32;
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(ints); i++)
+    {
+      if (ints[i] != 0
+	  && __builtin_clz (ints[i]) != my_clz (ints[i]))
+	  abort ();
+    }
+
+  exit (0);
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c
new file mode 100644
index 0000000000000000000000000000000000000000..83676da28884e79874fb0b5cc6a434a0fe6b87cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c
@@ -0,0 +1,161 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#if __INT_MAX__ > 2147483647L
+# if __INT_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_INT 64
+# else
+#  define BITSIZEOF_INT 32
+# endif
+#else
+# if __INT_MAX__ >= 2147483647L
+#  define BITSIZEOF_INT 32
+# else
+#  define BITSIZEOF_INT 16
+# endif
+#endif
+
+#if __LONG_MAX__ > 2147483647L
+# if __LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG 64
+# else
+#  define BITSIZEOF_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG 32
+#endif
+
+#if __LONG_LONG_MAX__ > 2147483647L
+# if __LONG_LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG_LONG 64
+# else
+#  define BITSIZEOF_LONG_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG_LONG 32
+#endif
+
+#define MAKE_FUNS(suffix, type)						\
+int my_clrsb##suffix(type x) {						\
+    int i;								\
+    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;		\
+    for (i = 1; i < CHAR_BIT * sizeof (type); i++)			\
+	if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)		\
+	    != leading)							\
+	    break;							\
+    return i - 1;							\
+}
+
+MAKE_FUNS (, unsigned);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS16					\
+  {						\
+    0x0000U,					\
+    0x0001U,					\
+    0x8000U,					\
+    0x0002U,					\
+    0x4000U,					\
+    0x0100U,					\
+    0x0080U,					\
+    0xa5a5U,					\
+    0x5a5aU,					\
+    0xcafeU,					\
+    0xffffU					\
+  }
+
+#define NUMS32					\
+  {						\
+    0x00000000UL,				\
+    0x00000001UL,				\
+    0x80000000UL,				\
+    0x00000002UL,				\
+    0x40000000UL,				\
+    0x00010000UL,				\
+    0x00008000UL,				\
+    0xa5a5a5a5UL,				\
+    0x5a5a5a5aUL,				\
+    0xcafe0000UL,				\
+    0x00cafe00UL,				\
+    0x0000cafeUL,				\
+    0xffffffffUL				\
+  }
+
+#define NUMS64					\
+  {						\
+    0x0000000000000000ULL,			\
+    0x0000000000000001ULL,			\
+    0x8000000000000000ULL,			\
+    0x0000000000000002ULL,			\
+    0x4000000000000000ULL,			\
+    0x0000000100000000ULL,			\
+    0x0000000080000000ULL,			\
+    0xa5a5a5a5a5a5a5a5ULL,			\
+    0x5a5a5a5a5a5a5a5aULL,			\
+    0xcafecafe00000000ULL,			\
+    0x0000cafecafe0000ULL,			\
+    0x00000000cafecafeULL,			\
+    0xffffffffffffffffULL			\
+  }
+
+unsigned int ints[] =
+#if BITSIZEOF_INT == 64
+NUMS64;
+#elif BITSIZEOF_INT == 32
+NUMS32;
+#else
+NUMS16;
+#endif
+
+unsigned long longs[] =
+#if BITSIZEOF_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+unsigned long long longlongs[] =
+#if BITSIZEOF_LONG_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+  /* Test constant folding.  */
+
+#define TEST(x, suffix)							\
+  if (__builtin_clrsb##suffix (x) != my_clrsb##suffix (x))		\
+    abort ();								
+
+#if BITSIZEOF_INT == 32
+  TEST(0x00000000UL,);
+  TEST(0x00000001UL,);
+  TEST(0x80000000UL,);
+  TEST(0x40000000UL,);
+  TEST(0x00010000UL,);
+  TEST(0x00008000UL,);
+  TEST(0xa5a5a5a5UL,);
+  TEST(0x5a5a5a5aUL,);
+  TEST(0xcafe0000UL,);
+  TEST(0x00cafe00UL,);
+  TEST(0x0000cafeUL,);
+  TEST(0xffffffffUL,);
+#endif
+
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c
new file mode 100644
index 0000000000000000000000000000000000000000..cc1ce4cf298ee0747f41ea4941af5a65f8a688ef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c
@@ -0,0 +1,230 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-O3" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#if __INT_MAX__ > 2147483647L
+# if __INT_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_INT 64
+# else
+#  define BITSIZEOF_INT 32
+# endif
+#else
+# if __INT_MAX__ >= 2147483647L
+#  define BITSIZEOF_INT 32
+# else
+#  define BITSIZEOF_INT 16
+# endif
+#endif
+
+#if __LONG_MAX__ > 2147483647L
+# if __LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG 64
+# else
+#  define BITSIZEOF_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG 32
+#endif
+
+#if __LONG_LONG_MAX__ > 2147483647L
+# if __LONG_LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG_LONG 64
+# else
+#  define BITSIZEOF_LONG_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG_LONG 32
+#endif
+
+#define MAKE_FUNS(suffix, type)						\
+__attribute__((noinline)) \
+int my_ffs##suffix(type x) {						\
+    int i;								\
+    if (x == 0)								\
+	 return 0; 							\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1  << i))					\
+	    break;							\
+    return i + 1;							\
+}									\
+									\
+int my_ctz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1  << i))					\
+	    break;							\
+    return i;								\
+}									\
+									\
+__attribute__((noinline)) \
+int my_clz##suffix(type x) {						\
+    int i;								\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
+	    break;							\
+    return i;								\
+}									\
+									\
+int my_clrsb##suffix(type x) {						\
+    int i;								\
+    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;		\
+    for (i = 1; i < CHAR_BIT * sizeof (type); i++)			\
+	if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)		\
+	    != leading)							\
+	    break;							\
+    return i - 1;							\
+}									\
+									\
+__attribute__((noinline)) \
+int my_popcount##suffix(type x) {					\
+    int i;								\
+    int count = 0;							\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << i))					\
+	    count++;							\
+    return count;							\
+}									\
+									\
+__attribute__((noinline)) \
+int my_parity##suffix(type x) {						\
+    int i;								\
+    int count = 0;							\
+    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
+	if (x & ((type) 1 << i))					\
+	    count++;							\
+    return count & 1;							\
+}
+
+MAKE_FUNS (ll, unsigned long long);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS16					\
+  {						\
+    0x0000U,					\
+    0x0001U,					\
+    0x8000U,					\
+    0x0002U,					\
+    0x4000U,					\
+    0x0100U,					\
+    0x0080U,					\
+    0xa5a5U,					\
+    0x5a5aU,					\
+    0xcafeU,					\
+    0xffffU					\
+  }
+
+#define NUMS32					\
+  {						\
+    0x00000000UL,				\
+    0x00000001UL,				\
+    0x80000000UL,				\
+    0x00000002UL,				\
+    0x40000000UL,				\
+    0x00010000UL,				\
+    0x00008000UL,				\
+    0xa5a5a5a5UL,				\
+    0x5a5a5a5aUL,				\
+    0xcafe0000UL,				\
+    0x00cafe00UL,				\
+    0x0000cafeUL,				\
+    0xffffffffUL				\
+  }
+
+#define NUMS64					\
+  {						\
+    0x0000000000000000ULL,			\
+    0x0000000000000001ULL,			\
+    0x8000000000000000ULL,			\
+    0x0000000000000002ULL,			\
+    0x4000000000000000ULL,			\
+    0x0000000100000000ULL,			\
+    0x0000000080000000ULL,			\
+    0xa5a5a5a5a5a5a5a5ULL,			\
+    0x5a5a5a5a5a5a5a5aULL,			\
+    0xcafecafe00000000ULL,			\
+    0x0000cafecafe0000ULL,			\
+    0x00000000cafecafeULL,			\
+    0xffffffffffffffffULL			\
+  }
+
+unsigned int ints[] =
+#if BITSIZEOF_INT == 64
+NUMS64;
+#elif BITSIZEOF_INT == 32
+NUMS32;
+#else
+NUMS16;
+#endif
+
+unsigned long longs[] =
+#if BITSIZEOF_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+unsigned long long longlongs[] =
+#if BITSIZEOF_LONG_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(longlongs); i++)
+    {
+      if (__builtin_ffsll (longlongs[i]) != my_ffsll (longlongs[i]))
+	abort ();
+      if (longlongs[i] != 0
+	  && __builtin_clzll (longlongs[i]) != my_clzll (longlongs[i]))
+	abort ();
+      if (longlongs[i] != 0
+	  && __builtin_ctzll (longlongs[i]) != my_ctzll (longlongs[i]))
+	abort ();
+      if (__builtin_clrsbll (longlongs[i]) != my_clrsbll (longlongs[i]))
+	abort ();
+      if (__builtin_popcountll (longlongs[i]) != my_popcountll (longlongs[i]))
+	abort ();
+      if (__builtin_parityll (longlongs[i]) != my_parityll (longlongs[i]))
+	abort ();
+    }
+
+  /* Test constant folding.  */
+
+#define TEST(x, suffix)							\
+  if (__builtin_ffs##suffix (x) != my_ffs##suffix (x))			\
+    abort ();								\
+
+#if BITSIZEOF_LONG_LONG == 64
+  TEST(0x0000000000000000ULL, ll);
+  TEST(0x0000000000000001ULL, ll);
+  TEST(0x8000000000000000ULL, ll);
+  TEST(0x0000000000000002ULL, ll);
+  TEST(0x4000000000000000ULL, ll);
+  TEST(0x0000000100000000ULL, ll);
+  TEST(0x0000000080000000ULL, ll);
+  TEST(0xa5a5a5a5a5a5a5a5ULL, ll);
+  TEST(0x5a5a5a5a5a5a5a5aULL, ll);
+  TEST(0xcafecafe00000000ULL, ll);
+  TEST(0x0000cafecafe0000ULL, ll);
+  TEST(0x00000000cafecafeULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+#endif
+
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c
new file mode 100644
index 0000000000000000000000000000000000000000..adba337b101f4d7cafaa50329a933594b0d501ad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c
@@ -0,0 +1,165 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-O3" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <limits.h>
+#include <assert.h>
+
+#if __INT_MAX__ > 2147483647L
+# if __INT_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_INT 64
+# else
+#  define BITSIZEOF_INT 32
+# endif
+#else
+# if __INT_MAX__ >= 2147483647L
+#  define BITSIZEOF_INT 32
+# else
+#  define BITSIZEOF_INT 16
+# endif
+#endif
+
+#if __LONG_MAX__ > 2147483647L
+# if __LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG 64
+# else
+#  define BITSIZEOF_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG 32
+#endif
+
+#if __LONG_LONG_MAX__ > 2147483647L
+# if __LONG_LONG_MAX__ >= 9223372036854775807L
+#  define BITSIZEOF_LONG_LONG 64
+# else
+#  define BITSIZEOF_LONG_LONG 32
+# endif
+#else
+# define BITSIZEOF_LONG_LONG 32
+#endif
+
+#define MAKE_FUNS(suffix, type)						\
+int my_clrsb##suffix(type x) {						\
+    int i;								\
+    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;		\
+    for (i = 1; i < CHAR_BIT * sizeof (type); i++)			\
+	if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)		\
+	    != leading)							\
+	    break;							\
+    return i - 1;							\
+}									\
+									\
+
+MAKE_FUNS (, unsigned);
+MAKE_FUNS (ll, unsigned long long);
+
+extern void abort (void);
+extern void exit (int);
+
+#define NUMS16					\
+  {						\
+    0x0000U,					\
+    0x0001U,					\
+    0x8000U,					\
+    0x0002U,					\
+    0x4000U,					\
+    0x0100U,					\
+    0x0080U,					\
+    0xa5a5U,					\
+    0x5a5aU,					\
+    0xcafeU,					\
+    0xffffU					\
+  }
+
+#define NUMS32					\
+  {						\
+    0x00000000UL,				\
+    0x00000001UL,				\
+    0x80000000UL,				\
+    0x00000002UL,				\
+    0x40000000UL,				\
+    0x00010000UL,				\
+    0x00008000UL,				\
+    0xa5a5a5a5UL,				\
+    0x5a5a5a5aUL,				\
+    0xcafe0000UL,				\
+    0x00cafe00UL,				\
+    0x0000cafeUL,				\
+    0xffffffffUL				\
+  }
+
+#define NUMS64					\
+  {						\
+    0x0000000000000000ULL,			\
+    0x0000000000000001ULL,			\
+    0x8000000000000000ULL,			\
+    0x0000000000000002ULL,			\
+    0x4000000000000000ULL,			\
+    0x0000000100000000ULL,			\
+    0x0000000080000000ULL,			\
+    0xa5a5a5a5a5a5a5a5ULL,			\
+    0x5a5a5a5a5a5a5a5aULL,			\
+    0xcafecafe00000000ULL,			\
+    0x0000cafecafe0000ULL,			\
+    0x00000000cafecafeULL,			\
+    0xffffffffffffffffULL			\
+  }
+
+unsigned int ints[] =
+#if BITSIZEOF_INT == 64
+NUMS64;
+#elif BITSIZEOF_INT == 32
+NUMS32;
+#else
+NUMS16;
+#endif
+
+unsigned long longs[] =
+#if BITSIZEOF_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+unsigned long long longlongs[] =
+#if BITSIZEOF_LONG_LONG == 64
+NUMS64;
+#else
+NUMS32;
+#endif
+
+#define N(table) (sizeof (table) / sizeof (table[0]))
+
+int
+main (void)
+{
+  int i;
+
+#pragma GCC novector
+  for (i = 0; i < N(ints); i++)
+    {
+      if (__builtin_clrsb (ints[i]) != my_clrsb (ints[i]))
+	abort ();
+    }
+
+  /* Test constant folding.  */
+
+#define TEST(x, suffix)							\
+  if (__builtin_clrsb##suffix (x) != my_clrsb##suffix (x))		\
+    abort ();								
+
+#if BITSIZEOF_LONG_LONG == 64
+  TEST(0xffffffffffffffffULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+  TEST(0xffffffffffffffffULL, ll);
+#endif
+
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
new file mode 100644
index 0000000000000000000000000000000000000000..ae706b2952cfcecf20546a67a735b8d902cbb607
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+char vect_a[N];
+char vect_b[N];
+  
+char test4(char x, char * restrict res)
+{
+ char ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_b[i] += x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] += x * vect_b[i];
+   res[i] *= vect_b[i];
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
new file mode 100644
index 0000000000000000000000000000000000000000..4e8b5bdea5ff9aa0cadbea0af10d51707da011c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 803
+#endif
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   vect_a[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..571aec0ccfdbcdc318ba1f17de31958c16b3e9bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8.3-a -mcpu=neoverse-n1" } */
+
+#include <arm_neon.h>
+
+/* { dg-warning "switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.3-a’ switch and would result in options \\+fp16\\+dotprod\\+profile\\+nopauth" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..cee42c84c4f762a4d4773ea4380163742b5137b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8-a+sve -mcpu=neoverse-n1" } */
+
+#include <arm_neon.h>
+
+/* { dg-warning "switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8-a+sve’ switch and would result in options \\+lse\\+rcpc\\+rdma\\+dotprod\\+profile\\+nosve" } */
diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..0a05b98eedb8bd743bb5af8e4dd3c95aab001c4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8-a -mcpu=neovese-n1 -Wpedentic -Werror" } */
+
+#include <arm_neon.h>
+
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index f0b692a2e19bae3cf3ffee8f27bd39b05aba3b9c..1e47ae84080f9908736d1c3be9c14d589e8772a7 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3975,6 +3975,17 @@ proc check_effective_target_vect_int { } {
 	}}]
 }
 
+# Return 1 if the target supports hardware vectorization of early breaks,
+# 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_vect_early_break { } {
+    return [check_cached_effective_target_indexed vect_early_break {
+      expr {
+	[istarget aarch64*-*-*]
+	}}]
+}
 # Return 1 if the target supports hardware vectorization of complex additions of
 # byte, 0 otherwise.
 #




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (22 preceding siblings ...)
  2023-11-06  7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
@ 2023-11-06  7:37 ` Tamar Christina
  2023-11-07 10:53   ` Richard Biener
  2023-11-06  7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
                   ` (18 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 19505 bytes --]

Hi All,

When performing early break vectorization we need to be sure that the vector
operations are safe to perform.  A simple example is e.g.

 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i]*2 != x)
     break;
   vect_a[i] = x;
 }

where the store to vect_b is not allowed to be executed unconditionally since
if we exit through the early break it wouldn't have been done for the full VF
iteration.

Effective the code motion determines:
  - is it safe/possible to vectorize the function
  - what updates to the VUSES should be performed if we do
  - Which statements need to be moved
  - Which statements can't be moved:
    * values that are live must be reachable through all exits
    * values that aren't single use and shared by the use/def chain of the cond
  - The final insertion point of the instructions.  In the cases we have
    multiple early exist statements this should be the one closest to the loop
    latch itself.

After motion the loop above is:

 for (int i = 0; i < N; i++)
 {
   ... y = x + i;
   if (vect_a[i]*2 != x)
     break;
   vect_b[i] = y;
   vect_a[i] = x;

 }

The operation is split into two, during data ref analysis we determine
validity of the operation and generate a worklist of actions to perform if we
vectorize.

After peeling and just before statetement tranformation we replay this worklist
which moves the statements and updates book keeping only in the main loop that's
to be vectorized.  This includes updating of USES in exit blocks.

At the moment we don't support this for epilog nomasks since the additional
vectorized epilog's stmt UIDs are not found.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
	(vect_analyze_early_break_dependences): New.
	(vect_analyze_data_ref_dependences): Use them.
	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
	early_breaks.
	(move_early_exit_stmts): New.
	(vect_transform_loop): use it/
	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
	(class _loop_vec_info): Add early_breaks, early_break_conflict,
	early_break_vuses.
	(LOOP_VINFO_EARLY_BREAKS): New.
	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
	(LOOP_VINFO_EARLY_BRK_VUSES): New.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c32b9ce7be77f3e1d60 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.
+
+   Arguments:
+     - LOOP_VINFO: loop information for the current loop.
+     - CHAIN: Currently detected sequence of instructions that need to be moved
+	      if we are to vectorize this early break.
+     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from
+	      one or more cond conditions.  If this set overlaps with CHAIN then FIXED
+	      takes precedence.  This deals with non-single use cases.
+     - LOADS: List of all loads found during traversal.
+     - BASES: List of all load data references found during traversal.
+     - GSTMT: Current position to inspect for validity.  The sequence
+	      will be moved upwards from this point.
+     - REACHING_VUSE: The dominating VUSE found so far.  */
+
+static bool
+validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree> *chain,
+			   hash_set<tree> *fixed, vec<tree> *loads,
+			   vec<data_reference *> *bases, tree *reaching_vuse,
+			   gimple_stmt_iterator *gstmt)
+{
+  if (gsi_end_p (*gstmt))
+    return true;
+
+  gimple *stmt = gsi_stmt (*gstmt);
+  /* ?? Do I need to move debug statements? not quite sure..  */
+  if (gimple_has_ops (stmt)
+      && !is_gimple_debug (stmt))
+    {
+      tree dest = NULL_TREE;
+      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
+	 use the LHS, if not, assume that the first argument of a call is the
+	 value being defined.  e.g. MASKED_LOAD etc.  */
+      if (gimple_has_lhs (stmt))
+	dest = gimple_get_lhs (stmt);
+      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
+	dest = gimple_arg (call, 0);
+      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
+	{
+	  /* Operands of conds are ones we can't move.  */
+	  fixed->add (gimple_cond_lhs (cond));
+	  fixed->add (gimple_cond_rhs (cond));
+	}
+
+      bool move = false;
+
+      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_vinfo)
+	{
+	   if (dump_enabled_p ())
+	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			      "early breaks not supported. Unknown"
+			      " statement: %G", stmt);
+	   return false;
+	}
+
+      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+      if (dr_ref)
+	{
+	   /* We currently only support statically allocated objects due to
+	      not having first-faulting loads support or peeling for alignment
+	      support.  Compute the size of the referenced object (it could be
+	      dynamically allocated).  */
+	   tree obj = DR_BASE_ADDRESS (dr_ref);
+	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   tree refop = TREE_OPERAND (obj, 0);
+	   tree refbase = get_base_address (refop);
+	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   if (DR_IS_READ (dr_ref))
+	     {
+		loads->safe_push (dest);
+		bases->safe_push (dr_ref);
+	     }
+	   else if (DR_IS_WRITE (dr_ref))
+	     {
+		for (auto dr : bases)
+		  if (same_data_refs_base_objects (dr, dr_ref))
+		    {
+		      if (dump_enabled_p ())
+			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+					   vect_location,
+					   "early breaks only supported,"
+					   " overlapping loads and stores found"
+					   " before the break statement.\n");
+		      return false;
+		    }
+		/* Any writes starts a new chain. */
+		move = true;
+	     }
+	}
+
+      /* If a statement is live and escapes the loop through usage in the loop
+	 epilogue then we can't move it since we need to maintain its
+	 reachability through all exits.  */
+      bool skip = false;
+      if (STMT_VINFO_LIVE_P (stmt_vinfo)
+	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
+	{
+	  imm_use_iterator imm_iter;
+	  use_operand_p use_p;
+	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
+	    {
+	      basic_block bb = gimple_bb (USE_STMT (use_p));
+	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+	      if (skip)
+		break;
+	    }
+	}
+
+      /* If we found the defining statement of a something that's part of the
+	 chain then expand the chain with the new SSA_VARs being used.  */
+      if (!skip && (chain->contains (dest) || move))
+	{
+	  move = true;
+	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
+	    {
+	      tree var = gimple_arg (stmt, x);
+	      if (TREE_CODE (var) == SSA_NAME)
+		{
+		  if (fixed->contains (dest))
+		    {
+		      move = false;
+		      fixed->add (var);
+		    }
+		  else
+		    chain->add (var);
+		}
+	      else
+		{
+		  use_operand_p use_p;
+		  ssa_op_iter iter;
+		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+		    {
+		      tree op = USE_FROM_PTR (use_p);
+		      gcc_assert (TREE_CODE (op) == SSA_NAME);
+		      if (fixed->contains (dest))
+			{
+			  move = false;
+			  fixed->add (op);
+			}
+		      else
+			chain->add (op);
+		    }
+		}
+	    }
+
+	  if (dump_enabled_p ())
+	    {
+	      if (move)
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"found chain %G", stmt);
+	      else
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"ignored chain %G, not single use", stmt);
+	    }
+	}
+
+      if (move)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "==> recording stmt %G", stmt);
+
+	  for (tree ref : loads)
+	    if (stmt_may_clobber_ref_p (stmt, ref, true))
+	      {
+	        if (dump_enabled_p ())
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   "early breaks not supported as memory used"
+				   " may alias.\n");
+	        return false;
+	      }
+
+	  /* If we've moved a VDEF, extract the defining MEM and update
+	     usages of it.   */
+	  tree vdef;
+	  if ((vdef = gimple_vdef (stmt)))
+	    {
+	      /* This statement is to be moved.  */
+	      LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt);
+	      *reaching_vuse = gimple_vuse (stmt);
+	    }
+	}
+    }
+
+  gsi_prev (gstmt);
+
+  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
+				  reaching_vuse, gstmt))
+    return false;
+
+  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
+    {
+      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt);
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "marked statement for vUSE update: %G", stmt);
+    }
+
+  return true;
+}
+
+/* Funcion vect_analyze_early_break_dependences.
+
+   Examime all the data references in the loop and make sure that if we have
+   mulitple exits that we are able to safely move stores such that they become
+   safe for vectorization.  The function also calculates the place where to move
+   the instructions to and computes what the new vUSE chain should be.
+
+   This works in tandem with the CFG that will be produced by
+   slpeel_tree_duplicate_loop_to_edge_cfg later on.  */
+
+static opt_result
+vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
+
+  hash_set<tree> chain, fixed;
+  auto_vec<tree> loads;
+  auto_vec<data_reference *> bases;
+  basic_block dest_bb = NULL;
+  tree vuse = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "loop contains multiple exits, analyzing"
+		     " statement dependencies.\n");
+
+  for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	continue;
+
+      gimple *stmt = STMT_VINFO_STMT (loop_cond_info);
+      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+
+      /* Initiaze the vuse chain with the one at the early break.  */
+      if (!vuse)
+	vuse = gimple_vuse (c);
+
+      if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
+				     &bases, &vuse, &gsi))
+	return opt_result::failure_at (stmt,
+				       "can't safely apply code motion to "
+				       "dependencies of %G to vectorize "
+				       "the early exit.\n", stmt);
+
+      /* Save destination as we go, BB are visited in order and the last one
+	is where statements should be moved to.  */
+      if (!dest_bb)
+	dest_bb = gimple_bb (c);
+      else
+	{
+	  basic_block curr_bb = gimple_bb (c);
+	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+	    dest_bb = curr_bb;
+	}
+    }
+
+  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
+  gcc_assert (dest_bb);
+  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+  /* TODO: Remove? It's useful debug statement but may be too much.  */
+  for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "updated use: %T, mem_ref: %G",
+			 vuse, g);
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "recorded statements to be moved to BB %d\n",
+		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+
+  return opt_result::success ();
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return vect_analyze_early_break_dependences (loop_vinfo);
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a2079c5234033c3d825ea 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
+    return;
+
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
+    {
+      /* Check to see if statement is still required for vect or has been
+	 elided.  */
+      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_info)
+	continue;
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+      update_stmt (stmt);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ());
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse to %T for stmt %G", vuse, p);
+      unlink_stmt_vdef (p);
+      gimple_set_vuse (p, vuse);
+      update_stmt (p);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a91cb8e74c0557e75d1ea2c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+       case vect_early_exit_def:
+	  dump_printf (MSG_NOTE, "early exit\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf78352e29dc9958746fb9c94 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -66,6 +66,7 @@ enum vect_def_type {
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
+  vect_early_exit_def,
   vect_unknown_def_type
 };
 
@@ -888,6 +889,10 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -942,6 +947,20 @@ public:
   /* The controlling loop IV for the scalar loop being vectorized.  This IV
      controls the natural exits of the loop.  */
   edge scalar_loop_iv_exit;
+
+  /* Used to store the list of statements needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  These are stored in order that they need
+     to be moved.  */
+  auto_vec<gimple *> early_break_conflict;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<gimple*> early_break_vuses;
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -996,6 +1015,10 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies




-- 

[-- Attachment #2: rb17963.patch --]
[-- Type: text/plain, Size: 17052 bytes --]

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c32b9ce7be77f3e1d60 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.
+
+   Arguments:
+     - LOOP_VINFO: loop information for the current loop.
+     - CHAIN: Currently detected sequence of instructions that need to be moved
+	      if we are to vectorize this early break.
+     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from
+	      one or more cond conditions.  If this set overlaps with CHAIN then FIXED
+	      takes precedence.  This deals with non-single use cases.
+     - LOADS: List of all loads found during traversal.
+     - BASES: List of all load data references found during traversal.
+     - GSTMT: Current position to inspect for validity.  The sequence
+	      will be moved upwards from this point.
+     - REACHING_VUSE: The dominating VUSE found so far.  */
+
+static bool
+validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree> *chain,
+			   hash_set<tree> *fixed, vec<tree> *loads,
+			   vec<data_reference *> *bases, tree *reaching_vuse,
+			   gimple_stmt_iterator *gstmt)
+{
+  if (gsi_end_p (*gstmt))
+    return true;
+
+  gimple *stmt = gsi_stmt (*gstmt);
+  /* ?? Do I need to move debug statements? not quite sure..  */
+  if (gimple_has_ops (stmt)
+      && !is_gimple_debug (stmt))
+    {
+      tree dest = NULL_TREE;
+      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
+	 use the LHS, if not, assume that the first argument of a call is the
+	 value being defined.  e.g. MASKED_LOAD etc.  */
+      if (gimple_has_lhs (stmt))
+	dest = gimple_get_lhs (stmt);
+      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
+	dest = gimple_arg (call, 0);
+      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
+	{
+	  /* Operands of conds are ones we can't move.  */
+	  fixed->add (gimple_cond_lhs (cond));
+	  fixed->add (gimple_cond_rhs (cond));
+	}
+
+      bool move = false;
+
+      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_vinfo)
+	{
+	   if (dump_enabled_p ())
+	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			      "early breaks not supported. Unknown"
+			      " statement: %G", stmt);
+	   return false;
+	}
+
+      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+      if (dr_ref)
+	{
+	   /* We currently only support statically allocated objects due to
+	      not having first-faulting loads support or peeling for alignment
+	      support.  Compute the size of the referenced object (it could be
+	      dynamically allocated).  */
+	   tree obj = DR_BASE_ADDRESS (dr_ref);
+	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   tree refop = TREE_OPERAND (obj, 0);
+	   tree refbase = get_base_address (refop);
+	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+	     {
+	       if (dump_enabled_p ())
+		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				  "early breaks only supported on statically"
+				  " allocated objects.\n");
+	       return false;
+	     }
+
+	   if (DR_IS_READ (dr_ref))
+	     {
+		loads->safe_push (dest);
+		bases->safe_push (dr_ref);
+	     }
+	   else if (DR_IS_WRITE (dr_ref))
+	     {
+		for (auto dr : bases)
+		  if (same_data_refs_base_objects (dr, dr_ref))
+		    {
+		      if (dump_enabled_p ())
+			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+					   vect_location,
+					   "early breaks only supported,"
+					   " overlapping loads and stores found"
+					   " before the break statement.\n");
+		      return false;
+		    }
+		/* Any writes starts a new chain. */
+		move = true;
+	     }
+	}
+
+      /* If a statement is live and escapes the loop through usage in the loop
+	 epilogue then we can't move it since we need to maintain its
+	 reachability through all exits.  */
+      bool skip = false;
+      if (STMT_VINFO_LIVE_P (stmt_vinfo)
+	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
+	{
+	  imm_use_iterator imm_iter;
+	  use_operand_p use_p;
+	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
+	    {
+	      basic_block bb = gimple_bb (USE_STMT (use_p));
+	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+	      if (skip)
+		break;
+	    }
+	}
+
+      /* If we found the defining statement of a something that's part of the
+	 chain then expand the chain with the new SSA_VARs being used.  */
+      if (!skip && (chain->contains (dest) || move))
+	{
+	  move = true;
+	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
+	    {
+	      tree var = gimple_arg (stmt, x);
+	      if (TREE_CODE (var) == SSA_NAME)
+		{
+		  if (fixed->contains (dest))
+		    {
+		      move = false;
+		      fixed->add (var);
+		    }
+		  else
+		    chain->add (var);
+		}
+	      else
+		{
+		  use_operand_p use_p;
+		  ssa_op_iter iter;
+		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+		    {
+		      tree op = USE_FROM_PTR (use_p);
+		      gcc_assert (TREE_CODE (op) == SSA_NAME);
+		      if (fixed->contains (dest))
+			{
+			  move = false;
+			  fixed->add (op);
+			}
+		      else
+			chain->add (op);
+		    }
+		}
+	    }
+
+	  if (dump_enabled_p ())
+	    {
+	      if (move)
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"found chain %G", stmt);
+	      else
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"ignored chain %G, not single use", stmt);
+	    }
+	}
+
+      if (move)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "==> recording stmt %G", stmt);
+
+	  for (tree ref : loads)
+	    if (stmt_may_clobber_ref_p (stmt, ref, true))
+	      {
+	        if (dump_enabled_p ())
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   "early breaks not supported as memory used"
+				   " may alias.\n");
+	        return false;
+	      }
+
+	  /* If we've moved a VDEF, extract the defining MEM and update
+	     usages of it.   */
+	  tree vdef;
+	  if ((vdef = gimple_vdef (stmt)))
+	    {
+	      /* This statement is to be moved.  */
+	      LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt);
+	      *reaching_vuse = gimple_vuse (stmt);
+	    }
+	}
+    }
+
+  gsi_prev (gstmt);
+
+  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
+				  reaching_vuse, gstmt))
+    return false;
+
+  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
+    {
+      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt);
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "marked statement for vUSE update: %G", stmt);
+    }
+
+  return true;
+}
+
+/* Funcion vect_analyze_early_break_dependences.
+
+   Examime all the data references in the loop and make sure that if we have
+   mulitple exits that we are able to safely move stores such that they become
+   safe for vectorization.  The function also calculates the place where to move
+   the instructions to and computes what the new vUSE chain should be.
+
+   This works in tandem with the CFG that will be produced by
+   slpeel_tree_duplicate_loop_to_edge_cfg later on.  */
+
+static opt_result
+vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
+
+  hash_set<tree> chain, fixed;
+  auto_vec<tree> loads;
+  auto_vec<data_reference *> bases;
+  basic_block dest_bb = NULL;
+  tree vuse = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "loop contains multiple exits, analyzing"
+		     " statement dependencies.\n");
+
+  for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	continue;
+
+      gimple *stmt = STMT_VINFO_STMT (loop_cond_info);
+      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+
+      /* Initiaze the vuse chain with the one at the early break.  */
+      if (!vuse)
+	vuse = gimple_vuse (c);
+
+      if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
+				     &bases, &vuse, &gsi))
+	return opt_result::failure_at (stmt,
+				       "can't safely apply code motion to "
+				       "dependencies of %G to vectorize "
+				       "the early exit.\n", stmt);
+
+      /* Save destination as we go, BB are visited in order and the last one
+	is where statements should be moved to.  */
+      if (!dest_bb)
+	dest_bb = gimple_bb (c);
+      else
+	{
+	  basic_block curr_bb = gimple_bb (c);
+	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+	    dest_bb = curr_bb;
+	}
+    }
+
+  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
+  gcc_assert (dest_bb);
+  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+  /* TODO: Remove? It's useful debug statement but may be too much.  */
+  for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "updated use: %T, mem_ref: %G",
+			 vuse, g);
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "recorded statements to be moved to BB %d\n",
+		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+
+  return opt_result::success ();
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return vect_analyze_early_break_dependences (loop_vinfo);
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a2079c5234033c3d825ea 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
+    return;
+
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
+    {
+      /* Check to see if statement is still required for vect or has been
+	 elided.  */
+      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_info)
+	continue;
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+      update_stmt (stmt);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ());
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse to %T for stmt %G", vuse, p);
+      unlink_stmt_vdef (p);
+      gimple_set_vuse (p, vuse);
+      update_stmt (p);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a91cb8e74c0557e75d1ea2c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+       case vect_early_exit_def:
+	  dump_printf (MSG_NOTE, "early exit\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf78352e29dc9958746fb9c94 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -66,6 +66,7 @@ enum vect_def_type {
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
+  vect_early_exit_def,
   vect_unknown_def_type
 };
 
@@ -888,6 +889,10 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -942,6 +947,20 @@ public:
   /* The controlling loop IV for the scalar loop being vectorized.  This IV
      controls the natural exits of the loop.  */
   edge scalar_loop_iv_exit;
+
+  /* Used to store the list of statements needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  These are stored in order that they need
+     to be moved.  */
+  auto_vec<gimple *> early_break_conflict;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<gimple*> early_break_vuses;
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -996,6 +1015,10 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (23 preceding siblings ...)
  2023-11-06  7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
@ 2023-11-06  7:38 ` Tamar Christina
  2023-11-15  0:00   ` Tamar Christina
  2023-11-06  7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
                   ` (17 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 16406 bytes --]

Hi All,

This splits the part of the function that does peeling for loops at exits to
a different function.  In this new function we also peel for early breaks.

Peeling for early breaks works by redirecting all early break exits to a
single "early break" block and combine them and the normal exit edge together
later in a different block which then goes into the epilog preheader.

This allows us to re-use all the existing code for IV updates, Additionally this
also enables correct linking for multiple vector epilogues.

flush_pending_stmts cannot be used in this scenario since it updates the PHI
nodes in the order that they are in the exit destination blocks.  This means
they are in CFG visit order.  With a single exit this doesn't matter but with
multiple exits with different live values through the different exits the order
usually does not line up.

Additionally the vectorizer helper functions expect to be able to iterate over
the nodes in the order that they occur in the loop header blocks.  This is an
invariant we must maintain.  To do this we just inline the work
flush_pending_stmts but maintain the order by using the header blocks to guide
the work.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop-manip.cc (vect_is_loop_exit_latch_pred): New.
	(slpeel_tree_duplicate_loop_for_vectorization): New.
	(slpeel_tree_duplicate_loop_to_edge_cfg): use it.
	* tree-vectorizer.h (is_loop_header_bb_p): Drop assert.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Update signature.
	(vect_is_loop_exit_latch_pred): New.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 43ca985c53ce58aa83fb9689a9ea9b20b207e0a8..6fbb5b80986fd657814b48eb009b52b094f331e6 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1444,6 +1444,151 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
 		     get_current_def (PHI_ARG_DEF_FROM_EDGE (from_phi, from)));
 }
 
+/* Determine if the exit choosen by the loop vectorizer differs from the
+   natural loop exit.  i.e. if the exit leads to the loop patch or not.
+   When this happens we need to flip the understanding of main and other
+   exits by peeling and IV updates.  */
+
+bool
+vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
+{
+  return single_pred (loop->latch) == loop_exit->src;
+}
+
+/* Perform peeling for when the peeled loop is placed after the original loop.
+   This maintains LCSSA and creates the appropriate blocks for multiple exit
+   vectorization.   */
+
+void static
+slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
+					      vec<edge> &loop_exits, edge e,
+					      class loop *new_loop,
+					      bool flow_loops,
+					      basic_block new_preheader)
+{
+  bool multiple_exits_p = loop_exits.length () > 1;
+  basic_block main_loop_exit_block = new_preheader;
+  if (multiple_exits_p)
+    {
+      edge loop_entry = single_succ_edge (new_preheader);
+      new_preheader = split_edge (loop_entry);
+    }
+
+  /* First create the empty phi nodes so that when we flush the
+     statements they can be filled in.   However because there is no order
+     between the PHI nodes in the exits and the loop headers we need to
+     order them base on the order of the two headers.  First record the new
+     phi nodes. Then redirect the edges and flush the changes.  This writes out the new
+    SSA names.  */
+  for (auto exit : loop_exits)
+    {
+      basic_block dest
+	= exit == loop_exit ? main_loop_exit_block : new_preheader;
+      redirect_edge_and_branch (exit, dest);
+    }
+
+  /* Copy the current loop LC PHI nodes between the original loop exit
+     block and the new loop header.  This allows us to later split the
+     preheader block and still find the right LC nodes.  */
+  edge loop_entry = single_succ_edge (new_preheader);
+  hash_set<tree> lcssa_vars;
+  if (flow_loops)
+    for (auto gsi_from = gsi_start_phis (loop->header),
+	 gsi_to = gsi_start_phis (new_loop->header);
+	 !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	 gsi_next (&gsi_from), gsi_next (&gsi_to))
+      {
+	gimple *from_phi = gsi_stmt (gsi_from);
+	gimple *to_phi = gsi_stmt (gsi_to);
+	tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, loop_latch_edge (loop));
+
+	/* In all cases, even in early break situations we're only
+	   interested in the number of fully executed loop iters.  As such
+	   we discard any partially done iteration.  So we simply propagate
+	   the phi nodes from the latch to the merge block.  */
+	tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	gphi *lcssa_phi = create_phi_node (new_res, main_loop_exit_block);
+
+	/* Check if we haven't picked a different loop exit.  If we have we
+	   need to flip the LCSSA vars to prevent incorrect linking.  */
+	tree alt_arg = gimple_phi_result (from_phi);
+	if (!vect_is_loop_exit_latch_pred (loop_exit, loop))
+	  std::swap (new_arg, alt_arg);
+
+	lcssa_vars.add (new_arg);
+
+	/* Main loop exit should use the final iter value.  */
+	add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
+
+	/* All other exits use the previous iters.  */
+	if (multiple_exits_p)
+	  {
+	    tree alt_res = copy_ssa_name (alt_arg);
+	    gphi *alt_lcssa_phi = create_phi_node (alt_res, new_preheader);
+	    edge main_e = single_succ_edge (main_loop_exit_block);
+	    for (edge e : loop_exits)
+	      if (e != loop_exit)
+		{
+		  add_phi_arg (alt_lcssa_phi, alt_arg, e, UNKNOWN_LOCATION);
+		  SET_PHI_ARG_DEF (alt_lcssa_phi, main_e->dest_idx, new_res);
+		}
+	    new_res = alt_res; /* Push it down to the new_loop header.  */
+	  }
+
+	adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
+    }
+
+  /* Copy over any live SSA vars that may not have been materialized in the
+     loops themselves but would be in the exit block.  However when the live
+     value is not used inside the loop then we don't need to do this,  if we do
+     then when we split the guard block the branch edge can end up containing the
+     wrong reference,  particularly if it shares an edge with something that has
+     bypassed the loop.  This is not something peeling can check so we need to
+     anticipate the usage of the live variable here.  */
+  auto exit_map = redirect_edge_var_map_vector (loop_exit);
+  if (exit_map)
+    for (auto vm : exit_map)
+      {
+	if (lcssa_vars.contains (vm.def)
+	    || TREE_CODE (vm.def) != SSA_NAME)
+	  continue;
+
+	imm_use_iterator imm_iter;
+	use_operand_p use_p;
+	bool use_in_loop = false;
+
+	FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def)
+	  {
+	    basic_block bb = gimple_bb (USE_STMT (use_p));
+	    if (flow_bb_inside_loop_p (loop, bb)
+		&& !gimple_vuse (USE_STMT (use_p)))
+	      {
+		use_in_loop = true;
+		break;
+	      }
+	  }
+
+	if (!use_in_loop && SSA_VAR_P (vm.def))
+	  {
+	    /* Do a final check to see if it's perhaps defined in the loop.
+	       This mirrors the relevancy analysis's used_outside_scope.  */
+	    if (virtual_operand_p (vm.def)
+		&& (SSA_NAME_IS_DEFAULT_DEF (vm.def)
+		    || !flow_bb_inside_loop_p (loop,
+				gimple_bb (SSA_NAME_DEF_STMT (vm.def)))))
+	      continue;
+	  }
+
+	tree new_res = copy_ssa_name (vm.result);
+	gphi *lcssa_phi = create_phi_node (new_res, e->dest);
+	add_phi_arg (lcssa_phi, vm.def, loop_exit, vm.locus);
+    }
+
+  /* Now clear all the redirect maps.  */
+  for (auto exit : loop_exits)
+    redirect_edge_var_map_clear (exit);
+}
+
 /* Given LOOP this function generates a new copy of it and puts it
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
@@ -1455,13 +1600,16 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
    copies remains the same.
 
    If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
-   dominators were updated during the peeling.  */
+   dominators were updated during the peeling.  When doing early break vectorization
+   then LOOP_VINFO needs to be provided and is used to keep track of any newly created
+   memory references that need to be updated should we decide to vectorize.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 					class loop *scalar_loop,
 					edge scalar_exit, edge e, edge *new_e,
-					bool flow_loops)
+					bool flow_loops,
+					vec<basic_block> *updated_doms)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1593,7 +1741,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
     }
 
   auto loop_exits = get_loop_exit_edges (loop);
+  bool multiple_exits_p = loop_exits.length () > 1;
   auto_vec<basic_block> doms;
+  class loop *update_loop = NULL;
 
   if (at_exit) /* Add the loop copy at exit.  */
     {
@@ -1603,103 +1753,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	  flush_pending_stmts (new_exit);
 	}
 
-      auto_vec <gimple *> new_phis;
-      hash_map <tree, tree> new_phi_args;
-      /* First create the empty phi nodes so that when we flush the
-	 statements they can be filled in.   However because there is no order
-	 between the PHI nodes in the exits and the loop headers we need to
-	 order them base on the order of the two headers.  First record the new
-	 phi nodes.  */
-      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
-	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
-	{
-	  gimple *from_phi = gsi_stmt (gsi_from);
-	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	  gphi *res = create_phi_node (new_res, new_preheader);
-	  new_phis.safe_push (res);
-	}
-
-      /* Then redirect the edges and flush the changes.  This writes out the new
-	 SSA names.  */
-      for (edge exit : loop_exits)
-	{
-	  edge temp_e = redirect_edge_and_branch (exit, new_preheader);
-	  flush_pending_stmts (temp_e);
-	}
-      /* Record the new SSA names in the cache so that we can skip materializing
-	 them again when we fill in the rest of the LCSSA variables.  */
-      for (auto phi : new_phis)
-	{
-	  tree new_arg = gimple_phi_arg (phi, 0)->def;
-
-	  if (!SSA_VAR_P (new_arg))
-	    continue;
-	  /* If the PHI MEM node dominates the loop then we shouldn't create
-	      a new LC-SSSA PHI for it in the intermediate block.   */
-	  /* A MEM phi that consitutes a new DEF for the vUSE chain can either
-	     be a .VDEF or a PHI that operates on MEM. And said definition
-	     must not be inside the main loop.  Or we must be a parameter.
-	     In the last two cases we may remove a non-MEM PHI node, but since
-	     they dominate both loops the removal is unlikely to cause trouble
-	     as the exits must already be using them.  */
-	  if (virtual_operand_p (new_arg)
-	      && (SSA_NAME_IS_DEFAULT_DEF (new_arg)
-		  || !flow_bb_inside_loop_p (loop,
-				gimple_bb (SSA_NAME_DEF_STMT (new_arg)))))
-	    {
-	      auto gsi = gsi_for_stmt (phi);
-	      remove_phi_node (&gsi, true);
-	      continue;
-	    }
-	  new_phi_args.put (new_arg, gimple_phi_result (phi));
-
-	  if (TREE_CODE (new_arg) != SSA_NAME)
-	    continue;
-	  /* If the PHI node dominates the loop then we shouldn't create
-	      a new LC-SSSA PHI for it in the intermediate block.  Unless the
-	      the loop has been versioned.  If it has then we need the PHI
-	      node such that later when the loop guard is added the original
-	      dominating PHI can be found.  */
-	  basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (new_arg));
-	  if (loop == scalar_loop
-	      && (!def_bb || !flow_bb_inside_loop_p (loop, def_bb)))
-	    {
-	      auto gsi = gsi_for_stmt (phi);
-	      remove_phi_node (&gsi, true);
-	    }
-	}
-
-      /* Copy the current loop LC PHI nodes between the original loop exit
-	 block and the new loop header.  This allows us to later split the
-	 preheader block and still find the right LC nodes.  */
-      edge loop_entry = single_succ_edge (new_preheader);
-      if (flow_loops)
-	for (auto gsi_from = gsi_start_phis (loop->header),
-	     gsi_to = gsi_start_phis (new_loop->header);
-	     !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
-	     gsi_next (&gsi_from), gsi_next (&gsi_to))
-	  {
-	    gimple *from_phi = gsi_stmt (gsi_from);
-	    gimple *to_phi = gsi_stmt (gsi_to);
-	    tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
-						  loop_latch_edge (loop));
-
-	    /* Check if we've already created a new phi node during edge
-	       redirection.  If we have, only propagate the value downwards.  */
-	    if (tree *res = new_phi_args.get (new_arg))
-	      {
-		adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
-		continue;
-	      }
-
-	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
-
-	    /* Main loop exit should use the final iter value.  */
-	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
-
-	    adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
-	  }
+      slpeel_tree_duplicate_loop_for_vectorization (loop, loop_exit, loop_exits,
+						    e, new_loop, flow_loops,
+						    new_preheader);
 
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
 
@@ -1713,6 +1769,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (preheader);
       set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
 			       loop_preheader_edge (scalar_loop)->src);
+
+      /* Finally after wiring the new epilogue we need to update its main exit
+	 to the original function exit we recorded.  Other exits are already
+	 correct.  */
+      if (multiple_exits_p)
+	{
+	  update_loop = new_loop;
+	  for (edge e : get_loop_exit_edges (loop))
+	    doms.safe_push (e->dest);
+	  doms.safe_push (exit_dest);
+
+	  /* Likely a fall-through edge, so update if needed.  */
+	  if (single_succ_p (exit_dest))
+	    doms.safe_push (single_succ (exit_dest));
+	}
     }
   else /* Add the copy at entry.  */
     {
@@ -1760,6 +1831,34 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (new_preheader);
       set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
 			       loop_preheader_edge (new_loop)->src);
+
+      if (multiple_exits_p)
+	update_loop = loop;
+    }
+
+  if (multiple_exits_p)
+    {
+      for (edge e : get_loop_exit_edges (update_loop))
+	{
+	  edge ex;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
+	    {
+	      /* Find the first non-fallthrough block as fall-throughs can't
+		 dominate other blocks.  */
+	      if (single_succ_p (ex->dest))
+		{
+		  doms.safe_push (ex->dest);
+		  ex = single_succ_edge (ex->dest);
+		}
+	      doms.safe_push (ex->dest);
+	    }
+	  doms.safe_push (e->dest);
+	}
+
+      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+      if (updated_doms)
+	updated_doms->safe_splice (doms);
     }
 
   free (new_bbs);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 1418913d2c308b0cf78352e29dc9958746fb9c94..d8b532c4b8ca92a856368a686598859fab9d40e9 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1821,7 +1821,7 @@ is_loop_header_bb_p (basic_block bb)
 {
   if (bb == (bb->loop_father)->header)
     return true;
-  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
+
   return false;
 }
 
@@ -2212,7 +2212,8 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
 					 const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
 						    class loop *, edge,
-						    edge, edge *, bool = true);
+						    edge, edge *, bool = true,
+						    vec<basic_block> * = NULL);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,
@@ -2223,6 +2224,7 @@ extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
+extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,




-- 

[-- Attachment #2: rb17964.patch --]
[-- Type: text/plain, Size: 14758 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 43ca985c53ce58aa83fb9689a9ea9b20b207e0a8..6fbb5b80986fd657814b48eb009b52b094f331e6 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1444,6 +1444,151 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
 		     get_current_def (PHI_ARG_DEF_FROM_EDGE (from_phi, from)));
 }
 
+/* Determine if the exit choosen by the loop vectorizer differs from the
+   natural loop exit.  i.e. if the exit leads to the loop patch or not.
+   When this happens we need to flip the understanding of main and other
+   exits by peeling and IV updates.  */
+
+bool
+vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
+{
+  return single_pred (loop->latch) == loop_exit->src;
+}
+
+/* Perform peeling for when the peeled loop is placed after the original loop.
+   This maintains LCSSA and creates the appropriate blocks for multiple exit
+   vectorization.   */
+
+void static
+slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
+					      vec<edge> &loop_exits, edge e,
+					      class loop *new_loop,
+					      bool flow_loops,
+					      basic_block new_preheader)
+{
+  bool multiple_exits_p = loop_exits.length () > 1;
+  basic_block main_loop_exit_block = new_preheader;
+  if (multiple_exits_p)
+    {
+      edge loop_entry = single_succ_edge (new_preheader);
+      new_preheader = split_edge (loop_entry);
+    }
+
+  /* First create the empty phi nodes so that when we flush the
+     statements they can be filled in.   However because there is no order
+     between the PHI nodes in the exits and the loop headers we need to
+     order them base on the order of the two headers.  First record the new
+     phi nodes. Then redirect the edges and flush the changes.  This writes out the new
+    SSA names.  */
+  for (auto exit : loop_exits)
+    {
+      basic_block dest
+	= exit == loop_exit ? main_loop_exit_block : new_preheader;
+      redirect_edge_and_branch (exit, dest);
+    }
+
+  /* Copy the current loop LC PHI nodes between the original loop exit
+     block and the new loop header.  This allows us to later split the
+     preheader block and still find the right LC nodes.  */
+  edge loop_entry = single_succ_edge (new_preheader);
+  hash_set<tree> lcssa_vars;
+  if (flow_loops)
+    for (auto gsi_from = gsi_start_phis (loop->header),
+	 gsi_to = gsi_start_phis (new_loop->header);
+	 !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	 gsi_next (&gsi_from), gsi_next (&gsi_to))
+      {
+	gimple *from_phi = gsi_stmt (gsi_from);
+	gimple *to_phi = gsi_stmt (gsi_to);
+	tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, loop_latch_edge (loop));
+
+	/* In all cases, even in early break situations we're only
+	   interested in the number of fully executed loop iters.  As such
+	   we discard any partially done iteration.  So we simply propagate
+	   the phi nodes from the latch to the merge block.  */
+	tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	gphi *lcssa_phi = create_phi_node (new_res, main_loop_exit_block);
+
+	/* Check if we haven't picked a different loop exit.  If we have we
+	   need to flip the LCSSA vars to prevent incorrect linking.  */
+	tree alt_arg = gimple_phi_result (from_phi);
+	if (!vect_is_loop_exit_latch_pred (loop_exit, loop))
+	  std::swap (new_arg, alt_arg);
+
+	lcssa_vars.add (new_arg);
+
+	/* Main loop exit should use the final iter value.  */
+	add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
+
+	/* All other exits use the previous iters.  */
+	if (multiple_exits_p)
+	  {
+	    tree alt_res = copy_ssa_name (alt_arg);
+	    gphi *alt_lcssa_phi = create_phi_node (alt_res, new_preheader);
+	    edge main_e = single_succ_edge (main_loop_exit_block);
+	    for (edge e : loop_exits)
+	      if (e != loop_exit)
+		{
+		  add_phi_arg (alt_lcssa_phi, alt_arg, e, UNKNOWN_LOCATION);
+		  SET_PHI_ARG_DEF (alt_lcssa_phi, main_e->dest_idx, new_res);
+		}
+	    new_res = alt_res; /* Push it down to the new_loop header.  */
+	  }
+
+	adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
+    }
+
+  /* Copy over any live SSA vars that may not have been materialized in the
+     loops themselves but would be in the exit block.  However when the live
+     value is not used inside the loop then we don't need to do this,  if we do
+     then when we split the guard block the branch edge can end up containing the
+     wrong reference,  particularly if it shares an edge with something that has
+     bypassed the loop.  This is not something peeling can check so we need to
+     anticipate the usage of the live variable here.  */
+  auto exit_map = redirect_edge_var_map_vector (loop_exit);
+  if (exit_map)
+    for (auto vm : exit_map)
+      {
+	if (lcssa_vars.contains (vm.def)
+	    || TREE_CODE (vm.def) != SSA_NAME)
+	  continue;
+
+	imm_use_iterator imm_iter;
+	use_operand_p use_p;
+	bool use_in_loop = false;
+
+	FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def)
+	  {
+	    basic_block bb = gimple_bb (USE_STMT (use_p));
+	    if (flow_bb_inside_loop_p (loop, bb)
+		&& !gimple_vuse (USE_STMT (use_p)))
+	      {
+		use_in_loop = true;
+		break;
+	      }
+	  }
+
+	if (!use_in_loop && SSA_VAR_P (vm.def))
+	  {
+	    /* Do a final check to see if it's perhaps defined in the loop.
+	       This mirrors the relevancy analysis's used_outside_scope.  */
+	    if (virtual_operand_p (vm.def)
+		&& (SSA_NAME_IS_DEFAULT_DEF (vm.def)
+		    || !flow_bb_inside_loop_p (loop,
+				gimple_bb (SSA_NAME_DEF_STMT (vm.def)))))
+	      continue;
+	  }
+
+	tree new_res = copy_ssa_name (vm.result);
+	gphi *lcssa_phi = create_phi_node (new_res, e->dest);
+	add_phi_arg (lcssa_phi, vm.def, loop_exit, vm.locus);
+    }
+
+  /* Now clear all the redirect maps.  */
+  for (auto exit : loop_exits)
+    redirect_edge_var_map_clear (exit);
+}
+
 /* Given LOOP this function generates a new copy of it and puts it
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
@@ -1455,13 +1600,16 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
    copies remains the same.
 
    If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
-   dominators were updated during the peeling.  */
+   dominators were updated during the peeling.  When doing early break vectorization
+   then LOOP_VINFO needs to be provided and is used to keep track of any newly created
+   memory references that need to be updated should we decide to vectorize.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 					class loop *scalar_loop,
 					edge scalar_exit, edge e, edge *new_e,
-					bool flow_loops)
+					bool flow_loops,
+					vec<basic_block> *updated_doms)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1593,7 +1741,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
     }
 
   auto loop_exits = get_loop_exit_edges (loop);
+  bool multiple_exits_p = loop_exits.length () > 1;
   auto_vec<basic_block> doms;
+  class loop *update_loop = NULL;
 
   if (at_exit) /* Add the loop copy at exit.  */
     {
@@ -1603,103 +1753,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	  flush_pending_stmts (new_exit);
 	}
 
-      auto_vec <gimple *> new_phis;
-      hash_map <tree, tree> new_phi_args;
-      /* First create the empty phi nodes so that when we flush the
-	 statements they can be filled in.   However because there is no order
-	 between the PHI nodes in the exits and the loop headers we need to
-	 order them base on the order of the two headers.  First record the new
-	 phi nodes.  */
-      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
-	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
-	{
-	  gimple *from_phi = gsi_stmt (gsi_from);
-	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	  gphi *res = create_phi_node (new_res, new_preheader);
-	  new_phis.safe_push (res);
-	}
-
-      /* Then redirect the edges and flush the changes.  This writes out the new
-	 SSA names.  */
-      for (edge exit : loop_exits)
-	{
-	  edge temp_e = redirect_edge_and_branch (exit, new_preheader);
-	  flush_pending_stmts (temp_e);
-	}
-      /* Record the new SSA names in the cache so that we can skip materializing
-	 them again when we fill in the rest of the LCSSA variables.  */
-      for (auto phi : new_phis)
-	{
-	  tree new_arg = gimple_phi_arg (phi, 0)->def;
-
-	  if (!SSA_VAR_P (new_arg))
-	    continue;
-	  /* If the PHI MEM node dominates the loop then we shouldn't create
-	      a new LC-SSSA PHI for it in the intermediate block.   */
-	  /* A MEM phi that consitutes a new DEF for the vUSE chain can either
-	     be a .VDEF or a PHI that operates on MEM. And said definition
-	     must not be inside the main loop.  Or we must be a parameter.
-	     In the last two cases we may remove a non-MEM PHI node, but since
-	     they dominate both loops the removal is unlikely to cause trouble
-	     as the exits must already be using them.  */
-	  if (virtual_operand_p (new_arg)
-	      && (SSA_NAME_IS_DEFAULT_DEF (new_arg)
-		  || !flow_bb_inside_loop_p (loop,
-				gimple_bb (SSA_NAME_DEF_STMT (new_arg)))))
-	    {
-	      auto gsi = gsi_for_stmt (phi);
-	      remove_phi_node (&gsi, true);
-	      continue;
-	    }
-	  new_phi_args.put (new_arg, gimple_phi_result (phi));
-
-	  if (TREE_CODE (new_arg) != SSA_NAME)
-	    continue;
-	  /* If the PHI node dominates the loop then we shouldn't create
-	      a new LC-SSSA PHI for it in the intermediate block.  Unless the
-	      the loop has been versioned.  If it has then we need the PHI
-	      node such that later when the loop guard is added the original
-	      dominating PHI can be found.  */
-	  basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (new_arg));
-	  if (loop == scalar_loop
-	      && (!def_bb || !flow_bb_inside_loop_p (loop, def_bb)))
-	    {
-	      auto gsi = gsi_for_stmt (phi);
-	      remove_phi_node (&gsi, true);
-	    }
-	}
-
-      /* Copy the current loop LC PHI nodes between the original loop exit
-	 block and the new loop header.  This allows us to later split the
-	 preheader block and still find the right LC nodes.  */
-      edge loop_entry = single_succ_edge (new_preheader);
-      if (flow_loops)
-	for (auto gsi_from = gsi_start_phis (loop->header),
-	     gsi_to = gsi_start_phis (new_loop->header);
-	     !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
-	     gsi_next (&gsi_from), gsi_next (&gsi_to))
-	  {
-	    gimple *from_phi = gsi_stmt (gsi_from);
-	    gimple *to_phi = gsi_stmt (gsi_to);
-	    tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
-						  loop_latch_edge (loop));
-
-	    /* Check if we've already created a new phi node during edge
-	       redirection.  If we have, only propagate the value downwards.  */
-	    if (tree *res = new_phi_args.get (new_arg))
-	      {
-		adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
-		continue;
-	      }
-
-	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
-
-	    /* Main loop exit should use the final iter value.  */
-	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
-
-	    adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
-	  }
+      slpeel_tree_duplicate_loop_for_vectorization (loop, loop_exit, loop_exits,
+						    e, new_loop, flow_loops,
+						    new_preheader);
 
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
 
@@ -1713,6 +1769,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (preheader);
       set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
 			       loop_preheader_edge (scalar_loop)->src);
+
+      /* Finally after wiring the new epilogue we need to update its main exit
+	 to the original function exit we recorded.  Other exits are already
+	 correct.  */
+      if (multiple_exits_p)
+	{
+	  update_loop = new_loop;
+	  for (edge e : get_loop_exit_edges (loop))
+	    doms.safe_push (e->dest);
+	  doms.safe_push (exit_dest);
+
+	  /* Likely a fall-through edge, so update if needed.  */
+	  if (single_succ_p (exit_dest))
+	    doms.safe_push (single_succ (exit_dest));
+	}
     }
   else /* Add the copy at entry.  */
     {
@@ -1760,6 +1831,34 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (new_preheader);
       set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
 			       loop_preheader_edge (new_loop)->src);
+
+      if (multiple_exits_p)
+	update_loop = loop;
+    }
+
+  if (multiple_exits_p)
+    {
+      for (edge e : get_loop_exit_edges (update_loop))
+	{
+	  edge ex;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
+	    {
+	      /* Find the first non-fallthrough block as fall-throughs can't
+		 dominate other blocks.  */
+	      if (single_succ_p (ex->dest))
+		{
+		  doms.safe_push (ex->dest);
+		  ex = single_succ_edge (ex->dest);
+		}
+	      doms.safe_push (ex->dest);
+	    }
+	  doms.safe_push (e->dest);
+	}
+
+      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+      if (updated_doms)
+	updated_doms->safe_splice (doms);
     }
 
   free (new_bbs);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 1418913d2c308b0cf78352e29dc9958746fb9c94..d8b532c4b8ca92a856368a686598859fab9d40e9 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1821,7 +1821,7 @@ is_loop_header_bb_p (basic_block bb)
 {
   if (bb == (bb->loop_father)->header)
     return true;
-  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
+
   return false;
 }
 
@@ -2212,7 +2212,8 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
 					 const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
 						    class loop *, edge,
-						    edge, edge *, bool = true);
+						    edge, edge *, bool = true,
+						    vec<basic_block> * = NULL);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,
@@ -2223,6 +2224,7 @@ extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
+extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (24 preceding siblings ...)
  2023-11-06  7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
@ 2023-11-06  7:38 ` Tamar Christina
  2023-11-07 15:04   ` Richard Biener
  2023-11-06  7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
                   ` (16 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 5038 bytes --]

Hi All,

As requested, the vectorizer is now free to pick it's own exit which can be
different than what the loop CFG infrastucture uses.  The vectorizer makes use
of this to vectorize loops that it previously could not.

But this means that loop control must be materialized in the block that needs it
less we corrupt the SSA chain.  This makes it so we use the vectorizer's main
IV block instead of the loop infra.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-ssa-loop-manip.cc (standard_iv_increment_position): Conditionally
	take dest BB.
	* tree-ssa-loop-manip.h (standard_iv_increment_position): Likewise.
	* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Use it.
	(vect_set_loop_condition_partial_vectors_avx512): Likewise.
	(vect_set_loop_condition_normal): Likewise.

--- inline copy of patch -- 
diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index bda09f51d5619420331c513a9906831c779fd2b4..5938588c8882d842b00301423df111cbe7bf7ba8 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -38,7 +38,8 @@ extern basic_block split_loop_exit_edge (edge, bool = false);
 extern basic_block ip_end_pos (class loop *);
 extern basic_block ip_normal_pos (class loop *);
 extern void standard_iv_increment_position (class loop *,
-					    gimple_stmt_iterator *, bool *);
+					    gimple_stmt_iterator *, bool *,
+					    basic_block = NULL);
 extern bool
 gimple_duplicate_loop_body_to_header_edge (class loop *, edge, unsigned int,
 					   sbitmap, edge, vec<edge> *, int);
diff --git a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc
index e7436915e01297e7af2a3bcf1afd01e014de6f32..bdc7a3d74a788f450ca5dde6c29492ce4d4e4550 100644
--- a/gcc/tree-ssa-loop-manip.cc
+++ b/gcc/tree-ssa-loop-manip.cc
@@ -792,14 +792,19 @@ ip_normal_pos (class loop *loop)
 
 /* Stores the standard position for induction variable increment in LOOP
    (just before the exit condition if it is available and latch block is empty,
-   end of the latch block otherwise) to BSI.  INSERT_AFTER is set to true if
-   the increment should be inserted after *BSI.  */
+   end of the latch block otherwise) to BSI.  If DEST_BB is specified then that
+   basic block is used as the destination instead of the loop latch source
+   block.  INSERT_AFTER is set to true if the increment should be inserted after
+   *BSI.  */
 
 void
 standard_iv_increment_position (class loop *loop, gimple_stmt_iterator *bsi,
-				bool *insert_after)
+				bool *insert_after, basic_block dest_bb)
 {
-  basic_block bb = ip_normal_pos (loop), latch = ip_end_pos (loop);
+  basic_block bb = dest_bb;
+  if (!bb)
+    bb = ip_normal_pos (loop);
+  basic_block latch = ip_end_pos (loop);
   gimple *last = last_nondebug_stmt (latch);
 
   if (!bb
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 6fbb5b80986fd657814b48eb009b52b094f331e6..3d59119787d6afdc5a6465a547d1ea2d3d940373 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -531,7 +531,8 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
   tree index_before_incr, index_after_incr;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  standard_iv_increment_position (loop, &incr_gsi, &insert_after, exit_e->src);
   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
     {
       /* Create an IV that counts down from niters_total and whose step
@@ -1017,7 +1018,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
   tree index_before_incr, index_after_incr;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
+				  exit_edge->src);
   create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
 	     &incr_gsi, insert_after, &index_before_incr,
 	     &index_after_incr);
@@ -1185,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1278,7 +1280,8 @@ vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
 	}
     }
 
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
+				  exit_edge->src);
   create_iv (init, PLUS_EXPR, step, NULL_TREE, loop,
              &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
   indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, indx_after_incr,




-- 

[-- Attachment #2: rb17965.patch --]
[-- Type: text/plain, Size: 4139 bytes --]

diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
index bda09f51d5619420331c513a9906831c779fd2b4..5938588c8882d842b00301423df111cbe7bf7ba8 100644
--- a/gcc/tree-ssa-loop-manip.h
+++ b/gcc/tree-ssa-loop-manip.h
@@ -38,7 +38,8 @@ extern basic_block split_loop_exit_edge (edge, bool = false);
 extern basic_block ip_end_pos (class loop *);
 extern basic_block ip_normal_pos (class loop *);
 extern void standard_iv_increment_position (class loop *,
-					    gimple_stmt_iterator *, bool *);
+					    gimple_stmt_iterator *, bool *,
+					    basic_block = NULL);
 extern bool
 gimple_duplicate_loop_body_to_header_edge (class loop *, edge, unsigned int,
 					   sbitmap, edge, vec<edge> *, int);
diff --git a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc
index e7436915e01297e7af2a3bcf1afd01e014de6f32..bdc7a3d74a788f450ca5dde6c29492ce4d4e4550 100644
--- a/gcc/tree-ssa-loop-manip.cc
+++ b/gcc/tree-ssa-loop-manip.cc
@@ -792,14 +792,19 @@ ip_normal_pos (class loop *loop)
 
 /* Stores the standard position for induction variable increment in LOOP
    (just before the exit condition if it is available and latch block is empty,
-   end of the latch block otherwise) to BSI.  INSERT_AFTER is set to true if
-   the increment should be inserted after *BSI.  */
+   end of the latch block otherwise) to BSI.  If DEST_BB is specified then that
+   basic block is used as the destination instead of the loop latch source
+   block.  INSERT_AFTER is set to true if the increment should be inserted after
+   *BSI.  */
 
 void
 standard_iv_increment_position (class loop *loop, gimple_stmt_iterator *bsi,
-				bool *insert_after)
+				bool *insert_after, basic_block dest_bb)
 {
-  basic_block bb = ip_normal_pos (loop), latch = ip_end_pos (loop);
+  basic_block bb = dest_bb;
+  if (!bb)
+    bb = ip_normal_pos (loop);
+  basic_block latch = ip_end_pos (loop);
   gimple *last = last_nondebug_stmt (latch);
 
   if (!bb
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 6fbb5b80986fd657814b48eb009b52b094f331e6..3d59119787d6afdc5a6465a547d1ea2d3d940373 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -531,7 +531,8 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
   tree index_before_incr, index_after_incr;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  standard_iv_increment_position (loop, &incr_gsi, &insert_after, exit_e->src);
   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
     {
       /* Create an IV that counts down from niters_total and whose step
@@ -1017,7 +1018,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
   tree index_before_incr, index_after_incr;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
+				  exit_edge->src);
   create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
 	     &incr_gsi, insert_after, &index_before_incr,
 	     &index_after_incr);
@@ -1185,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1278,7 +1280,8 @@ vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
 	}
     }
 
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
+				  exit_edge->src);
   create_iv (init, PLUS_EXPR, step, NULL_TREE, loop,
              &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
   indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, indx_after_incr,




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 6/21]middle-end: support multiple exits in loop versioning
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (25 preceding siblings ...)
  2023-11-06  7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
@ 2023-11-06  7:38 ` Tamar Christina
  2023-11-07 14:54   ` Richard Biener
  2023-11-06  7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
                   ` (15 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]

Hi All,

This has loop versioning use the vectorizer's IV exit edge when it's available
since single_exit (..) fails with multiple exits.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop-manip.cc (vect_loop_versioning): Support multiple
	exits.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3d59119787d6afdc5a6465a547d1ea2d3d940373..58b4b9c11d8b844ee86156cdfcba7f838030a7c2 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -4180,12 +4180,24 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
 	 If loop versioning wasn't done from loop, but scalar_loop instead,
 	 merge_bb will have already just a single successor.  */
 
-      merge_bb = single_exit (loop_to_version)->dest;
+      /* Due to the single_exit check above we should only get here when
+	 loop == loop_to_version, that means we can use loop_vinfo to get the
+	 exits.  */
+      edge exit_edge = single_exit (loop_to_version);
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* In early exits the main exit will fail into the merge block of the
+	     alternative exits.  So we need the single successor of the main
+	     exit here to find the merge block.  */
+	  exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
+	}
+      gcc_assert (exit_edge);
+      merge_bb = exit_edge->dest;
       if (EDGE_COUNT (merge_bb->preds) >= 2)
 	{
 	  gcc_assert (EDGE_COUNT (merge_bb->preds) >= 2);
-	  new_exit_bb = split_edge (single_exit (loop_to_version));
-	  new_exit_e = single_exit (loop_to_version);
+	  new_exit_bb = split_edge (exit_edge);
+	  new_exit_e = exit_edge;
 	  e = EDGE_SUCC (new_exit_bb, 0);
 
 	  for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi);




-- 

[-- Attachment #2: rb17966.patch --]
[-- Type: text/plain, Size: 1454 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3d59119787d6afdc5a6465a547d1ea2d3d940373..58b4b9c11d8b844ee86156cdfcba7f838030a7c2 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -4180,12 +4180,24 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
 	 If loop versioning wasn't done from loop, but scalar_loop instead,
 	 merge_bb will have already just a single successor.  */
 
-      merge_bb = single_exit (loop_to_version)->dest;
+      /* Due to the single_exit check above we should only get here when
+	 loop == loop_to_version, that means we can use loop_vinfo to get the
+	 exits.  */
+      edge exit_edge = single_exit (loop_to_version);
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* In early exits the main exit will fail into the merge block of the
+	     alternative exits.  So we need the single successor of the main
+	     exit here to find the merge block.  */
+	  exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
+	}
+      gcc_assert (exit_edge);
+      merge_bb = exit_edge->dest;
       if (EDGE_COUNT (merge_bb->preds) >= 2)
 	{
 	  gcc_assert (EDGE_COUNT (merge_bb->preds) >= 2);
-	  new_exit_bb = split_edge (single_exit (loop_to_version));
-	  new_exit_e = single_exit (loop_to_version);
+	  new_exit_bb = split_edge (exit_edge);
+	  new_exit_e = exit_edge;
 	  e = EDGE_SUCC (new_exit_bb, 0);
 
 	  for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi);




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (26 preceding siblings ...)
  2023-11-06  7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
@ 2023-11-06  7:39 ` Tamar Christina
  2023-11-15  0:03   ` Tamar Christina
  2023-11-06  7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
                   ` (14 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 9972 bytes --]

Hi All,

This changes the PHI node updates to support early breaks.
It has to support both the case where the loop's exit matches the normal loop
exit and one where the early exit is "inverted", i.e. it's an early exit edge.

In the latter case we must always restart the loop for VF iterations.  For an
early exit the reason is obvious, but there are cases where the "normal" exit
is located before the early one.  This exit then does a check on ivtmp resulting
in us leaving the loop since it thinks we're done.

In these case we may still have side-effects to perform so we also go to the
scalar loop.

For the "normal" exit niters has already been adjusted for peeling, for the
early exits we must find out how many iterations we actually did.  So we have
to recalculate the new position for each exit.

This works, however ./gcc/testsuite/gcc.dg/vect/vect-early-break_76.c is
currently giving me a runtime failure, but I cannot seem to tell why.

The generated control looks correct to me, See loop 1:
https://gist.github.com/Mistuke/78b439de05e303ac6de5438dd83f079b

Any help in pointing out the mistake is appreciated.

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide unused.
	(vect_is_loop_exit_latch_pred): Mark inline
	(vect_update_ivs_after_vectorizer): Support early break.
	(vect_do_peeling): Use it.
	(find_guard_arg): Keep the same value.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 58b4b9c11d8b844ee86156cdfcba7f838030a7c2..abd905b78f3661f80168c3866d7c3e68a9c15521 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1452,7 +1452,7 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
    When this happens we need to flip the understanding of main and other
    exits by peeling and IV updates.  */
 
-bool
+bool inline
 vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
 {
   return single_pred (loop->latch) == loop_exit->src;
@@ -2193,6 +2193,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
      Input:
      - LOOP - a loop that is going to be vectorized. The last few iterations
               of LOOP were peeled.
+     - VF   - The chosen vectorization factor for LOOP.
      - NITERS - the number of iterations that LOOP executes (before it is
                 vectorized). i.e, the number of times the ivs should be bumped.
      - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
@@ -2203,6 +2204,9 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
                   The phi args associated with the edge UPDATE_E in the bb
                   UPDATE_E->dest are updated accordingly.
 
+     - MAIN_EXIT_P - Indicates whether UPDATE_E is twhat the vectorizer
+		     considers the main loop exit.
+
      Assumption 1: Like the rest of the vectorizer, this function assumes
      a single loop exit that has a single predecessor.
 
@@ -2220,18 +2224,21 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
  */
 
 static void
-vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
-				  tree niters, edge update_e)
+vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, poly_uint64 vf,
+				  tree niters, edge update_e, bool main_exit_p)
 {
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
+  bool inversed_iv
+	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
+					 LOOP_VINFO_LOOP (loop_vinfo));
 
-  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-
-  /* Make sure there exists a single-predecessor exit bb:  */
-  gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_succ_edge (exit_bb) == update_e);
+  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  gcond *cond = get_loop_exit_condition (loop_e);
+  basic_block exit_bb = loop_e->dest;
+  basic_block iv_block = NULL;
+  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
        !gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -2241,7 +2248,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       tree step_expr, off;
       tree type;
       tree var, ni, ni_name;
-      gimple_stmt_iterator last_gsi;
 
       gphi *phi = gsi.phi ();
       gphi *phi1 = gsi1.phi ();
@@ -2273,11 +2279,52 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       enum vect_induction_op_type induction_type
 	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
 
-      if (induction_type == vect_step_op_add)
+      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+      /* create_iv always places it on the LHS.  Alternatively we can set a
+	 property during create_iv to identify it.  */
+      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
+      if ((!main_exit_p || inversed_iv) && ivtemp)
+	{
+	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
+	  type = TREE_TYPE (gimple_phi_result (phi));
+	  ni = build_int_cst (type, vf);
+	}
+      else if (!main_exit_p && inversed_iv)
+	continue;
+      else if (induction_type == vect_step_op_add)
 	{
+
 	  tree stype = TREE_TYPE (step_expr);
-	  off = fold_build2 (MULT_EXPR, stype,
-			     fold_convert (stype, niters), step_expr);
+
+	  /* Early exits always use last iter value not niters. */
+	  if (!main_exit_p || (main_exit_p && inversed_iv))
+	    {
+	      /* Live statements in the non-main exit shouldn't be adjusted.  We
+		 normally didn't have this problem with a single exit as live
+		 values would be in the exit block.  However when dealing with
+		 multiple exits all exits are redirected to the merge block
+		 and we restart the iteration.  */
+	      if (STMT_VINFO_LIVE_P (phi_info))
+		continue;
+
+	      /* For early break the final loop IV is:
+		 init + (final - init) * vf which takes into account peeling
+		 values and non-single steps.  The main exit can use niters
+		 since if you exit from the main exit you've done all vector
+		 iterations.  For an early exit we don't know when we exit so we
+		 must re-calculate this on the exit.  */
+	      tree start_expr = gimple_phi_result (phi);
+	      off = fold_build2 (MINUS_EXPR, stype,
+				 fold_convert (stype, start_expr),
+				 fold_convert (stype, init_expr));
+	      /* Now adjust for VF to get the final iteration value.  */
+	      off = fold_build2 (MULT_EXPR, stype, off,
+				 build_int_cst (stype, vf));
+	    }
+	  else
+	    off = fold_build2 (MULT_EXPR, stype,
+			       fold_convert (stype, niters), step_expr);
+
 	  if (POINTER_TYPE_P (type))
 	    ni = fold_build_pointer_plus (init_expr, off);
 	  else
@@ -2289,6 +2336,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       /* Don't bother call vect_peel_nonlinear_iv_init.  */
       else if (induction_type == vect_step_op_neg)
 	ni = init_expr;
+      else if (!main_exit_p)
+	continue;
       else
 	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
 					  niters, step_expr,
@@ -2296,9 +2345,20 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 
       var = create_tmp_var (type, "tmp");
 
-      last_gsi = gsi_last_bb (exit_bb);
       gimple_seq new_stmts = NULL;
       ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+
+      /* For non-main exit create an intermediat edge to get any updated iv
+	 calculations.  */
+      if (!main_exit_p
+	  && !iv_block
+	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p (new_stmts)))
+	{
+	  iv_block = split_edge (update_e);
+	  update_e = single_succ_edge (update_e->dest);
+	  last_gsi = gsi_last_bb (iv_block);
+	}
+
       /* Exit_bb shouldn't be empty.  */
       if (!gsi_end_p (last_gsi))
 	{
@@ -2836,12 +2896,18 @@ find_guard_arg (class loop *loop ATTRIBUTE_UNUSED, const_edge loop_e,
 	 tree var = PHI_ARG_DEF (phi, loop_e->dest_idx);
 	 if (TREE_CODE (var) != SSA_NAME)
 	    continue;
-	 tree def = get_current_def (var);
-	 if (!def)
-	   continue;
-	 if (operand_equal_p (def,
-			      PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
-	   return PHI_RESULT (phi);
+
+	  /* The value could be carried all the way from the loop version block
+	     in which case we wouldn't have kept the value if it's not used in
+	     the loop.  In such cases get_current_def returns null as the value
+	     is already current.  */
+	  tree orig_var = get_current_def (var);
+	  if (!orig_var)
+	    orig_var = var;
+
+	  if (operand_equal_p (orig_var,
+			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
+	    return PHI_RESULT (phi);
 	}
     }
   return NULL_TREE;
@@ -3528,8 +3594,21 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 niters_vector_mult_vf steps.  */
       gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
       update_e = skip_vector ? e : loop_preheader_edge (epilog);
-      vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
-					update_e);
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	update_e = single_succ_edge (e->dest);
+      bool inversed_iv
+	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
+					 LOOP_VINFO_LOOP (loop_vinfo));
+
+      for (auto exit : get_loop_exit_edges (loop))
+	{
+	  bool main_exit_p = vect_is_loop_exit_latch_pred (exit, loop);
+	  edge exit_e = main_exit_p ? update_e : exit;
+	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
+					    niters_vector_mult_vf, exit_e,
+					    main_exit_p);
+
+	}
 
       if (skip_epilog)
 	{




-- 

[-- Attachment #2: rb17967.patch --]
[-- Type: text/plain, Size: 8534 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 58b4b9c11d8b844ee86156cdfcba7f838030a7c2..abd905b78f3661f80168c3866d7c3e68a9c15521 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1452,7 +1452,7 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to)
    When this happens we need to flip the understanding of main and other
    exits by peeling and IV updates.  */
 
-bool
+bool inline
 vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
 {
   return single_pred (loop->latch) == loop_exit->src;
@@ -2193,6 +2193,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
      Input:
      - LOOP - a loop that is going to be vectorized. The last few iterations
               of LOOP were peeled.
+     - VF   - The chosen vectorization factor for LOOP.
      - NITERS - the number of iterations that LOOP executes (before it is
                 vectorized). i.e, the number of times the ivs should be bumped.
      - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
@@ -2203,6 +2204,9 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
                   The phi args associated with the edge UPDATE_E in the bb
                   UPDATE_E->dest are updated accordingly.
 
+     - MAIN_EXIT_P - Indicates whether UPDATE_E is twhat the vectorizer
+		     considers the main loop exit.
+
      Assumption 1: Like the rest of the vectorizer, this function assumes
      a single loop exit that has a single predecessor.
 
@@ -2220,18 +2224,21 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
  */
 
 static void
-vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
-				  tree niters, edge update_e)
+vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, poly_uint64 vf,
+				  tree niters, edge update_e, bool main_exit_p)
 {
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
+  bool inversed_iv
+	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
+					 LOOP_VINFO_LOOP (loop_vinfo));
 
-  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-
-  /* Make sure there exists a single-predecessor exit bb:  */
-  gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_succ_edge (exit_bb) == update_e);
+  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  gcond *cond = get_loop_exit_condition (loop_e);
+  basic_block exit_bb = loop_e->dest;
+  basic_block iv_block = NULL;
+  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
        !gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -2241,7 +2248,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       tree step_expr, off;
       tree type;
       tree var, ni, ni_name;
-      gimple_stmt_iterator last_gsi;
 
       gphi *phi = gsi.phi ();
       gphi *phi1 = gsi1.phi ();
@@ -2273,11 +2279,52 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       enum vect_induction_op_type induction_type
 	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
 
-      if (induction_type == vect_step_op_add)
+      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+      /* create_iv always places it on the LHS.  Alternatively we can set a
+	 property during create_iv to identify it.  */
+      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
+      if ((!main_exit_p || inversed_iv) && ivtemp)
+	{
+	  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info);
+	  type = TREE_TYPE (gimple_phi_result (phi));
+	  ni = build_int_cst (type, vf);
+	}
+      else if (!main_exit_p && inversed_iv)
+	continue;
+      else if (induction_type == vect_step_op_add)
 	{
+
 	  tree stype = TREE_TYPE (step_expr);
-	  off = fold_build2 (MULT_EXPR, stype,
-			     fold_convert (stype, niters), step_expr);
+
+	  /* Early exits always use last iter value not niters. */
+	  if (!main_exit_p || (main_exit_p && inversed_iv))
+	    {
+	      /* Live statements in the non-main exit shouldn't be adjusted.  We
+		 normally didn't have this problem with a single exit as live
+		 values would be in the exit block.  However when dealing with
+		 multiple exits all exits are redirected to the merge block
+		 and we restart the iteration.  */
+	      if (STMT_VINFO_LIVE_P (phi_info))
+		continue;
+
+	      /* For early break the final loop IV is:
+		 init + (final - init) * vf which takes into account peeling
+		 values and non-single steps.  The main exit can use niters
+		 since if you exit from the main exit you've done all vector
+		 iterations.  For an early exit we don't know when we exit so we
+		 must re-calculate this on the exit.  */
+	      tree start_expr = gimple_phi_result (phi);
+	      off = fold_build2 (MINUS_EXPR, stype,
+				 fold_convert (stype, start_expr),
+				 fold_convert (stype, init_expr));
+	      /* Now adjust for VF to get the final iteration value.  */
+	      off = fold_build2 (MULT_EXPR, stype, off,
+				 build_int_cst (stype, vf));
+	    }
+	  else
+	    off = fold_build2 (MULT_EXPR, stype,
+			       fold_convert (stype, niters), step_expr);
+
 	  if (POINTER_TYPE_P (type))
 	    ni = fold_build_pointer_plus (init_expr, off);
 	  else
@@ -2289,6 +2336,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       /* Don't bother call vect_peel_nonlinear_iv_init.  */
       else if (induction_type == vect_step_op_neg)
 	ni = init_expr;
+      else if (!main_exit_p)
+	continue;
       else
 	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
 					  niters, step_expr,
@@ -2296,9 +2345,20 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 
       var = create_tmp_var (type, "tmp");
 
-      last_gsi = gsi_last_bb (exit_bb);
       gimple_seq new_stmts = NULL;
       ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+
+      /* For non-main exit create an intermediat edge to get any updated iv
+	 calculations.  */
+      if (!main_exit_p
+	  && !iv_block
+	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p (new_stmts)))
+	{
+	  iv_block = split_edge (update_e);
+	  update_e = single_succ_edge (update_e->dest);
+	  last_gsi = gsi_last_bb (iv_block);
+	}
+
       /* Exit_bb shouldn't be empty.  */
       if (!gsi_end_p (last_gsi))
 	{
@@ -2836,12 +2896,18 @@ find_guard_arg (class loop *loop ATTRIBUTE_UNUSED, const_edge loop_e,
 	 tree var = PHI_ARG_DEF (phi, loop_e->dest_idx);
 	 if (TREE_CODE (var) != SSA_NAME)
 	    continue;
-	 tree def = get_current_def (var);
-	 if (!def)
-	   continue;
-	 if (operand_equal_p (def,
-			      PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
-	   return PHI_RESULT (phi);
+
+	  /* The value could be carried all the way from the loop version block
+	     in which case we wouldn't have kept the value if it's not used in
+	     the loop.  In such cases get_current_def returns null as the value
+	     is already current.  */
+	  tree orig_var = get_current_def (var);
+	  if (!orig_var)
+	    orig_var = var;
+
+	  if (operand_equal_p (orig_var,
+			       PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0))
+	    return PHI_RESULT (phi);
 	}
     }
   return NULL_TREE;
@@ -3528,8 +3594,21 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 niters_vector_mult_vf steps.  */
       gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
       update_e = skip_vector ? e : loop_preheader_edge (epilog);
-      vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
-					update_e);
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	update_e = single_succ_edge (e->dest);
+      bool inversed_iv
+	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
+					 LOOP_VINFO_LOOP (loop_vinfo));
+
+      for (auto exit : get_loop_exit_edges (loop))
+	{
+	  bool main_exit_p = vect_is_loop_exit_latch_pred (exit, loop);
+	  edge exit_e = main_exit_p ? update_e : exit;
+	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
+					    niters_vector_mult_vf, exit_e,
+					    main_exit_p);
+
+	}
 
       if (skip_epilog)
 	{




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (27 preceding siblings ...)
  2023-11-06  7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
@ 2023-11-06  7:39 ` Tamar Christina
  2023-11-15  0:05   ` Tamar Christina
  2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
                   ` (13 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 7101 bytes --]

Hi All,

This adds support to vectorizable_live_reduction to handle multiple exits by
doing a search for which exit the live value should be materialized in.

Additinally which value in the index we're after depends on whether the exit
it's materialized in is an early exit or whether the loop's main exit is
different from the loop's natural one (i.e. the one with the same src block as
the latch).

In those two cases we want the first rather than the last value as we're going
to restart the iteration in the scalar loop.  For VLA this means we need to
reverse both the mask and vector since there's only a way to get the last
active element and not the first.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
	* tree-vectorizer.h (perm_mask_for_reverse): Expose.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c123398aad207082384a2079c5234033c3d825ea..55d6aee3d29151e6b528f6fdde15c693e5bdd847 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10503,12 +10503,56 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+      /* A value can only be live in one exit.  So figure out which one.  */
+      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      bool inverted_ctrl_p = false;
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	    if (!is_gimple_debug (use_stmt)
+		&& !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	      {
+		basic_block use_bb = gimple_bb (use_stmt);
+		for (auto edge : get_loop_exit_edges (loop))
+		  {
+		    /* Alternative exits can have an intermediate BB in
+		       between to update the IV.  In those cases we need to
+		       look one block further.  */
+		    if (use_bb == edge->dest
+			|| (single_succ_p (edge->dest)
+			    && use_bb == single_succ (edge->dest)))
+		      {
+			exit_e = edge;
+			goto found;
+		      }
+		  }
+	      }
+found:
+	  /* If the edge isn't a single pred then split the edge so we have a
+	     location to place the live operations.  Perhaps we should always
+	     split during IV updating.  But this way the CFG is cleaner to
+	     follow.  */
+	  inverted_ctrl_p = !vect_is_loop_exit_latch_pred (exit_e, loop);
+	  if (!single_pred_p (exit_e->dest))
+	    exit_e = single_pred_edge (split_edge (exit_e));
+
+	  /* For early exit where the exit is not in the BB that leads to the
+	     latch then we're restarting the iteration in the scalar loop. So
+	     get the first live value.  */
+	  if (inverted_ctrl_p)
+	    bitstart = build_zero_cst (TREE_TYPE (bitstart));
+	}
+
+      basic_block exit_bb = exit_e->dest;
       gcc_assert (single_pred_p (exit_bb));
 
       tree vec_lhs_phi = copy_ssa_name (vec_lhs);
       gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
+      SET_PHI_ARG_DEF (phi, exit_e->dest_idx, vec_lhs);
 
       gimple_seq stmts = NULL;
       tree new_tree;
@@ -10539,6 +10583,12 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
 					  len, bias_minus_one);
 
+	  /* This needs to implement extraction of the first index, but not sure
+	     how the LEN stuff works.  At the moment we shouldn't get here since
+	     there's no LEN support for early breaks.  But guard this so there's
+	     no incorrect codegen.  */
+	  gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
 	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
 	  tree scalar_res
 	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
@@ -10563,8 +10613,37 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 					  &LOOP_VINFO_MASKS (loop_vinfo),
 					  1, vectype, 0);
 	  gimple_seq_add_seq (&stmts, tem);
-	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-					  mask, vec_lhs_phi);
+	  tree scalar_res;
+
+	  /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	     instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+	  if (inverted_ctrl_p && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	    {
+	      auto gsi_stmt = gsi_last (stmts);
+
+	       /* First create the permuted mask.  */
+	      tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	      tree perm_dest = copy_ssa_name (mask);
+	      gimple *perm_stmt
+		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+					   mask, perm_mask);
+	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+					   &gsi_stmt);
+	      mask = perm_dest;
+
+	       /* Then permute the vector contents.  */
+	      tree perm_elem = perm_mask_for_reverse (vectype);
+	      perm_dest = copy_ssa_name (vec_lhs_phi);
+	      perm_stmt
+		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+					   vec_lhs_phi, perm_elem);
+	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+					   &gsi_stmt);
+	      vec_lhs_phi = perm_dest;
+	    }
+
+	  scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				     mask, vec_lhs_phi);
 
 	  /* Convert the extracted vector element to the scalar type.  */
 	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 42cebb92789247434a91cb8e74c0557e75d1ea2c..36aeca60a22cfaea8d3b43348000d75de1d525c7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1765,7 +1765,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index d8b532c4b8ca92a856368a686598859fab9d40e9..a570cf113adb8e11e5383d4ba7600bddaddbd8c4 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2245,6 +2245,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,




-- 

[-- Attachment #2: rb17968.patch --]
[-- Type: text/plain, Size: 6107 bytes --]

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index c123398aad207082384a2079c5234033c3d825ea..55d6aee3d29151e6b528f6fdde15c693e5bdd847 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10503,12 +10503,56 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+      /* A value can only be live in one exit.  So figure out which one.  */
+      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      bool inverted_ctrl_p = false;
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	    if (!is_gimple_debug (use_stmt)
+		&& !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	      {
+		basic_block use_bb = gimple_bb (use_stmt);
+		for (auto edge : get_loop_exit_edges (loop))
+		  {
+		    /* Alternative exits can have an intermediate BB in
+		       between to update the IV.  In those cases we need to
+		       look one block further.  */
+		    if (use_bb == edge->dest
+			|| (single_succ_p (edge->dest)
+			    && use_bb == single_succ (edge->dest)))
+		      {
+			exit_e = edge;
+			goto found;
+		      }
+		  }
+	      }
+found:
+	  /* If the edge isn't a single pred then split the edge so we have a
+	     location to place the live operations.  Perhaps we should always
+	     split during IV updating.  But this way the CFG is cleaner to
+	     follow.  */
+	  inverted_ctrl_p = !vect_is_loop_exit_latch_pred (exit_e, loop);
+	  if (!single_pred_p (exit_e->dest))
+	    exit_e = single_pred_edge (split_edge (exit_e));
+
+	  /* For early exit where the exit is not in the BB that leads to the
+	     latch then we're restarting the iteration in the scalar loop. So
+	     get the first live value.  */
+	  if (inverted_ctrl_p)
+	    bitstart = build_zero_cst (TREE_TYPE (bitstart));
+	}
+
+      basic_block exit_bb = exit_e->dest;
       gcc_assert (single_pred_p (exit_bb));
 
       tree vec_lhs_phi = copy_ssa_name (vec_lhs);
       gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
+      SET_PHI_ARG_DEF (phi, exit_e->dest_idx, vec_lhs);
 
       gimple_seq stmts = NULL;
       tree new_tree;
@@ -10539,6 +10583,12 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
 					  len, bias_minus_one);
 
+	  /* This needs to implement extraction of the first index, but not sure
+	     how the LEN stuff works.  At the moment we shouldn't get here since
+	     there's no LEN support for early breaks.  But guard this so there's
+	     no incorrect codegen.  */
+	  gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
 	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
 	  tree scalar_res
 	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
@@ -10563,8 +10613,37 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 					  &LOOP_VINFO_MASKS (loop_vinfo),
 					  1, vectype, 0);
 	  gimple_seq_add_seq (&stmts, tem);
-	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-					  mask, vec_lhs_phi);
+	  tree scalar_res;
+
+	  /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	     instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+	  if (inverted_ctrl_p && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	    {
+	      auto gsi_stmt = gsi_last (stmts);
+
+	       /* First create the permuted mask.  */
+	      tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	      tree perm_dest = copy_ssa_name (mask);
+	      gimple *perm_stmt
+		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+					   mask, perm_mask);
+	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+					   &gsi_stmt);
+	      mask = perm_dest;
+
+	       /* Then permute the vector contents.  */
+	      tree perm_elem = perm_mask_for_reverse (vectype);
+	      perm_dest = copy_ssa_name (vec_lhs_phi);
+	      perm_stmt
+		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+					   vec_lhs_phi, perm_elem);
+	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+					   &gsi_stmt);
+	      vec_lhs_phi = perm_dest;
+	    }
+
+	  scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				     mask, vec_lhs_phi);
 
 	  /* Convert the extracted vector element to the scalar type.  */
 	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 42cebb92789247434a91cb8e74c0557e75d1ea2c..36aeca60a22cfaea8d3b43348000d75de1d525c7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1765,7 +1765,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index d8b532c4b8ca92a856368a686598859fab9d40e9..a570cf113adb8e11e5383d4ba7600bddaddbd8c4 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2245,6 +2245,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (28 preceding siblings ...)
  2023-11-06  7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
@ 2023-11-06  7:39 ` Tamar Christina
  2023-11-27 22:49   ` Tamar Christina
  2023-11-06  7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
                   ` (12 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 9830 bytes --]

Hi All,

This implements vectorable_early_exit which is used as the codegen part of
vectorizing a gcond.

For the most part it shares the majority of the code with
vectorizable_comparison with addition that it needs to be able to reduce
multiple resulting statements into a single one for use in the gcond, and also
needs to be able to perform masking on the comparisons.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 36aeca60a22cfaea8d3b43348000d75de1d525c7..4809b822632279493a843d402a833c9267bb315e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12475,7 +12475,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12615,8 +12615,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12630,7 +12631,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12709,6 +12713,196 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  gimple_match_op op;
+  if (!gimple_extract_op (stmt_info->stmt, &op))
+    gcc_unreachable ();
+  gcc_assert (op.code.is_tree_code ());
+  auto code = tree_code (op.code);
+
+  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype_out);
+
+  tree var_op = op.ops[0];
+
+  /* When vectorizing things like pointer comparisons we will assume that
+     the VF of both operands are the same. e.g. a pointer must be compared
+     to a pointer.  We'll leave this up to vectorizable_comparison_1 to
+     check further.  */
+  tree vectype_op = vectype_out;
+  if (SSA_VAR_P (var_op))
+    {
+      stmt_vec_info operand0_info
+	= loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (var_op));
+      if (!operand0_info)
+	return false;
+
+      /* If we're in a pattern get the type of the original statement.  */
+      if (STMT_VINFO_IN_PATTERN_P (operand0_info))
+	operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
+      vectype_op = STMT_VINFO_VECTYPE (operand0_info);
+    }
+
+  tree truth_type = truth_type_for (vectype_op);
+  machine_mode mode = TYPE_MODE (truth_type);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector "
+			       "comparisons for type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  vec<tree> stmts;
+
+  if (slp_node)
+    stmts = SLP_TREE_VEC_DEFS (slp_node);
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.create (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+      workset.splice (stmts);
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (truth_type, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  if (slp_node)
+	    slp_node->push_vec_def (new_stmt);
+	  else
+	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  if (masked_loop_p)
+    {
+      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0);
+      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			       &cond_gsi);
+    }
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  if (is_pattern_stmt_p (stmt_info))
+    stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info));
+
+  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
+			build_zero_cst (truth_type));
+  t = canonicalize_cond_expr_cond (t);
+  gimple_cond_set_condition_from_tree ((gcond*)stmt, t);
+  update_stmt (stmt);
+
+  if (slp_node)
+    slp_node->push_vec_def (stmt);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12928,7 +13122,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12951,7 +13147,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13110,6 +13309,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -13511,7 +13716,7 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
-       case vect_early_exit_def:
+	case vect_early_exit_def:
 	  dump_printf (MSG_NOTE, "early exit\n");
 	  break;
 	case vect_unknown_def_type:




-- 

[-- Attachment #2: rb17969.patch --]
[-- Type: text/plain, Size: 9077 bytes --]

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 36aeca60a22cfaea8d3b43348000d75de1d525c7..4809b822632279493a843d402a833c9267bb315e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12475,7 +12475,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12615,8 +12615,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12630,7 +12631,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12709,6 +12713,196 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  gimple_match_op op;
+  if (!gimple_extract_op (stmt_info->stmt, &op))
+    gcc_unreachable ();
+  gcc_assert (op.code.is_tree_code ());
+  auto code = tree_code (op.code);
+
+  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype_out);
+
+  tree var_op = op.ops[0];
+
+  /* When vectorizing things like pointer comparisons we will assume that
+     the VF of both operands are the same. e.g. a pointer must be compared
+     to a pointer.  We'll leave this up to vectorizable_comparison_1 to
+     check further.  */
+  tree vectype_op = vectype_out;
+  if (SSA_VAR_P (var_op))
+    {
+      stmt_vec_info operand0_info
+	= loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (var_op));
+      if (!operand0_info)
+	return false;
+
+      /* If we're in a pattern get the type of the original statement.  */
+      if (STMT_VINFO_IN_PATTERN_P (operand0_info))
+	operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
+      vectype_op = STMT_VINFO_VECTYPE (operand0_info);
+    }
+
+  tree truth_type = truth_type_for (vectype_op);
+  machine_mode mode = TYPE_MODE (truth_type);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector "
+			       "comparisons for type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", truth_type);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  vec<tree> stmts;
+
+  if (slp_node)
+    stmts = SLP_TREE_VEC_DEFS (slp_node);
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.create (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+      workset.splice (stmts);
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (truth_type, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  if (slp_node)
+	    slp_node->push_vec_def (new_stmt);
+	  else
+	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  if (masked_loop_p)
+    {
+      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0);
+      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			       &cond_gsi);
+    }
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  if (is_pattern_stmt_p (stmt_info))
+    stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info));
+
+  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
+			build_zero_cst (truth_type));
+  t = canonicalize_cond_expr_cond (t);
+  gimple_cond_set_condition_from_tree ((gcond*)stmt, t);
+  update_stmt (stmt);
+
+  if (slp_node)
+    slp_node->push_vec_def (stmt);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12928,7 +13122,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12951,7 +13147,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13110,6 +13309,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -13511,7 +13716,7 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
-       case vect_early_exit_def:
+	case vect_early_exit_def:
 	  dump_printf (MSG_NOTE, "early exit\n");
 	  break;
 	case vect_unknown_def_type:




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 10/21]middle-end: implement relevancy analysis support for control flow
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (29 preceding siblings ...)
  2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
@ 2023-11-06  7:39 ` Tamar Christina
  2023-11-27 22:49   ` Tamar Christina
  2023-11-06  7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
                   ` (11 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 5364 bytes --]

Hi All,

This updates relevancy analysis to support marking gcond's belonging to early
breaks as relevant for vectorization.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-stmts.cc (vect_stmt_relevant_p,
	vect_mark_stmts_to_be_vectorized, vect_analyze_stmt, vect_is_simple_use,
	vect_get_vector_types_for_stmt): Support early breaks.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4809b822632279493a843d402a833c9267bb315e..31474e923cc3feb2604ca2882ecfb300cd211679 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -359,9 +359,14 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
-  if (is_ctrl_stmt (stmt_info->stmt)
-      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
-    *relevant = vect_used_in_scope;
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (is_ctrl_stmt (stmt) && is_a <gcond *> (stmt))
+    {
+      gcond *cond = as_a <gcond *> (stmt);
+      if (LOOP_VINFO_LOOP_CONDS (loop_vinfo).contains (cond)
+	  && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != cond)
+	*relevant = vect_used_in_scope;
+    }
 
   /* changing memory.  */
   if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
@@ -374,6 +379,11 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	*relevant = vect_used_in_scope;
       }
 
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  auto_bitmap exit_bbs;
+  for (edge exit : exits)
+    bitmap_set_bit (exit_bbs, exit->dest->index);
+
   /* uses outside the loop.  */
   FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter, SSA_OP_DEF)
     {
@@ -392,7 +402,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	      /* We expect all such uses to be in the loop exit phis
 		 (because of loop closed form)   */
 	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
-	      gcc_assert (bb == single_exit (loop)->dest);
 
               *live_p = true;
 	    }
@@ -793,6 +802,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
 			return res;
 		    }
                  }
+	    }
+	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
+	    {
+	      enum tree_code rhs_code = gimple_cond_code (cond);
+	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
+	      opt_result res
+		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
+			       loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
+	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
+				loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
             }
 	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
 	    {
@@ -13043,11 +13066,15 @@ vect_analyze_stmt (vec_info *vinfo,
 			     node_instance, cost_vec);
       if (!res)
 	return res;
-   }
+    }
+
+  if (is_ctrl_stmt (stmt_info->stmt))
+    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
 
   switch (STMT_VINFO_DEF_TYPE (stmt_info))
     {
       case vect_internal_def:
+      case vect_early_exit_def:
         break;
 
       case vect_reduction_def:
@@ -13080,6 +13107,7 @@ vect_analyze_stmt (vec_info *vinfo,
     {
       gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
       gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
+		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
 		  || (call && gimple_call_lhs (call) == NULL_TREE));
       *need_to_vectorize = true;
     }
@@ -13835,6 +13863,14 @@ vect_is_simple_use (vec_info *vinfo, stmt_vec_info stmt, slp_tree slp_node,
 	  else
 	    *op = gimple_op (ass, operand + 1);
 	}
+      else if (gcond *cond = dyn_cast <gcond *> (stmt->stmt))
+	{
+	  gimple_match_op m_op;
+	  if (!gimple_extract_op (cond, &m_op))
+	    return false;
+	  gcc_assert (m_op.code.is_tree_code ());
+	  *op = m_op.ops[operand];
+	}
       else if (gcall *call = dyn_cast <gcall *> (stmt->stmt))
 	*op = gimple_call_arg (call, operand);
       else
@@ -14445,6 +14481,8 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
   *nunits_vectype_out = NULL_TREE;
 
   if (gimple_get_lhs (stmt) == NULL_TREE
+      /* Allow vector conditionals through here.  */
+      && !is_ctrl_stmt (stmt)
       /* MASK_STORE has no lhs, but is ok.  */
       && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
     {
@@ -14461,7 +14499,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	}
 
       return opt_result::failure_at (stmt,
-				     "not vectorized: irregular stmt.%G", stmt);
+				     "not vectorized: irregular stmt: %G", stmt);
     }
 
   tree vectype;
@@ -14490,6 +14528,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if (is_ctrl_stmt (stmt))
+	{
+	  gcond *cond = dyn_cast <gcond *> (stmt);
+	  if (!cond)
+	    return opt_result::failure_at (stmt, "not vectorized: unsupported"
+					   " control flow statement.\n");
+	  scalar_type = TREE_TYPE (gimple_cond_rhs (stmt));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 




-- 

[-- Attachment #2: rb17970.patch --]
[-- Type: text/plain, Size: 4916 bytes --]

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4809b822632279493a843d402a833c9267bb315e..31474e923cc3feb2604ca2882ecfb300cd211679 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -359,9 +359,14 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
-  if (is_ctrl_stmt (stmt_info->stmt)
-      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
-    *relevant = vect_used_in_scope;
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (is_ctrl_stmt (stmt) && is_a <gcond *> (stmt))
+    {
+      gcond *cond = as_a <gcond *> (stmt);
+      if (LOOP_VINFO_LOOP_CONDS (loop_vinfo).contains (cond)
+	  && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != cond)
+	*relevant = vect_used_in_scope;
+    }
 
   /* changing memory.  */
   if (gimple_code (stmt_info->stmt) != GIMPLE_PHI)
@@ -374,6 +379,11 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	*relevant = vect_used_in_scope;
       }
 
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  auto_bitmap exit_bbs;
+  for (edge exit : exits)
+    bitmap_set_bit (exit_bbs, exit->dest->index);
+
   /* uses outside the loop.  */
   FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter, SSA_OP_DEF)
     {
@@ -392,7 +402,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	      /* We expect all such uses to be in the loop exit phis
 		 (because of loop closed form)   */
 	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
-	      gcc_assert (bb == single_exit (loop)->dest);
 
               *live_p = true;
 	    }
@@ -793,6 +802,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
 			return res;
 		    }
                  }
+	    }
+	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
+	    {
+	      enum tree_code rhs_code = gimple_cond_code (cond);
+	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
+	      opt_result res
+		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
+			       loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
+	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
+				loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
             }
 	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
 	    {
@@ -13043,11 +13066,15 @@ vect_analyze_stmt (vec_info *vinfo,
 			     node_instance, cost_vec);
       if (!res)
 	return res;
-   }
+    }
+
+  if (is_ctrl_stmt (stmt_info->stmt))
+    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
 
   switch (STMT_VINFO_DEF_TYPE (stmt_info))
     {
       case vect_internal_def:
+      case vect_early_exit_def:
         break;
 
       case vect_reduction_def:
@@ -13080,6 +13107,7 @@ vect_analyze_stmt (vec_info *vinfo,
     {
       gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
       gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
+		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
 		  || (call && gimple_call_lhs (call) == NULL_TREE));
       *need_to_vectorize = true;
     }
@@ -13835,6 +13863,14 @@ vect_is_simple_use (vec_info *vinfo, stmt_vec_info stmt, slp_tree slp_node,
 	  else
 	    *op = gimple_op (ass, operand + 1);
 	}
+      else if (gcond *cond = dyn_cast <gcond *> (stmt->stmt))
+	{
+	  gimple_match_op m_op;
+	  if (!gimple_extract_op (cond, &m_op))
+	    return false;
+	  gcc_assert (m_op.code.is_tree_code ());
+	  *op = m_op.ops[operand];
+	}
       else if (gcall *call = dyn_cast <gcall *> (stmt->stmt))
 	*op = gimple_call_arg (call, operand);
       else
@@ -14445,6 +14481,8 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
   *nunits_vectype_out = NULL_TREE;
 
   if (gimple_get_lhs (stmt) == NULL_TREE
+      /* Allow vector conditionals through here.  */
+      && !is_ctrl_stmt (stmt)
       /* MASK_STORE has no lhs, but is ok.  */
       && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
     {
@@ -14461,7 +14499,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	}
 
       return opt_result::failure_at (stmt,
-				     "not vectorized: irregular stmt.%G", stmt);
+				     "not vectorized: irregular stmt: %G", stmt);
     }
 
   tree vectype;
@@ -14490,6 +14528,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if (is_ctrl_stmt (stmt))
+	{
+	  gcond *cond = dyn_cast <gcond *> (stmt);
+	  if (!cond)
+	    return opt_result::failure_at (stmt, "not vectorized: unsupported"
+					   " control flow statement.\n");
+	  scalar_type = TREE_TYPE (gimple_cond_rhs (stmt));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (30 preceding siblings ...)
  2023-11-06  7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
@ 2023-11-06  7:40 ` Tamar Christina
  2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 3452 bytes --]

Hi All,

This wires through the final bits to support adding the guard block between
the loop and epilog.

For an "inverted loop", i.e. one where an early exit was chosen as the main
exit then we can never skip the scalar loop since we know we have side effects
to still perform.  For those cases we always restart the scalar loop regardless
of which exit is taken.

When we add the guard we also need to update the dominators calculated by the
peeling code as the bypass edge changes the dominators.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop-manip.cc (vect_do_peeling):

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index abd905b78f3661f80168c3866d7c3e68a9c15521..eef2bb50c1505f5cf802d5d80300affc2cbe69f6 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3512,11 +3512,14 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
       edge epilog_e = vect_epilogues ? e : scalar_e;
       edge new_epilog_e = NULL;
-      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog,
-						       epilog_e, e,
-						       &new_epilog_e);
+      auto_vec<basic_block> doms;
+      epilog
+	= slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog, epilog_e, e,
+						  &new_epilog_e, true, &doms);
+
       LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
       gcc_assert (epilog);
+      gcc_assert (new_epilog_e);
       epilog->force_vectorize = false;
       bb_before_epilog = loop_preheader_edge (epilog)->src;
 
@@ -3610,10 +3613,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 
 	}
 
-      if (skip_epilog)
+      if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
 	{
-	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
+	  /* For the case where a different exit was chosen we must execute
+	     the scalar loop with the remaining iterations.  */
+	  if (inversed_iv)
+	    guard_cond = boolean_false_node;
+	  else
+	    guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
 				    niters, niters_vector_mult_vf);
+
 	  guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 	  edge epilog_e = LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
 	  guard_to = split_edge (epilog_e);
@@ -3621,11 +3630,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 					   skip_vector ? anchor : guard_bb,
 					   prob_epilog.invert (),
 					   irred_flag);
+	  doms.safe_push (guard_to);
 	  if (vect_epilogues)
 	    epilogue_vinfo->skip_this_loop_edge = guard_e;
 	  edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
 	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv, guard_e,
 					      epilog_e);
+
 	  /* Only need to handle basic block before epilog loop if it's not
 	     the guard_bb, which is the case when skip_vector is true.  */
 	  if (guard_bb != bb_before_epilog)
@@ -3637,6 +3648,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	  scale_loop_profile (epilog, prob_epilog, -1);
 	}
 
+      /* Recalculate the dominators after adding the guard edge.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+
       unsigned HOST_WIDE_INT bound;
       if (bound_scalar.is_constant (&bound))
 	{




-- 

[-- Attachment #2: rb17971.patch --]
[-- Type: text/plain, Size: 2757 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index abd905b78f3661f80168c3866d7c3e68a9c15521..eef2bb50c1505f5cf802d5d80300affc2cbe69f6 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3512,11 +3512,14 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop;
       edge epilog_e = vect_epilogues ? e : scalar_e;
       edge new_epilog_e = NULL;
-      epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog,
-						       epilog_e, e,
-						       &new_epilog_e);
+      auto_vec<basic_block> doms;
+      epilog
+	= slpeel_tree_duplicate_loop_to_edge_cfg (loop, e, epilog, epilog_e, e,
+						  &new_epilog_e, true, &doms);
+
       LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo) = new_epilog_e;
       gcc_assert (epilog);
+      gcc_assert (new_epilog_e);
       epilog->force_vectorize = false;
       bb_before_epilog = loop_preheader_edge (epilog)->src;
 
@@ -3610,10 +3613,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 
 	}
 
-      if (skip_epilog)
+      if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
 	{
-	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
+	  /* For the case where a different exit was chosen we must execute
+	     the scalar loop with the remaining iterations.  */
+	  if (inversed_iv)
+	    guard_cond = boolean_false_node;
+	  else
+	    guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
 				    niters, niters_vector_mult_vf);
+
 	  guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 	  edge epilog_e = LOOP_VINFO_EPILOGUE_IV_EXIT (loop_vinfo);
 	  guard_to = split_edge (epilog_e);
@@ -3621,11 +3630,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 					   skip_vector ? anchor : guard_bb,
 					   prob_epilog.invert (),
 					   irred_flag);
+	  doms.safe_push (guard_to);
 	  if (vect_epilogues)
 	    epilogue_vinfo->skip_this_loop_edge = guard_e;
 	  edge main_iv = LOOP_VINFO_IV_EXIT (loop_vinfo);
 	  slpeel_update_phi_nodes_for_guard2 (loop, epilog, main_iv, guard_e,
 					      epilog_e);
+
 	  /* Only need to handle basic block before epilog loop if it's not
 	     the guard_bb, which is the case when skip_vector is true.  */
 	  if (guard_bb != bb_before_epilog)
@@ -3637,6 +3648,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	  scale_loop_profile (epilog, prob_epilog, -1);
 	}
 
+      /* Recalculate the dominators after adding the guard edge.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+
       unsigned HOST_WIDE_INT bound;
       if (bound_scalar.is_constant (&bound))
 	{




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (31 preceding siblings ...)
  2023-11-06  7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
@ 2023-11-06  7:40 ` Tamar Christina
  2023-11-27 22:48   ` Tamar Christina
  2023-12-06  8:31   ` Richard Biener
  2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
                   ` (9 subsequent siblings)
  42 siblings, 2 replies; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 5810 bytes --]

Hi All,

This finishes wiring that didn't fit in any of the other patches.
Essentially just adding related changes so peeling for early break works.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal,
	vect_do_peeling): Support early breaks.
	* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): Likewise.
	* tree-vectorizer.cc (pass_vectorize::execute): Check all exits.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index eef2bb50c1505f5cf802d5d80300affc2cbe69f6..9c1405d79fd8fe8689007df3b7605b7a3d3ecdd7 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1296,7 +1296,8 @@ vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
 
   /* Record the number of latch iterations.  */
-  if (limit == niters)
+  if (limit == niters
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     /* Case A: the loop iterates NITERS times.  Subtract one to get the
        latch count.  */
     loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
@@ -3242,6 +3243,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
     bound_epilog += vf - 1;
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
     bound_epilog += 1;
+
+  /* For early breaks the scalar loop needs to execute at most VF times
+     to find the element that caused the break.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      bound_epilog = vf;
+      /* Force a scalar epilogue as we can't vectorize the index finding.  */
+      vect_epilogues = false;
+    }
+
   bool epilog_peeling = maybe_ne (bound_epilog, 0U);
   poly_uint64 bound_scalar = bound_epilog;
 
@@ -3376,14 +3387,23 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 				  bound_prolog + bound_epilog)
 		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
 			 || vect_epilogues));
+
+  /* We only support early break vectorization on known bounds at this time.
+     This means that if the vector loop can't be entered then we won't generate
+     it at all.  So for now force skip_vector off because the additional control
+     flow messes with the BB exits and we've already analyzed them.  */
+ skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
+
   /* Epilog loop must be executed if the number of iterations for epilog
      loop is known at compile time, otherwise we need to add a check at
      the end of vector loop and skip to the end of epilog loop.  */
   bool skip_epilog = (prolog_peeling < 0
 		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
 		      || !vf.is_constant ());
-  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
-  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+  /* PEELING_FOR_GAPS and peeling for early breaks are special because epilog
+     loop must be executed.  */
+  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     skip_epilog = false;
 
   class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 55d6aee3d29151e6b528f6fdde15c693e5bdd847..51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1236,6 +1236,14 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
     th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
 					  (loop_vinfo));
 
+  /* When we have multiple exits and VF is unknown, we must require partial
+     vectors because the loop bounds is not a minimum but a maximum.  That is to
+     say we cannot unpredicate the main loop unless we peel or use partial
+     vectors in the epilogue.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
+    return true;
+
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
     {
@@ -3149,7 +3157,8 @@ start_over:
 
   /* If an epilogue loop is required make sure we can create one.  */
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
-      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
+      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index d97e2b54c25ac60378935392aa7b73476efed74b..8b495fc7ca137109fc0097c77837233c660e032a 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -1381,7 +1381,9 @@ pass_vectorize::execute (function *fun)
 	 predicates that need to be shared for optimal predicate usage.
 	 However reassoc will re-order them and prevent CSE from working
 	 as it should.  CSE only the loop body, not the entry.  */
-      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
+      auto_vec<edge> exits = get_loop_exit_edges (loop);
+      for (edge exit : exits)
+	bitmap_set_bit (exit_bbs, exit->dest->index);
 
       edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
       do_rpo_vn (fun, entry, exit_bbs);




-- 

[-- Attachment #2: rb17972.patch --]
[-- Type: text/plain, Size: 5273 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index eef2bb50c1505f5cf802d5d80300affc2cbe69f6..9c1405d79fd8fe8689007df3b7605b7a3d3ecdd7 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1296,7 +1296,8 @@ vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
 
   /* Record the number of latch iterations.  */
-  if (limit == niters)
+  if (limit == niters
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     /* Case A: the loop iterates NITERS times.  Subtract one to get the
        latch count.  */
     loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
@@ -3242,6 +3243,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
     bound_epilog += vf - 1;
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
     bound_epilog += 1;
+
+  /* For early breaks the scalar loop needs to execute at most VF times
+     to find the element that caused the break.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      bound_epilog = vf;
+      /* Force a scalar epilogue as we can't vectorize the index finding.  */
+      vect_epilogues = false;
+    }
+
   bool epilog_peeling = maybe_ne (bound_epilog, 0U);
   poly_uint64 bound_scalar = bound_epilog;
 
@@ -3376,14 +3387,23 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 				  bound_prolog + bound_epilog)
 		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
 			 || vect_epilogues));
+
+  /* We only support early break vectorization on known bounds at this time.
+     This means that if the vector loop can't be entered then we won't generate
+     it at all.  So for now force skip_vector off because the additional control
+     flow messes with the BB exits and we've already analyzed them.  */
+ skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
+
   /* Epilog loop must be executed if the number of iterations for epilog
      loop is known at compile time, otherwise we need to add a check at
      the end of vector loop and skip to the end of epilog loop.  */
   bool skip_epilog = (prolog_peeling < 0
 		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
 		      || !vf.is_constant ());
-  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
-  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+  /* PEELING_FOR_GAPS and peeling for early breaks are special because epilog
+     loop must be executed.  */
+  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     skip_epilog = false;
 
   class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 55d6aee3d29151e6b528f6fdde15c693e5bdd847..51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1236,6 +1236,14 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
     th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
 					  (loop_vinfo));
 
+  /* When we have multiple exits and VF is unknown, we must require partial
+     vectors because the loop bounds is not a minimum but a maximum.  That is to
+     say we cannot unpredicate the main loop unless we peel or use partial
+     vectors in the epilogue.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
+    return true;
+
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
     {
@@ -3149,7 +3157,8 @@ start_over:
 
   /* If an epilogue loop is required make sure we can create one.  */
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
-      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
+      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
+      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
index d97e2b54c25ac60378935392aa7b73476efed74b..8b495fc7ca137109fc0097c77837233c660e032a 100644
--- a/gcc/tree-vectorizer.cc
+++ b/gcc/tree-vectorizer.cc
@@ -1381,7 +1381,9 @@ pass_vectorize::execute (function *fun)
 	 predicates that need to be shared for optimal predicate usage.
 	 However reassoc will re-order them and prevent CSE from working
 	 as it should.  CSE only the loop body, not the entry.  */
-      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
+      auto_vec<edge> exits = get_loop_exit_edges (loop);
+      for (edge exit : exits)
+	bitmap_set_bit (exit_bbs, exit->dest->index);
 
       edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
       do_rpo_vn (fun, entry, exit_bbs);




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 13/21]middle-end: Update loop form analysis to support early break
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (32 preceding siblings ...)
  2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
@ 2023-11-06  7:40 ` Tamar Christina
  2023-11-27 22:48   ` Tamar Christina
  2023-12-06  8:18   ` Richard Biener
  2023-11-06  7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
                   ` (8 subsequent siblings)
  42 siblings, 2 replies; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 2709 bytes --]

Hi All,

This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the other
patches are self contained.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
	(vect_transform_loop): Use it.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991f07cd6052491d0 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
   loop_vinfo->scalar_costs->finish_cost (nullptr);
 }
 
-
 /* Function vect_analyze_loop_form.
 
    Verify that certain CFG restrictions hold, including:
    - the loop has a pre-header
-   - the loop has a single entry and exit
+   - the loop has a single entry
+   - nested loops can have only a single exit.
    - the loop exit condition is simple enough
    - the number of iterations can be analyzed, i.e, a countable loop.  The
      niter could be analyzed under some assumptions.  */
@@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				   "not vectorized: latch block not empty.\n");
 
   /* Make sure the exit is not abnormal.  */
-  if (exit_e->flags & EDGE_ABNORMAL)
-    return opt_result::failure_at (vect_location,
-				   "not vectorized:"
-				   " abnormal loop exit edge.\n");
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  for (edge e : exits)
+    {
+      if (e->flags & EDGE_ABNORMAL)
+	return opt_result::failure_at (vect_location,
+				       "not vectorized:"
+				       " abnormal loop exit edge.\n");
+    }
 
   info->conds
     = vect_get_loop_niters (loop, exit_e, &info->assumptions,
@@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
 
   LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
 
+  /* Check to see if we're vectorizing multiple exits.  */
+  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
+
   if (info->inner_loop_cond)
     {
       stmt_vec_info inner_loop_cond_info
@@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
   /* Make sure there exists a single-predecessor exit bb.  Do this before 
      versioning.   */
   edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
-  if (! single_pred_p (e->dest))
+  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       split_loop_exit_edge (e, true);
       if (dump_enabled_p ())




-- 

[-- Attachment #2: rb17973.patch --]
[-- Type: text/plain, Size: 2296 bytes --]

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991f07cd6052491d0 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
   loop_vinfo->scalar_costs->finish_cost (nullptr);
 }
 
-
 /* Function vect_analyze_loop_form.
 
    Verify that certain CFG restrictions hold, including:
    - the loop has a pre-header
-   - the loop has a single entry and exit
+   - the loop has a single entry
+   - nested loops can have only a single exit.
    - the loop exit condition is simple enough
    - the number of iterations can be analyzed, i.e, a countable loop.  The
      niter could be analyzed under some assumptions.  */
@@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				   "not vectorized: latch block not empty.\n");
 
   /* Make sure the exit is not abnormal.  */
-  if (exit_e->flags & EDGE_ABNORMAL)
-    return opt_result::failure_at (vect_location,
-				   "not vectorized:"
-				   " abnormal loop exit edge.\n");
+  auto_vec<edge> exits = get_loop_exit_edges (loop);
+  for (edge e : exits)
+    {
+      if (e->flags & EDGE_ABNORMAL)
+	return opt_result::failure_at (vect_location,
+				       "not vectorized:"
+				       " abnormal loop exit edge.\n");
+    }
 
   info->conds
     = vect_get_loop_niters (loop, exit_e, &info->assumptions,
@@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
 
   LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
 
+  /* Check to see if we're vectorizing multiple exits.  */
+  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
+
   if (info->inner_loop_cond)
     {
       stmt_vec_info inner_loop_cond_info
@@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
   /* Make sure there exists a single-predecessor exit bb.  Do this before 
      versioning.   */
   edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
-  if (! single_pred_p (e->dest))
+  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
     {
       split_loop_exit_edge (e, true);
       if (dump_enabled_p ())




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (33 preceding siblings ...)
  2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
@ 2023-11-06  7:41 ` Tamar Christina
  2023-11-06 14:44   ` Richard Biener
  2023-11-06  7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
                   ` (7 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 3786 bytes --]

Hi All,

The vectorizer at the moment uses a num_bb check to check for control flow.
This rejects a number of loops with no reason.  Instead this patch changes it
to check the destination of the exits instead.

This also allows early break to work by also dropping the single_exit check.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop-manip.cc (slpeel_can_duplicate_loop_p):
	* tree-vect-loop.cc (vect_analyze_loop_form):

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 9c1405d79fd8fe8689007df3b7605b7a3d3ecdd7..466cf4c47154099a33dc63e22d74eef42d282444 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1937,12 +1937,10 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge exit_e,
   edge entry_e = loop_preheader_edge (loop);
   gcond *orig_cond = get_loop_exit_condition (exit_e);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
-  unsigned int num_bb = loop->inner? 5 : 2;
 
   /* All loops have an outer scope; the only case loop->outer is NULL is for
      the function itself.  */
   if (!loop_outer (loop)
-      || loop->num_nodes != num_bb
       || !empty_block_p (loop->latch)
       || !exit_e
       /* Verify that new loop exit condition can be trivially modified.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index ddb6cad60f2f2cfdc96732f3f256d86e315d7357..27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1727,6 +1727,17 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 		       "using as main loop exit: %d -> %d [AUX: %p]\n",
 		       exit_e->src->index, exit_e->dest->index, exit_e->aux);
 
+  /* Check if we have any control flow that doesn't leave the loop.  */
+  class loop *v_loop = loop->inner ? loop->inner : loop;
+  basic_block *bbs= get_loop_body (v_loop);
+  for (unsigned i = 0; i < v_loop->num_nodes; i++)
+    if (!empty_block_p (bbs[i])
+	&& !loop_exits_from_bb_p (v_loop, bbs[i])
+	&& bbs[i]->loop_father == v_loop)
+      return opt_result::failure_at (vect_location,
+				     "not vectorized:"
+				     " unsupported control flow in loop.\n");
+
   /* Different restrictions apply when we are considering an inner-most loop,
      vs. an outer (nested) loop.
      (FORNOW. May want to relax some of these restrictions in the future).  */
@@ -1746,11 +1757,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
                            |
                         (exit-bb)  */
 
-      if (loop->num_nodes != 2)
-	return opt_result::failure_at (vect_location,
-				       "not vectorized:"
-				       " control flow in loop.\n");
-
       if (empty_block_p (loop->header))
 	return opt_result::failure_at (vect_location,
 				       "not vectorized: empty loop.\n");
@@ -1782,11 +1788,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				       "not vectorized:"
 				       " multiple nested loops.\n");
 
-      if (loop->num_nodes != 5)
-	return opt_result::failure_at (vect_location,
-				       "not vectorized:"
-				       " control flow in loop.\n");
-
       entryedge = loop_preheader_edge (innerloop);
       if (entryedge->src != loop->header
 	  || !single_exit (innerloop)
@@ -1823,9 +1824,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
       info->inner_loop_cond = inner.conds[0];
     }
 
-  if (!single_exit (loop))
-    return opt_result::failure_at (vect_location,
-				   "not vectorized: multiple exits.\n");
   if (EDGE_COUNT (loop->header->preds) != 2)
     return opt_result::failure_at (vect_location,
 				   "not vectorized:"




-- 

[-- Attachment #2: rb17974.patch --]
[-- Type: text/plain, Size: 3245 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 9c1405d79fd8fe8689007df3b7605b7a3d3ecdd7..466cf4c47154099a33dc63e22d74eef42d282444 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1937,12 +1937,10 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge exit_e,
   edge entry_e = loop_preheader_edge (loop);
   gcond *orig_cond = get_loop_exit_condition (exit_e);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
-  unsigned int num_bb = loop->inner? 5 : 2;
 
   /* All loops have an outer scope; the only case loop->outer is NULL is for
      the function itself.  */
   if (!loop_outer (loop)
-      || loop->num_nodes != num_bb
       || !empty_block_p (loop->latch)
       || !exit_e
       /* Verify that new loop exit condition can be trivially modified.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index ddb6cad60f2f2cfdc96732f3f256d86e315d7357..27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1727,6 +1727,17 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 		       "using as main loop exit: %d -> %d [AUX: %p]\n",
 		       exit_e->src->index, exit_e->dest->index, exit_e->aux);
 
+  /* Check if we have any control flow that doesn't leave the loop.  */
+  class loop *v_loop = loop->inner ? loop->inner : loop;
+  basic_block *bbs= get_loop_body (v_loop);
+  for (unsigned i = 0; i < v_loop->num_nodes; i++)
+    if (!empty_block_p (bbs[i])
+	&& !loop_exits_from_bb_p (v_loop, bbs[i])
+	&& bbs[i]->loop_father == v_loop)
+      return opt_result::failure_at (vect_location,
+				     "not vectorized:"
+				     " unsupported control flow in loop.\n");
+
   /* Different restrictions apply when we are considering an inner-most loop,
      vs. an outer (nested) loop.
      (FORNOW. May want to relax some of these restrictions in the future).  */
@@ -1746,11 +1757,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
                            |
                         (exit-bb)  */
 
-      if (loop->num_nodes != 2)
-	return opt_result::failure_at (vect_location,
-				       "not vectorized:"
-				       " control flow in loop.\n");
-
       if (empty_block_p (loop->header))
 	return opt_result::failure_at (vect_location,
 				       "not vectorized: empty loop.\n");
@@ -1782,11 +1788,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
 				       "not vectorized:"
 				       " multiple nested loops.\n");
 
-      if (loop->num_nodes != 5)
-	return opt_result::failure_at (vect_location,
-				       "not vectorized:"
-				       " control flow in loop.\n");
-
       entryedge = loop_preheader_edge (innerloop);
       if (entryedge->src != loop->header
 	  || !single_exit (innerloop)
@@ -1823,9 +1824,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
       info->inner_loop_cond = inner.conds[0];
     }
 
-  if (!single_exit (loop))
-    return opt_result::failure_at (vect_location,
-				   "not vectorized: multiple exits.\n");
   if (EDGE_COUNT (loop->header->preds) != 2)
     return opt_result::failure_at (vect_location,
 				   "not vectorized:"




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (34 preceding siblings ...)
  2023-11-06  7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
@ 2023-11-06  7:41 ` Tamar Christina
  2023-12-09 10:38   ` Richard Sandiford
  2023-11-06  7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
                   ` (6 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

Hi All,

What do people think about having the ability to force only the latch connected
exit as the exit as a param? I.e. what's in the patch but as a param.

I found this useful when debugging large example failures as it tells me where
I should be looking.  No hard requirement but just figured I'd ask if we should.

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop.cc (vec_init_loop_exit_info): Allow forcing of exit.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198..d6b35372623e94e02965510ab557cb568c302ebe 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -964,6 +964,7 @@ vec_init_loop_exit_info (class loop *loop)
   if (exits.length () == 1)
     return exits[0];
 
+#if 0
   /* If we have multiple exits we only support counting IV at the moment.  Analyze
      all exits and return one */
   class tree_niter_desc niter_desc;
@@ -982,6 +983,16 @@ vec_init_loop_exit_info (class loop *loop)
     }
 
   return candidate;
+#else
+  basic_block bb = ip_normal_pos (loop);
+  if (!bb)
+    return NULL;
+
+  edge exit = EDGE_SUCC (bb, 0);
+  if (exit->dest == loop->latch)
+    return EDGE_SUCC (bb, 1);
+  return exit;
+#endif
 }
 
 /* Function bb_in_loop_p




-- 

[-- Attachment #2: rb17975.patch --]
[-- Type: text/plain, Size: 821 bytes --]

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198..d6b35372623e94e02965510ab557cb568c302ebe 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -964,6 +964,7 @@ vec_init_loop_exit_info (class loop *loop)
   if (exits.length () == 1)
     return exits[0];
 
+#if 0
   /* If we have multiple exits we only support counting IV at the moment.  Analyze
      all exits and return one */
   class tree_niter_desc niter_desc;
@@ -982,6 +983,16 @@ vec_init_loop_exit_info (class loop *loop)
     }
 
   return candidate;
+#else
+  basic_block bb = ip_normal_pos (loop);
+  if (!bb)
+    return NULL;
+
+  edge exit = EDGE_SUCC (bb, 0);
+  if (exit->dest == loop->latch)
+    return EDGE_SUCC (bb, 1);
+  return exit;
+#endif
 }
 
 /* Function bb_in_loop_p




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (35 preceding siblings ...)
  2023-11-06  7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
@ 2023-11-06  7:41 ` Tamar Christina
  2023-11-06  7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 2283 bytes --]

Hi All,

I didn't want these to get lost in the noise of updates.

The following three tests now correctly work for targets that have an
implementation of cbranch for vectors so XFAILs are conditionally removed gated
on vect_early_break support.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/tsvc/vect-tsvc-s332.c: Remove xfail when early break
	supported.
	* gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
	* gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -49,4 +49,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break} } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -37,4 +37,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */




-- 

[-- Attachment #2: rb17507.patch --]
[-- Type: text/plain, Size: 1703 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -49,4 +49,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break} } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -37,4 +37,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (36 preceding siblings ...)
  2023-11-06  7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
@ 2023-11-06  7:41 ` Tamar Christina
  2023-11-28 16:37   ` Richard Sandiford
  2023-11-06  7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
                   ` (4 subsequent siblings)
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 4899 bytes --]

Hi All,

This adds an implementation for conditional branch optab for AArch64.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
    {
      b[i] += a[i];
      if (a[i] > 0)
	break;
    }
}

For 128-bit vectors we generate:

        cmgt    v1.4s, v1.4s, #0
        umaxp   v1.4s, v1.4s, v1.4s
        fmov    x3, d1
        cbnz    x3, .L8

and of 64-bit vector we can omit the compression:

        cmgt    v1.2s, v1.2s, #0
        fmov    x2, d1
        cbz     x2, .L13

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (cbranch<mode>4): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+        (if_then_else
+          (match_operator 0 "aarch64_equality_operator"
+            [(match_operand:VDQ_I 1 "register_operand")
+             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+          (label_ref (match_operand 3 ""))
+          (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+     so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (<MODE>mode))
+    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
+					operands[2]));
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      rtx reduc = gen_lowpart (V4SImode, tmp);
+      rtx res = gen_reg_rtx (V4SImode);
+      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+    }
+
+  rtx val = gen_reg_rtx (DImode);
+  emit_move_insn (val, gen_lowpart (DImode, tmp));
+
+  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}




-- 

[-- Attachment #2: rb17509.patch --]
[-- Type: text/plain, Size: 4128 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+        (if_then_else
+          (match_operator 0 "aarch64_equality_operator"
+            [(match_operand:VDQ_I 1 "register_operand")
+             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+          (label_ref (match_operand 3 ""))
+          (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+     so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (<MODE>mode))
+    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
+					operands[2]));
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      rtx reduc = gen_lowpart (V4SImode, tmp);
+      rtx res = gen_reg_rtx (V4SImode);
+      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+    }
+
+  rtx val = gen_reg_rtx (DImode);
+  emit_move_insn (val, gen_lowpart (DImode, tmp));
+
+  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 for Advanced SIMD
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (37 preceding siblings ...)
  2023-11-06  7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
@ 2023-11-06  7:42 ` Tamar Christina
  2023-11-06  7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 3156 bytes --]

Hi All,

Advanced SIMD lacks a cmpeq for vectors, and unlike compare to 0 we can't
rewrite to a cmtst.

This operation is however fairly common, especially now that we support early
break vectorization.

As such this adds a pattern to recognize the negated any comparison and
transform it to an all.  i.e. any(~x) => all(x) and invert the branches.

For e.g.

void f1 (int x)
{
  for (int i = 0; i < N; i++)
    {
      b[i] += a[i];
      if (a[i] != x)
	break;
    }
}

We currently generate:

	cmeq	v31.4s, v30.4s, v29.4s
	not	v31.16b, v31.16b
	umaxp	v31.4s, v31.4s, v31.4s
	fmov	x5, d31
	cbnz	x5, .L2

and after this patch:

	cmeq	v31.4s, v30.4s, v29.4s
	uminp	v31.4s, v31.4s, v31.4s
	fmov	x5, d31
	cbz	x5, .L2

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (*cbranchnev4si): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vect-early-break-cbranch_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3870,6 +3870,37 @@ (define_expand "cbranch<mode>4"
   DONE;
 })
 
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+(define_insn_and_split "*cbranchnev4si"
+  [(set (pc)
+    (if_then_else
+      (ne (subreg:DI
+	    (unspec:V4SI
+	      [(not:V4SI (match_operand:V4SI 0 "register_operand" "w"))
+	       (not:V4SI (match_dup 0))]
+		UNSPEC_UMAXV) 0)
+	   (const_int 0))
+	(label_ref (match_operand 1 ""))
+	(pc)))
+    (clobber (match_scratch:DI 2 "=w"))]
+  "TARGET_SIMD"
+  "#"
+  "&& true"
+  [(set (match_dup 2)
+	(unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV))
+   (set (pc)
+    (if_then_else
+      (eq (subreg:DI (match_dup 2) 0)
+	  (const_int 0))
+	(label_ref (match_dup 1))
+	(pc)))]
+{
+  if (can_create_pseudo_p ())
+    operands[2] = gen_reg_rtx (V4SImode);
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..e81027bb50138be627f4dfdffb1557893a5a7723
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+	cmeq	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+	uminp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+	fmov	x[0-9]+, d[0-9]+
+	cbz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 (int x)
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != x)
+	break;
+    }
+}




-- 

[-- Attachment #2: rb17510.patch --]
[-- Type: text/plain, Size: 2147 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3870,6 +3870,37 @@ (define_expand "cbranch<mode>4"
   DONE;
 })
 
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+(define_insn_and_split "*cbranchnev4si"
+  [(set (pc)
+    (if_then_else
+      (ne (subreg:DI
+	    (unspec:V4SI
+	      [(not:V4SI (match_operand:V4SI 0 "register_operand" "w"))
+	       (not:V4SI (match_dup 0))]
+		UNSPEC_UMAXV) 0)
+	   (const_int 0))
+	(label_ref (match_operand 1 ""))
+	(pc)))
+    (clobber (match_scratch:DI 2 "=w"))]
+  "TARGET_SIMD"
+  "#"
+  "&& true"
+  [(set (match_dup 2)
+	(unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV))
+   (set (pc)
+    (if_then_else
+      (eq (subreg:DI (match_dup 2) 0)
+	  (const_int 0))
+	(label_ref (match_dup 1))
+	(pc)))]
+{
+  if (can_create_pseudo_p ())
+    operands[2] = gen_reg_rtx (V4SImode);
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..e81027bb50138be627f4dfdffb1557893a5a7723
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+	cmeq	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+	uminp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+	fmov	x[0-9]+, d[0-9]+
+	cbz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 (int x)
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != x)
+	break;
+    }
+}




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and Advanced SIMD
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (38 preceding siblings ...)
  2023-11-06  7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
@ 2023-11-06  7:42 ` Tamar Christina
  2023-11-06  7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 14855 bytes --]

Hi All,

Advanced SIMD lacks flag setting vector comparisons which SVE adds.  Since machines
with SVE also support Advanced SIMD we can use the SVE comparisons to perform the
operation in cases where SVE codegen is allowed, but the vectorizer has decided
to generate Advanced SIMD because of loop costing.

e.g. for

void f1 (int x)
{
  for (int i = 0; i < N; i++)
    {
      b[i] += a[i];
      if (a[i] != x)
	break;
    }
}

We currently generate:

        cmeq    v31.4s, v31.4s, v28.4s
        uminp   v31.4s, v31.4s, v31.4s
        fmov    x5, d31
        cbz     x5, .L2

and after this patch:

        ptrue   p7.b, vl16
        ...
        cmpne   p15.s, p7/z, z31.s, z28.s
        b.any   .L2

Because we need to lift the predicate creation to outside of the loop we need to
expand the predicate early,  however in the cbranch expansion we don't see the
outer compare which we need to consume.

For this reason the expansion is two fold, when expanding the cbranch we emit an
SVE predicated comparison and later on during combine we match the SVE and NEON
comparison while also consuming the ptest.

Unfortunately *aarch64_pred_cmpne<mode><EQL:code>_neon_ptest is needed because
for some reason combine destroys the NOT and transforms it into a plus and -1.

For the straight SVE ones, we seem to fail to eliminate the ptest in these cases
but that's a separate optimization

Test show that I'm missing a few, but before I write the patterns for them, are
these OK?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (cbranch<mode>4): Update with SVE.
	* config/aarch64/aarch64-sve.md
	(*aarch64_pred_cmp<UCOMPARISONS:cmp_op><mode><EQL:code>_neon_ptest,
	*aarch64_pred_cmpeq<mode><EQL:code>_neon_ptest,
	*aarch64_pred_cmpne<mode><EQL:code>_neon_ptest): New.
	(aarch64_ptest<mode>): Rename to...
	(@aarch64_ptest<mode>): ... This.
	* genemit.cc: Include rtx-vector-builder.h.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/vect-early-break-cbranch_1.c: New test.
	* gcc.target/aarch64/sve/vect-early-break-cbranch_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c06dd4fd6f85e07f0d4a77992b2bc06f04a1935b..33799dc35a1b90dd60d7e487ec41c5d84fb215a5 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3883,31 +3883,58 @@ (define_expand "cbranch<mode>4"
   "TARGET_SIMD"
 {
   auto code = GET_CODE (operands[0]);
-  rtx tmp = operands[1];
-
-  /* If comparing against a non-zero vector we have to do a comparison first
-     so we can have a != 0 comparison with the result.  */
-  if (operands[2] != CONST0_RTX (<MODE>mode))
-    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
-					operands[2]));
-
-  /* For 64-bit vectors we need no reductions.  */
-  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+  /* If SVE is available, lets borrow some instructions.  We will optimize
+     these further later in combine.  */
+  if (TARGET_SVE)
     {
-      /* Always reduce using a V4SI.  */
-      rtx reduc = gen_lowpart (V4SImode, tmp);
-      rtx res = gen_reg_rtx (V4SImode);
-      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
-      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+      machine_mode full_mode = aarch64_full_sve_mode (<VEL>mode).require ();
+      rtx in1 = lowpart_subreg (full_mode, operands[1], <MODE>mode);
+      rtx in2 = lowpart_subreg (full_mode, operands[2], <MODE>mode);
+
+      machine_mode pred_mode = aarch64_sve_pred_mode (full_mode);
+      rtx_vector_builder builder (VNx16BImode, 16, 2);
+      for (unsigned int i = 0; i < 16; ++i)
+	builder.quick_push (CONST1_RTX (BImode));
+      for (unsigned int i = 0; i < 16; ++i)
+	builder.quick_push (CONST0_RTX (BImode));
+      rtx ptrue = force_reg (VNx16BImode, builder.build ());
+      rtx cast_ptrue = gen_lowpart (pred_mode, ptrue);
+      rtx ptrue_flag = gen_int_mode (SVE_KNOWN_PTRUE, SImode);
+
+      rtx tmp = gen_reg_rtx (pred_mode);
+      aarch64_expand_sve_vec_cmp_int (tmp, reverse_condition (code), in1, in2);
+      emit_insn (gen_aarch64_ptest (pred_mode, ptrue, cast_ptrue, ptrue_flag, tmp));
+      operands[1] = gen_rtx_REG (CC_NZCmode, CC_REGNUM);
+      operands[2] = const0_rtx;
     }
+  else
+    {
+      rtx tmp = operands[1];
 
-  rtx val = gen_reg_rtx (DImode);
-  emit_move_insn (val, gen_lowpart (DImode, tmp));
+      /* If comparing against a non-zero vector we have to do a comparison first
+	 so we can have a != 0 comparison with the result.  */
+      if (operands[2] != CONST0_RTX (<MODE>mode))
+	emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
+					    operands[2]));
 
-  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
-  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
-  DONE;
+      /* For 64-bit vectors we need no reductions.  */
+      if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+	{
+	  /* Always reduce using a V4SI.  */
+	  rtx reduc = gen_lowpart (V4SImode, tmp);
+	  rtx res = gen_reg_rtx (V4SImode);
+	  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+	  emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+	}
+
+      rtx val = gen_reg_rtx (DImode);
+      emit_move_insn (val, gen_lowpart (DImode, tmp));
+
+      rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+      rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+      emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+      DONE;
+    }
 })
 
 ;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 5a652d8536a0ef9461f40da7b22834e683e73ceb..d9cc5c7e5629691e7abba7a18e308d35082e027d 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -8123,6 +8123,105 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest"
   "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d"
 )
 
+;; Predicated integer comparisons over Advanced SIMD arguments in which only
+;; the flags result is interesting.
+(define_insn "*aarch64_pred_cmp<UCOMPARISONS:cmp_op><mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (neg:<V128>
+		  (UCOMPARISONS:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmp<UCOMPARISONS:cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
+;; Predicated integer comparisons over Advanced SIMD arguments in which only
+;; the flags result is interesting.
+(define_insn "*aarch64_pred_cmpeq<mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (neg:<V128>
+		  (eq:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmpeq\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
+;; Same as the above but version for == and !=
+(define_insn "*aarch64_pred_cmpne<mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (plus:<V128>
+		  (eq:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))
+		  (match_operand:<V128> 9 "aarch64_simd_imm_minus_one" "i")) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmpne\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [INT] While tests
 ;; -------------------------------------------------------------------------
@@ -8602,7 +8701,7 @@ (define_expand "cbranch<mode>4"
 )
 
 ;; See "Description of UNSPEC_PTEST" above for details.
-(define_insn "aarch64_ptest<mode>"
+(define_insn "@aarch64_ptest<mode>"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC [(match_operand:VNx16BI 0 "register_operand" "Upa")
 			(match_operand 1)
diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index 1ce0564076d8b0d39542f49dd51e5df01cc83c35..73309ca00ec0aa3cd76c85e04535bac44cb2f354 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -906,6 +906,7 @@ from the machine description file `md'.  */\n\n");
   printf ("#include \"tm-constrs.h\"\n");
   printf ("#include \"ggc.h\"\n");
   printf ("#include \"target.h\"\n\n");
+  printf ("#include \"rtx-vector-builder.h\"\n\n");
 
   /* Read the machine description.  */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..c281cfccbe12f0ac8c01ede563dbe325237902c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmpgt	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmpge	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmpeq	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmplt	p[0-9]+.s, p7/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmple	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..f1ca3eafc5ae33393a7df9b5e40fa3420a79bfc2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c
@@ -0,0 +1,114 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param=aarch64-autovec-preference=1" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmpeq	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}




-- 

[-- Attachment #2: rb17511.patch --]
[-- Type: text/plain, Size: 12679 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c06dd4fd6f85e07f0d4a77992b2bc06f04a1935b..33799dc35a1b90dd60d7e487ec41c5d84fb215a5 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3883,31 +3883,58 @@ (define_expand "cbranch<mode>4"
   "TARGET_SIMD"
 {
   auto code = GET_CODE (operands[0]);
-  rtx tmp = operands[1];
-
-  /* If comparing against a non-zero vector we have to do a comparison first
-     so we can have a != 0 comparison with the result.  */
-  if (operands[2] != CONST0_RTX (<MODE>mode))
-    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
-					operands[2]));
-
-  /* For 64-bit vectors we need no reductions.  */
-  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+  /* If SVE is available, lets borrow some instructions.  We will optimize
+     these further later in combine.  */
+  if (TARGET_SVE)
     {
-      /* Always reduce using a V4SI.  */
-      rtx reduc = gen_lowpart (V4SImode, tmp);
-      rtx res = gen_reg_rtx (V4SImode);
-      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
-      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+      machine_mode full_mode = aarch64_full_sve_mode (<VEL>mode).require ();
+      rtx in1 = lowpart_subreg (full_mode, operands[1], <MODE>mode);
+      rtx in2 = lowpart_subreg (full_mode, operands[2], <MODE>mode);
+
+      machine_mode pred_mode = aarch64_sve_pred_mode (full_mode);
+      rtx_vector_builder builder (VNx16BImode, 16, 2);
+      for (unsigned int i = 0; i < 16; ++i)
+	builder.quick_push (CONST1_RTX (BImode));
+      for (unsigned int i = 0; i < 16; ++i)
+	builder.quick_push (CONST0_RTX (BImode));
+      rtx ptrue = force_reg (VNx16BImode, builder.build ());
+      rtx cast_ptrue = gen_lowpart (pred_mode, ptrue);
+      rtx ptrue_flag = gen_int_mode (SVE_KNOWN_PTRUE, SImode);
+
+      rtx tmp = gen_reg_rtx (pred_mode);
+      aarch64_expand_sve_vec_cmp_int (tmp, reverse_condition (code), in1, in2);
+      emit_insn (gen_aarch64_ptest (pred_mode, ptrue, cast_ptrue, ptrue_flag, tmp));
+      operands[1] = gen_rtx_REG (CC_NZCmode, CC_REGNUM);
+      operands[2] = const0_rtx;
     }
+  else
+    {
+      rtx tmp = operands[1];
 
-  rtx val = gen_reg_rtx (DImode);
-  emit_move_insn (val, gen_lowpart (DImode, tmp));
+      /* If comparing against a non-zero vector we have to do a comparison first
+	 so we can have a != 0 comparison with the result.  */
+      if (operands[2] != CONST0_RTX (<MODE>mode))
+	emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
+					    operands[2]));
 
-  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
-  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
-  DONE;
+      /* For 64-bit vectors we need no reductions.  */
+      if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+	{
+	  /* Always reduce using a V4SI.  */
+	  rtx reduc = gen_lowpart (V4SImode, tmp);
+	  rtx res = gen_reg_rtx (V4SImode);
+	  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+	  emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+	}
+
+      rtx val = gen_reg_rtx (DImode);
+      emit_move_insn (val, gen_lowpart (DImode, tmp));
+
+      rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+      rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+      emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+      DONE;
+    }
 })
 
 ;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 5a652d8536a0ef9461f40da7b22834e683e73ceb..d9cc5c7e5629691e7abba7a18e308d35082e027d 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -8123,6 +8123,105 @@ (define_insn "*aarch64_pred_cmp<cmp_op><mode>_wide_ptest"
   "cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.d"
 )
 
+;; Predicated integer comparisons over Advanced SIMD arguments in which only
+;; the flags result is interesting.
+(define_insn "*aarch64_pred_cmp<UCOMPARISONS:cmp_op><mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (neg:<V128>
+		  (UCOMPARISONS:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmp<UCOMPARISONS:cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
+;; Predicated integer comparisons over Advanced SIMD arguments in which only
+;; the flags result is interesting.
+(define_insn "*aarch64_pred_cmpeq<mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (neg:<V128>
+		  (eq:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmpeq\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
+;; Same as the above but version for == and !=
+(define_insn "*aarch64_pred_cmpne<mode><EQL:code>_neon_ptest"
+  [(set (reg:CC_NZC CC_REGNUM)
+	(unspec:CC_NZC
+	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+	   (match_operand 4)
+	   (match_operand:SI 5 "aarch64_sve_ptrue_flag")
+	   (unspec:VNx4BI
+	     [(match_operand:VNx4BI 6 "register_operand" "Upl")
+	      (match_operand:SI 7 "aarch64_sve_ptrue_flag")
+	      (EQL:VNx4BI
+		(subreg:SVE_FULL_BHSI
+		 (plus:<V128>
+		  (eq:<V128>
+		   (match_operand:<V128> 2 "register_operand" "w")
+		   (match_operand:<V128> 3 "aarch64_simd_reg_or_zero" "w"))
+		  (match_operand:<V128> 9 "aarch64_simd_imm_minus_one" "i")) 0)
+		(match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))]
+	     UNSPEC_PRED_Z)]
+	  UNSPEC_PTEST))
+   (clobber (match_scratch:VNx4BI 0 "=Upa"))]
+  "TARGET_SVE
+   && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
+{
+  operands[2] = lowpart_subreg (<MODE>mode, operands[2], <V128>mode);
+  operands[3] = lowpart_subreg (<MODE>mode, operands[3], <V128>mode);
+  if (EQ == <EQL:CODE>)
+    std::swap (operands[2], operands[3]);
+
+  return "cmpne\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>";
+}
+)
+
 ;; -------------------------------------------------------------------------
 ;; ---- [INT] While tests
 ;; -------------------------------------------------------------------------
@@ -8602,7 +8701,7 @@ (define_expand "cbranch<mode>4"
 )
 
 ;; See "Description of UNSPEC_PTEST" above for details.
-(define_insn "aarch64_ptest<mode>"
+(define_insn "@aarch64_ptest<mode>"
   [(set (reg:CC_NZC CC_REGNUM)
 	(unspec:CC_NZC [(match_operand:VNx16BI 0 "register_operand" "Upa")
 			(match_operand 1)
diff --git a/gcc/genemit.cc b/gcc/genemit.cc
index 1ce0564076d8b0d39542f49dd51e5df01cc83c35..73309ca00ec0aa3cd76c85e04535bac44cb2f354 100644
--- a/gcc/genemit.cc
+++ b/gcc/genemit.cc
@@ -906,6 +906,7 @@ from the machine description file `md'.  */\n\n");
   printf ("#include \"tm-constrs.h\"\n");
   printf ("#include \"ggc.h\"\n");
   printf ("#include \"target.h\"\n\n");
+  printf ("#include \"rtx-vector-builder.h\"\n\n");
 
   /* Read the machine description.  */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..c281cfccbe12f0ac8c01ede563dbe325237902c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmpgt	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmpge	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmpeq	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmplt	p[0-9]+.s, p7/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmple	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	ptest	p[0-9]+, p[0-9]+.b
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..f1ca3eafc5ae33393a7df9b5e40fa3420a79bfc2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c
@@ -0,0 +1,114 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param=aarch64-autovec-preference=1" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmpeq	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	cmpne	p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0
+**	b.any	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (39 preceding siblings ...)
  2023-11-06  7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
@ 2023-11-06  7:42 ` Tamar Christina
  2023-11-27 12:48   ` Kyrylo Tkachov
  2023-11-06  7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
  2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Ramana.Radhakrishnan, Richard.Earnshaw, nickc, Kyrylo.Tkachov

[-- Attachment #1: Type: text/plain, Size: 6094 bytes --]

Hi All,

This adds an implementation for conditional branch optab for AArch32.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
    {
      b[i] += a[i];
      if (a[i] > 0)
	break;
    }
}

For 128-bit vectors we generate:

        vcgt.s32        q8, q9, #0
        vpmax.u32       d7, d16, d17
        vpmax.u32       d7, d7, d7
        vmov    r3, s14 @ int
        cmp     r3, #0

and of 64-bit vector we can omit one vpmax as we still need to compress to
32-bits.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/arm/neon.md (cbranch<mode>4): New.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp (vect_early_break): Add AArch32.
	* gcc.target/arm/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -408,6 +408,45 @@ (define_insn "vec_extract<mode><V_elem_l>"
   [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
 )
 
+;; Patterns comparing two vectors and conditionally jump.
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+;;
+;; However unlike the AArch64 version, we can't optimize this further as the
+;; chain is too long for combine due to these being unspecs so it doesn't fold
+;; the operation to something simpler.
+(define_expand "cbranch<mode>4"
+  [(set (pc) (if_then_else
+	      (match_operator 0 "expandable_comparison_operator"
+	       [(match_operand:VDQI 1 "register_operand")
+	        (match_operand:VDQI 2 "zero_operand")])
+	      (label_ref (match_operand 3 "" ""))
+	      (pc)))]
+  "TARGET_NEON"
+{
+  rtx mask = operands[1];
+
+  /* For 128-bit vectors we need an additional reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      mask = gen_reg_rtx (V2SImode);
+      rtx low = gen_reg_rtx (V2SImode);
+      rtx high = gen_reg_rtx (V2SImode);
+      emit_insn (gen_neon_vget_lowv4si (low, operands[1]));
+      emit_insn (gen_neon_vget_highv4si (high, operands[1]));
+      emit_insn (gen_neon_vpumaxv2si (mask, low, high));
+    }
+
+  emit_insn (gen_neon_vpumaxv2si (mask, mask, mask));
+
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, mask));
+  emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; This pattern is renamed from "vec_extract<mode><V_elem_l>" to
 ;; "neon_vec_extract<mode><V_elem_l>" and this pattern is called
 ;; by define_expand in vec-common.md file.
diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..2c05aa10d26ed4ac9785672e6e3b4355cef046dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
@@ -0,0 +1,136 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/* f1:
+**	...
+**	vcgt.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	vcge.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	vceq.i32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	vceq.i32	q[0-9]+, q[0-9]+, #0
+**	vmvn	q[0-9]+, q[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	vclt.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	vcle.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5516188dc0aa86d161d67dea5a7769e3c3d72f85..8f58671e6cfd3546c6a98e40341fe31c6492594b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3784,6 +3784,7 @@ proc check_effective_target_vect_early_break { } {
     return [check_cached_effective_target_indexed vect_early_break {
       expr {
 	[istarget aarch64*-*-*]
+	|| [check_effective_target_arm_neon_ok]
 	}}]
 }
 # Return 1 if the target supports hardware vectorization of complex additions of




-- 

[-- Attachment #2: rb17512.patch --]
[-- Type: text/plain, Size: 5281 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -408,6 +408,45 @@ (define_insn "vec_extract<mode><V_elem_l>"
   [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
 )
 
+;; Patterns comparing two vectors and conditionally jump.
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+;;
+;; However unlike the AArch64 version, we can't optimize this further as the
+;; chain is too long for combine due to these being unspecs so it doesn't fold
+;; the operation to something simpler.
+(define_expand "cbranch<mode>4"
+  [(set (pc) (if_then_else
+	      (match_operator 0 "expandable_comparison_operator"
+	       [(match_operand:VDQI 1 "register_operand")
+	        (match_operand:VDQI 2 "zero_operand")])
+	      (label_ref (match_operand 3 "" ""))
+	      (pc)))]
+  "TARGET_NEON"
+{
+  rtx mask = operands[1];
+
+  /* For 128-bit vectors we need an additional reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      mask = gen_reg_rtx (V2SImode);
+      rtx low = gen_reg_rtx (V2SImode);
+      rtx high = gen_reg_rtx (V2SImode);
+      emit_insn (gen_neon_vget_lowv4si (low, operands[1]));
+      emit_insn (gen_neon_vget_highv4si (high, operands[1]));
+      emit_insn (gen_neon_vpumaxv2si (mask, low, high));
+    }
+
+  emit_insn (gen_neon_vpumaxv2si (mask, mask, mask));
+
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, mask));
+  emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; This pattern is renamed from "vec_extract<mode><V_elem_l>" to
 ;; "neon_vec_extract<mode><V_elem_l>" and this pattern is called
 ;; by define_expand in vec-common.md file.
diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..2c05aa10d26ed4ac9785672e6e3b4355cef046dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
@@ -0,0 +1,136 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/* f1:
+**	...
+**	vcgt.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	vcge.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	vceq.i32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	vceq.i32	q[0-9]+, q[0-9]+, #0
+**	vmvn	q[0-9]+, q[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	vclt.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	vcle.s32	q[0-9]+, q[0-9]+, #0
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
+**	vmov	r[0-9]+, s[0-9]+	@ int
+**	cmp	r[0-9]+, #0
+**	bne	\.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5516188dc0aa86d161d67dea5a7769e3c3d72f85..8f58671e6cfd3546c6a98e40341fe31c6492594b 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3784,6 +3784,7 @@ proc check_effective_target_vect_early_break { } {
     return [check_cached_effective_target_indexed vect_early_break {
       expr {
 	[istarget aarch64*-*-*]
+	|| [check_effective_target_arm_neon_ok]
 	}}]
 }
 # Return 1 if the target supports hardware vectorization of complex additions of




^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH 21/21]Arm: Add MVE cbranch implementation
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (40 preceding siblings ...)
  2023-11-06  7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
@ 2023-11-06  7:43 ` Tamar Christina
  2023-11-27 12:47   ` Kyrylo Tkachov
  2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
  42 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06  7:43 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Ramana.Radhakrishnan, Richard.Earnshaw, nickc, Kyrylo.Tkachov

[-- Attachment #1: Type: text/plain, Size: 6234 bytes --]

Hi All,

This adds an implementation for conditional branch optab for MVE.

Unfortunately MVE has rather limited operations on VPT.P0, we are missing the
ability to do P0 comparisons and logical OR on P0.

For that reason we can only support cbranch with 0, as for comparing to a 0
predicate we don't need to actually do a comparison, we only have to check that
any bit is set within P0.

Because we can only do P0 comparisons with 0, the costing of the comparison was
reduced in order for the compiler not to try to push 0 to a register thinking
it's too expensive.  For the cbranch implementation to be safe we must see the
constant 0 vector.

For the lack of logical OR on P0 we can't really work around.  This means MVE
can't support cases where the sizes of operands in the comparison don't match,
i.e. when one operand has been unpacked.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
    {
      b[i] += a[i];
      if (a[i] > 0)
	break;
    }
}

For 128-bit vectors we generate:

        vcmp.s32        gt, q3, q1
        vmrs    r3, p0  @ movhi
        cbnz    r3, .L2

MVE does not have 64-bit vector comparisons, as such that is also not supported.

Bootstrapped arm-none-linux-gnueabihf and regtested with
-march=armv8.1-m.main+mve -mfpu=auto and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/arm/arm.cc (arm_rtx_costs_internal): Update costs for pred 0
	compares.
	* config/arm/mve.md (cbranch<mode>4): New.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp (vect_early_break): Add MVE.
	* gcc.target/arm/mve/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84b255a24eb51e32 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	   || TARGET_HAVE_MVE)
 	  && simd_immediate_valid_for_move (x, mode, NULL, NULL))
 	*cost = COSTS_N_INSNS (1);
+      else if (TARGET_HAVE_MVE
+	       && outer_code == COMPARE
+	       && VALID_MVE_PRED_MODE (mode))
+	/* MVE allows very limited instructions on VPT.P0,  however comparisons
+	   to 0 do not require us to materialze this constant or require a
+	   predicate comparison as we can go through SImode.  For that reason
+	   allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to
+	   registers as we can't compare two predicates.  */
+	*cost = COSTS_N_INSNS (1);
       else
 	*cost = COSTS_N_INSNS (4);
       return true;
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38306dd43cbebfb3f 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -6880,6 +6880,21 @@ (define_expand "vcond_mask_<mode><MVE_vpred>"
   DONE;
 })
 
+(define_expand "cbranch<mode>4"
+  [(set (pc) (if_then_else
+	      (match_operator 0 "expandable_comparison_operator"
+	       [(match_operand:MVE_7 1 "register_operand")
+	        (match_operand:MVE_7 2 "zero_operand")])
+	      (label_ref (match_operand 3 "" ""))
+	      (pc)))]
+  "TARGET_HAVE_MVE"
+{
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, operands[1]));
+  emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
 (define_expand "@arm_mve_reinterpret<mode>"
   [(set (match_operand:MVE_vecs 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c3b8506dca0b2b044e6869a6c8259d663c1ff930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/*
+** f1:
+**	...
+**	vcmp.s32	gt, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	vcmp.s32	ge, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	vcmp.i32	eq, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	vcmp.i32	ne, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	vcmp.s32	lt, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	vcmp.s32	le, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8f58671e6cfd3546c6a98e40341fe31c6492594b..1eef764542a782786e27ed935a06243e319ae3fc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3785,6 +3785,8 @@ proc check_effective_target_vect_early_break { } {
       expr {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_neon_ok]
+	|| ([check_effective_target_arm_v8_1m_mve_fp_ok]
+	     && [check_effective_target_arm_little_endian])
 	}}]
 }
 # Return 1 if the target supports hardware vectorization of complex additions of




-- 

[-- Attachment #2: rb17513.patch --]
[-- Type: text/plain, Size: 4604 bytes --]

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84b255a24eb51e32 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	   || TARGET_HAVE_MVE)
 	  && simd_immediate_valid_for_move (x, mode, NULL, NULL))
 	*cost = COSTS_N_INSNS (1);
+      else if (TARGET_HAVE_MVE
+	       && outer_code == COMPARE
+	       && VALID_MVE_PRED_MODE (mode))
+	/* MVE allows very limited instructions on VPT.P0,  however comparisons
+	   to 0 do not require us to materialze this constant or require a
+	   predicate comparison as we can go through SImode.  For that reason
+	   allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to
+	   registers as we can't compare two predicates.  */
+	*cost = COSTS_N_INSNS (1);
       else
 	*cost = COSTS_N_INSNS (4);
       return true;
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38306dd43cbebfb3f 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -6880,6 +6880,21 @@ (define_expand "vcond_mask_<mode><MVE_vpred>"
   DONE;
 })
 
+(define_expand "cbranch<mode>4"
+  [(set (pc) (if_then_else
+	      (match_operator 0 "expandable_comparison_operator"
+	       [(match_operand:MVE_7 1 "register_operand")
+	        (match_operand:MVE_7 2 "zero_operand")])
+	      (label_ref (match_operand 3 "" ""))
+	      (pc)))]
+  "TARGET_HAVE_MVE"
+{
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, operands[1]));
+  emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
 (define_expand "@arm_mve_reinterpret<mode>"
   [(set (match_operand:MVE_vecs 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c3b8506dca0b2b044e6869a6c8259d663c1ff930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/*
+** f1:
+**	...
+**	vcmp.s32	gt, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	vcmp.s32	ge, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	vcmp.i32	eq, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	vcmp.i32	ne, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	vcmp.s32	lt, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	vcmp.s32	le, q[0-9]+, q[0-9]+
+**	vmrs	r[0-9]+, p0	@ movhi
+**	cbnz	r[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 8f58671e6cfd3546c6a98e40341fe31c6492594b..1eef764542a782786e27ed935a06243e319ae3fc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3785,6 +3785,8 @@ proc check_effective_target_vect_early_break { } {
       expr {
 	[istarget aarch64*-*-*]
 	|| [check_effective_target_arm_neon_ok]
+	|| ([check_effective_target_arm_v8_1m_mve_fp_ok]
+	     && [check_effective_target_arm_little_endian])
 	}}]
 }
 # Return 1 if the target supports hardware vectorization of complex additions of




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization
  2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
                   ` (41 preceding siblings ...)
  2023-11-06  7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
@ 2023-11-06 14:25 ` Richard Biener
  2023-11-06 15:17   ` Tamar Christina
  42 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-06 14:25 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> This patch adds initial support for early break vectorization in GCC.
> The support is added for any target that implements a vector cbranch optab,
> this includes both fully masked and non-masked targets.
> 
> Depending on the operation, the vectorizer may also require support for boolean
> mask reductions using Inclusive OR.  This is however only checked then the
> comparison would produce multiple statements.
> 
> Note: I am currently struggling to get patch 7 correct in all cases and could use
>       some feedback there.
> 
> Concretely the kind of loops supported are of the forms:
> 
>  for (int i = 0; i < N; i++)
>  {
>    <statements1>
>    if (<condition>)
>      {
>        ...
>        <action>;
>      }
>    <statements2>
>  }
> 
> where <action> can be:
>  - break
>  - return
>  - goto
> 
> Any number of statements can be used before the <action> occurs.
> 
> Since this is an initial version for GCC 14 it has the following limitations and
> features:
> 
> - Only fixed sized iterations and buffers are supported.  That is to say any
>   vectors loaded or stored must be to statically allocated arrays with known
>   sizes. N must also be known.  This limitation is because our primary target
>   for this optimization is SVE.  For VLA SVE we can't easily do cross page
>   iteraion checks. The result is likely to also not be beneficial. For that
>   reason we punt support for variable buffers till we have First-Faulting
>   support in GCC.
> - any stores in <statements1> should not be to the same objects as in
>   <condition>.  Loads are fine as long as they don't have the possibility to
>   alias.  More concretely, we block RAW dependencies when the intermediate value
>   can't be separated fromt the store, or the store itself can't be moved.
> - Prologue peeling, alignment peelinig and loop versioning are supported.
> - Fully masked loops, unmasked loops and partially masked loops are supported
> - Any number of loop early exits are supported.
> - No support for epilogue vectorization.  The only epilogue supported is the
>   scalar final one.  Peeling code supports it but the code motion code cannot
>   find instructions to make the move in the epilog.
> - Early breaks are only supported for inner loop vectorization.
> 
> I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break
> 
> With the help of IPA and LTO this still gets hit quite often.  During bootstrap
> it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
> since these are tests for support for early exit vectorization.
> 
> This implementation does not support completely handling the early break inside
> the vector loop itself but instead supports adding checks such that if we know
> that we have to exit in the current iteration then we branch to scalar code to
> actually do the final VF iterations which handles all the code in <action>.
> 
> For the scalar loop we know that whatever exit you take you have to perform at
> most VF iterations.  For vector code we only case about the state of fully
> performed iteration and reset the scalar code to the (partially) remaining loop.
> 
> That is to say, the first vector loop executes so long as the early exit isn't
> needed.  Once the exit is taken, the scalar code will perform at most VF extra
> iterations.  The exact number depending on peeling and iteration start and which
> exit was taken (natural or early).   For this scalar loop, all early exits are
> treated the same.
> 
> When we vectorize we move any statement not related to the early break itself
> and that would be incorrect to execute before the break (i.e. has side effects)
> to after the break.  If this is not possible we decline to vectorize.
> 
> This means that we check at the start of iterations whether we are going to exit
> or not.  During the analyis phase we check whether we are allowed to do this
> moving of statements.  Also note that we only move the scalar statements, but
> only do so after peeling but just before we start transforming statements.
> 
> Codegen:
> 
> for e.g.
> 
> #define N 803
> unsigned vect_a[N];
> unsigned vect_b[N];
> 
> unsigned test4(unsigned x)
> {
>  unsigned ret = 0;
>  for (int i = 0; i < N; i++)
>  {
>    vect_b[i] = x + i;
>    if (vect_a[i] > x)
>      break;
>    vect_a[i] = x;
> 
>  }
>  return ret;
> }
> 
> We generate for Adv. SIMD:
> 
> test4:
>         adrp    x2, .LC0
>         adrp    x3, .LANCHOR0
>         dup     v2.4s, w0
>         add     x3, x3, :lo12:.LANCHOR0
>         movi    v4.4s, 0x4
>         add     x4, x3, 3216
>         ldr     q1, [x2, #:lo12:.LC0]
>         mov     x1, 0
>         mov     w2, 0
>         .p2align 3,,7
> .L3:
>         ldr     q0, [x3, x1]
>         add     v3.4s, v1.4s, v2.4s
>         add     v1.4s, v1.4s, v4.4s
>         cmhi    v0.4s, v0.4s, v2.4s
>         umaxp   v0.4s, v0.4s, v0.4s
>         fmov    x5, d0
>         cbnz    x5, .L6
>         add     w2, w2, 1
>         str     q3, [x1, x4]
>         str     q2, [x3, x1]
>         add     x1, x1, 16
>         cmp     w2, 200
>         bne     .L3
>         mov     w7, 3
> .L2:
>         lsl     w2, w2, 2
>         add     x5, x3, 3216
>         add     w6, w2, w0
>         sxtw    x4, w2
>         ldr     w1, [x3, x4, lsl 2]
>         str     w6, [x5, x4, lsl 2]
>         cmp     w0, w1
>         bcc     .L4
>         add     w1, w2, 1
>         str     w0, [x3, x4, lsl 2]
>         add     w6, w1, w0
>         sxtw    x1, w1
>         ldr     w4, [x3, x1, lsl 2]
>         str     w6, [x5, x1, lsl 2]
>         cmp     w0, w4
>         bcc     .L4
>         add     w4, w2, 2
>         str     w0, [x3, x1, lsl 2]
>         sxtw    x1, w4
>         add     w6, w1, w0
>         ldr     w4, [x3, x1, lsl 2]
>         str     w6, [x5, x1, lsl 2]
>         cmp     w0, w4
>         bcc     .L4
>         str     w0, [x3, x1, lsl 2]
>         add     w2, w2, 3
>         cmp     w7, 3
>         beq     .L4
>         sxtw    x1, w2
>         add     w2, w2, w0
>         ldr     w4, [x3, x1, lsl 2]
>         str     w2, [x5, x1, lsl 2]
>         cmp     w0, w4
>         bcc     .L4
>         str     w0, [x3, x1, lsl 2]
> .L4:
>         mov     w0, 0
>         ret
>         .p2align 2,,3
> .L6:
>         mov     w7, 4
>         b       .L2
> 
> and for SVE:
> 
> test4:
>         adrp    x2, .LANCHOR0
>         add     x2, x2, :lo12:.LANCHOR0
>         add     x5, x2, 3216
>         mov     x3, 0
>         mov     w1, 0
>         cntw    x4
>         mov     z1.s, w0
>         index   z0.s, #0, #1
>         ptrue   p1.b, all
>         ptrue   p0.s, all
>         .p2align 3,,7
> .L3:
>         ld1w    z2.s, p1/z, [x2, x3, lsl 2]
>         add     z3.s, z0.s, z1.s
>         cmplo   p2.s, p0/z, z1.s, z2.s
>         b.any   .L2
>         st1w    z3.s, p1, [x5, x3, lsl 2]
>         add     w1, w1, 1
>         st1w    z1.s, p1, [x2, x3, lsl 2]
>         add     x3, x3, x4
>         incw    z0.s
>         cmp     w3, 803
>         bls     .L3
> .L5:
>         mov     w0, 0
>         ret
>         .p2align 2,,3
> .L2:
>         cntw    x5
>         mul     w1, w1, w5
>         cbz     w5, .L5
>         sxtw    x1, w1
>         sub     w5, w5, #1
>         add     x5, x5, x1
>         add     x6, x2, 3216
>         b       .L6
>         .p2align 2,,3
> .L14:
>         str     w0, [x2, x1, lsl 2]
>         cmp     x1, x5
>         beq     .L5
>         mov     x1, x4
> .L6:
>         ldr     w3, [x2, x1, lsl 2]
>         add     w4, w0, w1
>         str     w4, [x6, x1, lsl 2]
>         add     x4, x1, 1
>         cmp     w0, w3
>         bcs     .L14
>         mov     w0, 0
>         ret
> 
> On the workloads this work is based on we see between 2-3x performance uplift
> using this patch.
> 
> Follow up plan:
>  - Boolean vectorization has several shortcomings.  I've filed PR110223 with the
>    bigger ones that cause vectorization to fail with this patch.
>  - SLP support.  This is planned for GCC 15 as for majority of the cases build
>    SLP itself fails.

It would be nice to get at least single-lane SLP support working.  I think
you need to treat the gcond as SLP root stmt and basically do discovery
on the condition as to as if it were a mask generating condition.

Code generation would then simply schedule the gcond root instances
first (that would get you the code motion automagically).

So, add a new slp_instance_kind, for example slp_inst_kind_early_break,
and record the gcond as root stmt.  Possibly "pattern" recognizing

 gcond <_1 != _2>

as

 _mask = _1 != _2;
 gcond <_mask != 0>

makes the SLP discovery less fiddly (but in theory you can of course
handle gconds directly).

Is there any part of the series that can be pushed independelty?  If
so I'll try to look at those parts first.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg
  2023-11-06  7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
@ 2023-11-06 14:44   ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-11-06 14:44 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> The vectorizer at the moment uses a num_bb check to check for control flow.
> This rejects a number of loops with no reason.  Instead this patch changes it
> to check the destination of the exits instead.
> 
> This also allows early break to work by also dropping the single_exit check.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

I think this can go in independently, one comment below ...

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop-manip.cc (slpeel_can_duplicate_loop_p):
> 	* tree-vect-loop.cc (vect_analyze_loop_form):
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 9c1405d79fd8fe8689007df3b7605b7a3d3ecdd7..466cf4c47154099a33dc63e22d74eef42d282444 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1937,12 +1937,10 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge exit_e,
>    edge entry_e = loop_preheader_edge (loop);
>    gcond *orig_cond = get_loop_exit_condition (exit_e);
>    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> -  unsigned int num_bb = loop->inner? 5 : 2;
>  
>    /* All loops have an outer scope; the only case loop->outer is NULL is for
>       the function itself.  */
>    if (!loop_outer (loop)
> -      || loop->num_nodes != num_bb
>        || !empty_block_p (loop->latch)
>        || !exit_e
>        /* Verify that new loop exit condition can be trivially modified.  */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ddb6cad60f2f2cfdc96732f3f256d86e315d7357..27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1727,6 +1727,17 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>  		       "using as main loop exit: %d -> %d [AUX: %p]\n",
>  		       exit_e->src->index, exit_e->dest->index, exit_e->aux);
>  
> +  /* Check if we have any control flow that doesn't leave the loop.  */
> +  class loop *v_loop = loop->inner ? loop->inner : loop;
> +  basic_block *bbs= get_loop_body (v_loop);
> +  for (unsigned i = 0; i < v_loop->num_nodes; i++)
> +    if (!empty_block_p (bbs[i])
> +	&& !loop_exits_from_bb_p (v_loop, bbs[i])
> +	&& bbs[i]->loop_father == v_loop)

That looks a bit complicated.  Better matching the comment would be

       if (EDGE_COUNT (bbs[i]->succs) != 1
           && (EDGE_COUNT (bbs[i]->succs) != 2
               || !loop_exits_from_bb_p (bb[i]->loop_father, bb[i])))

I'd say OK with that change, and independently if the removed
single_exit test below isn't harmful (I suppose it is).

Btw, for the outer loop case we still have the single_exit tests
but you already said you're not supporting multi-exits there yet.

Thanks,
Richard.

> +      return opt_result::failure_at (vect_location,
> +				     "not vectorized:"
> +				     " unsupported control flow in loop.\n");
> +
>    /* Different restrictions apply when we are considering an inner-most loop,
>       vs. an outer (nested) loop.
>       (FORNOW. May want to relax some of these restrictions in the future).  */
> @@ -1746,11 +1757,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>                             |
>                          (exit-bb)  */
>  
> -      if (loop->num_nodes != 2)
> -	return opt_result::failure_at (vect_location,
> -				       "not vectorized:"
> -				       " control flow in loop.\n");
> -
>        if (empty_block_p (loop->header))
>  	return opt_result::failure_at (vect_location,
>  				       "not vectorized: empty loop.\n");
> @@ -1782,11 +1788,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>  				       "not vectorized:"
>  				       " multiple nested loops.\n");
>  
> -      if (loop->num_nodes != 5)
> -	return opt_result::failure_at (vect_location,
> -				       "not vectorized:"
> -				       " control flow in loop.\n");
> -
>        entryedge = loop_preheader_edge (innerloop);
>        if (entryedge->src != loop->header
>  	  || !single_exit (innerloop)
> @@ -1823,9 +1824,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>        info->inner_loop_cond = inner.conds[0];
>      }
>  
> -  if (!single_exit (loop))
> -    return opt_result::failure_at (vect_location,
> -				   "not vectorized: multiple exits.\n");
>    if (EDGE_COUNT (loop->header->preds) != 2)
>      return opt_result::failure_at (vect_location,
>  				   "not vectorized:"

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization
  2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
@ 2023-11-06 15:17   ` Tamar Christina
  2023-11-07  9:42     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-06 15:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, November 6, 2023 2:25 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return auto-
> vectorization
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch adds initial support for early break vectorization in GCC.
> > The support is added for any target that implements a vector cbranch
> > optab, this includes both fully masked and non-masked targets.
> >
> > Depending on the operation, the vectorizer may also require support
> > for boolean mask reductions using Inclusive OR.  This is however only
> > checked then the comparison would produce multiple statements.
> >
> > Note: I am currently struggling to get patch 7 correct in all cases and could
> use
> >       some feedback there.
> >
> > Concretely the kind of loops supported are of the forms:
> >
> >  for (int i = 0; i < N; i++)
> >  {
> >    <statements1>
> >    if (<condition>)
> >      {
> >        ...
> >        <action>;
> >      }
> >    <statements2>
> >  }
> >
> > where <action> can be:
> >  - break
> >  - return
> >  - goto
> >
> > Any number of statements can be used before the <action> occurs.
> >
> > Since this is an initial version for GCC 14 it has the following
> > limitations and
> > features:
> >
> > - Only fixed sized iterations and buffers are supported.  That is to say any
> >   vectors loaded or stored must be to statically allocated arrays with known
> >   sizes. N must also be known.  This limitation is because our primary target
> >   for this optimization is SVE.  For VLA SVE we can't easily do cross page
> >   iteraion checks. The result is likely to also not be beneficial. For that
> >   reason we punt support for variable buffers till we have First-Faulting
> >   support in GCC.
> > - any stores in <statements1> should not be to the same objects as in
> >   <condition>.  Loads are fine as long as they don't have the possibility to
> >   alias.  More concretely, we block RAW dependencies when the intermediate
> value
> >   can't be separated fromt the store, or the store itself can't be moved.
> > - Prologue peeling, alignment peelinig and loop versioning are supported.
> > - Fully masked loops, unmasked loops and partially masked loops are
> > supported
> > - Any number of loop early exits are supported.
> > - No support for epilogue vectorization.  The only epilogue supported is the
> >   scalar final one.  Peeling code supports it but the code motion code cannot
> >   find instructions to make the move in the epilog.
> > - Early breaks are only supported for inner loop vectorization.
> >
> > I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break
> >
> > With the help of IPA and LTO this still gets hit quite often.  During
> > bootstrap it hit rather frequently.  Additionally TSVC s332, s481 and
> > s482 all pass now since these are tests for support for early exit
> vectorization.
> >
> > This implementation does not support completely handling the early
> > break inside the vector loop itself but instead supports adding checks
> > such that if we know that we have to exit in the current iteration
> > then we branch to scalar code to actually do the final VF iterations which
> handles all the code in <action>.
> >
> > For the scalar loop we know that whatever exit you take you have to
> > perform at most VF iterations.  For vector code we only case about the
> > state of fully performed iteration and reset the scalar code to the (partially)
> remaining loop.
> >
> > That is to say, the first vector loop executes so long as the early
> > exit isn't needed.  Once the exit is taken, the scalar code will
> > perform at most VF extra iterations.  The exact number depending on peeling
> and iteration start and which
> > exit was taken (natural or early).   For this scalar loop, all early exits are
> > treated the same.
> >
> > When we vectorize we move any statement not related to the early break
> > itself and that would be incorrect to execute before the break (i.e.
> > has side effects) to after the break.  If this is not possible we decline to
> vectorize.
> >
> > This means that we check at the start of iterations whether we are
> > going to exit or not.  During the analyis phase we check whether we
> > are allowed to do this moving of statements.  Also note that we only
> > move the scalar statements, but only do so after peeling but just before we
> start transforming statements.
> >
> > Codegen:
> >
> > for e.g.
> >
> > #define N 803
> > unsigned vect_a[N];
> > unsigned vect_b[N];
> >
> > unsigned test4(unsigned x)
> > {
> >  unsigned ret = 0;
> >  for (int i = 0; i < N; i++)
> >  {
> >    vect_b[i] = x + i;
> >    if (vect_a[i] > x)
> >      break;
> >    vect_a[i] = x;
> >
> >  }
> >  return ret;
> > }
> >
> > We generate for Adv. SIMD:
> >
> > test4:
> >         adrp    x2, .LC0
> >         adrp    x3, .LANCHOR0
> >         dup     v2.4s, w0
> >         add     x3, x3, :lo12:.LANCHOR0
> >         movi    v4.4s, 0x4
> >         add     x4, x3, 3216
> >         ldr     q1, [x2, #:lo12:.LC0]
> >         mov     x1, 0
> >         mov     w2, 0
> >         .p2align 3,,7
> > .L3:
> >         ldr     q0, [x3, x1]
> >         add     v3.4s, v1.4s, v2.4s
> >         add     v1.4s, v1.4s, v4.4s
> >         cmhi    v0.4s, v0.4s, v2.4s
> >         umaxp   v0.4s, v0.4s, v0.4s
> >         fmov    x5, d0
> >         cbnz    x5, .L6
> >         add     w2, w2, 1
> >         str     q3, [x1, x4]
> >         str     q2, [x3, x1]
> >         add     x1, x1, 16
> >         cmp     w2, 200
> >         bne     .L3
> >         mov     w7, 3
> > .L2:
> >         lsl     w2, w2, 2
> >         add     x5, x3, 3216
> >         add     w6, w2, w0
> >         sxtw    x4, w2
> >         ldr     w1, [x3, x4, lsl 2]
> >         str     w6, [x5, x4, lsl 2]
> >         cmp     w0, w1
> >         bcc     .L4
> >         add     w1, w2, 1
> >         str     w0, [x3, x4, lsl 2]
> >         add     w6, w1, w0
> >         sxtw    x1, w1
> >         ldr     w4, [x3, x1, lsl 2]
> >         str     w6, [x5, x1, lsl 2]
> >         cmp     w0, w4
> >         bcc     .L4
> >         add     w4, w2, 2
> >         str     w0, [x3, x1, lsl 2]
> >         sxtw    x1, w4
> >         add     w6, w1, w0
> >         ldr     w4, [x3, x1, lsl 2]
> >         str     w6, [x5, x1, lsl 2]
> >         cmp     w0, w4
> >         bcc     .L4
> >         str     w0, [x3, x1, lsl 2]
> >         add     w2, w2, 3
> >         cmp     w7, 3
> >         beq     .L4
> >         sxtw    x1, w2
> >         add     w2, w2, w0
> >         ldr     w4, [x3, x1, lsl 2]
> >         str     w2, [x5, x1, lsl 2]
> >         cmp     w0, w4
> >         bcc     .L4
> >         str     w0, [x3, x1, lsl 2]
> > .L4:
> >         mov     w0, 0
> >         ret
> >         .p2align 2,,3
> > .L6:
> >         mov     w7, 4
> >         b       .L2
> >
> > and for SVE:
> >
> > test4:
> >         adrp    x2, .LANCHOR0
> >         add     x2, x2, :lo12:.LANCHOR0
> >         add     x5, x2, 3216
> >         mov     x3, 0
> >         mov     w1, 0
> >         cntw    x4
> >         mov     z1.s, w0
> >         index   z0.s, #0, #1
> >         ptrue   p1.b, all
> >         ptrue   p0.s, all
> >         .p2align 3,,7
> > .L3:
> >         ld1w    z2.s, p1/z, [x2, x3, lsl 2]
> >         add     z3.s, z0.s, z1.s
> >         cmplo   p2.s, p0/z, z1.s, z2.s
> >         b.any   .L2
> >         st1w    z3.s, p1, [x5, x3, lsl 2]
> >         add     w1, w1, 1
> >         st1w    z1.s, p1, [x2, x3, lsl 2]
> >         add     x3, x3, x4
> >         incw    z0.s
> >         cmp     w3, 803
> >         bls     .L3
> > .L5:
> >         mov     w0, 0
> >         ret
> >         .p2align 2,,3
> > .L2:
> >         cntw    x5
> >         mul     w1, w1, w5
> >         cbz     w5, .L5
> >         sxtw    x1, w1
> >         sub     w5, w5, #1
> >         add     x5, x5, x1
> >         add     x6, x2, 3216
> >         b       .L6
> >         .p2align 2,,3
> > .L14:
> >         str     w0, [x2, x1, lsl 2]
> >         cmp     x1, x5
> >         beq     .L5
> >         mov     x1, x4
> > .L6:
> >         ldr     w3, [x2, x1, lsl 2]
> >         add     w4, w0, w1
> >         str     w4, [x6, x1, lsl 2]
> >         add     x4, x1, 1
> >         cmp     w0, w3
> >         bcs     .L14
> >         mov     w0, 0
> >         ret
> >
> > On the workloads this work is based on we see between 2-3x performance
> > uplift using this patch.
> >
> > Follow up plan:
> >  - Boolean vectorization has several shortcomings.  I've filed PR110223 with
> the
> >    bigger ones that cause vectorization to fail with this patch.
> >  - SLP support.  This is planned for GCC 15 as for majority of the cases build
> >    SLP itself fails.
> 
> It would be nice to get at least single-lane SLP support working.  I think you
> need to treat the gcond as SLP root stmt and basically do discovery on the
> condition as to as if it were a mask generating condition.

Hmm ok, will give it  a try.

> 
> Code generation would then simply schedule the gcond root instances first
> (that would get you the code motion automagically).

Right, so you're saying treat the gcond's as the seed, and stores as a sink.
And then schedule only the instances without a gcond around such that we
can still vectorize in place to get the branches.  Ok, makes sense.

> 
> So, add a new slp_instance_kind, for example slp_inst_kind_early_break, and
> record the gcond as root stmt.  Possibly "pattern" recognizing
> 
>  gcond <_1 != _2>
> 
> as
> 
>  _mask = _1 != _2;
>  gcond <_mask != 0>
> 
> makes the SLP discovery less fiddly (but in theory you can of course handle
> gconds directly).
> 
> Is there any part of the series that can be pushed independelty?  If so I'll try to
> look at those parts first.
> 

Aside from:

[PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks
[PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits  

The rest lie dormant and don't do anything or disrupt the tree until those two are in.
The rest all just touch up different parts piecewise.

They do rely on the new field introduced in:

[PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks

But can split them out.

I'll start respinning no #4 and #7 with your latest changes now.

Thanks,
Tamar

> Thanks,
> Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization
  2023-11-06 15:17   ` Tamar Christina
@ 2023-11-07  9:42     ` Richard Biener
  2023-11-07 10:47       ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-07  9:42 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd

On Mon, 6 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, November 6, 2023 2:25 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return auto-
> > vectorization
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch adds initial support for early break vectorization in GCC.
> > > The support is added for any target that implements a vector cbranch
> > > optab, this includes both fully masked and non-masked targets.
> > >
> > > Depending on the operation, the vectorizer may also require support
> > > for boolean mask reductions using Inclusive OR.  This is however only
> > > checked then the comparison would produce multiple statements.
> > >
> > > Note: I am currently struggling to get patch 7 correct in all cases and could
> > use
> > >       some feedback there.
> > >
> > > Concretely the kind of loops supported are of the forms:
> > >
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >    <statements1>
> > >    if (<condition>)
> > >      {
> > >        ...
> > >        <action>;
> > >      }
> > >    <statements2>
> > >  }
> > >
> > > where <action> can be:
> > >  - break
> > >  - return
> > >  - goto
> > >
> > > Any number of statements can be used before the <action> occurs.
> > >
> > > Since this is an initial version for GCC 14 it has the following
> > > limitations and
> > > features:
> > >
> > > - Only fixed sized iterations and buffers are supported.  That is to say any
> > >   vectors loaded or stored must be to statically allocated arrays with known
> > >   sizes. N must also be known.  This limitation is because our primary target
> > >   for this optimization is SVE.  For VLA SVE we can't easily do cross page
> > >   iteraion checks. The result is likely to also not be beneficial. For that
> > >   reason we punt support for variable buffers till we have First-Faulting
> > >   support in GCC.

Btw, for this I wonder if you thought about marking memory accesses
required for the early break condition as required to be vector-size
aligned, thus peeling or versioning them for alignment?  That should
ensure they do not fault.

OTOH I somehow remember prologue peeling isn't supported for early
break vectorization?  ..

> > > - any stores in <statements1> should not be to the same objects as in
> > >   <condition>.  Loads are fine as long as they don't have the possibility to
> > >   alias.  More concretely, we block RAW dependencies when the intermediate
> > value
> > >   can't be separated fromt the store, or the store itself can't be moved.
> > > - Prologue peeling, alignment peelinig and loop versioning are supported.

.. but here you say it is.  Not sure if peeling for alignment works for
VLA vectors though.  Just to say x86 doesn't support first-faulting
loads.

> > > - Fully masked loops, unmasked loops and partially masked loops are
> > > supported
> > > - Any number of loop early exits are supported.
> > > - No support for epilogue vectorization.  The only epilogue supported is the
> > >   scalar final one.  Peeling code supports it but the code motion code cannot
> > >   find instructions to make the move in the epilog.
> > > - Early breaks are only supported for inner loop vectorization.
> > >
> > > I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break
> > >
> > > With the help of IPA and LTO this still gets hit quite often.  During
> > > bootstrap it hit rather frequently.  Additionally TSVC s332, s481 and
> > > s482 all pass now since these are tests for support for early exit
> > vectorization.
> > >
> > > This implementation does not support completely handling the early
> > > break inside the vector loop itself but instead supports adding checks
> > > such that if we know that we have to exit in the current iteration
> > > then we branch to scalar code to actually do the final VF iterations which
> > handles all the code in <action>.
> > >
> > > For the scalar loop we know that whatever exit you take you have to
> > > perform at most VF iterations.  For vector code we only case about the
> > > state of fully performed iteration and reset the scalar code to the (partially)
> > remaining loop.
> > >
> > > That is to say, the first vector loop executes so long as the early
> > > exit isn't needed.  Once the exit is taken, the scalar code will
> > > perform at most VF extra iterations.  The exact number depending on peeling
> > and iteration start and which
> > > exit was taken (natural or early).   For this scalar loop, all early exits are
> > > treated the same.
> > >
> > > When we vectorize we move any statement not related to the early break
> > > itself and that would be incorrect to execute before the break (i.e.
> > > has side effects) to after the break.  If this is not possible we decline to
> > vectorize.
> > >
> > > This means that we check at the start of iterations whether we are
> > > going to exit or not.  During the analyis phase we check whether we
> > > are allowed to do this moving of statements.  Also note that we only
> > > move the scalar statements, but only do so after peeling but just before we
> > start transforming statements.
> > >
> > > Codegen:
> > >
> > > for e.g.
> > >
> > > #define N 803
> > > unsigned vect_a[N];
> > > unsigned vect_b[N];
> > >
> > > unsigned test4(unsigned x)
> > > {
> > >  unsigned ret = 0;
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >    vect_b[i] = x + i;
> > >    if (vect_a[i] > x)
> > >      break;
> > >    vect_a[i] = x;
> > >
> > >  }
> > >  return ret;
> > > }
> > >
> > > We generate for Adv. SIMD:
> > >
> > > test4:
> > >         adrp    x2, .LC0
> > >         adrp    x3, .LANCHOR0
> > >         dup     v2.4s, w0
> > >         add     x3, x3, :lo12:.LANCHOR0
> > >         movi    v4.4s, 0x4
> > >         add     x4, x3, 3216
> > >         ldr     q1, [x2, #:lo12:.LC0]
> > >         mov     x1, 0
> > >         mov     w2, 0
> > >         .p2align 3,,7
> > > .L3:
> > >         ldr     q0, [x3, x1]
> > >         add     v3.4s, v1.4s, v2.4s
> > >         add     v1.4s, v1.4s, v4.4s
> > >         cmhi    v0.4s, v0.4s, v2.4s
> > >         umaxp   v0.4s, v0.4s, v0.4s
> > >         fmov    x5, d0
> > >         cbnz    x5, .L6
> > >         add     w2, w2, 1
> > >         str     q3, [x1, x4]
> > >         str     q2, [x3, x1]
> > >         add     x1, x1, 16
> > >         cmp     w2, 200
> > >         bne     .L3
> > >         mov     w7, 3
> > > .L2:
> > >         lsl     w2, w2, 2
> > >         add     x5, x3, 3216
> > >         add     w6, w2, w0
> > >         sxtw    x4, w2
> > >         ldr     w1, [x3, x4, lsl 2]
> > >         str     w6, [x5, x4, lsl 2]
> > >         cmp     w0, w1
> > >         bcc     .L4
> > >         add     w1, w2, 1
> > >         str     w0, [x3, x4, lsl 2]
> > >         add     w6, w1, w0
> > >         sxtw    x1, w1
> > >         ldr     w4, [x3, x1, lsl 2]
> > >         str     w6, [x5, x1, lsl 2]
> > >         cmp     w0, w4
> > >         bcc     .L4
> > >         add     w4, w2, 2
> > >         str     w0, [x3, x1, lsl 2]
> > >         sxtw    x1, w4
> > >         add     w6, w1, w0
> > >         ldr     w4, [x3, x1, lsl 2]
> > >         str     w6, [x5, x1, lsl 2]
> > >         cmp     w0, w4
> > >         bcc     .L4
> > >         str     w0, [x3, x1, lsl 2]
> > >         add     w2, w2, 3
> > >         cmp     w7, 3
> > >         beq     .L4
> > >         sxtw    x1, w2
> > >         add     w2, w2, w0
> > >         ldr     w4, [x3, x1, lsl 2]
> > >         str     w2, [x5, x1, lsl 2]
> > >         cmp     w0, w4
> > >         bcc     .L4
> > >         str     w0, [x3, x1, lsl 2]
> > > .L4:
> > >         mov     w0, 0
> > >         ret
> > >         .p2align 2,,3
> > > .L6:
> > >         mov     w7, 4
> > >         b       .L2
> > >
> > > and for SVE:
> > >
> > > test4:
> > >         adrp    x2, .LANCHOR0
> > >         add     x2, x2, :lo12:.LANCHOR0
> > >         add     x5, x2, 3216
> > >         mov     x3, 0
> > >         mov     w1, 0
> > >         cntw    x4
> > >         mov     z1.s, w0
> > >         index   z0.s, #0, #1
> > >         ptrue   p1.b, all
> > >         ptrue   p0.s, all
> > >         .p2align 3,,7
> > > .L3:
> > >         ld1w    z2.s, p1/z, [x2, x3, lsl 2]
> > >         add     z3.s, z0.s, z1.s
> > >         cmplo   p2.s, p0/z, z1.s, z2.s
> > >         b.any   .L2
> > >         st1w    z3.s, p1, [x5, x3, lsl 2]
> > >         add     w1, w1, 1
> > >         st1w    z1.s, p1, [x2, x3, lsl 2]
> > >         add     x3, x3, x4
> > >         incw    z0.s
> > >         cmp     w3, 803
> > >         bls     .L3
> > > .L5:
> > >         mov     w0, 0
> > >         ret
> > >         .p2align 2,,3
> > > .L2:
> > >         cntw    x5
> > >         mul     w1, w1, w5
> > >         cbz     w5, .L5
> > >         sxtw    x1, w1
> > >         sub     w5, w5, #1
> > >         add     x5, x5, x1
> > >         add     x6, x2, 3216
> > >         b       .L6
> > >         .p2align 2,,3
> > > .L14:
> > >         str     w0, [x2, x1, lsl 2]
> > >         cmp     x1, x5
> > >         beq     .L5
> > >         mov     x1, x4
> > > .L6:
> > >         ldr     w3, [x2, x1, lsl 2]
> > >         add     w4, w0, w1
> > >         str     w4, [x6, x1, lsl 2]
> > >         add     x4, x1, 1
> > >         cmp     w0, w3
> > >         bcs     .L14
> > >         mov     w0, 0
> > >         ret
> > >
> > > On the workloads this work is based on we see between 2-3x performance
> > > uplift using this patch.
> > >
> > > Follow up plan:
> > >  - Boolean vectorization has several shortcomings.  I've filed PR110223 with
> > the
> > >    bigger ones that cause vectorization to fail with this patch.
> > >  - SLP support.  This is planned for GCC 15 as for majority of the cases build
> > >    SLP itself fails.
> > 
> > It would be nice to get at least single-lane SLP support working.  I think you
> > need to treat the gcond as SLP root stmt and basically do discovery on the
> > condition as to as if it were a mask generating condition.
> 
> Hmm ok, will give it  a try.
> 
> > 
> > Code generation would then simply schedule the gcond root instances first
> > (that would get you the code motion automagically).
> 
> Right, so you're saying treat the gcond's as the seed, and stores as a sink.
> And then schedule only the instances without a gcond around such that we
> can still vectorize in place to get the branches.  Ok, makes sense.
> 
> > 
> > So, add a new slp_instance_kind, for example slp_inst_kind_early_break, and
> > record the gcond as root stmt.  Possibly "pattern" recognizing
> > 
> >  gcond <_1 != _2>
> > 
> > as
> > 
> >  _mask = _1 != _2;
> >  gcond <_mask != 0>
> > 
> > makes the SLP discovery less fiddly (but in theory you can of course handle
> > gconds directly).
> > 
> > Is there any part of the series that can be pushed independelty?  If so I'll try to
> > look at those parts first.
> > 
> 
> Aside from:
> 
> [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks
> [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits  
> 
> The rest lie dormant and don't do anything or disrupt the tree until those two are in.
> The rest all just touch up different parts piecewise.
> 
> They do rely on the new field introduced in:
> 
> [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks
> 
> But can split them out.
> 
> I'll start respinning no #4 and #7 with your latest changes now.

OK, I'll simply go 1-n then.

Richard.

> Thanks,
> Tamar
> 
> > Thanks,
> > Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests
  2023-11-06  7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
@ 2023-11-07  9:46   ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-11-07  9:46 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> This adds pragma GCC novector to testcases that have showed up
> since last regression run and due to this series detecting more.
> 
> Is it ok that when it comes time to commit I can just update any
> new cases before committing? since this seems a cat and mouse game..

Yeah, just update.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/no-scevccp-slp-30.c: Add pragma novector.
> 	* gcc.dg/vect/no-scevccp-slp-31.c: Likewise.
> 	* gcc.dg/vect/no-section-anchors-vect-69.c: Likewise.
> 	* gcc.target/aarch64/vect-xorsign_exec.c: Likewise.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
> index 00d0eca56eeca6aee6f11567629dc955c0924c74..534bee4a1669a7cbd95cf6007f28dafd23bab8da 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-30.c
> @@ -24,9 +24,9 @@ main1 ()
>     }
>  
>    /* check results:  */
> -#pragma GCC novector
>     for (j = 0; j < N; j++)
>     {
> +#pragma GCC novector
>      for (i = 0; i < N; i++)
>        {
>          if (out[i*4] != 8
> diff --git a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
> index 48b6a9b0681cf1fe410755c3e639b825b27895b0..22817a57ef81398cc018a78597755397d20e0eb9 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c
> @@ -27,6 +27,7 @@ main1 ()
>  #pragma GCC novector
>   for (i = 0; i < N; i++)
>     {
> +#pragma GCC novector
>      for (j = 0; j < N; j++) 
>        {
>          if (a[i][j] != 8)
> diff --git a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
> index a0e53d5fef91868dfdbd542dd0a98dff92bd265b..0861d488e134d3f01a2fa83c56eff7174f36ddfb 100644
> --- a/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
> +++ b/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
> @@ -83,9 +83,9 @@ int main1 ()
>      }
>  
>    /* check results:  */
> -#pragma GCC novector
>    for (i = 0; i < N; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N; j++)
>  	{
>            if (tmp1[2].e.n[1][i][j] != 8)
> @@ -103,9 +103,9 @@ int main1 ()
>      }
>  
>    /* check results:  */
> -#pragma GCC novector
>    for (i = 0; i < N - NINTS; i++)
>      {
> +#pragma GCC novector
>        for (j = 0; j < N - NINTS; j++)
>  	{
>            if (tmp2[2].e.n[1][i][j] != 8)
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c b/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
> index cfa22115831272cb1d4e1a38512f10c3a1c6ad77..84f33d3f6cce9b0017fd12ab961019041245ffae 100644
> --- a/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-xorsign_exec.c
> @@ -33,6 +33,7 @@ main (void)
>      r[i] = a[i] * __builtin_copysignf (1.0f, b[i]);
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (r[i] != a[i] * __builtin_copysignf (1.0f, b[i]))
>        abort ();
> @@ -41,6 +42,7 @@ main (void)
>      rd[i] = ad[i] * __builtin_copysign (1.0d, bd[i]);
>  
>    /* check results:  */
> +#pragma GCC novector
>    for (i = 0; i < N; i++)
>      if (rd[i] != ad[i] * __builtin_copysign (1.0d, bd[i]))
>        abort ();
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization
  2023-11-06  7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
@ 2023-11-07  9:52   ` Richard Biener
  2023-11-16 10:53     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-07  9:52 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> This adds new test to check for all the early break functionality.
> It includes a number of codegen and runtime tests checking the values at
> different needles in the array.
> 
> They also check the values on different array sizes and peeling positions,
> datatypes, VL, ncopies and every other variant I could think of.
> 
> Additionally it also contains reduced cases from issues found running over
> various codebases.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Also regtested with:
>  -march=armv8.3-a+sve
>  -march=armv8.3-a+nosve
>  -march=armv9-a
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* doc/sourcebuild.texi: Document it.

Document what?

> gcc/testsuite/ChangeLog:
> 
> 	* lib/target-supports.exp:

?

For all runtime testcases you need to include "tree-vect.h"
and call check_vect () in main so appropriate cpuid checks
can be performed.

In vect/ you shouldn't use { dg-do run }, that's the default
and is overridden by some .exp magic.  If you add dg-do run
that magic doesn't work.

x86 also can do cbranch with SSE4.1, not sure how to
auto-magically add -msse4.1 for the tests though.
There's a sse4 target but that only checks whether you
can use -msse4.1.  Anyway, we can do x86 testsuite adjustments
as followup.


> 	* g++.dg/vect/vect-early-break_1.cc: New test.
> 	* g++.dg/vect/vect-early-break_2.cc: New test.
> 	* g++.dg/vect/vect-early-break_3.cc: New test.
> 	* gcc.dg/vect/vect-early-break-run_1.c: New test.
> 	* gcc.dg/vect/vect-early-break-run_10.c: New test.
> 	* gcc.dg/vect/vect-early-break-run_2.c: New test.
> 	* gcc.dg/vect/vect-early-break-run_3.c: New test.
> 	* gcc.dg/vect/vect-early-break-run_4.c: New test.
> 	* gcc.dg/vect/vect-early-break-run_5.c: New test.
> 	* gcc.dg/vect/vect-early-break-run_6.c: New test.
> 	* gcc.dg/vect/vect-early-break-run_7.c: New test.
> 	* gcc.dg/vect/vect-early-break-run_8.c: New test.
> 	* gcc.dg/vect/vect-early-break-run_9.c: New test.
> 	* gcc.dg/vect/vect-early-break-template_1.c: New test.
> 	* gcc.dg/vect/vect-early-break-template_2.c: New test.
> 	* gcc.dg/vect/vect-early-break_1.c: New test.
> 	* gcc.dg/vect/vect-early-break_10.c: New test.
> 	* gcc.dg/vect/vect-early-break_11.c: New test.
> 	* gcc.dg/vect/vect-early-break_12.c: New test.
> 	* gcc.dg/vect/vect-early-break_13.c: New test.
> 	* gcc.dg/vect/vect-early-break_14.c: New test.
> 	* gcc.dg/vect/vect-early-break_15.c: New test.
> 	* gcc.dg/vect/vect-early-break_16.c: New test.
> 	* gcc.dg/vect/vect-early-break_17.c: New test.
> 	* gcc.dg/vect/vect-early-break_18.c: New test.
> 	* gcc.dg/vect/vect-early-break_19.c: New test.
> 	* gcc.dg/vect/vect-early-break_2.c: New test.
> 	* gcc.dg/vect/vect-early-break_20.c: New test.
> 	* gcc.dg/vect/vect-early-break_21.c: New test.
> 	* gcc.dg/vect/vect-early-break_22.c: New test.
> 	* gcc.dg/vect/vect-early-break_23.c: New test.
> 	* gcc.dg/vect/vect-early-break_24.c: New test.
> 	* gcc.dg/vect/vect-early-break_25.c: New test.
> 	* gcc.dg/vect/vect-early-break_26.c: New test.
> 	* gcc.dg/vect/vect-early-break_27.c: New test.
> 	* gcc.dg/vect/vect-early-break_28.c: New test.
> 	* gcc.dg/vect/vect-early-break_29.c: New test.
> 	* gcc.dg/vect/vect-early-break_3.c: New test.
> 	* gcc.dg/vect/vect-early-break_30.c: New test.
> 	* gcc.dg/vect/vect-early-break_31.c: New test.
> 	* gcc.dg/vect/vect-early-break_32.c: New test.
> 	* gcc.dg/vect/vect-early-break_33.c: New test.
> 	* gcc.dg/vect/vect-early-break_34.c: New test.
> 	* gcc.dg/vect/vect-early-break_35.c: New test.
> 	* gcc.dg/vect/vect-early-break_36.c: New test.
> 	* gcc.dg/vect/vect-early-break_37.c: New test.
> 	* gcc.dg/vect/vect-early-break_38.c: New test.
> 	* gcc.dg/vect/vect-early-break_39.c: New test.
> 	* gcc.dg/vect/vect-early-break_4.c: New test.
> 	* gcc.dg/vect/vect-early-break_40.c: New test.
> 	* gcc.dg/vect/vect-early-break_41.c: New test.
> 	* gcc.dg/vect/vect-early-break_42.c: New test.
> 	* gcc.dg/vect/vect-early-break_43.c: New test.
> 	* gcc.dg/vect/vect-early-break_44.c: New test.
> 	* gcc.dg/vect/vect-early-break_45.c: New test.
> 	* gcc.dg/vect/vect-early-break_46.c: New test.
> 	* gcc.dg/vect/vect-early-break_47.c: New test.
> 	* gcc.dg/vect/vect-early-break_48.c: New test.
> 	* gcc.dg/vect/vect-early-break_49.c: New test.
> 	* gcc.dg/vect/vect-early-break_5.c: New test.
> 	* gcc.dg/vect/vect-early-break_50.c: New test.
> 	* gcc.dg/vect/vect-early-break_51.c: New test.
> 	* gcc.dg/vect/vect-early-break_52.c: New test.
> 	* gcc.dg/vect/vect-early-break_53.c: New test.
> 	* gcc.dg/vect/vect-early-break_54.c: New test.
> 	* gcc.dg/vect/vect-early-break_55.c: New test.
> 	* gcc.dg/vect/vect-early-break_56.c: New test.
> 	* gcc.dg/vect/vect-early-break_57.c: New test.
> 	* gcc.dg/vect/vect-early-break_58.c: New test.
> 	* gcc.dg/vect/vect-early-break_59.c: New test.
> 	* gcc.dg/vect/vect-early-break_6.c: New test.
> 	* gcc.dg/vect/vect-early-break_60.c: New test.
> 	* gcc.dg/vect/vect-early-break_61.c: New test.
> 	* gcc.dg/vect/vect-early-break_62.c: New test.
> 	* gcc.dg/vect/vect-early-break_63.c: New test.
> 	* gcc.dg/vect/vect-early-break_64.c: New test.
> 	* gcc.dg/vect/vect-early-break_65.c: New test.
> 	* gcc.dg/vect/vect-early-break_66.c: New test.
> 	* gcc.dg/vect/vect-early-break_67.c: New test.
> 	* gcc.dg/vect/vect-early-break_68.c: New test.
> 	* gcc.dg/vect/vect-early-break_69.c: New test.
> 	* gcc.dg/vect/vect-early-break_7.c: New test.
> 	* gcc.dg/vect/vect-early-break_70.c: New test.
> 	* gcc.dg/vect/vect-early-break_71.c: New test.
> 	* gcc.dg/vect/vect-early-break_72.c: New test.
> 	* gcc.dg/vect/vect-early-break_73.c: New test.
> 	* gcc.dg/vect/vect-early-break_74.c: New test.
> 	* gcc.dg/vect/vect-early-break_75.c: New test.
> 	* gcc.dg/vect/vect-early-break_76.c: New test.
> 	* gcc.dg/vect/vect-early-break_8.c: New test.
> 	* gcc.dg/vect/vect-early-break_9.c: New test.
> 	* gcc.target/aarch64/opt_mismatch_1.c: New test.
> 	* gcc.target/aarch64/opt_mismatch_2.c: New test.
> 	* gcc.target/aarch64/opt_mismatch_3.c: New test.
> 	* gcc.target/aarch64/vect-early-break-cbranch_1.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index c20af31c64237baff70f8781b1dc47f4d1a48aa9..4c351335f2bec9c6bb6856bd38d9132da7447c13 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1636,6 +1636,10 @@ Target supports hardware vectors of @code{float} when
>  @option{-funsafe-math-optimizations} is not in effect.
>  This implies @code{vect_float}.
>  
> +@item vect_early_break
> +Target supports hardware vectorization of loops with early breaks.
> +This requires an implementation of the cbranch optab for vectors.
> +
>  @item vect_int
>  Target supports hardware vectors of @code{int}.
>  
> diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
> new file mode 100644
> index 0000000000000000000000000000000000000000..810d990e3efab0cf0363a3b76481f2cb649ad3ba
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-w -O2" } */
> +
> +void fancy_abort(char *, int, const char *) __attribute__((__noreturn__));
> +template <unsigned N, typename> struct poly_int_pod { int coeffs[N]; };
> +template <unsigned N, typename> class poly_int : public poly_int_pod<N, int> {
> +public:
> +  template <typename Ca> poly_int &operator+=(const poly_int_pod<N, Ca> &);
> +};
> +template <unsigned N, typename C>
> +template <typename Ca>
> +poly_int<N, C> &poly_int<N, C>::operator+=(const poly_int_pod<N, Ca> &a) {
> +  for (int i = 0; i < N; i++)
> +    this->coeffs[i] += a.coeffs[i];
> +  return *this;
> +}
> +template <unsigned N, typename Ca, typename Cb>
> +poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
> +  poly_int<N, long> r;
> +  return r;
> +}
> +struct vec_prefix {
> +  unsigned m_num;
> +};
> +struct vl_ptr;
> +struct va_heap {
> +  typedef vl_ptr default_layout;
> +};
> +template <typename, typename A, typename = typename A::default_layout>
> +struct vec;
> +template <typename T, typename A> struct vec<T, A, int> {
> +  T &operator[](unsigned);
> +  vec_prefix m_vecpfx;
> +  T m_vecdata[];
> +};
> +template <typename T, typename A> T &vec<T, A, int>::operator[](unsigned ix) {
> +  m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
> +  return m_vecdata[ix];
> +}
> +template <typename T> struct vec<T, va_heap> {
> +  T &operator[](unsigned ix) { return m_vec[ix]; }
> +  vec<T, va_heap, int> m_vec;
> +};
> +class auto_vec : public vec<poly_int<2, long>, va_heap> {};
> +template <typename> class vector_builder : public auto_vec {};
> +class int_vector_builder : public vector_builder<int> {
> +public:
> +  int_vector_builder(poly_int<2, long>, int, int);
> +};
> +bool vect_grouped_store_supported() {
> +  int i;
> +  poly_int<2, long> nelt;
> +  int_vector_builder sel(nelt, 2, 3);
> +  for (i = 0; i < 6; i++)
> +    sel[i] += exact_div(nelt, 2);
> +}
> +
> diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
> new file mode 100644
> index 0000000000000000000000000000000000000000..810d990e3efab0cf0363a3b76481f2cb649ad3ba
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-w -O2" } */
> +
> +void fancy_abort(char *, int, const char *) __attribute__((__noreturn__));
> +template <unsigned N, typename> struct poly_int_pod { int coeffs[N]; };
> +template <unsigned N, typename> class poly_int : public poly_int_pod<N, int> {
> +public:
> +  template <typename Ca> poly_int &operator+=(const poly_int_pod<N, Ca> &);
> +};
> +template <unsigned N, typename C>
> +template <typename Ca>
> +poly_int<N, C> &poly_int<N, C>::operator+=(const poly_int_pod<N, Ca> &a) {
> +  for (int i = 0; i < N; i++)
> +    this->coeffs[i] += a.coeffs[i];
> +  return *this;
> +}
> +template <unsigned N, typename Ca, typename Cb>
> +poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
> +  poly_int<N, long> r;
> +  return r;
> +}
> +struct vec_prefix {
> +  unsigned m_num;
> +};
> +struct vl_ptr;
> +struct va_heap {
> +  typedef vl_ptr default_layout;
> +};
> +template <typename, typename A, typename = typename A::default_layout>
> +struct vec;
> +template <typename T, typename A> struct vec<T, A, int> {
> +  T &operator[](unsigned);
> +  vec_prefix m_vecpfx;
> +  T m_vecdata[];
> +};
> +template <typename T, typename A> T &vec<T, A, int>::operator[](unsigned ix) {
> +  m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
> +  return m_vecdata[ix];
> +}
> +template <typename T> struct vec<T, va_heap> {
> +  T &operator[](unsigned ix) { return m_vec[ix]; }
> +  vec<T, va_heap, int> m_vec;
> +};
> +class auto_vec : public vec<poly_int<2, long>, va_heap> {};
> +template <typename> class vector_builder : public auto_vec {};
> +class int_vector_builder : public vector_builder<int> {
> +public:
> +  int_vector_builder(poly_int<2, long>, int, int);
> +};
> +bool vect_grouped_store_supported() {
> +  int i;
> +  poly_int<2, long> nelt;
> +  int_vector_builder sel(nelt, 2, 3);
> +  for (i = 0; i < 6; i++)
> +    sel[i] += exact_div(nelt, 2);
> +}
> +
> diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
> new file mode 100644
> index 0000000000000000000000000000000000000000..a12e5ca434b2ac37c03dbaa12273fd8e5aa2018c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-w -O2" } */
> +
> +int aarch64_advsimd_valid_immediate_hs_val32;
> +bool aarch64_advsimd_valid_immediate_hs() {
> +  for (int shift = 0; shift < 32; shift += 8)
> +    if (aarch64_advsimd_valid_immediate_hs_val32 & shift)
> +      return aarch64_advsimd_valid_immediate_hs_val32;
> +  for (;;)
> +    ;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..2495b36a72eae94cb7abc4a0d17a5c979fd78083
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast -save-temps" } */
> +
> +#define N 803
> +#define P 0
> +#include "vect-early-break-template_1.c"
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9bcd7f7e57ef9a1d4649d18569b3406050e54603
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
> @@ -0,0 +1,11 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast -save-temps" } */
> +
> +#define N 800
> +#define P 799
> +#include "vect-early-break-template_2.c"
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..63f63101a467909f328be7f3acbc5bcb721967ff
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast -save-temps" } */
> +
> +#define N 803
> +#define P 802
> +#include "vect-early-break-template_1.c"
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..626b95e9b8517081d41d794e9e0264d6301c8589
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
> @@ -0,0 +1,11 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast -save-temps" } */
> +
> +#define N 803
> +#define P 5
> +#include "vect-early-break-template_1.c"
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..7e0e6426120551152a7bd800c15d9ed6ab15bada
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
> @@ -0,0 +1,11 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast -save-temps" } */
> +
> +#define N 803
> +#define P 278
> +#include "vect-early-break-template_1.c"
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..242cf486f9c40055df0aef5fd238d1aff7a7c7da
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
> @@ -0,0 +1,11 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast -save-temps" } */
> +
> +#define N 800
> +#define P 799
> +#include "vect-early-break-template_1.c"
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9fe7136b7213a463ca6573c60476b7c8f531ddcb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
> @@ -0,0 +1,11 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast -save-temps" } */
> +
> +#define N 803
> +#define P 0
> +#include "vect-early-break-template_2.c"
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..02f93d77dba31b938f6fd9e8c7f5e4acde4aeec9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
> @@ -0,0 +1,11 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast -save-temps" } */
> +
> +#define N 803
> +#define P 802
> +#include "vect-early-break-template_2.c"
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a614925465606b54c638221ffb95a5e8d3bee797
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
> @@ -0,0 +1,11 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast -save-temps" } */
> +
> +#define N 803
> +#define P 5
> +#include "vect-early-break-template_2.c"
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..94e2b9c301456eda8f9ad7eaa67604563f0afee7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
> @@ -0,0 +1,11 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast -save-temps" } */
> +
> +#define N 803
> +#define P 278
> +#include "vect-early-break-template_2.c"
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..af70a8e2a5a9dc9756edb5580f2de02ddcc95de9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
> @@ -0,0 +1,47 @@
> +#ifndef N
> +#define N 803
> +#endif
> +
> +#ifndef P
> +#define P 0
> +#endif
> +
> +unsigned vect_a[N] = {0};
> +unsigned vect_b[N] = {0};
> +  
> +__attribute__((noipa, noinline))
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +
> +  int x = 1;
> +  int idx = P;
> +  vect_a[idx] = x + 1;
> +
> +  test4(x);
> +
> +  if (vect_b[idx] != (x + idx))
> +    abort ();
> +
> +  if (vect_a[idx] != x + 1)
> +    abort ();
> +
> +  if (idx > 0 && vect_a[idx-1] != x)
> +    abort ();
> +
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..d0f924d904437e71567d27cc1f1089e5607dca0d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
> @@ -0,0 +1,50 @@
> +#ifndef N
> +#define N 803
> +#endif
> +
> +#ifndef P
> +#define P 0
> +#endif
> +
> +unsigned vect_a[N] = {0};
> +unsigned vect_b[N] = {0};
> +  
> +__attribute__((noipa, noinline))
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     return i;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +
> +  int x = 1;
> +  int idx = P;
> +  vect_a[idx] = x + 1;
> +
> +  unsigned res = test4(x);
> +
> +  if (res != idx)
> +    abort ();
> +
> +  if (vect_b[idx] != (x + idx))
> +    abort ();
> +
> +  if (vect_a[idx] != x + 1)
> +    abort ();
> +
> +  if (idx > 0 && vect_a[idx-1] != x)
> +    abort ();
> +
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..51e7d6489b99c25b9b4b3d1c839f98562b6d4dd7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9e4ad1763202dfdab3ed7961ead5114fcc61a11b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x,int y, int z)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> + }
> +
> + ret = x + y * z;
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a613dd9909fb09278dd92a81a24ef854994a9890
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x, int y)
> +{
> + unsigned ret = 0;
> +for (int o = 0; o < y; o++)
> +{
> + ret += o;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> +}
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..cc10f3238f1cb8e1307e024a3ebcb5c25a39d1b2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x, int y)
> +{
> + unsigned ret = 0;
> +for (int o = 0; o < y; o++)
> +{
> + ret += o;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     return vect_a[i];
> +   vect_a[i] = x;
> +   
> + }
> +}
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..6967b7395ed7c19e38a436d6edcfe7c1580c7113
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     return vect_a[i] * x;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..03cce5cf6cadecb520b46be666bf608e3bc6a511
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#define N 803
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +int test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     return i;
> +   vect_a[i] += x * vect_b[i];
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..dec6872e1115ff66695f5a500ffa7ca01c0f8d3a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#define N 803
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +int test4(unsigned x)
> +{
> + int ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     return i;
> +   vect_a[i] += x * vect_b[i];
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..30812d12a39bd94b4b8a3aade6512b162697d659
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#define N 1024
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> + 
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     return vect_a[i];
> +   vect_a[i] = x;
> +   ret += vect_a[i] + vect_b[i];
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..510227a18435a8e47c5a754580180c6d340c0823
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#define N 1024
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> + 
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     return vect_a[i];
> +   vect_a[i] = x;
> +   ret = vect_a[i] + vect_b[i];
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..1372f79242b250cabbab29757b62cbc28a9064a8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i+=2)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..677487f7da496a8f467d8c529575d47ff22c6a31
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x, unsigned step)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i+=step)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..7268f6ae2485d0274fd85ea53cc1e44ef4b84d5c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <complex.h>
> +
> +#define N 1024
> +complex double vect_a[N];
> +complex double vect_b[N];
> +  
> +complex double test4(complex double x)
> +{
> + complex double ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] += x + i;
> +   if (vect_a[i] == x)
> +     return i;
> +   vect_a[i] += x * vect_b[i];
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ed41377d1c979bf14e0a4e80401831c09ffa463f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <stdbool.h>
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_b[N];
> +struct testStruct {
> + long e;
> + long f;
> + bool a : 1;
> + bool b : 1;
> + int c : 14;
> + int d;
> +};
> +struct testStruct vect_a[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i].a > x)
> +     return true;
> +   vect_a[i].e = x;
> + }
> + return ret;
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..6415e4951cb9ef70e56b7cfb1db3d3151368666d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <stdbool.h>
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_b[N];
> +struct testStruct {
> + long e;
> + long f;
> + bool a : 1;
> + bool b : 1;
> + int c : 14;
> + int d;
> +};
> +struct testStruct vect_a[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i].a)
> +     return true;
> +   vect_a[i].e = x;
> + }
> + return ret;
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..2ca189899fb6bd6dfdf63de7729f54e3bee06ba0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> @@ -0,0 +1,45 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_perm } */
> +/* { dg-require-effective-target vect_early_break } */
> +
> +#include "tree-vect.h"
> +
> +void __attribute__((noipa))
> +foo (int * __restrict__ a, short * __restrict__ b, int * __restrict__ c)
> +{
> +  int t1 = *c;
> +  int t2 = *c;
> +  for (int i = 0; i < 64; i+=2)
> +    {
> +      b[i] = a[i] - t1;
> +      t1 = a[i];
> +      b[i+1] = a[i+1] - t2;
> +      t2 = a[i+1];
> +    }
> +}
> +
> +int a[64];
> +short b[64];
> +
> +int
> +main ()
> +{
> +  check_vect ();
> +  for (int i = 0; i < 64; ++i)
> +    {
> +      a[i] = i;
> +      __asm__ volatile ("" ::: "memory");
> +    }
> +  int c = 7;
> +  foo (a, b, &c);
> +  for (int i = 2; i < 64; i+=2)
> +    if (b[i] != a[i] - a[i-2]
> +	|| b[i+1] != a[i+1] - a[i-1])
> +      abort ();
> +  if (b[0] != -7 || b[1] != -6)
> +    abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f3298656d5d67fd137c4029a96a2f9c1bae344ce
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
> @@ -0,0 +1,61 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#define N 200
> +#define M 4
> +
> +typedef signed char sc;
> +typedef unsigned char uc;
> +typedef signed short ss;
> +typedef unsigned short us;
> +typedef int si;
> +typedef unsigned int ui;
> +typedef signed long long sll;
> +typedef unsigned long long ull;
> +
> +#define FOR_EACH_TYPE(M) \
> +  M (sc) M (uc) \
> +  M (ss) M (us) \
> +  M (si) M (ui) \
> +  M (sll) M (ull) \
> +  M (float) M (double)
> +
> +#define TEST_VALUE(I) ((I) * 17 / 2)
> +
> +#define ADD_TEST(TYPE)				\
> +  void __attribute__((noinline, noclone))	\
> +  test_##TYPE (TYPE *a, TYPE *b)		\
> +  {						\
> +    for (int i = 0; i < N; i += 2)		\
> +      {						\
> +	a[i + 0] = b[i + 0] + 2;		\
> +	a[i + 1] = b[i + 1] + 3;		\
> +      }						\
> +  }
> +
> +#define DO_TEST(TYPE)					\
> +  for (int j = 1; j < M; ++j)				\
> +    {							\
> +      TYPE a[N + M];					\
> +      for (int i = 0; i < N + M; ++i)			\
> +	a[i] = TEST_VALUE (i);				\
> +      test_##TYPE (a + j, a);				\
> +      for (int i = 0; i < N; i += 2)			\
> +	if (a[i + j] != (TYPE) (a[i] + 2)		\
> +	    || a[i + j + 1] != (TYPE) (a[i + 1] + 3))	\
> +	  __builtin_abort ();				\
> +    }
> +
> +FOR_EACH_TYPE (ADD_TEST)
> +
> +int
> +main (void)
> +{
> +  FOR_EACH_TYPE (DO_TEST)
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */
> +/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..7b4b2ffb9b75db6d5ca7e313d1f18d9b51f5b566
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
> @@ -0,0 +1,46 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-require-effective-target vect_early_break } */
> +
> +#include "tree-vect.h"
> +
> +extern void abort (void);
> +void __attribute__((noinline,noclone))
> +foo (double *b, double *d, double *f)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      d[2*i] = 2. * d[2*i];
> +      d[2*i+1] = 4. * d[2*i+1];
> +      b[i] = d[2*i] - 1.;
> +      f[i] = d[2*i+1] + 2.;
> +    }
> +}
> +int main()
> +{
> +  double b[1024], d[2*1024], f[1024];
> +  int i;
> +
> +  check_vect ();
> +
> +  for (i = 0; i < 2*1024; i++)
> +    d[i] = 1.;
> +  foo (b, d, f);
> +  for (i = 0; i < 1024; i+= 2)
> +    {
> +      if (d[2*i] != 2.)
> +	abort ();
> +      if (d[2*i+1] != 4.)
> +	abort ();
> +    }
> +  for (i = 0; i < 1024; i++)
> +    {
> +      if (b[i] != 1.)
> +	abort ();
> +      if (f[i] != 6.)
> +	abort ();
> +    }
> +  return 0;
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..8db9b60128b9e21529ae73ea1902afb8fa327112
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* Disabling epilogues until we find a better way to deal with scans.  */
> +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#include "vect-peel-1-src.c"
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 14 "vect" { target { { vect_element_align } && { vect_aligned_arrays } } xfail { ! vect_unaligned_possible } } } } */
> +/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail vect_element_align_preferred } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..5905847cc0b6b393dde728a9f4ecb44c8ab42da5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
> @@ -0,0 +1,44 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_perm } */
> +
> +#include "tree-vect.h"
> +
> +void __attribute__((noipa))
> +foo (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c)
> +{
> +  int t1 = *c;
> +  int t2 = *c;
> +  for (int i = 0; i < 64; i+=2)
> +    {
> +      b[i] = a[i] - t1;
> +      t1 = a[i];
> +      b[i+1] = a[i+1] - t2;
> +      t2 = a[i+1];
> +    }
> +}
> +
> +int a[64], b[64];
> +
> +int
> +main ()
> +{
> +  check_vect ();
> +  for (int i = 0; i < 64; ++i)
> +    {
> +      a[i] = i;
> +      __asm__ volatile ("" ::: "memory");
> +    }
> +  int c = 7;
> +  foo (a, b, &c);
> +  for (int i = 2; i < 64; i+=2)
> +    if (b[i] != a[i] - a[i-2]
> +	|| b[i+1] != a[i+1] - a[i-1])
> +      abort ();
> +  if (b[0] != -7 || b[1] != -6)
> +    abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..d0cfbb01667fa016d72828d098aeaa252c2c9318
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +void abort ();
> +int a[128];
> +
> +int main ()
> +{
> +  int i;
> +  for (i = 1; i < 128; i++)
> +    if (a[i] != i%4 + 1)
> +      abort ();
> +  if (a[0] != 5)
> +    abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a5eae81f3f5f5b7d92082f1588c6453a71e205cc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +void abort ();
> +int a[128];
> +int main ()
> +{
> +  int i;
> +  for (i = 1; i < 128; i++)
> +    if (a[i] != i%4 + 1)
> +    abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..75d87e99e939fab61f751be025ca0398fa5bd078
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +int in[100];
> +int out[100 * 2];
> +
> +int main (void)
> +{
> +  if (out[0] != in[100 - 1])
> +  for (int i = 1; i <= 100; ++i)
> +    if (out[i] != 2)
> +      __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..3c6d28bd2d6e6e794146baf89e43c3b70293b7d9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +
> +unsigned test4(char x, char *vect, int n)
> +{  
> + unsigned ret = 0;
> + for (int i = 0; i < n; i++)
> + {
> +   if (vect[i] > x)
> +     return 1;
> +
> +   vect[i] = x;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..e09d883db84685679e73867d83aba9900563983d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +int x[100];
> +int choose1(int);
> +int choose2();
> +void consume(int);
> +void f() {
> +    for (int i = 0; i < 100; ++i) {
> +        if (x[i] == 11) {
> +            if (choose1(i))
> +                goto A;
> +            else
> +                goto B;
> +        }
> +    }
> +    if (choose2())
> +        goto B;
> +A:
> +    for (int i = 0; i < 100; ++i)
> +        consume(i);
> +B:
> +    for (int i = 0; i < 100; ++i)
> +        consume(i * i);
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..6001523162d24d140af73143435f25bcd3a217c8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 1025
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> + 
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     return vect_a[i];
> +   vect_a[i] = x;
> +   ret += vect_a[i] + vect_b[i];
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..73abddc267a0170c2d97a7e7c680525721455f22
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 1024
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> + 
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     return vect_a[i];
> +   vect_a[i] = x;
> +   ret = vect_a[i] + vect_b[i];
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..29b37f70939af7fa9409edd3a1e29f718c959706
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a2[N];
> +unsigned vect_a1[N];
> +unsigned vect_b[N];
> +
> +unsigned test4(unsigned x, int z)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a1[i]*2 > x)
> +     {
> +       for (int y = 0; y < z; y++)
> +	 vect_a2 [y] *= vect_a1[i];
> +       break;
> +     }
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 2 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..2c48e3cee33fc37f45ef59c2bbaff7bc5a76b460
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +
> +unsigned vect_a[N] __attribute__ ((aligned (4)));;
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + 
> + for (int i = 1; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i]*2 > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..3442484a81161f9bd09e30bc268fbcf66a899902
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a2[N];
> +unsigned vect_a1[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a1[i]*2 > x)
> +     break;
> +   vect_a1[i] = x;
> +   if (vect_a2[i]*4 > x)
> +     break;
> +   vect_a2[i] = x*x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..027766c51f508eab157db365a1653f3e92dcac10
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a2[N];
> +unsigned vect_a1[N];
> +unsigned vect_b[N];
> +
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a1[i]*2 > x)
> +     break;
> +   vect_a1[i] = x;
> +   if (vect_a2[i]*4 > x)
> +     return i;
> +   vect_a2[i] = x*x;
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..8d363120898232bb1402b9cf7b4b83b38a10505b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 4
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i]*2 != x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..226d55d7194ca3f676ab52976fea25b7e335bbec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i+=2)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i]*2 > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..554e6ec84318c600c87982ad6ef0f90e8b47af01
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x, unsigned n)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i+= (N % 4))
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i]*2 > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..216c56faf330449bf1969b7e51ff1e94270dc861
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +
> +#define N 1024
> +unsigned vect[N];
> +
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (i > 16 && vect[i] > x)
> +     break;
> +
> +   vect[i] = x;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f2ae372cd96e74cc06254937c2b8fa69ecdedf09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i*=3)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i]*2 > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +/* SCEV can't currently analyze this loop bounds.  */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..6ad9b3f17ddb953bfbf614e9331fa81f565b262f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> +#pragma GCC novector
> +#pragma GCC unroll 4
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] += vect_a[i] + x;
> + }
> + return ret;
> +}
> +
> +/* novector should have blocked vectorization.  */
> +/* { dg-final { scan-tree-dump-not "vectorized \d loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..88652f01595cb49a8736a1da6563507b607aae8f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 800
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i]*2 > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 802
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i+=2)
> + {
> +   vect_b[i] = x + i;
> +   vect_b[i+1] = x + i + 1;
> +   if (vect_a[i]*2 > x)
> +     break;
> +   if (vect_a[i+1]*2 > x)
> +     break;
> +   vect_a[i] = x;
> +   vect_a[i+1] = x;
> +   
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 802
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i+=2)
> + {
> +   vect_b[i] = x + i;
> +   vect_b[i+1] = x + i + 1;
> +   if (vect_a[i]*2 > x)
> +     break;
> +   if (vect_a[i+1]*2 > x)
> +     break;
> +   vect_a[i] = x;
> +   vect_a[i+1] = x;
> +   
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..cf1cb903b31d5fb5527bc6216c0cb9047357da96
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i]*2 > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..356d971e3a1f69f5c190b49d1d108e6be8766b39
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_float } */
> +
> +#include <complex.h>
> +
> +#define N 1024
> +complex double vect_a[N];
> +complex double vect_b[N];
> +  
> +complex double test4(complex double x)
> +{
> + complex double ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] += x + i;
> +   if (vect_a[i] == x)
> +     return i;
> +   vect_a[i] += x * vect_b[i];
> +   
> + }
> + return ret;
> +}
> +
> +/* At -O2 we can't currently vectorize this because of the libcalls not being
> +   lowered.  */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect"  { xfail *-*-* } } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..d1cca4a33a25fbf6b631d46ce3dcd3608cffa046
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_float } */
> +
> +void abort ();
> +
> +float results1[16] = {192.00,240.00,288.00,336.00,384.00,432.00,480.00,528.00,0.00};
> +float results2[16] = {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,54.00,120.00,198.00,288.00,390.00,504.00,630.00};
> +float a[16] = {0};
> +float e[16] = {0};
> +float b[16] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
> +int main1 ()
> +{
> +  int i;
> +  for (i=0; i<16; i++)
> +    {
> +      if (a[i] != results1[i] || e[i] != results2[i])
> +        abort();
> +    }
> +
> +  if (a[i+3] != b[i-1])
> +    abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..77043182860321a9e265a89ad8f29ec7946b17e8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +int main (void)
> +{
> +  signed char a[50], b[50], c[50];
> +  for (int i = 0; i < 50; ++i)
> +    if (a[i] != ((((signed int) -1 < 0 ? -126 : 4) + ((signed int) -1 < 0 ? -101 : 26) + i * 9 + 0) >> 1))
> +      __builtin_abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..bc9e5bf899a54c5b2ef67e0193d56b243ec5f043
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +void abort();
> +struct foostr {
> +  _Complex short f1;
> +  _Complex short f2;
> +};
> +struct foostr a[16] __attribute__ ((__aligned__(16))) = {};
> +struct foostr c[16] __attribute__ ((__aligned__(16)));
> +struct foostr res[16] = {};
> +void
> +foo (void)
> +{
> +  int i;
> +  for (i = 0; i < 16; i++)
> +    {
> +      if (c[i].f1 != res[i].f1)
> + abort ();
> +      if (c[i].f2 != res[i].f2)
> + abort ();
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..4a36d6979db1fd1f97ba2a290f78ac3b84f6de24
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#define N 1024
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> + 
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     return vect_a[i];
> +   vect_a[i] = x;
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..e2ac8283091597f6f4776560c86f89d1f98b58ee
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_float } */
> +
> +extern void abort();
> +float a[1024], b[1024], c[1024], d[1024];
> +_Bool k[1024];
> +
> +int main ()
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
> +      abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..af036079457a7f5e50eae5a9ad4c952f33e62f87
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +int x_in[32];
> +int x_out_a[32], x_out_b[32];
> +int c[16] = {3,2,1,10,1,42,3,4,50,9,32,8,11,10,1,2};
> +int a[16 +1] = {0,16,32,48,64,128,256,512,0,16,32,48,64,128,256,512,1024};
> +int b[16 +1] = {17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1};
> +
> +void foo ()
> +{
> +  int j, i, x;
> +  int curr_a, flag, next_a, curr_b, next_b;
> +    {
> +      for (i = 0; i < 16; i++)
> +        {
> +          next_b = b[i+1];
> +          curr_b = flag ? next_b : curr_b;
> +        }
> +      x_out_b[j] = curr_b;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..85cdfe0938e4093c7725e7f397accf26198f6a53
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +void abort();
> +int main1 (short X)
> +{
> +  unsigned char a[128];
> +  unsigned short b[128];
> +  unsigned int c[128];
> +  short myX = X;
> +  int i;
> +  for (i = 0; i < 128; i++)
> +    {
> +      if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
> +        abort ();
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f066ddcfe458ca04bb1336f832121c91d7a3e80e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +void abort ();
> +int a[64], b[64];
> +int main ()
> +{
> +  int c = 7;
> +  for (int i = 1; i < 64; ++i)
> +    if (b[i] != a[i] - a[i-1])
> +      abort ();
> +  if (b[0] != -7)
> +    abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9d0dd8dc5fccb05aeabcbce4014c4994bafdfb05
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + unsigned tmp[N];
> + for (int i = 0; i < N; i++)
> + {
> +   tmp[i] = x + i;
> +   vect_b[i] = tmp[i];
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..073cbdf614f81525975dbd188632582218e60e9e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   volatile unsigned tmp = x + i;
> +   vect_b[i] = tmp;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9086e885f56974d17f8cdf2dce4c6a44e580d74b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> @@ -0,0 +1,101 @@
> +/* Disabling epilogues until we find a better way to deal with scans.  */
> +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-add-options bind_pic_locally } */
> +/* { dg-require-effective-target vect_early_break } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +#define N 32
> +
> +unsigned short sa[N];
> +unsigned short sc[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned short sb[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +		16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned int ia[N];
> +unsigned int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +unsigned int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +	       0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +
> +/* Current peeling-for-alignment scheme will consider the 'sa[i+7]'
> +   access for peeling, and therefore will examine the option of
> +   using a peeling factor = VF-7%VF. This will result in a peeling factor 1,
> +   which will also align the access to 'ia[i+3]', and the loop could be
> +   vectorized on all targets that support unaligned loads.
> +   Without cost model on targets that support misaligned stores, no peeling
> +   will be applied since we want to keep the four loads aligned.  */
> +
> +__attribute__ ((noinline))
> +int main1 ()
> +{
> +  int i;
> +  int n = N - 7;
> +
> +  /* Multiple types with different sizes, used in independent
> +     copmutations. Vectorizable.  */
> +  for (i = 0; i < n; i++)
> +    {
> +      sa[i+7] = sb[i] + sc[i];
> +      ia[i+3] = ib[i] + ic[i];
> +    }
> +
> +  /* check results:  */
> +  for (i = 0; i < n; i++)
> +    {
> +      if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> +	abort ();
> +    }
> +
> +  return 0;
> +}
> +
> +/* Current peeling-for-alignment scheme will consider the 'ia[i+3]'
> +   access for peeling, and therefore will examine the option of
> +   using a peeling factor = VF-3%VF. This will result in a peeling factor
> +   1 if VF=4,2. This will not align the access to 'sa[i+3]', for which we 
> +   need to peel 5,1 iterations for VF=4,2 respectively, so the loop can not 
> +   be vectorized.  However, 'ia[i+3]' also gets aligned if we peel 5
> +   iterations, so the loop is vectorizable on all targets that support
> +   unaligned loads.
> +   Without cost model on targets that support misaligned stores, no peeling
> +   will be applied since we want to keep the four loads aligned.  */
> +
> +__attribute__ ((noinline))
> +int main2 ()
> +{
> +  int i;
> +  int n = N-3;
> +
> +  /* Multiple types with different sizes, used in independent
> +     copmutations. Vectorizable.  */
> +  for (i = 0; i < n; i++)
> +    {
> +      ia[i+3] = ib[i] + ic[i];
> +      sa[i+3] = sb[i] + sc[i];
> +    }
> +
> +  /* check results:  */
> +  for (i = 0; i < n; i++)
> +    {
> +      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> +        abort ();
> +    }
> +
> +  return 0;
> +}
> +
> +int main (void)
> +{ 
> +  check_vect ();
> +  
> +  main1 ();
> +  main2 ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 2 "vect" { xfail { vect_early_break && { ! vect_hw_misalign } } } } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..be4a0c7426093059ce37a9f824defb7ae270094d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +void abort ();
> +
> +unsigned short sa[32];
> +unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned int ia[32];
> +unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +
> +int main2 (int n)
> +{
> +  int i;
> +  for (i = 0; i < n; i++)
> +    {
> +      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> +        abort ();
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..84ea627b4927609079297f11674bdb4c6b301140
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_float } */
> +
> +extern void abort();
> +float a[1024], b[1024], c[1024], d[1024];
> +_Bool k[1024];
> +
> +int main ()
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    if (k[i] != ((i % 3) == 0))
> +      abort ();
> +}
> +
> +/* Pattern didn't match inside gcond.  */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..193f14e8a4d90793f65a5902eabb8d06496bd6e1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_float } */
> +
> +extern void abort();
> +float a[1024], b[1024], c[1024], d[1024];
> +_Bool k[1024];
> +
> +int main ()
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    if (k[i] != (i == 0))
> +      abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..63ff6662f5c2c93201897e43680daa580ed53867
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#define N 1024
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < (N/2); i+=2)
> + {
> +   vect_b[i] = x + i;
> +   vect_b[i+1] = x + i+1;
> +   if (vect_a[i] > x || vect_a[i+1] > x)
> +     break;
> +   vect_a[i] += x * vect_b[i];
> +   vect_a[i+1] += x * vect_b[i+1]; 
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..4c523d4e714ba67e84b213c2aaf3a56231f8b7e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_float } */
> +
> +extern void abort();
> +float a[1024], b[1024], c[1024], d[1024];
> +_Bool k[1024];
> +
> +int main ()
> +{
> +  char i;
> +  for (i = 0; i < 1024; i++)
> +    if (k[i] != (i == 0))
> +      abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a0c34f71e3bbd3516247a8e026fe513c25413252
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_float } */
> +
> +typedef float real_t;
> +__attribute__((aligned(64))) real_t a[32000], b[32000], c[32000];
> +real_t s482()
> +{
> +    for (int nl = 0; nl < 10000; nl++) {
> +        for (int i = 0; i < 32000; i++) {
> +            a[i] += b[i] * c[i];
> +            if (c[i] > b[i]) break;
> +        }
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9b94772934f75e685d71a41f3a0336fbfb7320d5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +int a, b;
> +int e() {
> +  int d, c;
> +  d = 0;
> +  for (; d < b; d++)
> +    a = 0;
> +  d = 0;
> +  for (; d < b; d++)
> +    if (d)
> +      c++;
> +  for (;;)
> +    if (c)
> +      break;
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..11f7fb8547b351734a964175380d1ada696011ae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
> @@ -0,0 +1,28 @@
> +/* Disabling epilogues until we find a better way to deal with scans.  */
> +/* { dg-do compile } */
> +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> +/* { dg-require-effective-target vect_long } */
> +/* { dg-require-effective-target vect_shift } */
> +/* { dg-additional-options "-fno-tree-scev-cprop" } */
> +
> +/* Statement used outside the loop.
> +   NOTE: SCEV disabled to ensure the live operation is not removed before
> +   vectorization.  */
> +__attribute__ ((noinline)) int
> +liveloop (int start, int n, int *x, int *y)
> +{
> +  int i = start;
> +  int j;
> +  int ret;
> +
> +  for (j = 0; j < n; ++j)
> +    {
> +      i += 1;
> +      x[j] = i;
> +      ret = y[j];
> +    }
> +  return ret;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 1 "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..32b9c087feba1780223e3aee8a2636c99990408c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-fdump-tree-vect-all" } */
> +
> +int d(unsigned);
> +
> +void a() {
> +  char b[8];
> +  unsigned c = 0;
> +  while (c < 7 && b[c])
> +    ++c;
> +  if (d(c))
> +    return;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_partial_vectors } } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..577c4e96ba91d4dd4aa448233c632de508286eb9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-Ofast -fno-vect-cost-model -fdump-tree-vect-details" } */
> +
> +enum a { b };
> +
> +struct {
> +  enum a c;
> +} d[10], *e;
> +
> +void f() {
> +  int g;
> +  for (g = 0, e = d; g < sizeof(1); g++, e++)
> +    if (e->c)
> +      return;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b56a4f755f89225cedd8c156cc7385fe5e07eee5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +int a[0];
> +int b;
> +
> +void g();
> +
> +void f() {
> +  int d, e;
> +  for (; e; e++) {
> +    int c;
> +    switch (b)
> +    case '9': {
> +      for (; d < 1; d++)
> +        if (a[d])
> +          c = 1;
> +      break;
> +    case '<':
> +      g();
> +      c = 0;
> +    }
> +      while (c)
> +        ;
> +  }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..80f23d1e2431133035895946a5d6b24bef3ca294
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target int32plus } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +
> +
> +int main()
> +{
> +  int var6 = -1267827473;
> +  do {
> +      ++var6;
> +      double s1_115[4], s2_108[4];
> +      int var8 = -161498264;
> +      do {
> +	  ++var8;
> +	  int var12 = 1260960076;
> +	  for (; var12 <= 1260960080; ++var12) {
> +	      int var13 = 1960990937;
> +	      do {
> +		  ++var13;
> +		  int var14 = 2128638723;
> +		  for (; var14 <= 2128638728; ++var14) {
> +		      int var22 = -1141190839;
> +		      do {
> +			  ++var22;
> +			  if (s2_108 > s1_115) {
> +			      int var23 = -890798748;
> +			      do {
> +				  long long e_119[4];
> +			      } while (var23 <= -890798746);
> +			  }
> +		      } while (var22 <= -1141190829);
> +		  }
> +	      } while (var13 <= 1960990946);
> +	  }
> +      } while (var8 <= -161498254);
> +  } while (var6 <= -1267827462);
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c9a8298a8b51e05079041ae7a05086a47b1be5dd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 800
> +#endif
> +unsigned vect_a1[N];
> +unsigned vect_b1[N];
> +unsigned vect_c1[N];
> +unsigned vect_d1[N];
> +  
> +unsigned vect_a2[N];
> +unsigned vect_b2[N];
> +unsigned vect_c2[N];
> +unsigned vect_d2[N];
> +
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b1[i] += x + i;
> +   vect_c1[i] += x + i;
> +   vect_d1[i] += x + i;
> +   if (vect_a1[i]*2 != x)
> +     break;
> +   vect_a1[i] = x;
> +
> +   vect_b2[i] += x + i;
> +   vect_c2[i] += x + i;
> +   vect_d2[i] += x + i;
> +   if (vect_a2[i]*2 != x)
> +     break;
> +   vect_a2[i] = x;
> +
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f99de8e1f0650a3b590ed8bd9052e18173fc97d0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c
> @@ -0,0 +1,76 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#include <limits.h>
> +#include <assert.h>
> +
> +#  define BITSIZEOF_INT 32
> +#  define BITSIZEOF_LONG 64
> +#  define BITSIZEOF_LONG_LONG 64
> +
> +#define MAKE_FUNS(suffix, type)						\
> +int my_ffs##suffix(type x) {						\
> +    int i;								\
> +    if (x == 0)								\
> +	 return 0; 							\
> +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (x & ((type) 1  << i))					\
> +	    break;							\
> +    return i + 1;							\
> +}									\
> +									\
> +int my_clz##suffix(type x) {						\
> +    int i;								\
> +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
> +	    break;							\
> +    return i;								\
> +}
> +
> +
> +MAKE_FUNS (, unsigned);
> +
> +extern void abort (void);
> +extern void exit (int);
> +
> +#define NUMS32					\
> +  {                                             \
> +    0x00000000UL,                               \
> +    0x00000001UL,                               \
> +    0x80000000UL,                               \
> +    0x00000002UL,                               \
> +    0x40000000UL,                               \
> +    0x00010000UL,                               \
> +    0x00008000UL,                               \
> +    0xa5a5a5a5UL,                               \
> +    0x5a5a5a5aUL,                               \
> +    0xcafe0000UL,                               \
> +    0x00cafe00UL,                               \
> +    0x0000cafeUL,                               \
> +    0xffffffffUL                                \
> +  }
> +
> +
> +unsigned int ints[] = NUMS32;
> +
> +#define N(table) (sizeof (table) / sizeof (table[0]))
> +
> +int
> +main (void)
> +{
> +  int i;
> +
> +  for (i = 0; i < N(ints); i++)
> +    {
> +      if (__builtin_ffs (ints[i]) != my_ffs (ints[i]))
> +	abort ();
> +      if (ints[i] != 0
> +	  && __builtin_clz (ints[i]) != my_clz (ints[i]))
> +	abort ();
> +    }
> +
> +  exit (0);
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..10fd8b42952c42f3d3a014da103931ca394423d5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <complex.h>
> +
> +#define N 1024
> +complex double vect_a[N];
> +complex double vect_b[N];
> +  
> +complex double test4(complex double x)
> +{
> + complex double ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] += x + i;
> +   if (vect_a[i] == x)
> +     break;
> +   vect_a[i] += x * vect_b[i];
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9073130197e124527f8e38c238d8f13452a7780e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c
> @@ -0,0 +1,68 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <limits.h>
> +#include <assert.h>
> +
> +#  define BITSIZEOF_INT 32
> +#  define BITSIZEOF_LONG 64
> +#  define BITSIZEOF_LONG_LONG 64
> +
> +#define MAKE_FUNS(suffix, type)						\
> +__attribute__((noinline)) \
> +int my_clz##suffix(type x) {						\
> +    int i;								\
> +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
> +	    break;							\
> +    return i;								\
> +}
> +
> +
> +MAKE_FUNS (, unsigned);
> +
> +extern void abort (void);
> +extern void exit (int);
> +
> +#define NUMS32					\
> +  {                                             \
> +    0x00000000UL,                               \
> +    0x00000001UL,                               \
> +    0x80000000UL,                               \
> +    0x00000002UL,                               \
> +    0x40000000UL,                               \
> +    0x00010000UL,                               \
> +    0x00008000UL,                               \
> +    0xa5a5a5a5UL,                               \
> +    0x5a5a5a5aUL,                               \
> +    0xcafe0000UL,                               \
> +    0x00cafe00UL,                               \
> +    0x0000cafeUL,                               \
> +    0xffffffffUL                                \
> +  }
> +
> +
> +unsigned int ints[] = NUMS32;
> +
> +#define N(table) (sizeof (table) / sizeof (table[0]))
> +
> +int
> +main (void)
> +{
> +  int i;
> +
> +#pragma GCC novector
> +  for (i = 0; i < N(ints); i++)
> +    {
> +      if (ints[i] != 0
> +	  && __builtin_clz (ints[i]) != my_clz (ints[i]))
> +	  abort ();
> +    }
> +
> +  exit (0);
> +  return 0;
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c6d6eb526e618ee93547e04eaba3c6a159a18075
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c
> @@ -0,0 +1,68 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <limits.h>
> +#include <assert.h>
> +
> +#  define BITSIZEOF_INT 32
> +#  define BITSIZEOF_LONG 64
> +#  define BITSIZEOF_LONG_LONG 64
> +
> +#define MAKE_FUNS(suffix, type)						\
> +__attribute__((noinline)) \
> +int my_ffs##suffix(type x) {						\
> +    int i;								\
> +    if (x == 0)								\
> +	 return 0; 							\
> +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (x & ((type) 1  << i))					\
> +	    break;							\
> +    return i + 1;							\
> +}
> +
> +MAKE_FUNS (, unsigned);
> +
> +extern void abort (void);
> +extern void exit (int);
> +
> +#define NUMS32					\
> +  {                                             \
> +    0x00000000UL,                               \
> +    0x00000001UL,                               \
> +    0x80000000UL,                               \
> +    0x00000002UL,                               \
> +    0x40000000UL,                               \
> +    0x00010000UL,                               \
> +    0x00008000UL,                               \
> +    0xa5a5a5a5UL,                               \
> +    0x5a5a5a5aUL,                               \
> +    0xcafe0000UL,                               \
> +    0x00cafe00UL,                               \
> +    0x0000cafeUL,                               \
> +    0xffffffffUL                                \
> +  }
> +
> +
> +unsigned int ints[] = NUMS32;
> +
> +#define N(table) (sizeof (table) / sizeof (table[0]))
> +
> +int
> +main (void)
> +{
> +  int i;
> +
> +#pragma GCC novector
> +  for (i = 0; i < N(ints); i++)
> +    {
> +      if (__builtin_ffs (ints[i]) != my_ffs (ints[i]))
> +	abort ();
> +    }
> +
> +  exit (0);
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..0f0a1f30ab95bf540027efa8c03aff8fe03a960b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c
> @@ -0,0 +1,147 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <limits.h>
> +#include <assert.h>
> +
> +#if __INT_MAX__ > 2147483647L
> +# if __INT_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_INT 64
> +# else
> +#  define BITSIZEOF_INT 32
> +# endif
> +#else
> +# if __INT_MAX__ >= 2147483647L
> +#  define BITSIZEOF_INT 32
> +# else
> +#  define BITSIZEOF_INT 16
> +# endif
> +#endif
> +
> +#if __LONG_MAX__ > 2147483647L
> +# if __LONG_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_LONG 64
> +# else
> +#  define BITSIZEOF_LONG 32
> +# endif
> +#else
> +# define BITSIZEOF_LONG 32
> +#endif
> +
> +#if __LONG_LONG_MAX__ > 2147483647L
> +# if __LONG_LONG_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_LONG_LONG 64
> +# else
> +#  define BITSIZEOF_LONG_LONG 32
> +# endif
> +#else
> +# define BITSIZEOF_LONG_LONG 32
> +#endif
> +
> +#define MAKE_FUNS(suffix, type)						\
> +__attribute__((noinline)) \
> +int my_ctz##suffix(type x) {						\
> +    int i;								\
> +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (x & ((type) 1  << i))					\
> +	    break;							\
> +    return i;								\
> +}
> +
> +MAKE_FUNS (, unsigned);
> +
> +extern void abort (void);
> +extern void exit (int);
> +
> +#define NUMS16					\
> +  {						\
> +    0x0000U,					\
> +    0x0001U,					\
> +    0x8000U,					\
> +    0x0002U,					\
> +    0x4000U,					\
> +    0x0100U,					\
> +    0x0080U,					\
> +    0xa5a5U,					\
> +    0x5a5aU,					\
> +    0xcafeU,					\
> +    0xffffU					\
> +  }
> +
> +#define NUMS32					\
> +  {						\
> +    0x00000000UL,				\
> +    0x00000001UL,				\
> +    0x80000000UL,				\
> +    0x00000002UL,				\
> +    0x40000000UL,				\
> +    0x00010000UL,				\
> +    0x00008000UL,				\
> +    0xa5a5a5a5UL,				\
> +    0x5a5a5a5aUL,				\
> +    0xcafe0000UL,				\
> +    0x00cafe00UL,				\
> +    0x0000cafeUL,				\
> +    0xffffffffUL				\
> +  }
> +
> +#define NUMS64					\
> +  {						\
> +    0x0000000000000000ULL,			\
> +    0x0000000000000001ULL,			\
> +    0x8000000000000000ULL,			\
> +    0x0000000000000002ULL,			\
> +    0x4000000000000000ULL,			\
> +    0x0000000100000000ULL,			\
> +    0x0000000080000000ULL,			\
> +    0xa5a5a5a5a5a5a5a5ULL,			\
> +    0x5a5a5a5a5a5a5a5aULL,			\
> +    0xcafecafe00000000ULL,			\
> +    0x0000cafecafe0000ULL,			\
> +    0x00000000cafecafeULL,			\
> +    0xffffffffffffffffULL			\
> +  }
> +
> +unsigned int ints[] =
> +#if BITSIZEOF_INT == 64
> +NUMS64;
> +#elif BITSIZEOF_INT == 32
> +NUMS32;
> +#else
> +NUMS16;
> +#endif
> +
> +unsigned long longs[] =
> +#if BITSIZEOF_LONG == 64
> +NUMS64;
> +#else
> +NUMS32;
> +#endif
> +
> +unsigned long long longlongs[] =
> +#if BITSIZEOF_LONG_LONG == 64
> +NUMS64;
> +#else
> +NUMS32;
> +#endif
> +
> +#define N(table) (sizeof (table) / sizeof (table[0]))
> +
> +int
> +main (void)
> +{
> +  int i;
> +
> +#pragma GCC novector
> +  for (i = 0; i < N(ints); i++)
> +    {
> +      if (ints[i] != 0
> +	  && __builtin_ctz (ints[i]) != my_ctz (ints[i]))
> +	  abort ();
> +    }
> +
> +  exit (0);
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..5cce21cd16aa89d96cdac2b302d29ee918b67249
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c
> @@ -0,0 +1,68 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <limits.h>
> +#include <assert.h>
> +
> +#  define BITSIZEOF_INT 32
> +#  define BITSIZEOF_LONG 64
> +#  define BITSIZEOF_LONG_LONG 64
> +
> +#define MAKE_FUNS(suffix, type)						\
> +__attribute__((noinline)) \
> +int my_clz##suffix(type x) {						\
> +    int i;								\
> +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
> +	    break;							\
> +    return i;								\
> +}
> +
> +
> +MAKE_FUNS (, unsigned);
> +
> +extern void abort (void);
> +extern void exit (int);
> +
> +#define NUMS32					\
> +  {                                             \
> +    0x00000000UL,                               \
> +    0x00000001UL,                               \
> +    0x80000000UL,                               \
> +    0x00000002UL,                               \
> +    0x40000000UL,                               \
> +    0x00010000UL,                               \
> +    0x00008000UL,                               \
> +    0xa5a5a5a5UL,                               \
> +    0x5a5a5a5aUL,                               \
> +    0xcafe0000UL,                               \
> +    0x00cafe00UL,                               \
> +    0x0000cafeUL,                               \
> +    0xffffffffUL                                \
> +  }
> +
> +
> +unsigned int ints[] = NUMS32;
> +
> +#define N(table) (sizeof (table) / sizeof (table[0]))
> +
> +int
> +main (void)
> +{
> +  int i;
> +
> +#pragma GCC novector
> +  for (i = 0; i < N(ints); i++)
> +    {
> +      if (ints[i] != 0
> +	  && __builtin_clz (ints[i]) != my_clz (ints[i]))
> +	  abort ();
> +    }
> +
> +  exit (0);
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..83676da28884e79874fb0b5cc6a434a0fe6b87cf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c
> @@ -0,0 +1,161 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <limits.h>
> +#include <assert.h>
> +
> +#if __INT_MAX__ > 2147483647L
> +# if __INT_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_INT 64
> +# else
> +#  define BITSIZEOF_INT 32
> +# endif
> +#else
> +# if __INT_MAX__ >= 2147483647L
> +#  define BITSIZEOF_INT 32
> +# else
> +#  define BITSIZEOF_INT 16
> +# endif
> +#endif
> +
> +#if __LONG_MAX__ > 2147483647L
> +# if __LONG_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_LONG 64
> +# else
> +#  define BITSIZEOF_LONG 32
> +# endif
> +#else
> +# define BITSIZEOF_LONG 32
> +#endif
> +
> +#if __LONG_LONG_MAX__ > 2147483647L
> +# if __LONG_LONG_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_LONG_LONG 64
> +# else
> +#  define BITSIZEOF_LONG_LONG 32
> +# endif
> +#else
> +# define BITSIZEOF_LONG_LONG 32
> +#endif
> +
> +#define MAKE_FUNS(suffix, type)						\
> +int my_clrsb##suffix(type x) {						\
> +    int i;								\
> +    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;		\
> +    for (i = 1; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)		\
> +	    != leading)							\
> +	    break;							\
> +    return i - 1;							\
> +}
> +
> +MAKE_FUNS (, unsigned);
> +
> +extern void abort (void);
> +extern void exit (int);
> +
> +#define NUMS16					\
> +  {						\
> +    0x0000U,					\
> +    0x0001U,					\
> +    0x8000U,					\
> +    0x0002U,					\
> +    0x4000U,					\
> +    0x0100U,					\
> +    0x0080U,					\
> +    0xa5a5U,					\
> +    0x5a5aU,					\
> +    0xcafeU,					\
> +    0xffffU					\
> +  }
> +
> +#define NUMS32					\
> +  {						\
> +    0x00000000UL,				\
> +    0x00000001UL,				\
> +    0x80000000UL,				\
> +    0x00000002UL,				\
> +    0x40000000UL,				\
> +    0x00010000UL,				\
> +    0x00008000UL,				\
> +    0xa5a5a5a5UL,				\
> +    0x5a5a5a5aUL,				\
> +    0xcafe0000UL,				\
> +    0x00cafe00UL,				\
> +    0x0000cafeUL,				\
> +    0xffffffffUL				\
> +  }
> +
> +#define NUMS64					\
> +  {						\
> +    0x0000000000000000ULL,			\
> +    0x0000000000000001ULL,			\
> +    0x8000000000000000ULL,			\
> +    0x0000000000000002ULL,			\
> +    0x4000000000000000ULL,			\
> +    0x0000000100000000ULL,			\
> +    0x0000000080000000ULL,			\
> +    0xa5a5a5a5a5a5a5a5ULL,			\
> +    0x5a5a5a5a5a5a5a5aULL,			\
> +    0xcafecafe00000000ULL,			\
> +    0x0000cafecafe0000ULL,			\
> +    0x00000000cafecafeULL,			\
> +    0xffffffffffffffffULL			\
> +  }
> +
> +unsigned int ints[] =
> +#if BITSIZEOF_INT == 64
> +NUMS64;
> +#elif BITSIZEOF_INT == 32
> +NUMS32;
> +#else
> +NUMS16;
> +#endif
> +
> +unsigned long longs[] =
> +#if BITSIZEOF_LONG == 64
> +NUMS64;
> +#else
> +NUMS32;
> +#endif
> +
> +unsigned long long longlongs[] =
> +#if BITSIZEOF_LONG_LONG == 64
> +NUMS64;
> +#else
> +NUMS32;
> +#endif
> +
> +#define N(table) (sizeof (table) / sizeof (table[0]))
> +
> +int
> +main (void)
> +{
> +  int i;
> +
> +  /* Test constant folding.  */
> +
> +#define TEST(x, suffix)							\
> +  if (__builtin_clrsb##suffix (x) != my_clrsb##suffix (x))		\
> +    abort ();								
> +
> +#if BITSIZEOF_INT == 32
> +  TEST(0x00000000UL,);
> +  TEST(0x00000001UL,);
> +  TEST(0x80000000UL,);
> +  TEST(0x40000000UL,);
> +  TEST(0x00010000UL,);
> +  TEST(0x00008000UL,);
> +  TEST(0xa5a5a5a5UL,);
> +  TEST(0x5a5a5a5aUL,);
> +  TEST(0xcafe0000UL,);
> +  TEST(0x00cafe00UL,);
> +  TEST(0x0000cafeUL,);
> +  TEST(0xffffffffUL,);
> +#endif
> +
> +  exit (0);
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..cc1ce4cf298ee0747f41ea4941af5a65f8a688ef
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c
> @@ -0,0 +1,230 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-O3" } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <limits.h>
> +#include <assert.h>
> +
> +#if __INT_MAX__ > 2147483647L
> +# if __INT_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_INT 64
> +# else
> +#  define BITSIZEOF_INT 32
> +# endif
> +#else
> +# if __INT_MAX__ >= 2147483647L
> +#  define BITSIZEOF_INT 32
> +# else
> +#  define BITSIZEOF_INT 16
> +# endif
> +#endif
> +
> +#if __LONG_MAX__ > 2147483647L
> +# if __LONG_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_LONG 64
> +# else
> +#  define BITSIZEOF_LONG 32
> +# endif
> +#else
> +# define BITSIZEOF_LONG 32
> +#endif
> +
> +#if __LONG_LONG_MAX__ > 2147483647L
> +# if __LONG_LONG_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_LONG_LONG 64
> +# else
> +#  define BITSIZEOF_LONG_LONG 32
> +# endif
> +#else
> +# define BITSIZEOF_LONG_LONG 32
> +#endif
> +
> +#define MAKE_FUNS(suffix, type)						\
> +__attribute__((noinline)) \
> +int my_ffs##suffix(type x) {						\
> +    int i;								\
> +    if (x == 0)								\
> +	 return 0; 							\
> +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (x & ((type) 1  << i))					\
> +	    break;							\
> +    return i + 1;							\
> +}									\
> +									\
> +int my_ctz##suffix(type x) {						\
> +    int i;								\
> +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (x & ((type) 1  << i))					\
> +	    break;							\
> +    return i;								\
> +}									\
> +									\
> +__attribute__((noinline)) \
> +int my_clz##suffix(type x) {						\
> +    int i;								\
> +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))	\
> +	    break;							\
> +    return i;								\
> +}									\
> +									\
> +int my_clrsb##suffix(type x) {						\
> +    int i;								\
> +    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;		\
> +    for (i = 1; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)		\
> +	    != leading)							\
> +	    break;							\
> +    return i - 1;							\
> +}									\
> +									\
> +__attribute__((noinline)) \
> +int my_popcount##suffix(type x) {					\
> +    int i;								\
> +    int count = 0;							\
> +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (x & ((type) 1 << i))					\
> +	    count++;							\
> +    return count;							\
> +}									\
> +									\
> +__attribute__((noinline)) \
> +int my_parity##suffix(type x) {						\
> +    int i;								\
> +    int count = 0;							\
> +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (x & ((type) 1 << i))					\
> +	    count++;							\
> +    return count & 1;							\
> +}
> +
> +MAKE_FUNS (ll, unsigned long long);
> +
> +extern void abort (void);
> +extern void exit (int);
> +
> +#define NUMS16					\
> +  {						\
> +    0x0000U,					\
> +    0x0001U,					\
> +    0x8000U,					\
> +    0x0002U,					\
> +    0x4000U,					\
> +    0x0100U,					\
> +    0x0080U,					\
> +    0xa5a5U,					\
> +    0x5a5aU,					\
> +    0xcafeU,					\
> +    0xffffU					\
> +  }
> +
> +#define NUMS32					\
> +  {						\
> +    0x00000000UL,				\
> +    0x00000001UL,				\
> +    0x80000000UL,				\
> +    0x00000002UL,				\
> +    0x40000000UL,				\
> +    0x00010000UL,				\
> +    0x00008000UL,				\
> +    0xa5a5a5a5UL,				\
> +    0x5a5a5a5aUL,				\
> +    0xcafe0000UL,				\
> +    0x00cafe00UL,				\
> +    0x0000cafeUL,				\
> +    0xffffffffUL				\
> +  }
> +
> +#define NUMS64					\
> +  {						\
> +    0x0000000000000000ULL,			\
> +    0x0000000000000001ULL,			\
> +    0x8000000000000000ULL,			\
> +    0x0000000000000002ULL,			\
> +    0x4000000000000000ULL,			\
> +    0x0000000100000000ULL,			\
> +    0x0000000080000000ULL,			\
> +    0xa5a5a5a5a5a5a5a5ULL,			\
> +    0x5a5a5a5a5a5a5a5aULL,			\
> +    0xcafecafe00000000ULL,			\
> +    0x0000cafecafe0000ULL,			\
> +    0x00000000cafecafeULL,			\
> +    0xffffffffffffffffULL			\
> +  }
> +
> +unsigned int ints[] =
> +#if BITSIZEOF_INT == 64
> +NUMS64;
> +#elif BITSIZEOF_INT == 32
> +NUMS32;
> +#else
> +NUMS16;
> +#endif
> +
> +unsigned long longs[] =
> +#if BITSIZEOF_LONG == 64
> +NUMS64;
> +#else
> +NUMS32;
> +#endif
> +
> +unsigned long long longlongs[] =
> +#if BITSIZEOF_LONG_LONG == 64
> +NUMS64;
> +#else
> +NUMS32;
> +#endif
> +
> +#define N(table) (sizeof (table) / sizeof (table[0]))
> +
> +int
> +main (void)
> +{
> +  int i;
> +
> +#pragma GCC novector
> +  for (i = 0; i < N(longlongs); i++)
> +    {
> +      if (__builtin_ffsll (longlongs[i]) != my_ffsll (longlongs[i]))
> +	abort ();
> +      if (longlongs[i] != 0
> +	  && __builtin_clzll (longlongs[i]) != my_clzll (longlongs[i]))
> +	abort ();
> +      if (longlongs[i] != 0
> +	  && __builtin_ctzll (longlongs[i]) != my_ctzll (longlongs[i]))
> +	abort ();
> +      if (__builtin_clrsbll (longlongs[i]) != my_clrsbll (longlongs[i]))
> +	abort ();
> +      if (__builtin_popcountll (longlongs[i]) != my_popcountll (longlongs[i]))
> +	abort ();
> +      if (__builtin_parityll (longlongs[i]) != my_parityll (longlongs[i]))
> +	abort ();
> +    }
> +
> +  /* Test constant folding.  */
> +
> +#define TEST(x, suffix)							\
> +  if (__builtin_ffs##suffix (x) != my_ffs##suffix (x))			\
> +    abort ();								\
> +
> +#if BITSIZEOF_LONG_LONG == 64
> +  TEST(0x0000000000000000ULL, ll);
> +  TEST(0x0000000000000001ULL, ll);
> +  TEST(0x8000000000000000ULL, ll);
> +  TEST(0x0000000000000002ULL, ll);
> +  TEST(0x4000000000000000ULL, ll);
> +  TEST(0x0000000100000000ULL, ll);
> +  TEST(0x0000000080000000ULL, ll);
> +  TEST(0xa5a5a5a5a5a5a5a5ULL, ll);
> +  TEST(0x5a5a5a5a5a5a5a5aULL, ll);
> +  TEST(0xcafecafe00000000ULL, ll);
> +  TEST(0x0000cafecafe0000ULL, ll);
> +  TEST(0x00000000cafecafeULL, ll);
> +  TEST(0xffffffffffffffffULL, ll);
> +#endif
> +
> +  exit (0);
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..adba337b101f4d7cafaa50329a933594b0d501ad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c
> @@ -0,0 +1,165 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-O3" } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <limits.h>
> +#include <assert.h>
> +
> +#if __INT_MAX__ > 2147483647L
> +# if __INT_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_INT 64
> +# else
> +#  define BITSIZEOF_INT 32
> +# endif
> +#else
> +# if __INT_MAX__ >= 2147483647L
> +#  define BITSIZEOF_INT 32
> +# else
> +#  define BITSIZEOF_INT 16
> +# endif
> +#endif
> +
> +#if __LONG_MAX__ > 2147483647L
> +# if __LONG_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_LONG 64
> +# else
> +#  define BITSIZEOF_LONG 32
> +# endif
> +#else
> +# define BITSIZEOF_LONG 32
> +#endif
> +
> +#if __LONG_LONG_MAX__ > 2147483647L
> +# if __LONG_LONG_MAX__ >= 9223372036854775807L
> +#  define BITSIZEOF_LONG_LONG 64
> +# else
> +#  define BITSIZEOF_LONG_LONG 32
> +# endif
> +#else
> +# define BITSIZEOF_LONG_LONG 32
> +#endif
> +
> +#define MAKE_FUNS(suffix, type)						\
> +int my_clrsb##suffix(type x) {						\
> +    int i;								\
> +    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;		\
> +    for (i = 1; i < CHAR_BIT * sizeof (type); i++)			\
> +	if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)		\
> +	    != leading)							\
> +	    break;							\
> +    return i - 1;							\
> +}									\
> +									\
> +
> +MAKE_FUNS (, unsigned);
> +MAKE_FUNS (ll, unsigned long long);
> +
> +extern void abort (void);
> +extern void exit (int);
> +
> +#define NUMS16					\
> +  {						\
> +    0x0000U,					\
> +    0x0001U,					\
> +    0x8000U,					\
> +    0x0002U,					\
> +    0x4000U,					\
> +    0x0100U,					\
> +    0x0080U,					\
> +    0xa5a5U,					\
> +    0x5a5aU,					\
> +    0xcafeU,					\
> +    0xffffU					\
> +  }
> +
> +#define NUMS32					\
> +  {						\
> +    0x00000000UL,				\
> +    0x00000001UL,				\
> +    0x80000000UL,				\
> +    0x00000002UL,				\
> +    0x40000000UL,				\
> +    0x00010000UL,				\
> +    0x00008000UL,				\
> +    0xa5a5a5a5UL,				\
> +    0x5a5a5a5aUL,				\
> +    0xcafe0000UL,				\
> +    0x00cafe00UL,				\
> +    0x0000cafeUL,				\
> +    0xffffffffUL				\
> +  }
> +
> +#define NUMS64					\
> +  {						\
> +    0x0000000000000000ULL,			\
> +    0x0000000000000001ULL,			\
> +    0x8000000000000000ULL,			\
> +    0x0000000000000002ULL,			\
> +    0x4000000000000000ULL,			\
> +    0x0000000100000000ULL,			\
> +    0x0000000080000000ULL,			\
> +    0xa5a5a5a5a5a5a5a5ULL,			\
> +    0x5a5a5a5a5a5a5a5aULL,			\
> +    0xcafecafe00000000ULL,			\
> +    0x0000cafecafe0000ULL,			\
> +    0x00000000cafecafeULL,			\
> +    0xffffffffffffffffULL			\
> +  }
> +
> +unsigned int ints[] =
> +#if BITSIZEOF_INT == 64
> +NUMS64;
> +#elif BITSIZEOF_INT == 32
> +NUMS32;
> +#else
> +NUMS16;
> +#endif
> +
> +unsigned long longs[] =
> +#if BITSIZEOF_LONG == 64
> +NUMS64;
> +#else
> +NUMS32;
> +#endif
> +
> +unsigned long long longlongs[] =
> +#if BITSIZEOF_LONG_LONG == 64
> +NUMS64;
> +#else
> +NUMS32;
> +#endif
> +
> +#define N(table) (sizeof (table) / sizeof (table[0]))
> +
> +int
> +main (void)
> +{
> +  int i;
> +
> +#pragma GCC novector
> +  for (i = 0; i < N(ints); i++)
> +    {
> +      if (__builtin_clrsb (ints[i]) != my_clrsb (ints[i]))
> +	abort ();
> +    }
> +
> +  /* Test constant folding.  */
> +
> +#define TEST(x, suffix)							\
> +  if (__builtin_clrsb##suffix (x) != my_clrsb##suffix (x))		\
> +    abort ();								
> +
> +#if BITSIZEOF_LONG_LONG == 64
> +  TEST(0xffffffffffffffffULL, ll);
> +  TEST(0xffffffffffffffffULL, ll);
> +  TEST(0xffffffffffffffffULL, ll);
> +  TEST(0xffffffffffffffffULL, ll);
> +  TEST(0xffffffffffffffffULL, ll);
> +  TEST(0xffffffffffffffffULL, ll);
> +#endif
> +
> +  exit (0);
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ae706b2952cfcecf20546a67a735b8d902cbb607
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <complex.h>
> +
> +#define N 1024
> +char vect_a[N];
> +char vect_b[N];
> +  
> +char test4(char x, char * restrict res)
> +{
> + char ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] += x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] += x * vect_b[i];
> +   res[i] *= vect_b[i];
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..4e8b5bdea5ff9aa0cadbea0af10d51707da011c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 803
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_a[i] = x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..571aec0ccfdbcdc318ba1f17de31958c16b3e9bc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=armv8.3-a -mcpu=neoverse-n1" } */
> +
> +#include <arm_neon.h>
> +
> +/* { dg-warning "switch ?-mcpu=neoverse-n1? conflicts with ?-march=armv8.3-a? switch and would result in options \\+fp16\\+dotprod\\+profile\\+nopauth" "" { target *-*-* } 0 } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..cee42c84c4f762a4d4773ea4380163742b5137b0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=armv8-a+sve -mcpu=neoverse-n1" } */
> +
> +#include <arm_neon.h>
> +
> +/* { dg-warning "switch ?-mcpu=neoverse-n1? conflicts with ?-march=armv8-a+sve? switch and would result in options \\+lse\\+rcpc\\+rdma\\+dotprod\\+profile\\+nosve" } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..0a05b98eedb8bd743bb5af8e4dd3c95aab001c4b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c
> @@ -0,0 +1,5 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=armv8-a -mcpu=neovese-n1 -Wpedentic -Werror" } */
> +
> +#include <arm_neon.h>
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c
> @@ -0,0 +1,124 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#pragma GCC target "+nosve"
> +
> +#define N 640
> +int a[N] = {0};
> +int b[N] = {0};
> +
> +
> +/*
> +** f1:
> +**	...
> +**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f1 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] > 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f2:
> +**	...
> +**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f2 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] >= 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f3:
> +**	...
> +**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f3 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] == 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f4:
> +**	...
> +**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f4 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] != 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f5:
> +**	...
> +**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f5 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] < 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f6:
> +**	...
> +**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f6 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] <= 0)
> +	break;
> +    }
> +}
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> index f0b692a2e19bae3cf3ffee8f27bd39b05aba3b9c..1e47ae84080f9908736d1c3be9c14d589e8772a7 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -3975,6 +3975,17 @@ proc check_effective_target_vect_int { } {
>  	}}]
>  }
>  
> +# Return 1 if the target supports hardware vectorization of early breaks,
> +# 0 otherwise.
> +#
> +# This won't change for different subtargets so cache the result.
> +
> +proc check_effective_target_vect_early_break { } {
> +    return [check_cached_effective_target_indexed vect_early_break {
> +      expr {
> +	[istarget aarch64*-*-*]
> +	}}]
> +}
>  # Return 1 if the target supports hardware vectorization of complex additions of
>  # byte, 0 otherwise.
>  #
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization
  2023-11-07  9:42     ` Richard Biener
@ 2023-11-07 10:47       ` Tamar Christina
  2023-11-07 13:58         ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-07 10:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, November 7, 2023 9:43 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: RE: [PATCH v6 0/21]middle-end: Support early break/return auto-
> vectorization
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Monday, November 6, 2023 2:25 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return
> > > auto- vectorization
> > >
> > > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch adds initial support for early break vectorization in GCC.
> > > > The support is added for any target that implements a vector
> > > > cbranch optab, this includes both fully masked and non-masked targets.
> > > >
> > > > Depending on the operation, the vectorizer may also require
> > > > support for boolean mask reductions using Inclusive OR.  This is
> > > > however only checked then the comparison would produce multiple
> statements.
> > > >
> > > > Note: I am currently struggling to get patch 7 correct in all
> > > > cases and could
> > > use
> > > >       some feedback there.
> > > >
> > > > Concretely the kind of loops supported are of the forms:
> > > >
> > > >  for (int i = 0; i < N; i++)
> > > >  {
> > > >    <statements1>
> > > >    if (<condition>)
> > > >      {
> > > >        ...
> > > >        <action>;
> > > >      }
> > > >    <statements2>
> > > >  }
> > > >
> > > > where <action> can be:
> > > >  - break
> > > >  - return
> > > >  - goto
> > > >
> > > > Any number of statements can be used before the <action> occurs.
> > > >
> > > > Since this is an initial version for GCC 14 it has the following
> > > > limitations and
> > > > features:
> > > >
> > > > - Only fixed sized iterations and buffers are supported.  That is to say any
> > > >   vectors loaded or stored must be to statically allocated arrays with
> known
> > > >   sizes. N must also be known.  This limitation is because our primary
> target
> > > >   for this optimization is SVE.  For VLA SVE we can't easily do cross page
> > > >   iteraion checks. The result is likely to also not be beneficial. For that
> > > >   reason we punt support for variable buffers till we have First-Faulting
> > > >   support in GCC.
> 
> Btw, for this I wonder if you thought about marking memory accesses required
> for the early break condition as required to be vector-size aligned, thus peeling
> or versioning them for alignment?  That should ensure they do not fault.
> 
> OTOH I somehow remember prologue peeling isn't supported for early break
> vectorization?  ..
> 
> > > > - any stores in <statements1> should not be to the same objects as in
> > > >   <condition>.  Loads are fine as long as they don't have the possibility to
> > > >   alias.  More concretely, we block RAW dependencies when the
> > > > intermediate
> > > value
> > > >   can't be separated fromt the store, or the store itself can't be moved.
> > > > - Prologue peeling, alignment peelinig and loop versioning are supported.
> 
> .. but here you say it is.  Not sure if peeling for alignment works for VLA vectors
> though.  Just to say x86 doesn't support first-faulting loads.

For VLA we support it through masking.  i.e. if you need to peel N iterations, we
generate a masked copy of the loop vectorized which masks off the first N bits.

This is not typically needed, but we do support it.  But the problem with this
scheme and early break is obviously that the peeled loop needs to be vectorized
so you kinda end up with the same issue again.  So Atm it rejects it for VLA.

Regards,
Tamar

> 
> > > > - Fully masked loops, unmasked loops and partially masked loops
> > > > are supported
> > > > - Any number of loop early exits are supported.
> > > > - No support for epilogue vectorization.  The only epilogue supported is
> the
> > > >   scalar final one.  Peeling code supports it but the code motion code
> cannot
> > > >   find instructions to make the move in the epilog.
> > > > - Early breaks are only supported for inner loop vectorization.
> > > >
> > > > I have pushed a branch to
> > > > refs/users/tnfchris/heads/gcc-14-early-break
> > > >
> > > > With the help of IPA and LTO this still gets hit quite often.
> > > > During bootstrap it hit rather frequently.  Additionally TSVC
> > > > s332, s481 and
> > > > s482 all pass now since these are tests for support for early exit
> > > vectorization.
> > > >
> > > > This implementation does not support completely handling the early
> > > > break inside the vector loop itself but instead supports adding
> > > > checks such that if we know that we have to exit in the current
> > > > iteration then we branch to scalar code to actually do the final
> > > > VF iterations which
> > > handles all the code in <action>.
> > > >
> > > > For the scalar loop we know that whatever exit you take you have
> > > > to perform at most VF iterations.  For vector code we only case
> > > > about the state of fully performed iteration and reset the scalar
> > > > code to the (partially)
> > > remaining loop.
> > > >
> > > > That is to say, the first vector loop executes so long as the
> > > > early exit isn't needed.  Once the exit is taken, the scalar code
> > > > will perform at most VF extra iterations.  The exact number
> > > > depending on peeling
> > > and iteration start and which
> > > > exit was taken (natural or early).   For this scalar loop, all early exits are
> > > > treated the same.
> > > >
> > > > When we vectorize we move any statement not related to the early
> > > > break itself and that would be incorrect to execute before the break (i.e.
> > > > has side effects) to after the break.  If this is not possible we
> > > > decline to
> > > vectorize.
> > > >
> > > > This means that we check at the start of iterations whether we are
> > > > going to exit or not.  During the analyis phase we check whether
> > > > we are allowed to do this moving of statements.  Also note that we
> > > > only move the scalar statements, but only do so after peeling but
> > > > just before we
> > > start transforming statements.
> > > >
> > > > Codegen:
> > > >
> > > > for e.g.
> > > >
> > > > #define N 803
> > > > unsigned vect_a[N];
> > > > unsigned vect_b[N];
> > > >
> > > > unsigned test4(unsigned x)
> > > > {
> > > >  unsigned ret = 0;
> > > >  for (int i = 0; i < N; i++)
> > > >  {
> > > >    vect_b[i] = x + i;
> > > >    if (vect_a[i] > x)
> > > >      break;
> > > >    vect_a[i] = x;
> > > >
> > > >  }
> > > >  return ret;
> > > > }
> > > >
> > > > We generate for Adv. SIMD:
> > > >
> > > > test4:
> > > >         adrp    x2, .LC0
> > > >         adrp    x3, .LANCHOR0
> > > >         dup     v2.4s, w0
> > > >         add     x3, x3, :lo12:.LANCHOR0
> > > >         movi    v4.4s, 0x4
> > > >         add     x4, x3, 3216
> > > >         ldr     q1, [x2, #:lo12:.LC0]
> > > >         mov     x1, 0
> > > >         mov     w2, 0
> > > >         .p2align 3,,7
> > > > .L3:
> > > >         ldr     q0, [x3, x1]
> > > >         add     v3.4s, v1.4s, v2.4s
> > > >         add     v1.4s, v1.4s, v4.4s
> > > >         cmhi    v0.4s, v0.4s, v2.4s
> > > >         umaxp   v0.4s, v0.4s, v0.4s
> > > >         fmov    x5, d0
> > > >         cbnz    x5, .L6
> > > >         add     w2, w2, 1
> > > >         str     q3, [x1, x4]
> > > >         str     q2, [x3, x1]
> > > >         add     x1, x1, 16
> > > >         cmp     w2, 200
> > > >         bne     .L3
> > > >         mov     w7, 3
> > > > .L2:
> > > >         lsl     w2, w2, 2
> > > >         add     x5, x3, 3216
> > > >         add     w6, w2, w0
> > > >         sxtw    x4, w2
> > > >         ldr     w1, [x3, x4, lsl 2]
> > > >         str     w6, [x5, x4, lsl 2]
> > > >         cmp     w0, w1
> > > >         bcc     .L4
> > > >         add     w1, w2, 1
> > > >         str     w0, [x3, x4, lsl 2]
> > > >         add     w6, w1, w0
> > > >         sxtw    x1, w1
> > > >         ldr     w4, [x3, x1, lsl 2]
> > > >         str     w6, [x5, x1, lsl 2]
> > > >         cmp     w0, w4
> > > >         bcc     .L4
> > > >         add     w4, w2, 2
> > > >         str     w0, [x3, x1, lsl 2]
> > > >         sxtw    x1, w4
> > > >         add     w6, w1, w0
> > > >         ldr     w4, [x3, x1, lsl 2]
> > > >         str     w6, [x5, x1, lsl 2]
> > > >         cmp     w0, w4
> > > >         bcc     .L4
> > > >         str     w0, [x3, x1, lsl 2]
> > > >         add     w2, w2, 3
> > > >         cmp     w7, 3
> > > >         beq     .L4
> > > >         sxtw    x1, w2
> > > >         add     w2, w2, w0
> > > >         ldr     w4, [x3, x1, lsl 2]
> > > >         str     w2, [x5, x1, lsl 2]
> > > >         cmp     w0, w4
> > > >         bcc     .L4
> > > >         str     w0, [x3, x1, lsl 2]
> > > > .L4:
> > > >         mov     w0, 0
> > > >         ret
> > > >         .p2align 2,,3
> > > > .L6:
> > > >         mov     w7, 4
> > > >         b       .L2
> > > >
> > > > and for SVE:
> > > >
> > > > test4:
> > > >         adrp    x2, .LANCHOR0
> > > >         add     x2, x2, :lo12:.LANCHOR0
> > > >         add     x5, x2, 3216
> > > >         mov     x3, 0
> > > >         mov     w1, 0
> > > >         cntw    x4
> > > >         mov     z1.s, w0
> > > >         index   z0.s, #0, #1
> > > >         ptrue   p1.b, all
> > > >         ptrue   p0.s, all
> > > >         .p2align 3,,7
> > > > .L3:
> > > >         ld1w    z2.s, p1/z, [x2, x3, lsl 2]
> > > >         add     z3.s, z0.s, z1.s
> > > >         cmplo   p2.s, p0/z, z1.s, z2.s
> > > >         b.any   .L2
> > > >         st1w    z3.s, p1, [x5, x3, lsl 2]
> > > >         add     w1, w1, 1
> > > >         st1w    z1.s, p1, [x2, x3, lsl 2]
> > > >         add     x3, x3, x4
> > > >         incw    z0.s
> > > >         cmp     w3, 803
> > > >         bls     .L3
> > > > .L5:
> > > >         mov     w0, 0
> > > >         ret
> > > >         .p2align 2,,3
> > > > .L2:
> > > >         cntw    x5
> > > >         mul     w1, w1, w5
> > > >         cbz     w5, .L5
> > > >         sxtw    x1, w1
> > > >         sub     w5, w5, #1
> > > >         add     x5, x5, x1
> > > >         add     x6, x2, 3216
> > > >         b       .L6
> > > >         .p2align 2,,3
> > > > .L14:
> > > >         str     w0, [x2, x1, lsl 2]
> > > >         cmp     x1, x5
> > > >         beq     .L5
> > > >         mov     x1, x4
> > > > .L6:
> > > >         ldr     w3, [x2, x1, lsl 2]
> > > >         add     w4, w0, w1
> > > >         str     w4, [x6, x1, lsl 2]
> > > >         add     x4, x1, 1
> > > >         cmp     w0, w3
> > > >         bcs     .L14
> > > >         mov     w0, 0
> > > >         ret
> > > >
> > > > On the workloads this work is based on we see between 2-3x
> > > > performance uplift using this patch.
> > > >
> > > > Follow up plan:
> > > >  - Boolean vectorization has several shortcomings.  I've filed
> > > > PR110223 with
> > > the
> > > >    bigger ones that cause vectorization to fail with this patch.
> > > >  - SLP support.  This is planned for GCC 15 as for majority of the cases
> build
> > > >    SLP itself fails.
> > >
> > > It would be nice to get at least single-lane SLP support working.  I
> > > think you need to treat the gcond as SLP root stmt and basically do
> > > discovery on the condition as to as if it were a mask generating condition.
> >
> > Hmm ok, will give it  a try.
> >
> > >
> > > Code generation would then simply schedule the gcond root instances
> > > first (that would get you the code motion automagically).
> >
> > Right, so you're saying treat the gcond's as the seed, and stores as a sink.
> > And then schedule only the instances without a gcond around such that
> > we can still vectorize in place to get the branches.  Ok, makes sense.
> >
> > >
> > > So, add a new slp_instance_kind, for example
> > > slp_inst_kind_early_break, and record the gcond as root stmt.
> > > Possibly "pattern" recognizing
> > >
> > >  gcond <_1 != _2>
> > >
> > > as
> > >
> > >  _mask = _1 != _2;
> > >  gcond <_mask != 0>
> > >
> > > makes the SLP discovery less fiddly (but in theory you can of course
> > > handle gconds directly).
> > >
> > > Is there any part of the series that can be pushed independelty?  If
> > > so I'll try to look at those parts first.
> > >
> >
> > Aside from:
> >
> > [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA
> > form for early breaks [PATCH 7/21]middle-end: update IV update code to
> > support early breaks and arbitrary exits
> >
> > The rest lie dormant and don't do anything or disrupt the tree until those
> two are in.
> > The rest all just touch up different parts piecewise.
> >
> > They do rely on the new field introduced in:
> >
> > [PATCH 3/21]middle-end: Implement code motion and dependency analysis
> > for early breaks
> >
> > But can split them out.
> >
> > I'll start respinning no #4 and #7 with your latest changes now.
> 
> OK, I'll simply go 1-n then.
> 
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > > Thanks,
> > > Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks
  2023-11-06  7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
@ 2023-11-07 10:53   ` Richard Biener
  2023-11-07 11:34     ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-07 10:53 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> When performing early break vectorization we need to be sure that the vector
> operations are safe to perform.  A simple example is e.g.
> 
>  for (int i = 0; i < N; i++)
>  {
>    vect_b[i] = x + i;
>    if (vect_a[i]*2 != x)
>      break;
>    vect_a[i] = x;
>  }
> 
> where the store to vect_b is not allowed to be executed unconditionally since
> if we exit through the early break it wouldn't have been done for the full VF
> iteration.
> 
> Effective the code motion determines:
>   - is it safe/possible to vectorize the function
>   - what updates to the VUSES should be performed if we do
>   - Which statements need to be moved
>   - Which statements can't be moved:
>     * values that are live must be reachable through all exits
>     * values that aren't single use and shared by the use/def chain of the cond
>   - The final insertion point of the instructions.  In the cases we have
>     multiple early exist statements this should be the one closest to the loop
>     latch itself.
> 
> After motion the loop above is:
> 
>  for (int i = 0; i < N; i++)
>  {
>    ... y = x + i;
>    if (vect_a[i]*2 != x)
>      break;
>    vect_b[i] = y;
>    vect_a[i] = x;
> 
>  }
> 
> The operation is split into two, during data ref analysis we determine
> validity of the operation and generate a worklist of actions to perform if we
> vectorize.
> 
> After peeling and just before statetement tranformation we replay this worklist
> which moves the statements and updates book keeping only in the main loop that's
> to be vectorized.  This includes updating of USES in exit blocks.
> 
> At the moment we don't support this for epilog nomasks since the additional
> vectorized epilog's stmt UIDs are not found.

As of UIDs note that UIDs are used for dominance checking in
vect_stmt_dominates_stmt_p and that at least is used during
transform when scheduling SLP.  Moving stmts around invalidates
this UID order (I don't see you "renumbering" UIDs).

More comments below.
 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
>
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
> 	(vect_analyze_early_break_dependences): New.
> 	(vect_analyze_data_ref_dependences): Use them.
> 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> 	early_breaks.
> 	(move_early_exit_stmts): New.
> 	(vect_transform_loop): use it/
> 	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
> 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> 	(class _loop_vec_info): Add early_breaks, early_break_conflict,
> 	early_break_vuses.
> 	(LOOP_VINFO_EARLY_BREAKS): New.
> 	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
> 	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
> 	(LOOP_VINFO_EARLY_BRK_VUSES): New.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c32b9ce7be77f3e1d60 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
>    return opt_result::success ();
>  }
>  
> +/* This function tries to validate whether an early break vectorization
> +   is possible for the current instruction sequence. Returns True i
> +   possible, otherwise False.
> +
> +   Requirements:
> +     - Any memory access must be to a fixed size buffer.
> +     - There must not be any loads and stores to the same object.
> +     - Multiple loads are allowed as long as they don't alias.
> +
> +   NOTE:
> +     This implemementation is very conservative. Any overlappig loads/stores
> +     that take place before the early break statement gets rejected aside from
> +     WAR dependencies.
> +
> +     i.e.:
> +
> +	a[i] = 8
> +	c = a[i]
> +	if (b[i])
> +	  ...
> +
> +	is not allowed, but
> +
> +	c = a[i]
> +	a[i] = 8
> +	if (b[i])
> +	  ...
> +
> +	is which is the common case.
> +
> +   Arguments:
> +     - LOOP_VINFO: loop information for the current loop.
> +     - CHAIN: Currently detected sequence of instructions that need to be moved
> +	      if we are to vectorize this early break.
> +     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from
> +	      one or more cond conditions.  If this set overlaps with CHAIN then FIXED
> +	      takes precedence.  This deals with non-single use cases.
> +     - LOADS: List of all loads found during traversal.
> +     - BASES: List of all load data references found during traversal.
> +     - GSTMT: Current position to inspect for validity.  The sequence
> +	      will be moved upwards from this point.
> +     - REACHING_VUSE: The dominating VUSE found so far.  */
> +
> +static bool
> +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree> *chain,
> +			   hash_set<tree> *fixed, vec<tree> *loads,
> +			   vec<data_reference *> *bases, tree *reaching_vuse,
> +			   gimple_stmt_iterator *gstmt)
> +{
> +  if (gsi_end_p (*gstmt))
> +    return true;
> +
> +  gimple *stmt = gsi_stmt (*gstmt);
> +  /* ?? Do I need to move debug statements? not quite sure..  */

I think we reset them.

> +  if (gimple_has_ops (stmt)
> +      && !is_gimple_debug (stmt))
> +    {
> +      tree dest = NULL_TREE;
> +      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
> +	 use the LHS, if not, assume that the first argument of a call is the
> +	 value being defined.  e.g. MASKED_LOAD etc.  */
> +      if (gimple_has_lhs (stmt))
> +	dest = gimple_get_lhs (stmt);
> +      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> +	dest = gimple_arg (call, 0);
> +      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
> +	{
> +	  /* Operands of conds are ones we can't move.  */
> +	  fixed->add (gimple_cond_lhs (cond));
> +	  fixed->add (gimple_cond_rhs (cond));
> +	}
> +
> +      bool move = false;


So this all looks a bit like spaghetti (sorry).  I think what
you want to do is perform this in two steps:

 a) mark (and check) the dependences of the early break conditions,
    aka populate 'fixed'
 b) walk stmts from the _last_ early break, verifying all earlier
    non-'fixed' stmts can be moved

for marking dependences you want to simply iterate over use
operands:

  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
    USE_FROM_PTR (use_p) then is a SSA name that's used by 'stmt',
    the SSA_NAME_DEF_STMT of it is the next stmt to visit.  Use
    a worklist with a visited set to gather all of the relevant
    stmts/defs

> +      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> +      if (!stmt_vinfo)
> +	{
> +	   if (dump_enabled_p ())
> +	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			      "early breaks not supported. Unknown"
> +			      " statement: %G", stmt);
> +	   return false;
> +	}
> +
> +      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> +      if (dr_ref)
> +	{
> +	   /* We currently only support statically allocated objects due to
> +	      not having first-faulting loads support or peeling for alignment
> +	      support.  Compute the size of the referenced object (it could be
> +	      dynamically allocated).  */
> +	   tree obj = DR_BASE_ADDRESS (dr_ref);
> +	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> +	     {
> +	       if (dump_enabled_p ())
> +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				  "early breaks only supported on statically"
> +				  " allocated objects.\n");
> +	       return false;
> +	     }
> +
> +	   tree refop = TREE_OPERAND (obj, 0);
> +	   tree refbase = get_base_address (refop);
> +	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> +	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> +	     {
> +	       if (dump_enabled_p ())
> +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				  "early breaks only supported on statically"
> +				  " allocated objects.\n");
> +	       return false;
> +	     }

Note this doesn't ensure in-bound accesses:

int a[4];

void foo ()
{
  for (unsigned int i = 0; i < 32; ++i)
   {
     if (a[i] == 0)
       break;
     /* ... */
   }
}

you'd happily load a V8SImode vector from 'a'.  If the caller
ensures a[3] == 0 the code is fine but your transformed vector
code not.  You need to check that DR_BASE_ADDRESS + DR_OFFSET
+ DR_INIT + niter * DR_STEP is within the object instead.

> +
> +	   if (DR_IS_READ (dr_ref))
> +	     {
> +		loads->safe_push (dest);
> +		bases->safe_push (dr_ref);
> +	     }
> +	   else if (DR_IS_WRITE (dr_ref))
> +	     {
> +		for (auto dr : bases)
> +		  if (same_data_refs_base_objects (dr, dr_ref))
> +		    {

that looks quadratic to me.  So what's this actually?  You've
gathered all loads after this write and now you are checking
that all those loads do not alias the write?  But
same_data_refs_base_objects is only verifying that the
two refs are suitable for classical dependence analysis,
so it's not a conservative test here.  I think you may want to
use dr_may_alias_p instead?

I'm missing some overall idea of what you are doing, like what's
the actual transform and how do you validate its validity?

It looks like you only move stores?

> +		      if (dump_enabled_p ())
> +			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +					   vect_location,
> +					   "early breaks only supported,"
> +					   " overlapping loads and stores found"
> +					   " before the break statement.\n");
> +		      return false;
> +		    }
> +		/* Any writes starts a new chain. */
> +		move = true;
> +	     }
> +	}
> +
> +      /* If a statement is live and escapes the loop through usage in the loop
> +	 epilogue then we can't move it since we need to maintain its
> +	 reachability through all exits.  */
> +      bool skip = false;
> +      if (STMT_VINFO_LIVE_P (stmt_vinfo)
> +	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
> +	{
> +	  imm_use_iterator imm_iter;
> +	  use_operand_p use_p;
> +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
> +	    {
> +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> +	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +	      if (skip)
> +		break;
> +	    }
> +	}
> +
> +      /* If we found the defining statement of a something that's part of the
> +	 chain then expand the chain with the new SSA_VARs being used.  */
> +      if (!skip && (chain->contains (dest) || move))
> +	{
> +	  move = true;
> +	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
> +	    {
> +	      tree var = gimple_arg (stmt, x);
> +	      if (TREE_CODE (var) == SSA_NAME)
> +		{
> +		  if (fixed->contains (dest))
> +		    {
> +		      move = false;
> +		      fixed->add (var);
> +		    }
> +		  else
> +		    chain->add (var);
> +		}
> +	      else
> +		{
> +		  use_operand_p use_p;
> +		  ssa_op_iter iter;
> +		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
> +		    {
> +		      tree op = USE_FROM_PTR (use_p);
> +		      gcc_assert (TREE_CODE (op) == SSA_NAME);
> +		      if (fixed->contains (dest))
> +			{
> +			  move = false;
> +			  fixed->add (op);
> +			}
> +		      else
> +			chain->add (op);
> +		    }
> +		}
> +	    }
> +
> +	  if (dump_enabled_p ())
> +	    {
> +	      if (move)
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				"found chain %G", stmt);
> +	      else
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				"ignored chain %G, not single use", stmt);
> +	    }
> +	}
> +
> +      if (move)
> +	{
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_NOTE, vect_location,
> +			     "==> recording stmt %G", stmt);
> +
> +	  for (tree ref : loads)
> +	    if (stmt_may_clobber_ref_p (stmt, ref, true))
> +	      {
> +	        if (dump_enabled_p ())
> +		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				   "early breaks not supported as memory used"
> +				   " may alias.\n");
> +	        return false;
> +	      }

Here you check aliasing again?!

I think it might be conceptually easier (and stronger) to instead
think of the 'fixed' set (and the gconds) to be moved earlier instead
of the stores to be sunk.

For example I fail to see how you check for, say

   for (..)
    {
      tem = a[i] / b[i];
      if (c[i]) break;
      d[i] = tem;
    }

where the division might trap.  For this the validation wouldn't
identify anything to move, right?

I'll note that doing the actual movement will be easier with SLP
and it would be a possibility to implement early break with just
SLP support - as we need to start discovery from the gconds
explicitly anyway there's no problem forcing a single-lane SLP
discovery there.

> +
> +	  /* If we've moved a VDEF, extract the defining MEM and update
> +	     usages of it.   */
> +	  tree vdef;
> +	  if ((vdef = gimple_vdef (stmt)))
> +	    {
> +	      /* This statement is to be moved.  */
> +	      LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt);
> +	      *reaching_vuse = gimple_vuse (stmt);
> +	    }
> +	}
> +    }
> +
> +  gsi_prev (gstmt);
> +
> +  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
> +				  reaching_vuse, gstmt))
> +    return false;

Please use a loop instead of recursion.  I suggest to do the loop at the 
single caller.

> +  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
> +    {
> +      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt);
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "marked statement for vUSE update: %G", stmt);
> +    }
> +
> +  return true;
> +}
> +
> +/* Funcion vect_analyze_early_break_dependences.
> +
> +   Examime all the data references in the loop and make sure that if we have
> +   mulitple exits that we are able to safely move stores such that they become
> +   safe for vectorization.  The function also calculates the place where to move
> +   the instructions to and computes what the new vUSE chain should be.
> +
> +   This works in tandem with the CFG that will be produced by
> +   slpeel_tree_duplicate_loop_to_edge_cfg later on.  */
> +
> +static opt_result
> +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> +{
> +  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
> +
> +  hash_set<tree> chain, fixed;
> +  auto_vec<tree> loads;
> +  auto_vec<data_reference *> bases;
> +  basic_block dest_bb = NULL;
> +  tree vuse = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "loop contains multiple exits, analyzing"
> +		     " statement dependencies.\n");
> +
> +  for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> +    {
> +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> +	continue;
> +
> +      gimple *stmt = STMT_VINFO_STMT (loop_cond_info);

isn't that 'c' already?

> +      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> +
> +      /* Initiaze the vuse chain with the one at the early break.  */
> +      if (!vuse)
> +	vuse = gimple_vuse (c);

gconds do not have virtual operands.

> +
> +      if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
> +				     &bases, &vuse, &gsi))
> +	return opt_result::failure_at (stmt,
> +				       "can't safely apply code motion to "
> +				       "dependencies of %G to vectorize "
> +				       "the early exit.\n", stmt);
> +
> +      /* Save destination as we go, BB are visited in order and the last one
> +	is where statements should be moved to.  */
> +      if (!dest_bb)
> +	dest_bb = gimple_bb (c);
> +      else
> +	{
> +	  basic_block curr_bb = gimple_bb (c);
> +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> +	    dest_bb = curr_bb;
> +	}
> +    }
> +
> +  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;

no edge is the fallthru edge out of a condition, so this always
selects EDGE_SUCC (dest_bb, 1) which cannot be correct (well,
guess you're lucky).  I think you instead want

  dest_bb = EDGE_SUCC (dest_bb, 0)->dest->loop_father == 
dest_bb->loop_father ? EDGE_SUCC (dest_bb, 0)->dest : EDGE_SUCC (dest_bb, 
1)->dest;

more nicely written, of course.

> +  gcc_assert (dest_bb);
> +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;

Sorting the vector of early breaks as we gather them might be nicer
than this - you'd then simply use the first or last.

> +
> +  /* TODO: Remove? It's useful debug statement but may be too much.  */
> +  for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location,
> +			 "updated use: %T, mem_ref: %G",
> +			 vuse, g);
> +    }
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "recorded statements to be moved to BB %d\n",
> +		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
> +
> +  return opt_result::success ();
> +}
> +
>  /* Function vect_analyze_data_ref_dependences.
>  
>     Examine all the data references in the loop, and make sure there do not
> @@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
>  	  return res;
>        }
>  
> +  /* If we have early break statements in the loop, check to see if they
> +     are of a form we can vectorizer.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    return vect_analyze_early_break_dependences (loop_vinfo);
> +
>    return opt_result::success ();
>  }
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a2079c5234033c3d825ea 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>      partial_load_store_bias (0),
>      peeling_for_gaps (false),
>      peeling_for_niter (false),
> +    early_breaks (false),
>      no_data_dependencies (false),
>      has_mask_store (false),
>      scalar_loop_scaling (profile_probability::uninitialized ()),
> @@ -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
>    epilogue_vinfo->shared->save_datarefs ();
>  }
>  
> +/*  When vectorizing early break statements instructions that happen before
> +    the early break in the current BB need to be moved to after the early
> +    break.  This function deals with that and assumes that any validity
> +    checks has already been performed.
> +
> +    While moving the instructions if it encounters a VUSE or VDEF it then
> +    corrects the VUSES as it moves the statements along.  GDEST is the location
> +    in which to insert the new statements.  */
> +
> +static void
> +move_early_exit_stmts (loop_vec_info loop_vinfo)
> +{
> +  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
> +    return;
> +
> +  /* Move all stmts that need moving.  */
> +  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);

I suppose dest_bb is the in-loop block following the last early
exit?  I suppose we do not support an "early" exit after the
main IV exit, right?  Instead we'd require loop rotation?

> +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> +
> +  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
> +    {
> +      /* Check to see if statement is still required for vect or has been
> +	 elided.  */
> +      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
> +      if (!stmt_info)
> +	continue;
> +
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
> +
> +      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
> +      gsi_move_before (&stmt_gsi, &dest_gsi);
> +      gsi_prev (&dest_gsi);
> +      update_stmt (stmt);

You shouldn't need to update_stmt here I think.

> +    }
> +
> +  /* Update all the stmts with their new reaching VUSES.  */
> +  tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ());
> +  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "updating vuse to %T for stmt %G", vuse, p);
> +      unlink_stmt_vdef (p);

it's odd to first move the stmts and then propagate out their defs
(which you forget to release?)

> +      gimple_set_vuse (p, vuse);

and now every store gets the same vuse?  I'm quite sure you'll end
up with broken virtual SSA form here.

> +      update_stmt (p);
> +    }
> +}
> +
>  /* Function vect_transform_loop.
>  
>     The analysis phase has determined that the loop is vectorizable.
> @@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>        vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
>      }
>  
> +  /* Handle any code motion that we need to for early-break vectorization after
> +     we've done peeling but just before we start vectorizing.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    move_early_exit_stmts (loop_vinfo);
> +
>    /* FORNOW: the vectorizer supports only loops which body consist
>       of one basic block (header + empty latch). When the vectorizer will
>       support more involved loop forms, the order by which the BBs are
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a91cb8e74c0557e75d1ea2c 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
>  	case vect_first_order_recurrence:
>  	  dump_printf (MSG_NOTE, "first order recurrence\n");
>  	  break;
> +       case vect_early_exit_def:
> +	  dump_printf (MSG_NOTE, "early exit\n");
> +	  break;
>  	case vect_unknown_def_type:
>  	  dump_printf (MSG_NOTE, "unknown\n");
>  	  break;
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf78352e29dc9958746fb9c94 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -66,6 +66,7 @@ enum vect_def_type {
>    vect_double_reduction_def,
>    vect_nested_cycle,
>    vect_first_order_recurrence,
> +  vect_early_exit_def,
>    vect_unknown_def_type
>  };
>  
> @@ -888,6 +889,10 @@ public:
>       we need to peel off iterations at the end to form an epilogue loop.  */
>    bool peeling_for_niter;
>  
> +  /* When the loop has early breaks that we can vectorize we need to peel
> +     the loop for the break finding loop.  */
> +  bool early_breaks;
> +
>    /* List of loop additional IV conditionals found in the loop.  */
>    auto_vec<gcond *> conds;
>  
> @@ -942,6 +947,20 @@ public:
>    /* The controlling loop IV for the scalar loop being vectorized.  This IV
>       controls the natural exits of the loop.  */
>    edge scalar_loop_iv_exit;
> +
> +  /* Used to store the list of statements needing to be moved if doing early
> +     break vectorization as they would violate the scalar loop semantics if
> +     vectorized in their current location.  These are stored in order that they need
> +     to be moved.  */
> +  auto_vec<gimple *> early_break_conflict;
> +
> +  /* The final basic block where to move statements to.  In the case of
> +     multiple exits this could be pretty far away.  */
> +  basic_block early_break_dest_bb;
> +
> +  /* Statements whose VUSES need updating if early break vectorization is to
> +     happen.  */
> +  auto_vec<gimple*> early_break_vuses;
>  } *loop_vec_info;
>  
>  /* Access Functions.  */
> @@ -996,6 +1015,10 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
> +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> +#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
>  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
>  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks
  2023-11-07 10:53   ` Richard Biener
@ 2023-11-07 11:34     ` Tamar Christina
  2023-11-07 14:23       ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-07 11:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, November 7, 2023 10:53 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: Re: [PATCH 3/21]middle-end: Implement code motion and
> dependency analysis for early breaks
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > When performing early break vectorization we need to be sure that the
> > vector operations are safe to perform.  A simple example is e.g.
> >
> >  for (int i = 0; i < N; i++)
> >  {
> >    vect_b[i] = x + i;
> >    if (vect_a[i]*2 != x)
> >      break;
> >    vect_a[i] = x;
> >  }
> >
> > where the store to vect_b is not allowed to be executed
> > unconditionally since if we exit through the early break it wouldn't
> > have been done for the full VF iteration.
> >
> > Effective the code motion determines:
> >   - is it safe/possible to vectorize the function
> >   - what updates to the VUSES should be performed if we do
> >   - Which statements need to be moved
> >   - Which statements can't be moved:
> >     * values that are live must be reachable through all exits
> >     * values that aren't single use and shared by the use/def chain of the cond
> >   - The final insertion point of the instructions.  In the cases we have
> >     multiple early exist statements this should be the one closest to the loop
> >     latch itself.
> >
> > After motion the loop above is:
> >
> >  for (int i = 0; i < N; i++)
> >  {
> >    ... y = x + i;
> >    if (vect_a[i]*2 != x)
> >      break;
> >    vect_b[i] = y;
> >    vect_a[i] = x;
> >
> >  }
> >
> > The operation is split into two, during data ref analysis we determine
> > validity of the operation and generate a worklist of actions to
> > perform if we vectorize.
> >
> > After peeling and just before statetement tranformation we replay this
> > worklist which moves the statements and updates book keeping only in
> > the main loop that's to be vectorized.  This includes updating of USES in exit
> blocks.
> >
> > At the moment we don't support this for epilog nomasks since the
> > additional vectorized epilog's stmt UIDs are not found.
> 
> As of UIDs note that UIDs are used for dominance checking in
> vect_stmt_dominates_stmt_p and that at least is used during transform when
> scheduling SLP.  Moving stmts around invalidates this UID order (I don't see
> you "renumbering" UIDs).
> 

Just some responses to questions while I process the rest.

I see, yeah I didn't encounter it because I punted SLP support.  As you said for SLP
We indeed don't need this.

> More comments below.
> 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
> > 	(vect_analyze_early_break_dependences): New.
> > 	(vect_analyze_data_ref_dependences): Use them.
> > 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > 	early_breaks.
> > 	(move_early_exit_stmts): New.
> > 	(vect_transform_loop): use it/
> > 	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
> > 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > 	(class _loop_vec_info): Add early_breaks, early_break_conflict,
> > 	early_break_vuses.
> > 	(LOOP_VINFO_EARLY_BREAKS): New.
> > 	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
> > 	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
> > 	(LOOP_VINFO_EARLY_BRK_VUSES): New.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > index
> >
> d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c
> 32b9ce
> > 7be77f3e1d60 100644
> > --- a/gcc/tree-vect-data-refs.cc
> > +++ b/gcc/tree-vect-data-refs.cc
> > @@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct
> data_dependence_relation *ddr,
> >    return opt_result::success ();
> >  }
> >
> > +/* This function tries to validate whether an early break vectorization
> > +   is possible for the current instruction sequence. Returns True i
> > +   possible, otherwise False.
> > +
> > +   Requirements:
> > +     - Any memory access must be to a fixed size buffer.
> > +     - There must not be any loads and stores to the same object.
> > +     - Multiple loads are allowed as long as they don't alias.
> > +
> > +   NOTE:
> > +     This implemementation is very conservative. Any overlappig loads/stores
> > +     that take place before the early break statement gets rejected aside from
> > +     WAR dependencies.
> > +
> > +     i.e.:
> > +
> > +	a[i] = 8
> > +	c = a[i]
> > +	if (b[i])
> > +	  ...
> > +
> > +	is not allowed, but
> > +
> > +	c = a[i]
> > +	a[i] = 8
> > +	if (b[i])
> > +	  ...
> > +
> > +	is which is the common case.
> > +
> > +   Arguments:
> > +     - LOOP_VINFO: loop information for the current loop.
> > +     - CHAIN: Currently detected sequence of instructions that need to be
> moved
> > +	      if we are to vectorize this early break.
> > +     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are
> reachable from
> > +	      one or more cond conditions.  If this set overlaps with CHAIN then
> FIXED
> > +	      takes precedence.  This deals with non-single use cases.
> > +     - LOADS: List of all loads found during traversal.
> > +     - BASES: List of all load data references found during traversal.
> > +     - GSTMT: Current position to inspect for validity.  The sequence
> > +	      will be moved upwards from this point.
> > +     - REACHING_VUSE: The dominating VUSE found so far.  */
> > +
> > +static bool
> > +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree>
> *chain,
> > +			   hash_set<tree> *fixed, vec<tree> *loads,
> > +			   vec<data_reference *> *bases, tree *reaching_vuse,
> > +			   gimple_stmt_iterator *gstmt)
> > +{
> > +  if (gsi_end_p (*gstmt))
> > +    return true;
> > +
> > +  gimple *stmt = gsi_stmt (*gstmt);
> > +  /* ?? Do I need to move debug statements? not quite sure..  */
> 
> I think we reset them.
> 
> > +  if (gimple_has_ops (stmt)
> > +      && !is_gimple_debug (stmt))
> > +    {
> > +      tree dest = NULL_TREE;
> > +      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
> > +	 use the LHS, if not, assume that the first argument of a call is the
> > +	 value being defined.  e.g. MASKED_LOAD etc.  */
> > +      if (gimple_has_lhs (stmt))
> > +	dest = gimple_get_lhs (stmt);
> > +      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > +	dest = gimple_arg (call, 0);
> > +      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
> > +	{
> > +	  /* Operands of conds are ones we can't move.  */
> > +	  fixed->add (gimple_cond_lhs (cond));
> > +	  fixed->add (gimple_cond_rhs (cond));
> > +	}
> > +
> > +      bool move = false;
> 
> 
> So this all looks a bit like spaghetti (sorry).  I think what you want to do is
> perform this in two steps:
> 
>  a) mark (and check) the dependences of the early break conditions,
>     aka populate 'fixed'
>  b) walk stmts from the _last_ early break, verifying all earlier
>     non-'fixed' stmts can be moved
> 
> for marking dependences you want to simply iterate over use
> operands:
> 
>   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
>     USE_FROM_PTR (use_p) then is a SSA name that's used by 'stmt',
>     the SSA_NAME_DEF_STMT of it is the next stmt to visit.  Use
>     a worklist with a visited set to gather all of the relevant
>     stmts/defs
> 
> > +      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > +      if (!stmt_vinfo)
> > +	{
> > +	   if (dump_enabled_p ())
> > +	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			      "early breaks not supported. Unknown"
> > +			      " statement: %G", stmt);
> > +	   return false;
> > +	}
> > +
> > +      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > +      if (dr_ref)
> > +	{
> > +	   /* We currently only support statically allocated objects due to
> > +	      not having first-faulting loads support or peeling for alignment
> > +	      support.  Compute the size of the referenced object (it could be
> > +	      dynamically allocated).  */
> > +	   tree obj = DR_BASE_ADDRESS (dr_ref);
> > +	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> > +	     {
> > +	       if (dump_enabled_p ())
> > +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> vect_location,
> > +				  "early breaks only supported on statically"
> > +				  " allocated objects.\n");
> > +	       return false;
> > +	     }
> > +
> > +	   tree refop = TREE_OPERAND (obj, 0);
> > +	   tree refbase = get_base_address (refop);
> > +	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> > +	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> > +	     {
> > +	       if (dump_enabled_p ())
> > +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> vect_location,
> > +				  "early breaks only supported on statically"
> > +				  " allocated objects.\n");
> > +	       return false;
> > +	     }
> 
> Note this doesn't ensure in-bound accesses:
> 
> int a[4];
> 
> void foo ()
> {
>   for (unsigned int i = 0; i < 32; ++i)
>    {
>      if (a[i] == 0)
>        break;
>      /* ... */
>    }
> }
> 
> you'd happily load a V8SImode vector from 'a'.  If the caller ensures a[3] == 0
> the code is fine but your transformed vector code not.  You need to check that
> DR_BASE_ADDRESS + DR_OFFSET
> + DR_INIT + niter * DR_STEP is within the object instead.
> 
> > +
> > +	   if (DR_IS_READ (dr_ref))
> > +	     {
> > +		loads->safe_push (dest);
> > +		bases->safe_push (dr_ref);
> > +	     }
> > +	   else if (DR_IS_WRITE (dr_ref))
> > +	     {
> > +		for (auto dr : bases)
> > +		  if (same_data_refs_base_objects (dr, dr_ref))
> > +		    {
> 
> that looks quadratic to me.  So what's this actually?  You've gathered all loads
> after this write and now you are checking that all those loads do not alias the
> write?  But same_data_refs_base_objects is only verifying that the two refs are
> suitable for classical dependence analysis, so it's not a conservative test here.  I
> think you may want to use dr_may_alias_p instead?
> 
> I'm missing some overall idea of what you are doing, like what's the actual
> transform and how do you validate its validity?
> 

So the basic idea is that we should move everything with a side effect to after all
the early exits.  I reasoned that most things with side effects would either block
vectorization entirely or are stores.  This is why it essentially just looks at stores
and the statements that create them.

> It looks like you only move stores?

Yeah, though an earlier version of the patch also moved, if possible the statements
creating the value for the stores.  And I think I'll have to go back to that version again.

The reason is that with this new BB layout and how we "layer" the BB for the early exits
and main exit it seems like sched1 no longer is able to schedule instructions over the EBBS.

This leads to us extending the live ranges for the statements creating the values and causing
reload to having to copy the values in some cases.

So 

x = a + I;
y[i] = x;
If (..) { }

and moving the store alone can end up making reload copy the value of x.   To fix this I should
probably move x as well.  This code is also checking if that's possible, since you can't move x if it's
used by something that can't be moved. Say, if the condition was `if (b[i] > x)`.

> 
> > +		      if (dump_enabled_p ())
> > +			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > +					   vect_location,
> > +					   "early breaks only supported,"
> > +					   " overlapping loads and stores
> found"
> > +					   " before the break statement.\n");
> > +		      return false;
> > +		    }
> > +		/* Any writes starts a new chain. */
> > +		move = true;
> > +	     }
> > +	}
> > +
> > +      /* If a statement is live and escapes the loop through usage in the loop
> > +	 epilogue then we can't move it since we need to maintain its
> > +	 reachability through all exits.  */
> > +      bool skip = false;
> > +      if (STMT_VINFO_LIVE_P (stmt_vinfo)
> > +	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
> > +	{
> > +	  imm_use_iterator imm_iter;
> > +	  use_operand_p use_p;
> > +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
> > +	    {
> > +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> > +	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > +	      if (skip)
> > +		break;
> > +	    }
> > +	}
> > +
> > +      /* If we found the defining statement of a something that's part of the
> > +	 chain then expand the chain with the new SSA_VARs being used.  */
> > +      if (!skip && (chain->contains (dest) || move))
> > +	{
> > +	  move = true;
> > +	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
> > +	    {
> > +	      tree var = gimple_arg (stmt, x);
> > +	      if (TREE_CODE (var) == SSA_NAME)
> > +		{
> > +		  if (fixed->contains (dest))
> > +		    {
> > +		      move = false;
> > +		      fixed->add (var);
> > +		    }
> > +		  else
> > +		    chain->add (var);
> > +		}
> > +	      else
> > +		{
> > +		  use_operand_p use_p;
> > +		  ssa_op_iter iter;
> > +		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter,
> SSA_OP_USE)
> > +		    {
> > +		      tree op = USE_FROM_PTR (use_p);
> > +		      gcc_assert (TREE_CODE (op) == SSA_NAME);
> > +		      if (fixed->contains (dest))
> > +			{
> > +			  move = false;
> > +			  fixed->add (op);
> > +			}
> > +		      else
> > +			chain->add (op);
> > +		    }
> > +		}
> > +	    }
> > +
> > +	  if (dump_enabled_p ())
> > +	    {
> > +	      if (move)
> > +		dump_printf_loc (MSG_NOTE, vect_location,
> > +				"found chain %G", stmt);
> > +	      else
> > +		dump_printf_loc (MSG_NOTE, vect_location,
> > +				"ignored chain %G, not single use", stmt);
> > +	    }
> > +	}
> > +
> > +      if (move)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > +			     "==> recording stmt %G", stmt);
> > +
> > +	  for (tree ref : loads)
> > +	    if (stmt_may_clobber_ref_p (stmt, ref, true))
> > +	      {
> > +	        if (dump_enabled_p ())
> > +		  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> vect_location,
> > +				   "early breaks not supported as memory
> used"
> > +				   " may alias.\n");
> > +	        return false;
> > +	      }
> 
> Here you check aliasing again?!
> 
> I think it might be conceptually easier (and stronger) to instead think of the
> 'fixed' set (and the gconds) to be moved earlier instead of the stores to be
> sunk.
> 
> For example I fail to see how you check for, say
> 
>    for (..)
>     {
>       tem = a[i] / b[i];
>       if (c[i]) break;
>       d[i] = tem;
>     }
> 
> where the division might trap.  For this the validation wouldn't identify
> anything to move, right?
> 

Hmm yes I ignored it because I figured we wouldn't vectorize anyway with -ftrapping-math?
I guess I should call gimple_has_side_effects on the stmt but figured we wouldn't get here.

> I'll note that doing the actual movement will be easier with SLP and it would be
> a possibility to implement early break with just SLP support - as we need to
> start discovery from the gconds explicitly anyway there's no problem forcing a
> single-lane SLP discovery there.
> 

Possibly, but I think we'd still have problem with figuring out what to do with the live
range issue.  I guess long term the issue should be fixed in sched1?

> > +
> > +	  /* If we've moved a VDEF, extract the defining MEM and update
> > +	     usages of it.   */
> > +	  tree vdef;
> > +	  if ((vdef = gimple_vdef (stmt)))
> > +	    {
> > +	      /* This statement is to be moved.  */
> > +	      LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> (loop_vinfo).safe_push (stmt);
> > +	      *reaching_vuse = gimple_vuse (stmt);
> > +	    }
> > +	}
> > +    }
> > +
> > +  gsi_prev (gstmt);
> > +
> > +  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
> > +				  reaching_vuse, gstmt))
> > +    return false;
> 
> Please use a loop instead of recursion.  I suggest to do the loop at the single
> caller.
> 
> > +  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
> > +    {
> > +      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt);
> > +      if (dump_enabled_p ())
> > +	  dump_printf_loc (MSG_NOTE, vect_location,
> > +			   "marked statement for vUSE update: %G", stmt);
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +/* Funcion vect_analyze_early_break_dependences.
> > +
> > +   Examime all the data references in the loop and make sure that if we have
> > +   mulitple exits that we are able to safely move stores such that they
> become
> > +   safe for vectorization.  The function also calculates the place where to
> move
> > +   the instructions to and computes what the new vUSE chain should be.
> > +
> > +   This works in tandem with the CFG that will be produced by
> > +   slpeel_tree_duplicate_loop_to_edge_cfg later on.  */
> > +
> > +static opt_result
> > +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo) {
> > +  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
> > +
> > +  hash_set<tree> chain, fixed;
> > +  auto_vec<tree> loads;
> > +  auto_vec<data_reference *> bases;
> > +  basic_block dest_bb = NULL;
> > +  tree vuse = NULL;
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location,
> > +		     "loop contains multiple exits, analyzing"
> > +		     " statement dependencies.\n");
> > +
> > +  for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> > +    {
> > +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> > +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> > +	continue;
> > +
> > +      gimple *stmt = STMT_VINFO_STMT (loop_cond_info);
> 
> isn't that 'c' already?
> 
> > +      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> > +
> > +      /* Initiaze the vuse chain with the one at the early break.  */
> > +      if (!vuse)
> > +	vuse = gimple_vuse (c);
> 
> gconds do not have virtual operands.
> 
> > +
> > +      if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
> > +				     &bases, &vuse, &gsi))
> > +	return opt_result::failure_at (stmt,
> > +				       "can't safely apply code motion to "
> > +				       "dependencies of %G to vectorize "
> > +				       "the early exit.\n", stmt);
> > +
> > +      /* Save destination as we go, BB are visited in order and the last one
> > +	is where statements should be moved to.  */
> > +      if (!dest_bb)
> > +	dest_bb = gimple_bb (c);
> > +      else
> > +	{
> > +	  basic_block curr_bb = gimple_bb (c);
> > +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> > +	    dest_bb = curr_bb;
> > +	}
> > +    }
> > +
> > +  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
> 
> no edge is the fallthru edge out of a condition, so this always selects
> EDGE_SUCC (dest_bb, 1) which cannot be correct (well, guess you're lucky).  I
> think you instead want
> 
>   dest_bb = EDGE_SUCC (dest_bb, 0)->dest->loop_father == dest_bb-
> >loop_father ? EDGE_SUCC (dest_bb, 0)->dest : EDGE_SUCC (dest_bb, 1)-
> >dest;
> 
> more nicely written, of course.
> 
> > +  gcc_assert (dest_bb);
> > +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> 
> Sorting the vector of early breaks as we gather them might be nicer than this -
> you'd then simply use the first or last.
> 
> > +
> > +  /* TODO: Remove? It's useful debug statement but may be too much.
> > +*/
> > +  for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> > +    {
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location,
> > +			 "updated use: %T, mem_ref: %G",
> > +			 vuse, g);
> > +    }
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location,
> > +		     "recorded statements to be moved to BB %d\n",
> > +		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
> > +
> > +  return opt_result::success ();
> > +}
> > +
> >  /* Function vect_analyze_data_ref_dependences.
> >
> >     Examine all the data references in the loop, and make sure there
> > do not @@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences
> (loop_vec_info loop_vinfo,
> >  	  return res;
> >        }
> >
> > +  /* If we have early break statements in the loop, check to see if they
> > +     are of a form we can vectorizer.  */  if
> > + (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +    return vect_analyze_early_break_dependences (loop_vinfo);
> > +
> >    return opt_result::success ();
> >  }
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> >
> 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a
> 2079c523
> > 4033c3d825ea 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop
> *loop_in, vec_info_shared *shared)
> >      partial_load_store_bias (0),
> >      peeling_for_gaps (false),
> >      peeling_for_niter (false),
> > +    early_breaks (false),
> >      no_data_dependencies (false),
> >      has_mask_store (false),
> >      scalar_loop_scaling (profile_probability::uninitialized ()), @@
> > -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop
> *epilogue, tree advance)
> >    epilogue_vinfo->shared->save_datarefs ();  }
> >
> > +/*  When vectorizing early break statements instructions that happen
> before
> > +    the early break in the current BB need to be moved to after the early
> > +    break.  This function deals with that and assumes that any validity
> > +    checks has already been performed.
> > +
> > +    While moving the instructions if it encounters a VUSE or VDEF it then
> > +    corrects the VUSES as it moves the statements along.  GDEST is the
> location
> > +    in which to insert the new statements.  */
> > +
> > +static void
> > +move_early_exit_stmts (loop_vec_info loop_vinfo) {
> > +  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
> > +    return;
> > +
> > +  /* Move all stmts that need moving.  */  basic_block dest_bb =
> > + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
> 
> I suppose dest_bb is the in-loop block following the last early exit?  I suppose
> we do not support an "early" exit after the main IV exit, right?  Instead we'd
> require loop rotation?

Indeed, this is also keeping in mind that when we add general control flow
we don't want to move it past the control flow inside the loop.  This would
extend the live ranges too much.

> 
> > +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> > +
> > +  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> (loop_vinfo))
> > +    {
> > +      /* Check to see if statement is still required for vect or has been
> > +	 elided.  */
> > +      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
> > +      if (!stmt_info)
> > +	continue;
> > +
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G",
> stmt);
> > +
> > +      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
> > +      gsi_move_before (&stmt_gsi, &dest_gsi);
> > +      gsi_prev (&dest_gsi);
> > +      update_stmt (stmt);
> 
> You shouldn't need to update_stmt here I think.
> 
> > +    }
> > +
> > +  /* Update all the stmts with their new reaching VUSES.  */
> > +  tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> > +(loop_vinfo).last ());
> > +  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> > +    {
> > +      if (dump_enabled_p ())
> > +	  dump_printf_loc (MSG_NOTE, vect_location,
> > +			   "updating vuse to %T for stmt %G", vuse, p);
> > +      unlink_stmt_vdef (p);
> 
> it's odd to first move the stmts and then propagate out their defs (which you
> forget to release?)
> 
> > +      gimple_set_vuse (p, vuse);
> 
> and now every store gets the same vuse?  I'm quite sure you'll end up with
> broken virtual SSA form here.
> 
No, not every store, but every load.   Since we've moved everything that can
introduce a new vDEF, then all the use of memory before the last early exit
must be using the same vUSE.  The loop never has to update stores because
they are moved in order.

Regards,
Tamar

> > +      update_stmt (p);
> > +    }
> > +}
> > +
> >  /* Function vect_transform_loop.
> >
> >     The analysis phase has determined that the loop is vectorizable.
> > @@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info
> loop_vinfo, gimple *loop_vectorized_call)
> >        vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES
> (loop_vinfo));
> >      }
> >
> > +  /* Handle any code motion that we need to for early-break vectorization
> after
> > +     we've done peeling but just before we start vectorizing.  */  if
> > + (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +    move_early_exit_stmts (loop_vinfo);
> > +
> >    /* FORNOW: the vectorizer supports only loops which body consist
> >       of one basic block (header + empty latch). When the vectorizer will
> >       support more involved loop forms, the order by which the BBs are
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> >
> 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a9
> 1cb8e74c0
> > 557e75d1ea2c 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info
> *vinfo, enum vect_def_type *dt,
> >  	case vect_first_order_recurrence:
> >  	  dump_printf (MSG_NOTE, "first order recurrence\n");
> >  	  break;
> > +       case vect_early_exit_def:
> > +	  dump_printf (MSG_NOTE, "early exit\n");
> > +	  break;
> >  	case vect_unknown_def_type:
> >  	  dump_printf (MSG_NOTE, "unknown\n");
> >  	  break;
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> >
> a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf7835
> 2e29dc9
> > 958746fb9c94 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -66,6 +66,7 @@ enum vect_def_type {
> >    vect_double_reduction_def,
> >    vect_nested_cycle,
> >    vect_first_order_recurrence,
> > +  vect_early_exit_def,
> >    vect_unknown_def_type
> >  };
> >
> > @@ -888,6 +889,10 @@ public:
> >       we need to peel off iterations at the end to form an epilogue loop.  */
> >    bool peeling_for_niter;
> >
> > +  /* When the loop has early breaks that we can vectorize we need to peel
> > +     the loop for the break finding loop.  */  bool early_breaks;
> > +
> >    /* List of loop additional IV conditionals found in the loop.  */
> >    auto_vec<gcond *> conds;
> >
> > @@ -942,6 +947,20 @@ public:
> >    /* The controlling loop IV for the scalar loop being vectorized.  This IV
> >       controls the natural exits of the loop.  */
> >    edge scalar_loop_iv_exit;
> > +
> > +  /* Used to store the list of statements needing to be moved if doing early
> > +     break vectorization as they would violate the scalar loop semantics if
> > +     vectorized in their current location.  These are stored in order that they
> need
> > +     to be moved.  */
> > +  auto_vec<gimple *> early_break_conflict;
> > +
> > +  /* The final basic block where to move statements to.  In the case of
> > +     multiple exits this could be pretty far away.  */  basic_block
> > + early_break_dest_bb;
> > +
> > +  /* Statements whose VUSES need updating if early break vectorization is
> to
> > +     happen.  */
> > +  auto_vec<gimple*> early_break_vuses;
> >  } *loop_vec_info;
> >
> >  /* Access Functions.  */
> > @@ -996,6 +1015,10 @@ public:
> >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> >early_break_conflict
> > +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> > +#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> >no_data_dependencies
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization
  2023-11-07 10:47       ` Tamar Christina
@ 2023-11-07 13:58         ` Richard Biener
  2023-11-27 18:30           ` Richard Sandiford
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-07 13:58 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd

On Tue, 7 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Tuesday, November 7, 2023 9:43 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: RE: [PATCH v6 0/21]middle-end: Support early break/return auto-
> > vectorization
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Monday, November 6, 2023 2:25 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return
> > > > auto- vectorization
> > > >
> > > > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch adds initial support for early break vectorization in GCC.
> > > > > The support is added for any target that implements a vector
> > > > > cbranch optab, this includes both fully masked and non-masked targets.
> > > > >
> > > > > Depending on the operation, the vectorizer may also require
> > > > > support for boolean mask reductions using Inclusive OR.  This is
> > > > > however only checked then the comparison would produce multiple
> > statements.
> > > > >
> > > > > Note: I am currently struggling to get patch 7 correct in all
> > > > > cases and could
> > > > use
> > > > >       some feedback there.
> > > > >
> > > > > Concretely the kind of loops supported are of the forms:
> > > > >
> > > > >  for (int i = 0; i < N; i++)
> > > > >  {
> > > > >    <statements1>
> > > > >    if (<condition>)
> > > > >      {
> > > > >        ...
> > > > >        <action>;
> > > > >      }
> > > > >    <statements2>
> > > > >  }
> > > > >
> > > > > where <action> can be:
> > > > >  - break
> > > > >  - return
> > > > >  - goto
> > > > >
> > > > > Any number of statements can be used before the <action> occurs.
> > > > >
> > > > > Since this is an initial version for GCC 14 it has the following
> > > > > limitations and
> > > > > features:
> > > > >
> > > > > - Only fixed sized iterations and buffers are supported.  That is to say any
> > > > >   vectors loaded or stored must be to statically allocated arrays with
> > known
> > > > >   sizes. N must also be known.  This limitation is because our primary
> > target
> > > > >   for this optimization is SVE.  For VLA SVE we can't easily do cross page
> > > > >   iteraion checks. The result is likely to also not be beneficial. For that
> > > > >   reason we punt support for variable buffers till we have First-Faulting
> > > > >   support in GCC.
> > 
> > Btw, for this I wonder if you thought about marking memory accesses required
> > for the early break condition as required to be vector-size aligned, thus peeling
> > or versioning them for alignment?  That should ensure they do not fault.
> > 
> > OTOH I somehow remember prologue peeling isn't supported for early break
> > vectorization?  ..
> > 
> > > > > - any stores in <statements1> should not be to the same objects as in
> > > > >   <condition>.  Loads are fine as long as they don't have the possibility to
> > > > >   alias.  More concretely, we block RAW dependencies when the
> > > > > intermediate
> > > > value
> > > > >   can't be separated fromt the store, or the store itself can't be moved.
> > > > > - Prologue peeling, alignment peelinig and loop versioning are supported.
> > 
> > .. but here you say it is.  Not sure if peeling for alignment works for VLA vectors
> > though.  Just to say x86 doesn't support first-faulting loads.
> 
> For VLA we support it through masking.  i.e. if you need to peel N iterations, we
> generate a masked copy of the loop vectorized which masks off the first N bits.
> 
> This is not typically needed, but we do support it.  But the problem with this
> scheme and early break is obviously that the peeled loop needs to be vectorized
> so you kinda end up with the same issue again.  So Atm it rejects it for VLA.

Hmm, I see.  I thought peeling by masking is an optimization.  Anyhow,
I think it should still work here - since all accesses are aligned and
we know that there's at least one original scalar iteration in the
first masked and the following "unmasked" vector iterations there
should never be faults for any of the aligned accesses.

I think going via alignment is a way easier method to guarantee this
than handwaving about "declared" arrays and niter.  One can try that
in addition of course - it's not always possible to align all
vector loads we are going to speculate (for VLA one could also
find common runtime (mis-)alignment and restrict the vector length based
on that, for RISC-V it seems to be efficient, not sure whether altering
that for SVE is though).

Richard.

> Regards,
> Tamar
> 
> > 
> > > > > - Fully masked loops, unmasked loops and partially masked loops
> > > > > are supported
> > > > > - Any number of loop early exits are supported.
> > > > > - No support for epilogue vectorization.  The only epilogue supported is
> > the
> > > > >   scalar final one.  Peeling code supports it but the code motion code
> > cannot
> > > > >   find instructions to make the move in the epilog.
> > > > > - Early breaks are only supported for inner loop vectorization.
> > > > >
> > > > > I have pushed a branch to
> > > > > refs/users/tnfchris/heads/gcc-14-early-break
> > > > >
> > > > > With the help of IPA and LTO this still gets hit quite often.
> > > > > During bootstrap it hit rather frequently.  Additionally TSVC
> > > > > s332, s481 and
> > > > > s482 all pass now since these are tests for support for early exit
> > > > vectorization.
> > > > >
> > > > > This implementation does not support completely handling the early
> > > > > break inside the vector loop itself but instead supports adding
> > > > > checks such that if we know that we have to exit in the current
> > > > > iteration then we branch to scalar code to actually do the final
> > > > > VF iterations which
> > > > handles all the code in <action>.
> > > > >
> > > > > For the scalar loop we know that whatever exit you take you have
> > > > > to perform at most VF iterations.  For vector code we only case
> > > > > about the state of fully performed iteration and reset the scalar
> > > > > code to the (partially)
> > > > remaining loop.
> > > > >
> > > > > That is to say, the first vector loop executes so long as the
> > > > > early exit isn't needed.  Once the exit is taken, the scalar code
> > > > > will perform at most VF extra iterations.  The exact number
> > > > > depending on peeling
> > > > and iteration start and which
> > > > > exit was taken (natural or early).   For this scalar loop, all early exits are
> > > > > treated the same.
> > > > >
> > > > > When we vectorize we move any statement not related to the early
> > > > > break itself and that would be incorrect to execute before the break (i.e.
> > > > > has side effects) to after the break.  If this is not possible we
> > > > > decline to
> > > > vectorize.
> > > > >
> > > > > This means that we check at the start of iterations whether we are
> > > > > going to exit or not.  During the analyis phase we check whether
> > > > > we are allowed to do this moving of statements.  Also note that we
> > > > > only move the scalar statements, but only do so after peeling but
> > > > > just before we
> > > > start transforming statements.
> > > > >
> > > > > Codegen:
> > > > >
> > > > > for e.g.
> > > > >
> > > > > #define N 803
> > > > > unsigned vect_a[N];
> > > > > unsigned vect_b[N];
> > > > >
> > > > > unsigned test4(unsigned x)
> > > > > {
> > > > >  unsigned ret = 0;
> > > > >  for (int i = 0; i < N; i++)
> > > > >  {
> > > > >    vect_b[i] = x + i;
> > > > >    if (vect_a[i] > x)
> > > > >      break;
> > > > >    vect_a[i] = x;
> > > > >
> > > > >  }
> > > > >  return ret;
> > > > > }
> > > > >
> > > > > We generate for Adv. SIMD:
> > > > >
> > > > > test4:
> > > > >         adrp    x2, .LC0
> > > > >         adrp    x3, .LANCHOR0
> > > > >         dup     v2.4s, w0
> > > > >         add     x3, x3, :lo12:.LANCHOR0
> > > > >         movi    v4.4s, 0x4
> > > > >         add     x4, x3, 3216
> > > > >         ldr     q1, [x2, #:lo12:.LC0]
> > > > >         mov     x1, 0
> > > > >         mov     w2, 0
> > > > >         .p2align 3,,7
> > > > > .L3:
> > > > >         ldr     q0, [x3, x1]
> > > > >         add     v3.4s, v1.4s, v2.4s
> > > > >         add     v1.4s, v1.4s, v4.4s
> > > > >         cmhi    v0.4s, v0.4s, v2.4s
> > > > >         umaxp   v0.4s, v0.4s, v0.4s
> > > > >         fmov    x5, d0
> > > > >         cbnz    x5, .L6
> > > > >         add     w2, w2, 1
> > > > >         str     q3, [x1, x4]
> > > > >         str     q2, [x3, x1]
> > > > >         add     x1, x1, 16
> > > > >         cmp     w2, 200
> > > > >         bne     .L3
> > > > >         mov     w7, 3
> > > > > .L2:
> > > > >         lsl     w2, w2, 2
> > > > >         add     x5, x3, 3216
> > > > >         add     w6, w2, w0
> > > > >         sxtw    x4, w2
> > > > >         ldr     w1, [x3, x4, lsl 2]
> > > > >         str     w6, [x5, x4, lsl 2]
> > > > >         cmp     w0, w1
> > > > >         bcc     .L4
> > > > >         add     w1, w2, 1
> > > > >         str     w0, [x3, x4, lsl 2]
> > > > >         add     w6, w1, w0
> > > > >         sxtw    x1, w1
> > > > >         ldr     w4, [x3, x1, lsl 2]
> > > > >         str     w6, [x5, x1, lsl 2]
> > > > >         cmp     w0, w4
> > > > >         bcc     .L4
> > > > >         add     w4, w2, 2
> > > > >         str     w0, [x3, x1, lsl 2]
> > > > >         sxtw    x1, w4
> > > > >         add     w6, w1, w0
> > > > >         ldr     w4, [x3, x1, lsl 2]
> > > > >         str     w6, [x5, x1, lsl 2]
> > > > >         cmp     w0, w4
> > > > >         bcc     .L4
> > > > >         str     w0, [x3, x1, lsl 2]
> > > > >         add     w2, w2, 3
> > > > >         cmp     w7, 3
> > > > >         beq     .L4
> > > > >         sxtw    x1, w2
> > > > >         add     w2, w2, w0
> > > > >         ldr     w4, [x3, x1, lsl 2]
> > > > >         str     w2, [x5, x1, lsl 2]
> > > > >         cmp     w0, w4
> > > > >         bcc     .L4
> > > > >         str     w0, [x3, x1, lsl 2]
> > > > > .L4:
> > > > >         mov     w0, 0
> > > > >         ret
> > > > >         .p2align 2,,3
> > > > > .L6:
> > > > >         mov     w7, 4
> > > > >         b       .L2
> > > > >
> > > > > and for SVE:
> > > > >
> > > > > test4:
> > > > >         adrp    x2, .LANCHOR0
> > > > >         add     x2, x2, :lo12:.LANCHOR0
> > > > >         add     x5, x2, 3216
> > > > >         mov     x3, 0
> > > > >         mov     w1, 0
> > > > >         cntw    x4
> > > > >         mov     z1.s, w0
> > > > >         index   z0.s, #0, #1
> > > > >         ptrue   p1.b, all
> > > > >         ptrue   p0.s, all
> > > > >         .p2align 3,,7
> > > > > .L3:
> > > > >         ld1w    z2.s, p1/z, [x2, x3, lsl 2]
> > > > >         add     z3.s, z0.s, z1.s
> > > > >         cmplo   p2.s, p0/z, z1.s, z2.s
> > > > >         b.any   .L2
> > > > >         st1w    z3.s, p1, [x5, x3, lsl 2]
> > > > >         add     w1, w1, 1
> > > > >         st1w    z1.s, p1, [x2, x3, lsl 2]
> > > > >         add     x3, x3, x4
> > > > >         incw    z0.s
> > > > >         cmp     w3, 803
> > > > >         bls     .L3
> > > > > .L5:
> > > > >         mov     w0, 0
> > > > >         ret
> > > > >         .p2align 2,,3
> > > > > .L2:
> > > > >         cntw    x5
> > > > >         mul     w1, w1, w5
> > > > >         cbz     w5, .L5
> > > > >         sxtw    x1, w1
> > > > >         sub     w5, w5, #1
> > > > >         add     x5, x5, x1
> > > > >         add     x6, x2, 3216
> > > > >         b       .L6
> > > > >         .p2align 2,,3
> > > > > .L14:
> > > > >         str     w0, [x2, x1, lsl 2]
> > > > >         cmp     x1, x5
> > > > >         beq     .L5
> > > > >         mov     x1, x4
> > > > > .L6:
> > > > >         ldr     w3, [x2, x1, lsl 2]
> > > > >         add     w4, w0, w1
> > > > >         str     w4, [x6, x1, lsl 2]
> > > > >         add     x4, x1, 1
> > > > >         cmp     w0, w3
> > > > >         bcs     .L14
> > > > >         mov     w0, 0
> > > > >         ret
> > > > >
> > > > > On the workloads this work is based on we see between 2-3x
> > > > > performance uplift using this patch.
> > > > >
> > > > > Follow up plan:
> > > > >  - Boolean vectorization has several shortcomings.  I've filed
> > > > > PR110223 with
> > > > the
> > > > >    bigger ones that cause vectorization to fail with this patch.
> > > > >  - SLP support.  This is planned for GCC 15 as for majority of the cases
> > build
> > > > >    SLP itself fails.
> > > >
> > > > It would be nice to get at least single-lane SLP support working.  I
> > > > think you need to treat the gcond as SLP root stmt and basically do
> > > > discovery on the condition as to as if it were a mask generating condition.
> > >
> > > Hmm ok, will give it  a try.
> > >
> > > >
> > > > Code generation would then simply schedule the gcond root instances
> > > > first (that would get you the code motion automagically).
> > >
> > > Right, so you're saying treat the gcond's as the seed, and stores as a sink.
> > > And then schedule only the instances without a gcond around such that
> > > we can still vectorize in place to get the branches.  Ok, makes sense.
> > >
> > > >
> > > > So, add a new slp_instance_kind, for example
> > > > slp_inst_kind_early_break, and record the gcond as root stmt.
> > > > Possibly "pattern" recognizing
> > > >
> > > >  gcond <_1 != _2>
> > > >
> > > > as
> > > >
> > > >  _mask = _1 != _2;
> > > >  gcond <_mask != 0>
> > > >
> > > > makes the SLP discovery less fiddly (but in theory you can of course
> > > > handle gconds directly).
> > > >
> > > > Is there any part of the series that can be pushed independelty?  If
> > > > so I'll try to look at those parts first.
> > > >
> > >
> > > Aside from:
> > >
> > > [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA
> > > form for early breaks [PATCH 7/21]middle-end: update IV update code to
> > > support early breaks and arbitrary exits
> > >
> > > The rest lie dormant and don't do anything or disrupt the tree until those
> > two are in.
> > > The rest all just touch up different parts piecewise.
> > >
> > > They do rely on the new field introduced in:
> > >
> > > [PATCH 3/21]middle-end: Implement code motion and dependency analysis
> > > for early breaks
> > >
> > > But can split them out.
> > >
> > > I'll start respinning no #4 and #7 with your latest changes now.
> > 
> > OK, I'll simply go 1-n then.
> > 
> > Richard.
> > 
> > > Thanks,
> > > Tamar
> > >
> > > > Thanks,
> > > > Richard.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks
  2023-11-07 11:34     ` Tamar Christina
@ 2023-11-07 14:23       ` Richard Biener
  2023-12-19 10:11         ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-07 14:23 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Tue, 7 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Tuesday, November 7, 2023 10:53 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: Re: [PATCH 3/21]middle-end: Implement code motion and
> > dependency analysis for early breaks
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > When performing early break vectorization we need to be sure that the
> > > vector operations are safe to perform.  A simple example is e.g.
> > >
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >    vect_b[i] = x + i;
> > >    if (vect_a[i]*2 != x)
> > >      break;
> > >    vect_a[i] = x;
> > >  }
> > >
> > > where the store to vect_b is not allowed to be executed
> > > unconditionally since if we exit through the early break it wouldn't
> > > have been done for the full VF iteration.
> > >
> > > Effective the code motion determines:
> > >   - is it safe/possible to vectorize the function
> > >   - what updates to the VUSES should be performed if we do
> > >   - Which statements need to be moved
> > >   - Which statements can't be moved:
> > >     * values that are live must be reachable through all exits
> > >     * values that aren't single use and shared by the use/def chain of the cond
> > >   - The final insertion point of the instructions.  In the cases we have
> > >     multiple early exist statements this should be the one closest to the loop
> > >     latch itself.
> > >
> > > After motion the loop above is:
> > >
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >    ... y = x + i;
> > >    if (vect_a[i]*2 != x)
> > >      break;
> > >    vect_b[i] = y;
> > >    vect_a[i] = x;
> > >
> > >  }
> > >
> > > The operation is split into two, during data ref analysis we determine
> > > validity of the operation and generate a worklist of actions to
> > > perform if we vectorize.
> > >
> > > After peeling and just before statetement tranformation we replay this
> > > worklist which moves the statements and updates book keeping only in
> > > the main loop that's to be vectorized.  This includes updating of USES in exit
> > blocks.
> > >
> > > At the moment we don't support this for epilog nomasks since the
> > > additional vectorized epilog's stmt UIDs are not found.
> > 
> > As of UIDs note that UIDs are used for dominance checking in
> > vect_stmt_dominates_stmt_p and that at least is used during transform when
> > scheduling SLP.  Moving stmts around invalidates this UID order (I don't see
> > you "renumbering" UIDs).
> > 
> 
> Just some responses to questions while I process the rest.
> 
> I see, yeah I didn't encounter it because I punted SLP support.  As you said for SLP
> We indeed don't need this.
> 
> > More comments below.
> > 
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
> > > 	(vect_analyze_early_break_dependences): New.
> > > 	(vect_analyze_data_ref_dependences): Use them.
> > > 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > > 	early_breaks.
> > > 	(move_early_exit_stmts): New.
> > > 	(vect_transform_loop): use it/
> > > 	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
> > > 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > > 	(class _loop_vec_info): Add early_breaks, early_break_conflict,
> > > 	early_break_vuses.
> > > 	(LOOP_VINFO_EARLY_BREAKS): New.
> > > 	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
> > > 	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
> > > 	(LOOP_VINFO_EARLY_BRK_VUSES): New.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> > > index
> > >
> > d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c
> > 32b9ce
> > > 7be77f3e1d60 100644
> > > --- a/gcc/tree-vect-data-refs.cc
> > > +++ b/gcc/tree-vect-data-refs.cc
> > > @@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct
> > data_dependence_relation *ddr,
> > >    return opt_result::success ();
> > >  }
> > >
> > > +/* This function tries to validate whether an early break vectorization
> > > +   is possible for the current instruction sequence. Returns True i
> > > +   possible, otherwise False.
> > > +
> > > +   Requirements:
> > > +     - Any memory access must be to a fixed size buffer.
> > > +     - There must not be any loads and stores to the same object.
> > > +     - Multiple loads are allowed as long as they don't alias.
> > > +
> > > +   NOTE:
> > > +     This implemementation is very conservative. Any overlappig loads/stores
> > > +     that take place before the early break statement gets rejected aside from
> > > +     WAR dependencies.
> > > +
> > > +     i.e.:
> > > +
> > > +	a[i] = 8
> > > +	c = a[i]
> > > +	if (b[i])
> > > +	  ...
> > > +
> > > +	is not allowed, but
> > > +
> > > +	c = a[i]
> > > +	a[i] = 8
> > > +	if (b[i])
> > > +	  ...
> > > +
> > > +	is which is the common case.
> > > +
> > > +   Arguments:
> > > +     - LOOP_VINFO: loop information for the current loop.
> > > +     - CHAIN: Currently detected sequence of instructions that need to be
> > moved
> > > +	      if we are to vectorize this early break.
> > > +     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are
> > reachable from
> > > +	      one or more cond conditions.  If this set overlaps with CHAIN then
> > FIXED
> > > +	      takes precedence.  This deals with non-single use cases.
> > > +     - LOADS: List of all loads found during traversal.
> > > +     - BASES: List of all load data references found during traversal.
> > > +     - GSTMT: Current position to inspect for validity.  The sequence
> > > +	      will be moved upwards from this point.
> > > +     - REACHING_VUSE: The dominating VUSE found so far.  */
> > > +
> > > +static bool
> > > +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set<tree>
> > *chain,
> > > +			   hash_set<tree> *fixed, vec<tree> *loads,
> > > +			   vec<data_reference *> *bases, tree *reaching_vuse,
> > > +			   gimple_stmt_iterator *gstmt)
> > > +{
> > > +  if (gsi_end_p (*gstmt))
> > > +    return true;
> > > +
> > > +  gimple *stmt = gsi_stmt (*gstmt);
> > > +  /* ?? Do I need to move debug statements? not quite sure..  */
> > 
> > I think we reset them.
> > 
> > > +  if (gimple_has_ops (stmt)
> > > +      && !is_gimple_debug (stmt))
> > > +    {
> > > +      tree dest = NULL_TREE;
> > > +      /* Try to find the SSA_NAME being defined.  For Statements with an LHS
> > > +	 use the LHS, if not, assume that the first argument of a call is the
> > > +	 value being defined.  e.g. MASKED_LOAD etc.  */
> > > +      if (gimple_has_lhs (stmt))
> > > +	dest = gimple_get_lhs (stmt);
> > > +      else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> > > +	dest = gimple_arg (call, 0);
> > > +      else if (const gcond *cond = dyn_cast <const gcond *> (stmt))
> > > +	{
> > > +	  /* Operands of conds are ones we can't move.  */
> > > +	  fixed->add (gimple_cond_lhs (cond));
> > > +	  fixed->add (gimple_cond_rhs (cond));
> > > +	}
> > > +
> > > +      bool move = false;
> > 
> > 
> > So this all looks a bit like spaghetti (sorry).  I think what you want to do is
> > perform this in two steps:
> > 
> >  a) mark (and check) the dependences of the early break conditions,
> >     aka populate 'fixed'
> >  b) walk stmts from the _last_ early break, verifying all earlier
> >     non-'fixed' stmts can be moved
> > 
> > for marking dependences you want to simply iterate over use
> > operands:
> > 
> >   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
> >     USE_FROM_PTR (use_p) then is a SSA name that's used by 'stmt',
> >     the SSA_NAME_DEF_STMT of it is the next stmt to visit.  Use
> >     a worklist with a visited set to gather all of the relevant
> >     stmts/defs
> > 
> > > +      stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> > > +      if (!stmt_vinfo)
> > > +	{
> > > +	   if (dump_enabled_p ())
> > > +	     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			      "early breaks not supported. Unknown"
> > > +			      " statement: %G", stmt);
> > > +	   return false;
> > > +	}
> > > +
> > > +      auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> > > +      if (dr_ref)
> > > +	{
> > > +	   /* We currently only support statically allocated objects due to
> > > +	      not having first-faulting loads support or peeling for alignment
> > > +	      support.  Compute the size of the referenced object (it could be
> > > +	      dynamically allocated).  */
> > > +	   tree obj = DR_BASE_ADDRESS (dr_ref);
> > > +	   if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> > > +	     {
> > > +	       if (dump_enabled_p ())
> > > +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > vect_location,
> > > +				  "early breaks only supported on statically"
> > > +				  " allocated objects.\n");
> > > +	       return false;
> > > +	     }
> > > +
> > > +	   tree refop = TREE_OPERAND (obj, 0);
> > > +	   tree refbase = get_base_address (refop);
> > > +	   if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> > > +	       || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> > > +	     {
> > > +	       if (dump_enabled_p ())
> > > +		 dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > vect_location,
> > > +				  "early breaks only supported on statically"
> > > +				  " allocated objects.\n");
> > > +	       return false;
> > > +	     }
> > 
> > Note this doesn't ensure in-bound accesses:
> > 
> > int a[4];
> > 
> > void foo ()
> > {
> >   for (unsigned int i = 0; i < 32; ++i)
> >    {
> >      if (a[i] == 0)
> >        break;
> >      /* ... */
> >    }
> > }
> > 
> > you'd happily load a V8SImode vector from 'a'.  If the caller ensures a[3] == 0
> > the code is fine but your transformed vector code not.  You need to check that
> > DR_BASE_ADDRESS + DR_OFFSET
> > + DR_INIT + niter * DR_STEP is within the object instead.
> > 
> > > +
> > > +	   if (DR_IS_READ (dr_ref))
> > > +	     {
> > > +		loads->safe_push (dest);
> > > +		bases->safe_push (dr_ref);
> > > +	     }
> > > +	   else if (DR_IS_WRITE (dr_ref))
> > > +	     {
> > > +		for (auto dr : bases)
> > > +		  if (same_data_refs_base_objects (dr, dr_ref))
> > > +		    {
> > 
> > that looks quadratic to me.  So what's this actually?  You've gathered all loads
> > after this write and now you are checking that all those loads do not alias the
> > write?  But same_data_refs_base_objects is only verifying that the two refs are
> > suitable for classical dependence analysis, so it's not a conservative test here.  I
> > think you may want to use dr_may_alias_p instead?

So while thinking about the code motion it's also that when you move
the stores you invalidate all previous data dependence analysis checks
that involved them since a read-after-write dependence might now
become a write-after-read one.  So _maybe_ this check wanted to ensures
we didn't derive any affine dependence distance/direction for the
two refs before as we can only (well, not 100% true..) do this for
same_data_refs_base_objects?  As said a few more comments would really
help here.

Below you're using stmt_may_clobber_ref_p, but that's only valid to
use if you transform a scalar sequence of stmts - what you are doing
is altering stmt ordering across iterations since vectorization with
a VF > 1 involves unrolling.  I think you need to re-formulate this
in terms of datat dependence analysis checks or use
dr_may_alias_p (..., loop_nest) instead of stmt_may_clobber_ref_p.

> > 
> > I'm missing some overall idea of what you are doing, like what's the actual
> > transform and how do you validate its validity?
> > 
> 
> So the basic idea is that we should move everything with a side effect to after all
> the early exits.  I reasoned that most things with side effects would either block
> vectorization entirely or are stores.  This is why it essentially just looks at stores
> and the statements that create them.
> 
> > It looks like you only move stores?
> 
> Yeah, though an earlier version of the patch also moved, if possible the statements
> creating the value for the stores.  And I think I'll have to go back to that version again.
> 
> The reason is that with this new BB layout and how we "layer" the BB for the early exits
> and main exit it seems like sched1 no longer is able to schedule instructions over the EBBS.
> 
> This leads to us extending the live ranges for the statements creating the values and causing
> reload to having to copy the values in some cases.
> 
> So 
> 
> x = a + I;
> y[i] = x;
> If (..) { }
> 
> and moving the store alone can end up making reload copy the value of x.   To fix this I should
> probably move x as well.  This code is also checking if that's possible, since you can't move x if it's
> used by something that can't be moved. Say, if the condition was `if (b[i] > x)`.

If you'd move the cbranch dependences up instead, simply by scheduling
their SLP instances first you would leave the order of the rest of the
stmts unperturbed.  The only adjustment would be needed to
vect_schedule_slp_node to make sure to schedule all other instances
stmts after the last cbranch.

> > 
> > > +		      if (dump_enabled_p ())
> > > +			  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > > +					   vect_location,
> > > +					   "early breaks only supported,"
> > > +					   " overlapping loads and stores
> > found"
> > > +					   " before the break statement.\n");
> > > +		      return false;
> > > +		    }
> > > +		/* Any writes starts a new chain. */
> > > +		move = true;
> > > +	     }
> > > +	}
> > > +
> > > +      /* If a statement is live and escapes the loop through usage in the loop
> > > +	 epilogue then we can't move it since we need to maintain its
> > > +	 reachability through all exits.  */
> > > +      bool skip = false;
> > > +      if (STMT_VINFO_LIVE_P (stmt_vinfo)
> > > +	  && !(dr_ref && DR_IS_WRITE (dr_ref)))
> > > +	{
> > > +	  imm_use_iterator imm_iter;
> > > +	  use_operand_p use_p;
> > > +	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
> > > +	    {
> > > +	      basic_block bb = gimple_bb (USE_STMT (use_p));
> > > +	      skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > +	      if (skip)
> > > +		break;
> > > +	    }
> > > +	}
> > > +
> > > +      /* If we found the defining statement of a something that's part of the
> > > +	 chain then expand the chain with the new SSA_VARs being used.  */
> > > +      if (!skip && (chain->contains (dest) || move))
> > > +	{
> > > +	  move = true;
> > > +	  for (unsigned x = 0; x < gimple_num_args (stmt); x++)
> > > +	    {
> > > +	      tree var = gimple_arg (stmt, x);
> > > +	      if (TREE_CODE (var) == SSA_NAME)
> > > +		{
> > > +		  if (fixed->contains (dest))
> > > +		    {
> > > +		      move = false;
> > > +		      fixed->add (var);
> > > +		    }
> > > +		  else
> > > +		    chain->add (var);
> > > +		}
> > > +	      else
> > > +		{
> > > +		  use_operand_p use_p;
> > > +		  ssa_op_iter iter;
> > > +		  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter,
> > SSA_OP_USE)
> > > +		    {
> > > +		      tree op = USE_FROM_PTR (use_p);
> > > +		      gcc_assert (TREE_CODE (op) == SSA_NAME);
> > > +		      if (fixed->contains (dest))
> > > +			{
> > > +			  move = false;
> > > +			  fixed->add (op);
> > > +			}
> > > +		      else
> > > +			chain->add (op);
> > > +		    }
> > > +		}
> > > +	    }
> > > +
> > > +	  if (dump_enabled_p ())
> > > +	    {
> > > +	      if (move)
> > > +		dump_printf_loc (MSG_NOTE, vect_location,
> > > +				"found chain %G", stmt);
> > > +	      else
> > > +		dump_printf_loc (MSG_NOTE, vect_location,
> > > +				"ignored chain %G, not single use", stmt);
> > > +	    }
> > > +	}
> > > +
> > > +      if (move)
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > +			     "==> recording stmt %G", stmt);
> > > +
> > > +	  for (tree ref : loads)
> > > +	    if (stmt_may_clobber_ref_p (stmt, ref, true))
> > > +	      {
> > > +	        if (dump_enabled_p ())
> > > +		  dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> > vect_location,
> > > +				   "early breaks not supported as memory
> > used"
> > > +				   " may alias.\n");
> > > +	        return false;
> > > +	      }
> > 
> > Here you check aliasing again?!
> > 
> > I think it might be conceptually easier (and stronger) to instead think of the
> > 'fixed' set (and the gconds) to be moved earlier instead of the stores to be
> > sunk.
> > 
> > For example I fail to see how you check for, say
> > 
> >    for (..)
> >     {
> >       tem = a[i] / b[i];
> >       if (c[i]) break;
> >       d[i] = tem;
> >     }
> > 
> > where the division might trap.  For this the validation wouldn't identify
> > anything to move, right?
> > 
> 
> Hmm yes I ignored it because I figured we wouldn't vectorize anyway with -ftrapping-math?
> I guess I should call gimple_has_side_effects on the stmt but figured we wouldn't get here.
> 
> > I'll note that doing the actual movement will be easier with SLP and it would be
> > a possibility to implement early break with just SLP support - as we need to
> > start discovery from the gconds explicitly anyway there's no problem forcing a
> > single-lane SLP discovery there.
> > 
> 
> Possibly, but I think we'd still have problem with figuring out what to do with the live
> range issue.  I guess long term the issue should be fixed in sched1?
> 
> > > +
> > > +	  /* If we've moved a VDEF, extract the defining MEM and update
> > > +	     usages of it.   */
> > > +	  tree vdef;
> > > +	  if ((vdef = gimple_vdef (stmt)))
> > > +	    {
> > > +	      /* This statement is to be moved.  */
> > > +	      LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> > (loop_vinfo).safe_push (stmt);
> > > +	      *reaching_vuse = gimple_vuse (stmt);
> > > +	    }
> > > +	}
> > > +    }
> > > +
> > > +  gsi_prev (gstmt);
> > > +
> > > +  if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases,
> > > +				  reaching_vuse, gstmt))
> > > +    return false;
> > 
> > Please use a loop instead of recursion.  I suggest to do the loop at the single
> > caller.
> > 
> > > +  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
> > > +    {
> > > +      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt);
> > > +      if (dump_enabled_p ())
> > > +	  dump_printf_loc (MSG_NOTE, vect_location,
> > > +			   "marked statement for vUSE update: %G", stmt);
> > > +    }
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Funcion vect_analyze_early_break_dependences.
> > > +
> > > +   Examime all the data references in the loop and make sure that if we have
> > > +   mulitple exits that we are able to safely move stores such that they
> > become
> > > +   safe for vectorization.  The function also calculates the place where to
> > move
> > > +   the instructions to and computes what the new vUSE chain should be.
> > > +
> > > +   This works in tandem with the CFG that will be produced by
> > > +   slpeel_tree_duplicate_loop_to_edge_cfg later on.  */
> > > +
> > > +static opt_result
> > > +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo) {
> > > +  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
> > > +
> > > +  hash_set<tree> chain, fixed;
> > > +  auto_vec<tree> loads;
> > > +  auto_vec<data_reference *> bases;
> > > +  basic_block dest_bb = NULL;
> > > +  tree vuse = NULL;
> > > +
> > > +  if (dump_enabled_p ())
> > > +    dump_printf_loc (MSG_NOTE, vect_location,
> > > +		     "loop contains multiple exits, analyzing"
> > > +		     " statement dependencies.\n");
> > > +
> > > +  for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> > > +    {
> > > +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> > > +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> > > +	continue;
> > > +
> > > +      gimple *stmt = STMT_VINFO_STMT (loop_cond_info);
> > 
> > isn't that 'c' already?
> > 
> > > +      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> > > +
> > > +      /* Initiaze the vuse chain with the one at the early break.  */
> > > +      if (!vuse)
> > > +	vuse = gimple_vuse (c);
> > 
> > gconds do not have virtual operands.
> > 
> > > +
> > > +      if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads,
> > > +				     &bases, &vuse, &gsi))
> > > +	return opt_result::failure_at (stmt,
> > > +				       "can't safely apply code motion to "
> > > +				       "dependencies of %G to vectorize "
> > > +				       "the early exit.\n", stmt);
> > > +
> > > +      /* Save destination as we go, BB are visited in order and the last one
> > > +	is where statements should be moved to.  */
> > > +      if (!dest_bb)
> > > +	dest_bb = gimple_bb (c);
> > > +      else
> > > +	{
> > > +	  basic_block curr_bb = gimple_bb (c);
> > > +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> > > +	    dest_bb = curr_bb;
> > > +	}
> > > +    }
> > > +
> > > +  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
> > 
> > no edge is the fallthru edge out of a condition, so this always selects
> > EDGE_SUCC (dest_bb, 1) which cannot be correct (well, guess you're lucky).  I
> > think you instead want
> > 
> >   dest_bb = EDGE_SUCC (dest_bb, 0)->dest->loop_father == dest_bb-
> > >loop_father ? EDGE_SUCC (dest_bb, 0)->dest : EDGE_SUCC (dest_bb, 1)-
> > >dest;
> > 
> > more nicely written, of course.
> > 
> > > +  gcc_assert (dest_bb);
> > > +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> > 
> > Sorting the vector of early breaks as we gather them might be nicer than this -
> > you'd then simply use the first or last.
> > 
> > > +
> > > +  /* TODO: Remove? It's useful debug statement but may be too much.
> > > +*/
> > > +  for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> > > +    {
> > > +      if (dump_enabled_p ())
> > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > +			 "updated use: %T, mem_ref: %G",
> > > +			 vuse, g);
> > > +    }
> > > +
> > > +  if (dump_enabled_p ())
> > > +    dump_printf_loc (MSG_NOTE, vect_location,
> > > +		     "recorded statements to be moved to BB %d\n",
> > > +		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
> > > +
> > > +  return opt_result::success ();
> > > +}
> > > +
> > >  /* Function vect_analyze_data_ref_dependences.
> > >
> > >     Examine all the data references in the loop, and make sure there
> > > do not @@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences
> > (loop_vec_info loop_vinfo,
> > >  	  return res;
> > >        }
> > >
> > > +  /* If we have early break statements in the loop, check to see if they
> > > +     are of a form we can vectorizer.  */  if
> > > + (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +    return vect_analyze_early_break_dependences (loop_vinfo);
> > > +
> > >    return opt_result::success ();
> > >  }
> > >
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > >
> > 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a
> > 2079c523
> > > 4033c3d825ea 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop
> > *loop_in, vec_info_shared *shared)
> > >      partial_load_store_bias (0),
> > >      peeling_for_gaps (false),
> > >      peeling_for_niter (false),
> > > +    early_breaks (false),
> > >      no_data_dependencies (false),
> > >      has_mask_store (false),
> > >      scalar_loop_scaling (profile_probability::uninitialized ()), @@
> > > -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop
> > *epilogue, tree advance)
> > >    epilogue_vinfo->shared->save_datarefs ();  }
> > >
> > > +/*  When vectorizing early break statements instructions that happen
> > before
> > > +    the early break in the current BB need to be moved to after the early
> > > +    break.  This function deals with that and assumes that any validity
> > > +    checks has already been performed.
> > > +
> > > +    While moving the instructions if it encounters a VUSE or VDEF it then
> > > +    corrects the VUSES as it moves the statements along.  GDEST is the
> > location
> > > +    in which to insert the new statements.  */
> > > +
> > > +static void
> > > +move_early_exit_stmts (loop_vec_info loop_vinfo) {
> > > +  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
> > > +    return;
> > > +
> > > +  /* Move all stmts that need moving.  */  basic_block dest_bb =
> > > + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
> > 
> > I suppose dest_bb is the in-loop block following the last early exit?  I suppose
> > we do not support an "early" exit after the main IV exit, right?  Instead we'd
> > require loop rotation?
> 
> Indeed, this is also keeping in mind that when we add general control flow
> we don't want to move it past the control flow inside the loop.  This would
> extend the live ranges too much.
> 
> > 
> > > +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> > > +
> > > +  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> > (loop_vinfo))
> > > +    {
> > > +      /* Check to see if statement is still required for vect or has been
> > > +	 elided.  */
> > > +      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
> > > +      if (!stmt_info)
> > > +	continue;
> > > +
> > > +      if (dump_enabled_p ())
> > > +	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G",
> > stmt);
> > > +
> > > +      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
> > > +      gsi_move_before (&stmt_gsi, &dest_gsi);
> > > +      gsi_prev (&dest_gsi);
> > > +      update_stmt (stmt);
> > 
> > You shouldn't need to update_stmt here I think.
> > 
> > > +    }
> > > +
> > > +  /* Update all the stmts with their new reaching VUSES.  */
> > > +  tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> > > +(loop_vinfo).last ());
> > > +  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> > > +    {
> > > +      if (dump_enabled_p ())
> > > +	  dump_printf_loc (MSG_NOTE, vect_location,
> > > +			   "updating vuse to %T for stmt %G", vuse, p);
> > > +      unlink_stmt_vdef (p);
> > 
> > it's odd to first move the stmts and then propagate out their defs (which you
> > forget to release?)
> > 
> > > +      gimple_set_vuse (p, vuse);
> > 
> > and now every store gets the same vuse?  I'm quite sure you'll end up with
> > broken virtual SSA form here.
> > 
> No, not every store, but every load.   Since we've moved everything that can
> introduce a new vDEF, then all the use of memory before the last early exit
> must be using the same vUSE.  The loop never has to update stores because
> they are moved in order.

Ah, I was confused about the unlink_stmt_vdef - if 
LOOP_VINFO_EARLY_BRK_VUSES are only loads then there are no VDEFs
involved and unlink_stmt_vdef does nothing.

Richard.

> Regards,
> Tamar
> 
> > > +      update_stmt (p);
> > > +    }
> > > +}
> > > +
> > >  /* Function vect_transform_loop.
> > >
> > >     The analysis phase has determined that the loop is vectorizable.
> > > @@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info
> > loop_vinfo, gimple *loop_vectorized_call)
> > >        vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES
> > (loop_vinfo));
> > >      }
> > >
> > > +  /* Handle any code motion that we need to for early-break vectorization
> > after
> > > +     we've done peeling but just before we start vectorizing.  */  if
> > > + (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +    move_early_exit_stmts (loop_vinfo);
> > > +
> > >    /* FORNOW: the vectorizer supports only loops which body consist
> > >       of one basic block (header + empty latch). When the vectorizer will
> > >       support more involved loop forms, the order by which the BBs are
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> > >
> > 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a9
> > 1cb8e74c0
> > > 557e75d1ea2c 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info
> > *vinfo, enum vect_def_type *dt,
> > >  	case vect_first_order_recurrence:
> > >  	  dump_printf (MSG_NOTE, "first order recurrence\n");
> > >  	  break;
> > > +       case vect_early_exit_def:
> > > +	  dump_printf (MSG_NOTE, "early exit\n");
> > > +	  break;
> > >  	case vect_unknown_def_type:
> > >  	  dump_printf (MSG_NOTE, "unknown\n");
> > >  	  break;
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > >
> > a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf7835
> > 2e29dc9
> > > 958746fb9c94 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -66,6 +66,7 @@ enum vect_def_type {
> > >    vect_double_reduction_def,
> > >    vect_nested_cycle,
> > >    vect_first_order_recurrence,
> > > +  vect_early_exit_def,
> > >    vect_unknown_def_type
> > >  };
> > >
> > > @@ -888,6 +889,10 @@ public:
> > >       we need to peel off iterations at the end to form an epilogue loop.  */
> > >    bool peeling_for_niter;
> > >
> > > +  /* When the loop has early breaks that we can vectorize we need to peel
> > > +     the loop for the break finding loop.  */  bool early_breaks;
> > > +
> > >    /* List of loop additional IV conditionals found in the loop.  */
> > >    auto_vec<gcond *> conds;
> > >
> > > @@ -942,6 +947,20 @@ public:
> > >    /* The controlling loop IV for the scalar loop being vectorized.  This IV
> > >       controls the natural exits of the loop.  */
> > >    edge scalar_loop_iv_exit;
> > > +
> > > +  /* Used to store the list of statements needing to be moved if doing early
> > > +     break vectorization as they would violate the scalar loop semantics if
> > > +     vectorized in their current location.  These are stored in order that they
> > need
> > > +     to be moved.  */
> > > +  auto_vec<gimple *> early_break_conflict;
> > > +
> > > +  /* The final basic block where to move statements to.  In the case of
> > > +     multiple exits this could be pretty far away.  */  basic_block
> > > + early_break_dest_bb;
> > > +
> > > +  /* Statements whose VUSES need updating if early break vectorization is
> > to
> > > +     happen.  */
> > > +  auto_vec<gimple*> early_break_vuses;
> > >  } *loop_vec_info;
> > >
> > >  /* Access Functions.  */
> > > @@ -996,6 +1015,10 @@ public:
> > >  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
> > >  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
> > >  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> > > +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> > > +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)-
> > >early_break_conflict
> > > +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> > > +#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> > >  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> > >  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
> > >  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)-
> > >no_data_dependencies
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 6/21]middle-end: support multiple exits in loop versioning
  2023-11-06  7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
@ 2023-11-07 14:54   ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-11-07 14:54 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> This has loop versioning use the vectorizer's IV exit edge when it's available
> since single_exit (..) fails with multiple exits.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop-manip.cc (vect_loop_versioning): Support multiple
> 	exits.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 3d59119787d6afdc5a6465a547d1ea2d3d940373..58b4b9c11d8b844ee86156cdfcba7f838030a7c2 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -4180,12 +4180,24 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
>  	 If loop versioning wasn't done from loop, but scalar_loop instead,
>  	 merge_bb will have already just a single successor.  */
>  
> -      merge_bb = single_exit (loop_to_version)->dest;
> +      /* Due to the single_exit check above we should only get here when
> +	 loop == loop_to_version, that means we can use loop_vinfo to get the
> +	 exits.  */

You mean LOOP_VINFO_EARLY_BREAKS can only ever version the loop
itself?  That's correct.  All inner loops of loop_to_version have
a single exit unless it's loop itself.

Please reword a bit and instead do

       edge exit_edge;
       if (loop_to_version == loop)
         exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
       else
         exit_edge = single_exit (loop_to_version);

OK with that change.

Richard.

> +      edge exit_edge = single_exit (loop_to_version);
> +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	{
> +	  /* In early exits the main exit will fail into the merge block of the
> +	     alternative exits.  So we need the single successor of the main
> +	     exit here to find the merge block.  */
> +	  exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +	}
> +      gcc_assert (exit_edge);
> +      merge_bb = exit_edge->dest;
>        if (EDGE_COUNT (merge_bb->preds) >= 2)
>  	{
>  	  gcc_assert (EDGE_COUNT (merge_bb->preds) >= 2);
> -	  new_exit_bb = split_edge (single_exit (loop_to_version));
> -	  new_exit_e = single_exit (loop_to_version);
> +	  new_exit_bb = split_edge (exit_edge);
> +	  new_exit_e = exit_edge;
>  	  e = EDGE_SUCC (new_exit_bb, 0);
>  
>  	  for (gsi = gsi_start_phis (merge_bb); !gsi_end_p (gsi);

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch
  2023-11-06  7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
@ 2023-11-07 15:04   ` Richard Biener
  2023-11-07 23:10     ` Tamar Christina
  2023-11-13 20:11     ` Tamar Christina
  0 siblings, 2 replies; 200+ messages in thread
From: Richard Biener @ 2023-11-07 15:04 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> As requested, the vectorizer is now free to pick it's own exit which can be
> different than what the loop CFG infrastucture uses.  The vectorizer makes use
> of this to vectorize loops that it previously could not.
> 
> But this means that loop control must be materialized in the block that needs it
> less we corrupt the SSA chain.  This makes it so we use the vectorizer's main
> IV block instead of the loop infra.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-ssa-loop-manip.cc (standard_iv_increment_position): Conditionally
> 	take dest BB.
> 	* tree-ssa-loop-manip.h (standard_iv_increment_position): Likewise.
> 	* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Use it.
> 	(vect_set_loop_condition_partial_vectors_avx512): Likewise.
> 	(vect_set_loop_condition_normal): Likewise.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
> index bda09f51d5619420331c513a9906831c779fd2b4..5938588c8882d842b00301423df111cbe7bf7ba8 100644
> --- a/gcc/tree-ssa-loop-manip.h
> +++ b/gcc/tree-ssa-loop-manip.h
> @@ -38,7 +38,8 @@ extern basic_block split_loop_exit_edge (edge, bool = false);
>  extern basic_block ip_end_pos (class loop *);
>  extern basic_block ip_normal_pos (class loop *);
>  extern void standard_iv_increment_position (class loop *,
> -					    gimple_stmt_iterator *, bool *);
> +					    gimple_stmt_iterator *, bool *,
> +					    basic_block = NULL);
>  extern bool
>  gimple_duplicate_loop_body_to_header_edge (class loop *, edge, unsigned int,
>  					   sbitmap, edge, vec<edge> *, int);
> diff --git a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc
> index e7436915e01297e7af2a3bcf1afd01e014de6f32..bdc7a3d74a788f450ca5dde6c29492ce4d4e4550 100644
> --- a/gcc/tree-ssa-loop-manip.cc
> +++ b/gcc/tree-ssa-loop-manip.cc
> @@ -792,14 +792,19 @@ ip_normal_pos (class loop *loop)
>  
>  /* Stores the standard position for induction variable increment in LOOP
>     (just before the exit condition if it is available and latch block is empty,
> -   end of the latch block otherwise) to BSI.  INSERT_AFTER is set to true if
> -   the increment should be inserted after *BSI.  */
> +   end of the latch block otherwise) to BSI.  If DEST_BB is specified then that
> +   basic block is used as the destination instead of the loop latch source
> +   block.  INSERT_AFTER is set to true if the increment should be inserted after
> +   *BSI.  */
>  
>  void
>  standard_iv_increment_position (class loop *loop, gimple_stmt_iterator *bsi,
> -				bool *insert_after)
> +				bool *insert_after, basic_block dest_bb)
>  {
> -  basic_block bb = ip_normal_pos (loop), latch = ip_end_pos (loop);
> +  basic_block bb = dest_bb;
> +  if (!bb)
> +    bb = ip_normal_pos (loop);
> +  basic_block latch = ip_end_pos (loop);

I don't think that's a good API extension.  Given that we don't support
an early exit after the main IV exit doesn't this code already work
fine as-is?  It chooses the last exit.  The position is also not
semantically relevant, we just try to keep the latch empty here
(that is, it's a bit of a "bad" API).

So, do you really need this change?

Maybe we're really using standard_iv_increment_position wrong here,
the result is supposed to _only_ feed the PHI latch argument.

Richard.

>    gimple *last = last_nondebug_stmt (latch);
>  
>    if (!bb
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 6fbb5b80986fd657814b48eb009b52b094f331e6..3d59119787d6afdc5a6465a547d1ea2d3d940373 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -531,7 +531,8 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
>    tree index_before_incr, index_after_incr;
>    gimple_stmt_iterator incr_gsi;
>    bool insert_after;
> -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +  standard_iv_increment_position (loop, &incr_gsi, &insert_after, exit_e->src);
>    if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
>      {
>        /* Create an IV that counts down from niters_total and whose step
> @@ -1017,7 +1018,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
>    tree index_before_incr, index_after_incr;
>    gimple_stmt_iterator incr_gsi;
>    bool insert_after;
> -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> +				  exit_edge->src);
>    create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
>  	     &incr_gsi, insert_after, &index_before_incr,
>  	     &index_after_incr);
> @@ -1185,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
>     loop handles exactly VF scalars per iteration.  */
>  
>  static gcond *
> -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
> +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
>  				class loop *loop, tree niters, tree step,
>  				tree final_iv, bool niters_maybe_zero,
>  				gimple_stmt_iterator loop_cond_gsi)
> @@ -1278,7 +1280,8 @@ vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
>  	}
>      }
>  
> -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> +				  exit_edge->src);
>    create_iv (init, PLUS_EXPR, step, NULL_TREE, loop,
>               &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
>    indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, indx_after_incr,
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch
  2023-11-07 15:04   ` Richard Biener
@ 2023-11-07 23:10     ` Tamar Christina
  2023-11-13 20:11     ` Tamar Christina
  1 sibling, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-11-07 23:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, November 7, 2023 3:04 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: Re: [PATCH 5/21]middle-end: update vectorizer's control update to
> support picking an exit other than loop latch
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > As requested, the vectorizer is now free to pick it's own exit which
> > can be different than what the loop CFG infrastucture uses.  The
> > vectorizer makes use of this to vectorize loops that it previously could not.
> >
> > But this means that loop control must be materialized in the block
> > that needs it less we corrupt the SSA chain.  This makes it so we use
> > the vectorizer's main IV block instead of the loop infra.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-ssa-loop-manip.cc (standard_iv_increment_position):
> Conditionally
> > 	take dest BB.
> > 	* tree-ssa-loop-manip.h (standard_iv_increment_position): Likewise.
> > 	* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Use it.
> > 	(vect_set_loop_condition_partial_vectors_avx512): Likewise.
> > 	(vect_set_loop_condition_normal): Likewise.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
> > index
> >
> bda09f51d5619420331c513a9906831c779fd2b4..5938588c8882d842b00
> 301423df1
> > 11cbe7bf7ba8 100644
> > --- a/gcc/tree-ssa-loop-manip.h
> > +++ b/gcc/tree-ssa-loop-manip.h
> > @@ -38,7 +38,8 @@ extern basic_block split_loop_exit_edge (edge, bool
> > = false);  extern basic_block ip_end_pos (class loop *);  extern
> > basic_block ip_normal_pos (class loop *);  extern void
> > standard_iv_increment_position (class loop *,
> > -					    gimple_stmt_iterator *, bool *);
> > +					    gimple_stmt_iterator *, bool *,
> > +					    basic_block = NULL);
> >  extern bool
> >  gimple_duplicate_loop_body_to_header_edge (class loop *, edge, unsigned
> int,
> >  					   sbitmap, edge, vec<edge> *, int); diff
> --git
> > a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc index
> >
> e7436915e01297e7af2a3bcf1afd01e014de6f32..bdc7a3d74a788f450ca5d
> de6c294
> > 92ce4d4e4550 100644
> > --- a/gcc/tree-ssa-loop-manip.cc
> > +++ b/gcc/tree-ssa-loop-manip.cc
> > @@ -792,14 +792,19 @@ ip_normal_pos (class loop *loop)
> >
> >  /* Stores the standard position for induction variable increment in LOOP
> >     (just before the exit condition if it is available and latch block is empty,
> > -   end of the latch block otherwise) to BSI.  INSERT_AFTER is set to true if
> > -   the increment should be inserted after *BSI.  */
> > +   end of the latch block otherwise) to BSI.  If DEST_BB is specified then that
> > +   basic block is used as the destination instead of the loop latch source
> > +   block.  INSERT_AFTER is set to true if the increment should be inserted
> after
> > +   *BSI.  */
> >
> >  void
> >  standard_iv_increment_position (class loop *loop, gimple_stmt_iterator
> *bsi,
> > -				bool *insert_after)
> > +				bool *insert_after, basic_block dest_bb)
> >  {
> > -  basic_block bb = ip_normal_pos (loop), latch = ip_end_pos (loop);
> > +  basic_block bb = dest_bb;
> > +  if (!bb)
> > +    bb = ip_normal_pos (loop);
> > +  basic_block latch = ip_end_pos (loop);
> 
> I don't think that's a good API extension.  Given that we don't support an early
> exit after the main IV exit doesn't this code already work fine as-is?  It chooses
> the last exit.  The position is also not semantically relevant, we just try to keep
> the latch empty here (that is, it's a bit of a "bad" API).
> 
> So, do you really need this change?

I'll double check. I remember needing it to fix an ICE before, but also re-did the
way the alternative main exits were handled later.  At the end of the series as I
was writing the cover letter this change also seemed... off to me, I should have
checked it again before submitting it.

> 
> Maybe we're really using standard_iv_increment_position wrong here, the
> result is supposed to _only_ feed the PHI latch argument.

Could be, this was needed before I changed the way I handled the IV updates for
the alternate exit loops.  I'll double check and drop If not needed.

Thanks,
Tamar

> 
> Richard.
> 
> >    gimple *last = last_nondebug_stmt (latch);
> >
> >    if (!bb
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> >
> 6fbb5b80986fd657814b48eb009b52b094f331e6..3d59119787d6afdc5a64
> 65a547d1
> > ea2d3d940373 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -531,7 +531,8 @@ vect_set_loop_controls_directly (class loop *loop,
> loop_vec_info loop_vinfo,
> >    tree index_before_incr, index_after_incr;
> >    gimple_stmt_iterator incr_gsi;
> >    bool insert_after;
> > -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > + standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > + exit_e->src);
> >    if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> >      {
> >        /* Create an IV that counts down from niters_total and whose
> > step @@ -1017,7 +1018,8 @@
> vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
> >    tree index_before_incr, index_after_incr;
> >    gimple_stmt_iterator incr_gsi;
> >    bool insert_after;
> > -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > +  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > +				  exit_edge->src);
> >    create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
> >  	     &incr_gsi, insert_after, &index_before_incr,
> >  	     &index_after_incr);
> > @@ -1185,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512
> (class loop *loop,
> >     loop handles exactly VF scalars per iteration.  */
> >
> >  static gcond *
> > -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > exit_edge,
> > +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > +exit_edge,
> >  				class loop *loop, tree niters, tree step,
> >  				tree final_iv, bool niters_maybe_zero,
> >  				gimple_stmt_iterator loop_cond_gsi) @@ -
> 1278,7 +1280,8 @@
> > vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> exit_edge,
> >  	}
> >      }
> >
> > -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > +  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > +				  exit_edge->src);
> >    create_iv (init, PLUS_EXPR, step, NULL_TREE, loop,
> >               &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
> >    indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi,
> > indx_after_incr,
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch
  2023-11-07 15:04   ` Richard Biener
  2023-11-07 23:10     ` Tamar Christina
@ 2023-11-13 20:11     ` Tamar Christina
  2023-11-14  7:56       ` Richard Biener
  1 sibling, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-13 20:11 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, November 7, 2023 3:04 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: Re: [PATCH 5/21]middle-end: update vectorizer's control update to
> support picking an exit other than loop latch
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > As requested, the vectorizer is now free to pick it's own exit which
> > can be different than what the loop CFG infrastucture uses.  The
> > vectorizer makes use of this to vectorize loops that it previously could not.
> >
> > But this means that loop control must be materialized in the block
> > that needs it less we corrupt the SSA chain.  This makes it so we use
> > the vectorizer's main IV block instead of the loop infra.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-ssa-loop-manip.cc (standard_iv_increment_position):
> Conditionally
> > 	take dest BB.
> > 	* tree-ssa-loop-manip.h (standard_iv_increment_position): Likewise.
> > 	* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Use it.
> > 	(vect_set_loop_condition_partial_vectors_avx512): Likewise.
> > 	(vect_set_loop_condition_normal): Likewise.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
> > index
> >
> bda09f51d5619420331c513a9906831c779fd2b4..5938588c8882d842b00
> 301423df1
> > 11cbe7bf7ba8 100644
> > --- a/gcc/tree-ssa-loop-manip.h
> > +++ b/gcc/tree-ssa-loop-manip.h
> > @@ -38,7 +38,8 @@ extern basic_block split_loop_exit_edge (edge, bool
> > = false);  extern basic_block ip_end_pos (class loop *);  extern
> > basic_block ip_normal_pos (class loop *);  extern void
> > standard_iv_increment_position (class loop *,
> > -					    gimple_stmt_iterator *, bool *);
> > +					    gimple_stmt_iterator *, bool *,
> > +					    basic_block = NULL);
> >  extern bool
> >  gimple_duplicate_loop_body_to_header_edge (class loop *, edge, unsigned
> int,
> >  					   sbitmap, edge, vec<edge> *, int); diff
> --git
> > a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc index
> >
> e7436915e01297e7af2a3bcf1afd01e014de6f32..bdc7a3d74a788f450ca5d
> de6c294
> > 92ce4d4e4550 100644
> > --- a/gcc/tree-ssa-loop-manip.cc
> > +++ b/gcc/tree-ssa-loop-manip.cc
> > @@ -792,14 +792,19 @@ ip_normal_pos (class loop *loop)
> >
> >  /* Stores the standard position for induction variable increment in LOOP
> >     (just before the exit condition if it is available and latch block is empty,
> > -   end of the latch block otherwise) to BSI.  INSERT_AFTER is set to true if
> > -   the increment should be inserted after *BSI.  */
> > +   end of the latch block otherwise) to BSI.  If DEST_BB is specified then that
> > +   basic block is used as the destination instead of the loop latch source
> > +   block.  INSERT_AFTER is set to true if the increment should be inserted
> after
> > +   *BSI.  */
> >
> >  void
> >  standard_iv_increment_position (class loop *loop, gimple_stmt_iterator
> *bsi,
> > -				bool *insert_after)
> > +				bool *insert_after, basic_block dest_bb)
> >  {
> > -  basic_block bb = ip_normal_pos (loop), latch = ip_end_pos (loop);
> > +  basic_block bb = dest_bb;
> > +  if (!bb)
> > +    bb = ip_normal_pos (loop);
> > +  basic_block latch = ip_end_pos (loop);
> 
> I don't think that's a good API extension.  Given that we don't support an early
> exit after the main IV exit doesn't this code already work fine as-is?  It chooses
> the last exit.  The position is also not semantically relevant, we just try to keep
> the latch empty here (that is, it's a bit of a "bad" API).
> 
> So, do you really need this change?

Yes I do, If you look at these kinds of loops https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f

You'll see that the main exit, i.e. the one attached to the latch block is the early break.
Because SCEV can't analyze it picks the main exit to be the one in BB4.

This means that the loop control must be placed in BB4.  If we place ivtmp_10 = ivtmp_9 - 1
In BB 3 then that's broken SSA.  If we use `ivtmp_9` in BB4 then we'll have an off by one issue.

You could have reached the end of the valid range for the loop when you re-enter BB4, since
loads are still allowed you'll then read out of bounds before checking that you exit.

This is also annoyingly hard to get correct, which Is what took me a long time.  Such loops mean
You need to restart the scalar loop at i_7 if you take the main exit.

Regards,
Tamar

> 
> Maybe we're really using standard_iv_increment_position wrong here, the
> result is supposed to _only_ feed the PHI latch argument.
> Richard.
> 
> >    gimple *last = last_nondebug_stmt (latch);
> >
> >    if (!bb
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> >
> 6fbb5b80986fd657814b48eb009b52b094f331e6..3d59119787d6afdc5a64
> 65a547d1
> > ea2d3d940373 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -531,7 +531,8 @@ vect_set_loop_controls_directly (class loop *loop,
> loop_vec_info loop_vinfo,
> >    tree index_before_incr, index_after_incr;
> >    gimple_stmt_iterator incr_gsi;
> >    bool insert_after;
> > -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > + standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > + exit_e->src);
> >    if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> >      {
> >        /* Create an IV that counts down from niters_total and whose
> > step @@ -1017,7 +1018,8 @@
> vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
> >    tree index_before_incr, index_after_incr;
> >    gimple_stmt_iterator incr_gsi;
> >    bool insert_after;
> > -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > +  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > +				  exit_edge->src);
> >    create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
> >  	     &incr_gsi, insert_after, &index_before_incr,
> >  	     &index_after_incr);
> > @@ -1185,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512
> (class loop *loop,
> >     loop handles exactly VF scalars per iteration.  */
> >
> >  static gcond *
> > -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > exit_edge,
> > +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > +exit_edge,
> >  				class loop *loop, tree niters, tree step,
> >  				tree final_iv, bool niters_maybe_zero,
> >  				gimple_stmt_iterator loop_cond_gsi) @@ -
> 1278,7 +1280,8 @@
> > vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> exit_edge,
> >  	}
> >      }
> >
> > -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > +  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > +				  exit_edge->src);
> >    create_iv (init, PLUS_EXPR, step, NULL_TREE, loop,
> >               &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
> >    indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi,
> > indx_after_incr,
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch
  2023-11-13 20:11     ` Tamar Christina
@ 2023-11-14  7:56       ` Richard Biener
  2023-11-14  8:07         ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-14  7:56 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 13 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Tuesday, November 7, 2023 3:04 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: Re: [PATCH 5/21]middle-end: update vectorizer's control update to
> > support picking an exit other than loop latch
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > As requested, the vectorizer is now free to pick it's own exit which
> > > can be different than what the loop CFG infrastucture uses.  The
> > > vectorizer makes use of this to vectorize loops that it previously could not.
> > >
> > > But this means that loop control must be materialized in the block
> > > that needs it less we corrupt the SSA chain.  This makes it so we use
> > > the vectorizer's main IV block instead of the loop infra.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-ssa-loop-manip.cc (standard_iv_increment_position):
> > Conditionally
> > > 	take dest BB.
> > > 	* tree-ssa-loop-manip.h (standard_iv_increment_position): Likewise.
> > > 	* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Use it.
> > > 	(vect_set_loop_condition_partial_vectors_avx512): Likewise.
> > > 	(vect_set_loop_condition_normal): Likewise.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
> > > index
> > >
> > bda09f51d5619420331c513a9906831c779fd2b4..5938588c8882d842b00
> > 301423df1
> > > 11cbe7bf7ba8 100644
> > > --- a/gcc/tree-ssa-loop-manip.h
> > > +++ b/gcc/tree-ssa-loop-manip.h
> > > @@ -38,7 +38,8 @@ extern basic_block split_loop_exit_edge (edge, bool
> > > = false);  extern basic_block ip_end_pos (class loop *);  extern
> > > basic_block ip_normal_pos (class loop *);  extern void
> > > standard_iv_increment_position (class loop *,
> > > -					    gimple_stmt_iterator *, bool *);
> > > +					    gimple_stmt_iterator *, bool *,
> > > +					    basic_block = NULL);
> > >  extern bool
> > >  gimple_duplicate_loop_body_to_header_edge (class loop *, edge, unsigned
> > int,
> > >  					   sbitmap, edge, vec<edge> *, int); diff
> > --git
> > > a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc index
> > >
> > e7436915e01297e7af2a3bcf1afd01e014de6f32..bdc7a3d74a788f450ca5d
> > de6c294
> > > 92ce4d4e4550 100644
> > > --- a/gcc/tree-ssa-loop-manip.cc
> > > +++ b/gcc/tree-ssa-loop-manip.cc
> > > @@ -792,14 +792,19 @@ ip_normal_pos (class loop *loop)
> > >
> > >  /* Stores the standard position for induction variable increment in LOOP
> > >     (just before the exit condition if it is available and latch block is empty,
> > > -   end of the latch block otherwise) to BSI.  INSERT_AFTER is set to true if
> > > -   the increment should be inserted after *BSI.  */
> > > +   end of the latch block otherwise) to BSI.  If DEST_BB is specified then that
> > > +   basic block is used as the destination instead of the loop latch source
> > > +   block.  INSERT_AFTER is set to true if the increment should be inserted
> > after
> > > +   *BSI.  */
> > >
> > >  void
> > >  standard_iv_increment_position (class loop *loop, gimple_stmt_iterator
> > *bsi,
> > > -				bool *insert_after)
> > > +				bool *insert_after, basic_block dest_bb)
> > >  {
> > > -  basic_block bb = ip_normal_pos (loop), latch = ip_end_pos (loop);
> > > +  basic_block bb = dest_bb;
> > > +  if (!bb)
> > > +    bb = ip_normal_pos (loop);
> > > +  basic_block latch = ip_end_pos (loop);
> > 
> > I don't think that's a good API extension.  Given that we don't support an early
> > exit after the main IV exit doesn't this code already work fine as-is?  It chooses
> > the last exit.  The position is also not semantically relevant, we just try to keep
> > the latch empty here (that is, it's a bit of a "bad" API).
> > 
> > So, do you really need this change?
> 
> Yes I do, If you look at these kinds of loops https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
> 
> You'll see that the main exit, i.e. the one attached to the latch block is the early break.
> Because SCEV can't analyze it picks the main exit to be the one in BB4.
> 
> This means that the loop control must be placed in BB4.  If we place ivtmp_10 = ivtmp_9 - 1
> In BB 3 then that's broken SSA.  If we use `ivtmp_9` in BB4 then we'll have an off by one issue.

OK, but then I think the fix is to not use standard_iv_increment_position
(it's a weird API anyway).  Instead insert before the main exit condition.

Btw, I assumed this order of main / early exit cannot happen.  But I
didn't re-review the main exit identification code yet.

Richard.

> You could have reached the end of the valid range for the loop when you re-enter BB4, since
> loads are still allowed you'll then read out of bounds before checking that you exit.
> 
> This is also annoyingly hard to get correct, which Is what took me a long time.  Such loops mean
> You need to restart the scalar loop at i_7 if you take the main exit.
> 
> Regards,
> Tamar
> 
> > 
> > Maybe we're really using standard_iv_increment_position wrong here, the
> > result is supposed to _only_ feed the PHI latch argument.
> > Richard.
> > 
> > >    gimple *last = last_nondebug_stmt (latch);
> > >
> > >    if (!bb
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > >
> > 6fbb5b80986fd657814b48eb009b52b094f331e6..3d59119787d6afdc5a64
> > 65a547d1
> > > ea2d3d940373 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -531,7 +531,8 @@ vect_set_loop_controls_directly (class loop *loop,
> > loop_vec_info loop_vinfo,
> > >    tree index_before_incr, index_after_incr;
> > >    gimple_stmt_iterator incr_gsi;
> > >    bool insert_after;
> > > -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > > +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > + standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > > + exit_e->src);
> > >    if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> > >      {
> > >        /* Create an IV that counts down from niters_total and whose
> > > step @@ -1017,7 +1018,8 @@
> > vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
> > >    tree index_before_incr, index_after_incr;
> > >    gimple_stmt_iterator incr_gsi;
> > >    bool insert_after;
> > > -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > > +  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > > +				  exit_edge->src);
> > >    create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
> > >  	     &incr_gsi, insert_after, &index_before_incr,
> > >  	     &index_after_incr);
> > > @@ -1185,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512
> > (class loop *loop,
> > >     loop handles exactly VF scalars per iteration.  */
> > >
> > >  static gcond *
> > > -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > > exit_edge,
> > > +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > > +exit_edge,
> > >  				class loop *loop, tree niters, tree step,
> > >  				tree final_iv, bool niters_maybe_zero,
> > >  				gimple_stmt_iterator loop_cond_gsi) @@ -
> > 1278,7 +1280,8 @@
> > > vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > exit_edge,
> > >  	}
> > >      }
> > >
> > > -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > > +  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > > +				  exit_edge->src);
> > >    create_iv (init, PLUS_EXPR, step, NULL_TREE, loop,
> > >               &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
> > >    indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi,
> > > indx_after_incr,
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch
  2023-11-14  7:56       ` Richard Biener
@ 2023-11-14  8:07         ` Tamar Christina
  2023-11-14 23:59           ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-14  8:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, November 14, 2023 7:56 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 5/21]middle-end: update vectorizer's control update to
> support picking an exit other than loop latch
> 
> On Mon, 13 Nov 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Tuesday, November 7, 2023 3:04 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> jlaw@ventanamicro.com
> > > Subject: Re: [PATCH 5/21]middle-end: update vectorizer's control
> > > update to support picking an exit other than loop latch
> > >
> > > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > As requested, the vectorizer is now free to pick it's own exit
> > > > which can be different than what the loop CFG infrastucture uses.
> > > > The vectorizer makes use of this to vectorize loops that it previously could
> not.
> > > >
> > > > But this means that loop control must be materialized in the block
> > > > that needs it less we corrupt the SSA chain.  This makes it so we
> > > > use the vectorizer's main IV block instead of the loop infra.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* tree-ssa-loop-manip.cc (standard_iv_increment_position):
> > > Conditionally
> > > > 	take dest BB.
> > > > 	* tree-ssa-loop-manip.h (standard_iv_increment_position): Likewise.
> > > > 	* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Use it.
> > > > 	(vect_set_loop_condition_partial_vectors_avx512): Likewise.
> > > > 	(vect_set_loop_condition_normal): Likewise.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/tree-ssa-loop-manip.h b/gcc/tree-ssa-loop-manip.h
> > > > index
> > > >
> > >
> bda09f51d5619420331c513a9906831c779fd2b4..5938588c8882d842b00
> > > 301423df1
> > > > 11cbe7bf7ba8 100644
> > > > --- a/gcc/tree-ssa-loop-manip.h
> > > > +++ b/gcc/tree-ssa-loop-manip.h
> > > > @@ -38,7 +38,8 @@ extern basic_block split_loop_exit_edge (edge,
> > > > bool = false);  extern basic_block ip_end_pos (class loop *);
> > > > extern basic_block ip_normal_pos (class loop *);  extern void
> > > > standard_iv_increment_position (class loop *,
> > > > -					    gimple_stmt_iterator *, bool *);
> > > > +					    gimple_stmt_iterator *, bool *,
> > > > +					    basic_block = NULL);
> > > >  extern bool
> > > >  gimple_duplicate_loop_body_to_header_edge (class loop *, edge,
> > > > unsigned
> > > int,
> > > >  					   sbitmap, edge, vec<edge> *, int); diff
> > > --git
> > > > a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc index
> > > >
> > >
> e7436915e01297e7af2a3bcf1afd01e014de6f32..bdc7a3d74a788f450ca5d
> > > de6c294
> > > > 92ce4d4e4550 100644
> > > > --- a/gcc/tree-ssa-loop-manip.cc
> > > > +++ b/gcc/tree-ssa-loop-manip.cc
> > > > @@ -792,14 +792,19 @@ ip_normal_pos (class loop *loop)
> > > >
> > > >  /* Stores the standard position for induction variable increment in LOOP
> > > >     (just before the exit condition if it is available and latch block is empty,
> > > > -   end of the latch block otherwise) to BSI.  INSERT_AFTER is set to true if
> > > > -   the increment should be inserted after *BSI.  */
> > > > +   end of the latch block otherwise) to BSI.  If DEST_BB is specified then
> that
> > > > +   basic block is used as the destination instead of the loop latch source
> > > > +   block.  INSERT_AFTER is set to true if the increment should be
> > > > + inserted
> > > after
> > > > +   *BSI.  */
> > > >
> > > >  void
> > > >  standard_iv_increment_position (class loop *loop,
> > > > gimple_stmt_iterator
> > > *bsi,
> > > > -				bool *insert_after)
> > > > +				bool *insert_after, basic_block dest_bb)
> > > >  {
> > > > -  basic_block bb = ip_normal_pos (loop), latch = ip_end_pos
> > > > (loop);
> > > > +  basic_block bb = dest_bb;
> > > > +  if (!bb)
> > > > +    bb = ip_normal_pos (loop);
> > > > +  basic_block latch = ip_end_pos (loop);
> > >
> > > I don't think that's a good API extension.  Given that we don't
> > > support an early exit after the main IV exit doesn't this code
> > > already work fine as-is?  It chooses the last exit.  The position is
> > > also not semantically relevant, we just try to keep the latch empty here
> (that is, it's a bit of a "bad" API).
> > >
> > > So, do you really need this change?
> >
> > Yes I do, If you look at these kinds of loops
> > https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
> >
> > You'll see that the main exit, i.e. the one attached to the latch block is the
> early break.
> > Because SCEV can't analyze it picks the main exit to be the one in BB4.
> >
> > This means that the loop control must be placed in BB4.  If we place
> > ivtmp_10 = ivtmp_9 - 1 In BB 3 then that's broken SSA.  If we use `ivtmp_9`
> in BB4 then we'll have an off by one issue.
> 
> OK, but then I think the fix is to not use standard_iv_increment_position (it's a
> weird API anyway).  Instead insert before the main exit condition.

I figured as much, Almost done respinning it with the vectorizer's own simpler copy.
Should be out today with the rest.

> 
> Btw, I assumed this order of main / early exit cannot happen.  But I didn't re-
> review the main exit identification code yet.
> 

It can happen because we allowed vec_init_loop_exit_info to pick the last
analyzable exit.  In cases like these it happens because the final exit has no
Information from SCEV. It then picks the last exit it could analyze which by
default is an early exit.

It's very tricky to deal with and have just finished cleaning up the IV update
code to make it easier to follow... but it does seem to add about 970 more
vectorized cases (most of which are execution tests).

Regards,
Tamar

> Richard.
> 
> > You could have reached the end of the valid range for the loop when
> > you re-enter BB4, since loads are still allowed you'll then read out of bounds
> before checking that you exit.
> >
> > This is also annoyingly hard to get correct, which Is what took me a
> > long time.  Such loops mean You need to restart the scalar loop at i_7 if you
> take the main exit.
> >
> > Regards,
> > Tamar
> >
> > >
> > > Maybe we're really using standard_iv_increment_position wrong here,
> > > the result is supposed to _only_ feed the PHI latch argument.
> > > Richard.
> > >
> > > >    gimple *last = last_nondebug_stmt (latch);
> > > >
> > > >    if (!bb
> > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > b/gcc/tree-vect-loop-manip.cc index
> > > >
> > >
> 6fbb5b80986fd657814b48eb009b52b094f331e6..3d59119787d6afdc5a64
> > > 65a547d1
> > > > ea2d3d940373 100644
> > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > @@ -531,7 +531,8 @@ vect_set_loop_controls_directly (class loop
> > > > *loop,
> > > loop_vec_info loop_vinfo,
> > > >    tree index_before_incr, index_after_incr;
> > > >    gimple_stmt_iterator incr_gsi;
> > > >    bool insert_after;
> > > > -  standard_iv_increment_position (loop, &incr_gsi,
> > > > &insert_after);
> > > > +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > + standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > > > + exit_e->src);
> > > >    if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> > > >      {
> > > >        /* Create an IV that counts down from niters_total and
> > > > whose step @@ -1017,7 +1018,8 @@
> > > vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
> > > >    tree index_before_incr, index_after_incr;
> > > >    gimple_stmt_iterator incr_gsi;
> > > >    bool insert_after;
> > > > -  standard_iv_increment_position (loop, &incr_gsi,
> > > > &insert_after);
> > > > +  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > > > +				  exit_edge->src);
> > > >    create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
> > > >  	     &incr_gsi, insert_after, &index_before_incr,
> > > >  	     &index_after_incr);
> > > > @@ -1185,7 +1187,7 @@
> > > > vect_set_loop_condition_partial_vectors_avx512
> > > (class loop *loop,
> > > >     loop handles exactly VF scalars per iteration.  */
> > > >
> > > >  static gcond *
> > > > -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */,
> > > > edge exit_edge,
> > > > +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > > > +exit_edge,
> > > >  				class loop *loop, tree niters, tree step,
> > > >  				tree final_iv, bool niters_maybe_zero,
> > > >  				gimple_stmt_iterator loop_cond_gsi) @@ -
> > > 1278,7 +1280,8 @@
> > > > vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */,
> > > > edge
> > > exit_edge,
> > > >  	}
> > > >      }
> > > >
> > > > -  standard_iv_increment_position (loop, &incr_gsi,
> > > > &insert_after);
> > > > +  standard_iv_increment_position (loop, &incr_gsi, &insert_after,
> > > > +				  exit_edge->src);
> > > >    create_iv (init, PLUS_EXPR, step, NULL_TREE, loop,
> > > >               &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
> > > >    indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi,
> > > > indx_after_incr,
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch
  2023-11-14  8:07         ` Tamar Christina
@ 2023-11-14 23:59           ` Tamar Christina
  2023-11-15 12:14             ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-14 23:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 5211 bytes --]

> > OK, but then I think the fix is to not use
> > standard_iv_increment_position (it's a weird API anyway).  Instead insert
> before the main exit condition.
> 
> I figured as much, Almost done respinning it with the vectorizer's own simpler
> copy.
> Should be out today with the rest.
> 
> >
> > Btw, I assumed this order of main / early exit cannot happen.  But I
> > didn't re- review the main exit identification code yet.
> >
> 
> It can happen because we allowed vec_init_loop_exit_info to pick the last
> analyzable exit.  In cases like these it happens because the final exit has no
> Information from SCEV. It then picks the last exit it could analyze which by
> default is an early exit.
> 
> It's very tricky to deal with and have just finished cleaning up the IV update
> code to make it easier to follow... but it does seem to add about 970 more
> vectorized cases (most of which are execution tests).
> 

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop-manip.cc (vect_iv_increment_position): New.
	(vect_set_loop_controls_directly): Use it.
	(vect_set_loop_condition_partial_vectors_avx512): Likewise.
	(vect_set_loop_condition_normal): Likewise.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index fafbf924e8db18eb4eec7a4a1906d10f6ce9812f..a5a612dc6b47436730592469176623685a7a413f 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -448,6 +448,20 @@ vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
     }
 }
 
+/* Stores the standard position for induction variable increment in belonging to
+   LOOP_EXIT (just before the exit condition of the given exit to BSI.
+   INSERT_AFTER is set to true if the increment should be inserted after
+   *BSI.  */
+
+static void
+vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
+			    bool *insert_after)
+{
+  basic_block bb = loop_exit->src;
+  *bsi = gsi_last_bb (bb);
+  *insert_after = false;
+}
+
 /* Helper for vect_set_loop_condition_partial_vectors.  Generate definitions
    for all the rgroup controls in RGC and return a control that is nonzero
    when the loop needs to iterate.  Add any new preheader statements to
@@ -531,7 +545,8 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
   tree index_before_incr, index_after_incr;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  vect_iv_increment_position (exit_e, &incr_gsi, &insert_after);
   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
     {
       /* Create an IV that counts down from niters_total and whose step
@@ -1017,7 +1032,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
   tree index_before_incr, index_after_incr;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  vect_iv_increment_position (exit_edge, &incr_gsi, &insert_after);
   create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
 	     &incr_gsi, insert_after, &index_before_incr,
 	     &index_after_incr);
@@ -1185,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1278,7 +1293,7 @@ vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
 	}
     }
 
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  vect_iv_increment_position (exit_edge, &incr_gsi, &insert_after);
   create_iv (init, PLUS_EXPR, step, NULL_TREE, loop,
              &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
   indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, indx_after_incr,
@@ -1446,7 +1461,7 @@ slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
       redirect_edge_and_branch (exit, dest);
     }
 
-  /* Only fush the main exit, the remaining exits we need to match the order
+  /* Only flush the main exit, the remaining exits we need to match the order
      in the loop->header which with multiple exits may not be the same.  */
   flush_pending_stmts (loop_exit);
 
@@ -1519,7 +1534,9 @@ slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
 		  SET_PHI_ARG_DEF (alt_lcssa_phi, main_e->dest_idx, new_arg);
 		}
 	    new_arg = alt_res; /* Push it down to the new_loop header.  */
-	  } else if (!res) {
+	  }
+	else if (!res)
+	  {
 	    /* For non-early break we need to keep the possibly live values in
 	       the exit block.  For early break these are kept in the merge
 	       block in the code above.  */

[-- Attachment #2: rb17965.patch --]
[-- Type: application/octet-stream, Size: 3814 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index fafbf924e8db18eb4eec7a4a1906d10f6ce9812f..a5a612dc6b47436730592469176623685a7a413f 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -448,6 +448,20 @@ vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
     }
 }
 
+/* Stores the standard position for induction variable increment in belonging to
+   LOOP_EXIT (just before the exit condition of the given exit to BSI.
+   INSERT_AFTER is set to true if the increment should be inserted after
+   *BSI.  */
+
+static void
+vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
+			    bool *insert_after)
+{
+  basic_block bb = loop_exit->src;
+  *bsi = gsi_last_bb (bb);
+  *insert_after = false;
+}
+
 /* Helper for vect_set_loop_condition_partial_vectors.  Generate definitions
    for all the rgroup controls in RGC and return a control that is nonzero
    when the loop needs to iterate.  Add any new preheader statements to
@@ -531,7 +545,8 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
   tree index_before_incr, index_after_incr;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  vect_iv_increment_position (exit_e, &incr_gsi, &insert_after);
   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
     {
       /* Create an IV that counts down from niters_total and whose step
@@ -1017,7 +1032,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
   tree index_before_incr, index_after_incr;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  vect_iv_increment_position (exit_edge, &incr_gsi, &insert_after);
   create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
 	     &incr_gsi, insert_after, &index_before_incr,
 	     &index_after_incr);
@@ -1185,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1278,7 +1293,7 @@ vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
 	}
     }
 
-  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+  vect_iv_increment_position (exit_edge, &incr_gsi, &insert_after);
   create_iv (init, PLUS_EXPR, step, NULL_TREE, loop,
              &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
   indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, indx_after_incr,
@@ -1446,7 +1461,7 @@ slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
       redirect_edge_and_branch (exit, dest);
     }
 
-  /* Only fush the main exit, the remaining exits we need to match the order
+  /* Only flush the main exit, the remaining exits we need to match the order
      in the loop->header which with multiple exits may not be the same.  */
   flush_pending_stmts (loop_exit);
 
@@ -1519,7 +1534,9 @@ slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
 		  SET_PHI_ARG_DEF (alt_lcssa_phi, main_e->dest_idx, new_arg);
 		}
 	    new_arg = alt_res; /* Push it down to the new_loop header.  */
-	  } else if (!res) {
+	  }
+	else if (!res)
+	  {
 	    /* For non-early break we need to keep the possibly live values in
 	       the exit block.  For early break these are kept in the merge
 	       block in the code above.  */

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks
  2023-11-06  7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
@ 2023-11-15  0:00   ` Tamar Christina
  2023-11-15 12:40     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-15  0:00 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 16458 bytes --]

Patch updated to latest trunk,

This splits the part of the function that does peeling for loops at exits to
a different function.  In this new function we also peel for early breaks.

Peeling for early breaks works by redirecting all early break exits to a
single "early break" block and combine them and the normal exit edge together
later in a different block which then goes into the epilog preheader.

This allows us to re-use all the existing code for IV updates, Additionally this
also enables correct linking for multiple vector epilogues.

flush_pending_stmts cannot be used in this scenario since it updates the PHI
nodes in the order that they are in the exit destination blocks.  This means
they are in CFG visit order.  With a single exit this doesn't matter but with
multiple exits with different live values through the different exits the order
usually does not line up.

Additionally the vectorizer helper functions expect to be able to iterate over
the nodes in the order that they occur in the loop header blocks.  This is an
invariant we must maintain.  To do this we just inline the work
flush_pending_stmts but maintain the order by using the header blocks to guide
the work.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop-manip.cc (vect_is_loop_exit_latch_pred): New.
	(slpeel_tree_duplicate_loop_for_vectorization): New.
	(slpeel_tree_duplicate_loop_to_edge_cfg): use it.
	* tree-vectorizer.h (is_loop_header_bb_p): Drop assert.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Update signature.
	(vect_is_loop_exit_latch_pred): New.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index b9161274ce401a7307f3e61ad23aa036701190d7..fafbf924e8db18eb4eec7a4a1906d10f6ce9812f 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1392,6 +1392,153 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
 		     (gimple *) cond_stmt);
 }
 
+/* Determine if the exit choosen by the loop vectorizer differs from the
+   natural loop exit.  i.e. if the exit leads to the loop patch or not.
+   When this happens we need to flip the understanding of main and other
+   exits by peeling and IV updates.  */
+
+bool inline
+vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
+{
+  return single_pred (loop->latch) == loop_exit->src;
+}
+
+/* Perform peeling for when the peeled loop is placed after the original loop.
+   This maintains LCSSA and creates the appropriate blocks for multiple exit
+   vectorization.   */
+
+void static
+slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
+					      vec<edge> &loop_exits,
+					      class loop *new_loop,
+					      bool flow_loops,
+					      basic_block new_preheader)
+{
+  bool multiple_exits_p = loop_exits.length () > 1;
+  basic_block main_loop_exit_block = new_preheader;
+  if (multiple_exits_p)
+    {
+      edge loop_entry = single_succ_edge (new_preheader);
+      new_preheader = split_edge (loop_entry);
+    }
+
+  auto_vec <gimple *> new_phis;
+  hash_map <tree, tree> new_phi_args;
+  /* First create the empty phi nodes so that when we flush the
+     statements they can be filled in.   However because there is no order
+     between the PHI nodes in the exits and the loop headers we need to
+     order them base on the order of the two headers.  First record the new
+     phi nodes. Then redirect the edges and flush the changes.  This writes out
+     the new SSA names.  */
+  for (auto gsi_from = gsi_start_phis (loop_exit->dest);
+       !gsi_end_p (gsi_from); gsi_next (&gsi_from))
+    {
+      gimple *from_phi = gsi_stmt (gsi_from);
+      tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+      gphi *res = create_phi_node (new_res, main_loop_exit_block);
+      new_phis.safe_push (res);
+    }
+
+  for (auto exit : loop_exits)
+    {
+      basic_block dest
+	= exit == loop_exit ? main_loop_exit_block : new_preheader;
+      redirect_edge_and_branch (exit, dest);
+    }
+
+  /* Only fush the main exit, the remaining exits we need to match the order
+     in the loop->header which with multiple exits may not be the same.  */
+  flush_pending_stmts (loop_exit);
+
+  /* Record the new SSA names in the cache so that we can skip materializing
+     them again when we fill in the rest of the LCSSA variables.  */
+  for (auto phi : new_phis)
+    {
+      tree new_arg = gimple_phi_arg (phi, 0)->def;
+
+      if (!SSA_VAR_P (new_arg))
+	continue;
+
+      /* If the PHI MEM node dominates the loop then we shouldn't create
+	 a new LC-SSSA PHI for it in the intermediate block.   */
+      /* A MEM phi that consitutes a new DEF for the vUSE chain can either
+	 be a .VDEF or a PHI that operates on MEM. And said definition
+	 must not be inside the main loop.  Or we must be a parameter.
+	 In the last two cases we may remove a non-MEM PHI node, but since
+	 they dominate both loops the removal is unlikely to cause trouble
+	 as the exits must already be using them.  */
+      if (virtual_operand_p (new_arg)
+	  && (SSA_NAME_IS_DEFAULT_DEF (new_arg)
+	      || !flow_bb_inside_loop_p (loop,
+				gimple_bb (SSA_NAME_DEF_STMT (new_arg)))))
+	{
+	  auto gsi = gsi_for_stmt (phi);
+	  remove_phi_node (&gsi, true);
+	  continue;
+	}
+
+      /* If we decide to remove the PHI node we should also not
+	 rematerialize it later on.  */
+      new_phi_args.put (new_arg, gimple_phi_result (phi));
+
+      if (TREE_CODE (new_arg) != SSA_NAME)
+	continue;
+    }
+
+  /* Copy the current loop LC PHI nodes between the original loop exit
+     block and the new loop header.  This allows us to later split the
+     preheader block and still find the right LC nodes.  */
+  edge loop_entry = single_succ_edge (new_preheader);
+  if (flow_loops)
+    for (auto gsi_from = gsi_start_phis (loop->header),
+	 gsi_to = gsi_start_phis (new_loop->header);
+	 !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	 gsi_next (&gsi_from), gsi_next (&gsi_to))
+      {
+	gimple *from_phi = gsi_stmt (gsi_from);
+	gimple *to_phi = gsi_stmt (gsi_to);
+	tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, loop_latch_edge (loop));
+	tree *res = NULL;
+
+	/* Check if we've already created a new phi node during edge
+	   redirection.  If we have, only propagate the value downwards.  */
+	if ((res = new_phi_args.get (new_arg)))
+	  new_arg = *res;
+
+	/* All other exits use the previous iters.  */
+	if (multiple_exits_p)
+	  {
+	    tree alt_arg = gimple_phi_result (from_phi);
+	    tree alt_res = copy_ssa_name (alt_arg);
+	    gphi *alt_lcssa_phi = create_phi_node (alt_res, new_preheader);
+	    edge main_e = single_succ_edge (main_loop_exit_block);
+	    for (edge e : loop_exits)
+	      if (e != loop_exit)
+		{
+		  add_phi_arg (alt_lcssa_phi, alt_arg, e, UNKNOWN_LOCATION);
+		  SET_PHI_ARG_DEF (alt_lcssa_phi, main_e->dest_idx, new_arg);
+		}
+	    new_arg = alt_res; /* Push it down to the new_loop header.  */
+	  } else if (!res) {
+	    /* For non-early break we need to keep the possibly live values in
+	       the exit block.  For early break these are kept in the merge
+	       block in the code above.  */
+	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
+
+	    /* Main loop exit should use the final iter value.  */
+	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
+	    new_arg = new_res;
+	  }
+
+	adjust_phi_and_debug_stmts (to_phi, loop_entry, new_arg);
+    }
+
+  /* Now clear all the redirect maps.  */
+  for (auto exit : loop_exits)
+    redirect_edge_var_map_clear (exit);
+}
+
 /* Given LOOP this function generates a new copy of it and puts it
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
@@ -1403,13 +1550,16 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
    copies remains the same.
 
    If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
-   dominators were updated during the peeling.  */
+   dominators were updated during the peeling.  When doing early break vectorization
+   then LOOP_VINFO needs to be provided and is used to keep track of any newly created
+   memory references that need to be updated should we decide to vectorize.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 					class loop *scalar_loop,
 					edge scalar_exit, edge e, edge *new_e,
-					bool flow_loops)
+					bool flow_loops,
+					vec<basic_block> *updated_doms)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1526,7 +1676,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       }
 
   auto loop_exits = get_loop_exit_edges (loop);
+  bool multiple_exits_p = loop_exits.length () > 1;
   auto_vec<basic_block> doms;
+  class loop *update_loop = NULL;
 
   if (at_exit) /* Add the loop copy at exit.  */
     {
@@ -1536,91 +1688,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	  flush_pending_stmts (new_exit);
 	}
 
-      auto_vec <gimple *> new_phis;
-      hash_map <tree, tree> new_phi_args;
-      /* First create the empty phi nodes so that when we flush the
-	 statements they can be filled in.   However because there is no order
-	 between the PHI nodes in the exits and the loop headers we need to
-	 order them base on the order of the two headers.  First record the new
-	 phi nodes.  */
-      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
-	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
-	{
-	  gimple *from_phi = gsi_stmt (gsi_from);
-	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	  gphi *res = create_phi_node (new_res, new_preheader);
-	  new_phis.safe_push (res);
-	}
-
-      /* Then redirect the edges and flush the changes.  This writes out the new
-	 SSA names.  */
-      for (edge exit : loop_exits)
-	{
-	  edge temp_e = redirect_edge_and_branch (exit, new_preheader);
-	  flush_pending_stmts (temp_e);
-	}
-      /* Record the new SSA names in the cache so that we can skip materializing
-	 them again when we fill in the rest of the LCSSA variables.  */
-      for (auto phi : new_phis)
-	{
-	  tree new_arg = gimple_phi_arg (phi, 0)->def;
-
-	  if (!SSA_VAR_P (new_arg))
-	    continue;
-	  /* If the PHI MEM node dominates the loop then we shouldn't create
-	      a new LC-SSSA PHI for it in the intermediate block.   */
-	  /* A MEM phi that consitutes a new DEF for the vUSE chain can either
-	     be a .VDEF or a PHI that operates on MEM. And said definition
-	     must not be inside the main loop.  Or we must be a parameter.
-	     In the last two cases we may remove a non-MEM PHI node, but since
-	     they dominate both loops the removal is unlikely to cause trouble
-	     as the exits must already be using them.  */
-	  if (virtual_operand_p (new_arg)
-	      && (SSA_NAME_IS_DEFAULT_DEF (new_arg)
-		  || !flow_bb_inside_loop_p (loop,
-				gimple_bb (SSA_NAME_DEF_STMT (new_arg)))))
-	    {
-	      auto gsi = gsi_for_stmt (phi);
-	      remove_phi_node (&gsi, true);
-	      continue;
-	    }
-	  new_phi_args.put (new_arg, gimple_phi_result (phi));
-
-	  if (TREE_CODE (new_arg) != SSA_NAME)
-	    continue;
-	}
-
-      /* Copy the current loop LC PHI nodes between the original loop exit
-	 block and the new loop header.  This allows us to later split the
-	 preheader block and still find the right LC nodes.  */
-      edge loop_entry = single_succ_edge (new_preheader);
-      if (flow_loops)
-	for (auto gsi_from = gsi_start_phis (loop->header),
-	     gsi_to = gsi_start_phis (new_loop->header);
-	     !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
-	     gsi_next (&gsi_from), gsi_next (&gsi_to))
-	  {
-	    gimple *from_phi = gsi_stmt (gsi_from);
-	    gimple *to_phi = gsi_stmt (gsi_to);
-	    tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
-						  loop_latch_edge (loop));
-
-	    /* Check if we've already created a new phi node during edge
-	       redirection.  If we have, only propagate the value downwards.  */
-	    if (tree *res = new_phi_args.get (new_arg))
-	      {
-		adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
-		continue;
-	      }
-
-	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
-
-	    /* Main loop exit should use the final iter value.  */
-	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
-
-	    adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
-	  }
+      slpeel_tree_duplicate_loop_for_vectorization (loop, loop_exit, loop_exits,
+						    new_loop, flow_loops,
+						    new_preheader);
 
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
 
@@ -1634,6 +1704,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (preheader);
       set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
 			       loop_preheader_edge (scalar_loop)->src);
+
+      /* Finally after wiring the new epilogue we need to update its main exit
+	 to the original function exit we recorded.  Other exits are already
+	 correct.  */
+      if (multiple_exits_p)
+	{
+	  update_loop = new_loop;
+	  for (edge e : get_loop_exit_edges (loop))
+	    doms.safe_push (e->dest);
+	  doms.safe_push (exit_dest);
+
+	  /* Likely a fall-through edge, so update if needed.  */
+	  if (single_succ_p (exit_dest))
+	    doms.safe_push (single_succ (exit_dest));
+	}
     }
   else /* Add the copy at entry.  */
     {
@@ -1681,6 +1766,34 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (new_preheader);
       set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
 			       loop_preheader_edge (new_loop)->src);
+
+      if (multiple_exits_p)
+	update_loop = loop;
+    }
+
+  if (multiple_exits_p)
+    {
+      for (edge e : get_loop_exit_edges (update_loop))
+	{
+	  edge ex;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
+	    {
+	      /* Find the first non-fallthrough block as fall-throughs can't
+		 dominate other blocks.  */
+	      if (single_succ_p (ex->dest))
+		{
+		  doms.safe_push (ex->dest);
+		  ex = single_succ_edge (ex->dest);
+		}
+	      doms.safe_push (ex->dest);
+	    }
+	  doms.safe_push (e->dest);
+	}
+
+      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+      if (updated_doms)
+	updated_doms->safe_splice (doms);
     }
 
   free (new_bbs);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 76451a7fefe6ff966cecfa2cbc7b11336b038565..b9a71a0b5f5407417e8366b0df132df20c7f60aa 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1821,7 +1821,7 @@ is_loop_header_bb_p (basic_block bb)
 {
   if (bb == (bb->loop_father)->header)
     return true;
-  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
+
   return false;
 }
 
@@ -2212,7 +2212,8 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
 					 const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
 						    class loop *, edge,
-						    edge, edge *, bool = true);
+						    edge, edge *, bool = true,
+						    vec<basic_block> * = NULL);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,
@@ -2223,6 +2224,7 @@ extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
+extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,

[-- Attachment #2: rb17964.patch --]
[-- Type: application/octet-stream, Size: 14379 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index b9161274ce401a7307f3e61ad23aa036701190d7..fafbf924e8db18eb4eec7a4a1906d10f6ce9812f 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1392,6 +1392,153 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
 		     (gimple *) cond_stmt);
 }
 
+/* Determine if the exit choosen by the loop vectorizer differs from the
+   natural loop exit.  i.e. if the exit leads to the loop patch or not.
+   When this happens we need to flip the understanding of main and other
+   exits by peeling and IV updates.  */
+
+bool inline
+vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
+{
+  return single_pred (loop->latch) == loop_exit->src;
+}
+
+/* Perform peeling for when the peeled loop is placed after the original loop.
+   This maintains LCSSA and creates the appropriate blocks for multiple exit
+   vectorization.   */
+
+void static
+slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
+					      vec<edge> &loop_exits,
+					      class loop *new_loop,
+					      bool flow_loops,
+					      basic_block new_preheader)
+{
+  bool multiple_exits_p = loop_exits.length () > 1;
+  basic_block main_loop_exit_block = new_preheader;
+  if (multiple_exits_p)
+    {
+      edge loop_entry = single_succ_edge (new_preheader);
+      new_preheader = split_edge (loop_entry);
+    }
+
+  auto_vec <gimple *> new_phis;
+  hash_map <tree, tree> new_phi_args;
+  /* First create the empty phi nodes so that when we flush the
+     statements they can be filled in.   However because there is no order
+     between the PHI nodes in the exits and the loop headers we need to
+     order them base on the order of the two headers.  First record the new
+     phi nodes. Then redirect the edges and flush the changes.  This writes out
+     the new SSA names.  */
+  for (auto gsi_from = gsi_start_phis (loop_exit->dest);
+       !gsi_end_p (gsi_from); gsi_next (&gsi_from))
+    {
+      gimple *from_phi = gsi_stmt (gsi_from);
+      tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+      gphi *res = create_phi_node (new_res, main_loop_exit_block);
+      new_phis.safe_push (res);
+    }
+
+  for (auto exit : loop_exits)
+    {
+      basic_block dest
+	= exit == loop_exit ? main_loop_exit_block : new_preheader;
+      redirect_edge_and_branch (exit, dest);
+    }
+
+  /* Only fush the main exit, the remaining exits we need to match the order
+     in the loop->header which with multiple exits may not be the same.  */
+  flush_pending_stmts (loop_exit);
+
+  /* Record the new SSA names in the cache so that we can skip materializing
+     them again when we fill in the rest of the LCSSA variables.  */
+  for (auto phi : new_phis)
+    {
+      tree new_arg = gimple_phi_arg (phi, 0)->def;
+
+      if (!SSA_VAR_P (new_arg))
+	continue;
+
+      /* If the PHI MEM node dominates the loop then we shouldn't create
+	 a new LC-SSSA PHI for it in the intermediate block.   */
+      /* A MEM phi that consitutes a new DEF for the vUSE chain can either
+	 be a .VDEF or a PHI that operates on MEM. And said definition
+	 must not be inside the main loop.  Or we must be a parameter.
+	 In the last two cases we may remove a non-MEM PHI node, but since
+	 they dominate both loops the removal is unlikely to cause trouble
+	 as the exits must already be using them.  */
+      if (virtual_operand_p (new_arg)
+	  && (SSA_NAME_IS_DEFAULT_DEF (new_arg)
+	      || !flow_bb_inside_loop_p (loop,
+				gimple_bb (SSA_NAME_DEF_STMT (new_arg)))))
+	{
+	  auto gsi = gsi_for_stmt (phi);
+	  remove_phi_node (&gsi, true);
+	  continue;
+	}
+
+      /* If we decide to remove the PHI node we should also not
+	 rematerialize it later on.  */
+      new_phi_args.put (new_arg, gimple_phi_result (phi));
+
+      if (TREE_CODE (new_arg) != SSA_NAME)
+	continue;
+    }
+
+  /* Copy the current loop LC PHI nodes between the original loop exit
+     block and the new loop header.  This allows us to later split the
+     preheader block and still find the right LC nodes.  */
+  edge loop_entry = single_succ_edge (new_preheader);
+  if (flow_loops)
+    for (auto gsi_from = gsi_start_phis (loop->header),
+	 gsi_to = gsi_start_phis (new_loop->header);
+	 !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	 gsi_next (&gsi_from), gsi_next (&gsi_to))
+      {
+	gimple *from_phi = gsi_stmt (gsi_from);
+	gimple *to_phi = gsi_stmt (gsi_to);
+	tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, loop_latch_edge (loop));
+	tree *res = NULL;
+
+	/* Check if we've already created a new phi node during edge
+	   redirection.  If we have, only propagate the value downwards.  */
+	if ((res = new_phi_args.get (new_arg)))
+	  new_arg = *res;
+
+	/* All other exits use the previous iters.  */
+	if (multiple_exits_p)
+	  {
+	    tree alt_arg = gimple_phi_result (from_phi);
+	    tree alt_res = copy_ssa_name (alt_arg);
+	    gphi *alt_lcssa_phi = create_phi_node (alt_res, new_preheader);
+	    edge main_e = single_succ_edge (main_loop_exit_block);
+	    for (edge e : loop_exits)
+	      if (e != loop_exit)
+		{
+		  add_phi_arg (alt_lcssa_phi, alt_arg, e, UNKNOWN_LOCATION);
+		  SET_PHI_ARG_DEF (alt_lcssa_phi, main_e->dest_idx, new_arg);
+		}
+	    new_arg = alt_res; /* Push it down to the new_loop header.  */
+	  } else if (!res) {
+	    /* For non-early break we need to keep the possibly live values in
+	       the exit block.  For early break these are kept in the merge
+	       block in the code above.  */
+	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
+
+	    /* Main loop exit should use the final iter value.  */
+	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
+	    new_arg = new_res;
+	  }
+
+	adjust_phi_and_debug_stmts (to_phi, loop_entry, new_arg);
+    }
+
+  /* Now clear all the redirect maps.  */
+  for (auto exit : loop_exits)
+    redirect_edge_var_map_clear (exit);
+}
+
 /* Given LOOP this function generates a new copy of it and puts it
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
@@ -1403,13 +1550,16 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
    copies remains the same.
 
    If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
-   dominators were updated during the peeling.  */
+   dominators were updated during the peeling.  When doing early break vectorization
+   then LOOP_VINFO needs to be provided and is used to keep track of any newly created
+   memory references that need to be updated should we decide to vectorize.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 					class loop *scalar_loop,
 					edge scalar_exit, edge e, edge *new_e,
-					bool flow_loops)
+					bool flow_loops,
+					vec<basic_block> *updated_doms)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1526,7 +1676,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       }
 
   auto loop_exits = get_loop_exit_edges (loop);
+  bool multiple_exits_p = loop_exits.length () > 1;
   auto_vec<basic_block> doms;
+  class loop *update_loop = NULL;
 
   if (at_exit) /* Add the loop copy at exit.  */
     {
@@ -1536,91 +1688,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	  flush_pending_stmts (new_exit);
 	}
 
-      auto_vec <gimple *> new_phis;
-      hash_map <tree, tree> new_phi_args;
-      /* First create the empty phi nodes so that when we flush the
-	 statements they can be filled in.   However because there is no order
-	 between the PHI nodes in the exits and the loop headers we need to
-	 order them base on the order of the two headers.  First record the new
-	 phi nodes.  */
-      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
-	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
-	{
-	  gimple *from_phi = gsi_stmt (gsi_from);
-	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	  gphi *res = create_phi_node (new_res, new_preheader);
-	  new_phis.safe_push (res);
-	}
-
-      /* Then redirect the edges and flush the changes.  This writes out the new
-	 SSA names.  */
-      for (edge exit : loop_exits)
-	{
-	  edge temp_e = redirect_edge_and_branch (exit, new_preheader);
-	  flush_pending_stmts (temp_e);
-	}
-      /* Record the new SSA names in the cache so that we can skip materializing
-	 them again when we fill in the rest of the LCSSA variables.  */
-      for (auto phi : new_phis)
-	{
-	  tree new_arg = gimple_phi_arg (phi, 0)->def;
-
-	  if (!SSA_VAR_P (new_arg))
-	    continue;
-	  /* If the PHI MEM node dominates the loop then we shouldn't create
-	      a new LC-SSSA PHI for it in the intermediate block.   */
-	  /* A MEM phi that consitutes a new DEF for the vUSE chain can either
-	     be a .VDEF or a PHI that operates on MEM. And said definition
-	     must not be inside the main loop.  Or we must be a parameter.
-	     In the last two cases we may remove a non-MEM PHI node, but since
-	     they dominate both loops the removal is unlikely to cause trouble
-	     as the exits must already be using them.  */
-	  if (virtual_operand_p (new_arg)
-	      && (SSA_NAME_IS_DEFAULT_DEF (new_arg)
-		  || !flow_bb_inside_loop_p (loop,
-				gimple_bb (SSA_NAME_DEF_STMT (new_arg)))))
-	    {
-	      auto gsi = gsi_for_stmt (phi);
-	      remove_phi_node (&gsi, true);
-	      continue;
-	    }
-	  new_phi_args.put (new_arg, gimple_phi_result (phi));
-
-	  if (TREE_CODE (new_arg) != SSA_NAME)
-	    continue;
-	}
-
-      /* Copy the current loop LC PHI nodes between the original loop exit
-	 block and the new loop header.  This allows us to later split the
-	 preheader block and still find the right LC nodes.  */
-      edge loop_entry = single_succ_edge (new_preheader);
-      if (flow_loops)
-	for (auto gsi_from = gsi_start_phis (loop->header),
-	     gsi_to = gsi_start_phis (new_loop->header);
-	     !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
-	     gsi_next (&gsi_from), gsi_next (&gsi_to))
-	  {
-	    gimple *from_phi = gsi_stmt (gsi_from);
-	    gimple *to_phi = gsi_stmt (gsi_to);
-	    tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
-						  loop_latch_edge (loop));
-
-	    /* Check if we've already created a new phi node during edge
-	       redirection.  If we have, only propagate the value downwards.  */
-	    if (tree *res = new_phi_args.get (new_arg))
-	      {
-		adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
-		continue;
-	      }
-
-	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
-
-	    /* Main loop exit should use the final iter value.  */
-	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
-
-	    adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
-	  }
+      slpeel_tree_duplicate_loop_for_vectorization (loop, loop_exit, loop_exits,
+						    new_loop, flow_loops,
+						    new_preheader);
 
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
 
@@ -1634,6 +1704,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (preheader);
       set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
 			       loop_preheader_edge (scalar_loop)->src);
+
+      /* Finally after wiring the new epilogue we need to update its main exit
+	 to the original function exit we recorded.  Other exits are already
+	 correct.  */
+      if (multiple_exits_p)
+	{
+	  update_loop = new_loop;
+	  for (edge e : get_loop_exit_edges (loop))
+	    doms.safe_push (e->dest);
+	  doms.safe_push (exit_dest);
+
+	  /* Likely a fall-through edge, so update if needed.  */
+	  if (single_succ_p (exit_dest))
+	    doms.safe_push (single_succ (exit_dest));
+	}
     }
   else /* Add the copy at entry.  */
     {
@@ -1681,6 +1766,34 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (new_preheader);
       set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
 			       loop_preheader_edge (new_loop)->src);
+
+      if (multiple_exits_p)
+	update_loop = loop;
+    }
+
+  if (multiple_exits_p)
+    {
+      for (edge e : get_loop_exit_edges (update_loop))
+	{
+	  edge ex;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
+	    {
+	      /* Find the first non-fallthrough block as fall-throughs can't
+		 dominate other blocks.  */
+	      if (single_succ_p (ex->dest))
+		{
+		  doms.safe_push (ex->dest);
+		  ex = single_succ_edge (ex->dest);
+		}
+	      doms.safe_push (ex->dest);
+	    }
+	  doms.safe_push (e->dest);
+	}
+
+      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+      if (updated_doms)
+	updated_doms->safe_splice (doms);
     }
 
   free (new_bbs);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 76451a7fefe6ff966cecfa2cbc7b11336b038565..b9a71a0b5f5407417e8366b0df132df20c7f60aa 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1821,7 +1821,7 @@ is_loop_header_bb_p (basic_block bb)
 {
   if (bb == (bb->loop_father)->header)
     return true;
-  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
+
   return false;
 }
 
@@ -2212,7 +2212,8 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
 					 const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
 						    class loop *, edge,
-						    edge, edge *, bool = true);
+						    edge, edge *, bool = true,
+						    vec<basic_block> * = NULL);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,
@@ -2223,6 +2224,7 @@ extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
+extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-06  7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
@ 2023-11-15  0:03   ` Tamar Christina
  2023-11-15 13:01     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-15  0:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 9209 bytes --]

Patch updated to latest trunk:

Hi All,

This changes the PHI node updates to support early breaks.
It has to support both the case where the loop's exit matches the normal loop
exit and one where the early exit is "inverted", i.e. it's an early exit edge.

In the latter case we must always restart the loop for VF iterations.  For an
early exit the reason is obvious, but there are cases where the "normal" exit
is located before the early one.  This exit then does a check on ivtmp resulting
in us leaving the loop since it thinks we're done.

In these case we may still have side-effects to perform so we also go to the
scalar loop.

For the "normal" exit niters has already been adjusted for peeling, for the
early exits we must find out how many iterations we actually did.  So we have
to recalculate the new position for each exit.

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide unused.
	(vect_update_ivs_after_vectorizer): Support early break.
	(vect_do_peeling): Use it.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3d2654cf1c842baac58f5 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1412,7 +1412,7 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
    When this happens we need to flip the understanding of main and other
    exits by peeling and IV updates.  */
 
-bool inline
+bool
 vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
 {
   return single_pred (loop->latch) == loop_exit->src;
@@ -2142,6 +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
      Input:
      - LOOP - a loop that is going to be vectorized. The last few iterations
               of LOOP were peeled.
+     - VF   - The chosen vectorization factor for LOOP.
      - NITERS - the number of iterations that LOOP executes (before it is
                 vectorized). i.e, the number of times the ivs should be bumped.
      - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
@@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
                   The phi args associated with the edge UPDATE_E in the bb
                   UPDATE_E->dest are updated accordingly.
 
+     - restart_loop - Indicates whether the scalar loop needs to restart the
+		      iteration count where the vector loop began.
+
      Assumption 1: Like the rest of the vectorizer, this function assumes
      a single loop exit that has a single predecessor.
 
@@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
  */
 
 static void
-vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
-				  tree niters, edge update_e)
+vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, poly_uint64 vf,
+				  tree niters, edge update_e, bool restart_loop)
 {
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
-
-  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-
-  /* Make sure there exists a single-predecessor exit bb:  */
-  gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_succ_edge (exit_bb) == update_e);
+  bool inversed_iv
+	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
+					 LOOP_VINFO_LOOP (loop_vinfo));
+  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+			    && flow_bb_inside_loop_p (loop, update_e->src);
+  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  gcond *cond = get_loop_exit_condition (loop_e);
+  basic_block exit_bb = loop_e->dest;
+  basic_block iv_block = NULL;
+  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
        !gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -2190,7 +2198,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       tree step_expr, off;
       tree type;
       tree var, ni, ni_name;
-      gimple_stmt_iterator last_gsi;
 
       gphi *phi = gsi.phi ();
       gphi *phi1 = gsi1.phi ();
@@ -2222,11 +2229,52 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       enum vect_induction_op_type induction_type
 	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
 
-      if (induction_type == vect_step_op_add)
+      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+      /* create_iv always places it on the LHS.  Alternatively we can set a
+	 property during create_iv to identify it.  */
+      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
+      if (restart_loop && ivtemp)
 	{
+	  type = TREE_TYPE (gimple_phi_result (phi));
+	  ni = build_int_cst (type, vf);
+	  if (inversed_iv)
+	    ni = fold_build2 (MINUS_EXPR, type, ni,
+			      fold_convert (type, step_expr));
+	}
+      else if (induction_type == vect_step_op_add)
+	{
+
 	  tree stype = TREE_TYPE (step_expr);
-	  off = fold_build2 (MULT_EXPR, stype,
-			     fold_convert (stype, niters), step_expr);
+
+	  /* Early exits always use last iter value not niters. */
+	  if (restart_loop)
+	    {
+	      /* Live statements in the non-main exit shouldn't be adjusted.  We
+		 normally didn't have this problem with a single exit as live
+		 values would be in the exit block.  However when dealing with
+		 multiple exits all exits are redirected to the merge block
+		 and we restart the iteration.  */
+	      if (STMT_VINFO_LIVE_P (phi_info))
+		continue;
+
+	      /* For early break the final loop IV is:
+		 init + (final - init) * vf which takes into account peeling
+		 values and non-single steps.  The main exit can use niters
+		 since if you exit from the main exit you've done all vector
+		 iterations.  For an early exit we don't know when we exit so we
+		 must re-calculate this on the exit.  */
+	      tree start_expr = gimple_phi_result (phi);
+	      off = fold_build2 (MINUS_EXPR, stype,
+				 fold_convert (stype, start_expr),
+				 fold_convert (stype, init_expr));
+	      /* Now adjust for VF to get the final iteration value.  */
+	      off = fold_build2 (MULT_EXPR, stype, off,
+				 build_int_cst (stype, vf));
+	    }
+	  else
+	    off = fold_build2 (MULT_EXPR, stype,
+			       fold_convert (stype, niters), step_expr);
+
 	  if (POINTER_TYPE_P (type))
 	    ni = fold_build_pointer_plus (init_expr, off);
 	  else
@@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       /* Don't bother call vect_peel_nonlinear_iv_init.  */
       else if (induction_type == vect_step_op_neg)
 	ni = init_expr;
+      else if (restart_loop)
+	continue;
       else
 	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
 					  niters, step_expr,
@@ -2245,9 +2295,20 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 
       var = create_tmp_var (type, "tmp");
 
-      last_gsi = gsi_last_bb (exit_bb);
       gimple_seq new_stmts = NULL;
       ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+
+      /* For non-main exit create an intermediat edge to get any updated iv
+	 calculations.  */
+      if (needs_interm_block
+	  && !iv_block
+	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p (new_stmts)))
+	{
+	  iv_block = split_edge (update_e);
+	  update_e = single_succ_edge (update_e->dest);
+	  last_gsi = gsi_last_bb (iv_block);
+	}
+
       /* Exit_bb shouldn't be empty.  */
       if (!gsi_end_p (last_gsi))
 	{
@@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 niters_vector_mult_vf steps.  */
       gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
       update_e = skip_vector ? e : loop_preheader_edge (epilog);
-      vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
-					update_e);
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	update_e = single_succ_edge (e->dest);
+      bool inversed_iv
+	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
+					 LOOP_VINFO_LOOP (loop_vinfo));
+
+      /* Update the main exit first.  */
+      vect_update_ivs_after_vectorizer (loop_vinfo, vf, niters_vector_mult_vf,
+					update_e, inversed_iv);
+
+      /* And then update the early exits.  */
+      for (auto exit : get_loop_exit_edges (loop))
+	{
+	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
+	    continue;
+
+	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
+					    niters_vector_mult_vf,
+					    exit, true);
+	}
 
       if (skip_epilog)
 	{

[-- Attachment #2: rb17967.patch --]
[-- Type: application/octet-stream, Size: 7921 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3d2654cf1c842baac58f5 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1412,7 +1412,7 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
    When this happens we need to flip the understanding of main and other
    exits by peeling and IV updates.  */
 
-bool inline
+bool
 vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
 {
   return single_pred (loop->latch) == loop_exit->src;
@@ -2142,6 +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
      Input:
      - LOOP - a loop that is going to be vectorized. The last few iterations
               of LOOP were peeled.
+     - VF   - The chosen vectorization factor for LOOP.
      - NITERS - the number of iterations that LOOP executes (before it is
                 vectorized). i.e, the number of times the ivs should be bumped.
      - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path
@@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
                   The phi args associated with the edge UPDATE_E in the bb
                   UPDATE_E->dest are updated accordingly.
 
+     - restart_loop - Indicates whether the scalar loop needs to restart the
+		      iteration count where the vector loop began.
+
      Assumption 1: Like the rest of the vectorizer, this function assumes
      a single loop exit that has a single predecessor.
 
@@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
  */
 
 static void
-vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
-				  tree niters, edge update_e)
+vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, poly_uint64 vf,
+				  tree niters, edge update_e, bool restart_loop)
 {
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
-
-  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-
-  /* Make sure there exists a single-predecessor exit bb:  */
-  gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_succ_edge (exit_bb) == update_e);
+  bool inversed_iv
+	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
+					 LOOP_VINFO_LOOP (loop_vinfo));
+  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+			    && flow_bb_inside_loop_p (loop, update_e->src);
+  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+  gcond *cond = get_loop_exit_condition (loop_e);
+  basic_block exit_bb = loop_e->dest;
+  basic_block iv_block = NULL;
+  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
        !gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -2190,7 +2198,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       tree step_expr, off;
       tree type;
       tree var, ni, ni_name;
-      gimple_stmt_iterator last_gsi;
 
       gphi *phi = gsi.phi ();
       gphi *phi1 = gsi1.phi ();
@@ -2222,11 +2229,52 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       enum vect_induction_op_type induction_type
 	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
 
-      if (induction_type == vect_step_op_add)
+      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+      /* create_iv always places it on the LHS.  Alternatively we can set a
+	 property during create_iv to identify it.  */
+      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
+      if (restart_loop && ivtemp)
 	{
+	  type = TREE_TYPE (gimple_phi_result (phi));
+	  ni = build_int_cst (type, vf);
+	  if (inversed_iv)
+	    ni = fold_build2 (MINUS_EXPR, type, ni,
+			      fold_convert (type, step_expr));
+	}
+      else if (induction_type == vect_step_op_add)
+	{
+
 	  tree stype = TREE_TYPE (step_expr);
-	  off = fold_build2 (MULT_EXPR, stype,
-			     fold_convert (stype, niters), step_expr);
+
+	  /* Early exits always use last iter value not niters. */
+	  if (restart_loop)
+	    {
+	      /* Live statements in the non-main exit shouldn't be adjusted.  We
+		 normally didn't have this problem with a single exit as live
+		 values would be in the exit block.  However when dealing with
+		 multiple exits all exits are redirected to the merge block
+		 and we restart the iteration.  */
+	      if (STMT_VINFO_LIVE_P (phi_info))
+		continue;
+
+	      /* For early break the final loop IV is:
+		 init + (final - init) * vf which takes into account peeling
+		 values and non-single steps.  The main exit can use niters
+		 since if you exit from the main exit you've done all vector
+		 iterations.  For an early exit we don't know when we exit so we
+		 must re-calculate this on the exit.  */
+	      tree start_expr = gimple_phi_result (phi);
+	      off = fold_build2 (MINUS_EXPR, stype,
+				 fold_convert (stype, start_expr),
+				 fold_convert (stype, init_expr));
+	      /* Now adjust for VF to get the final iteration value.  */
+	      off = fold_build2 (MULT_EXPR, stype, off,
+				 build_int_cst (stype, vf));
+	    }
+	  else
+	    off = fold_build2 (MULT_EXPR, stype,
+			       fold_convert (stype, niters), step_expr);
+
 	  if (POINTER_TYPE_P (type))
 	    ni = fold_build_pointer_plus (init_expr, off);
 	  else
@@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       /* Don't bother call vect_peel_nonlinear_iv_init.  */
       else if (induction_type == vect_step_op_neg)
 	ni = init_expr;
+      else if (restart_loop)
+	continue;
       else
 	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
 					  niters, step_expr,
@@ -2245,9 +2295,20 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 
       var = create_tmp_var (type, "tmp");
 
-      last_gsi = gsi_last_bb (exit_bb);
       gimple_seq new_stmts = NULL;
       ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+
+      /* For non-main exit create an intermediat edge to get any updated iv
+	 calculations.  */
+      if (needs_interm_block
+	  && !iv_block
+	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p (new_stmts)))
+	{
+	  iv_block = split_edge (update_e);
+	  update_e = single_succ_edge (update_e->dest);
+	  last_gsi = gsi_last_bb (iv_block);
+	}
+
       /* Exit_bb shouldn't be empty.  */
       if (!gsi_end_p (last_gsi))
 	{
@@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 niters_vector_mult_vf steps.  */
       gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
       update_e = skip_vector ? e : loop_preheader_edge (epilog);
-      vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
-					update_e);
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	update_e = single_succ_edge (e->dest);
+      bool inversed_iv
+	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
+					 LOOP_VINFO_LOOP (loop_vinfo));
+
+      /* Update the main exit first.  */
+      vect_update_ivs_after_vectorizer (loop_vinfo, vf, niters_vector_mult_vf,
+					update_e, inversed_iv);
+
+      /* And then update the early exits.  */
+      for (auto exit : get_loop_exit_edges (loop))
+	{
+	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
+	    continue;
+
+	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
+					    niters_vector_mult_vf,
+					    exit, true);
+	}
 
       if (skip_epilog)
 	{

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-06  7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
@ 2023-11-15  0:05   ` Tamar Christina
  2023-11-15 13:41     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-15  0:05 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 9170 bytes --]

Patch updated to trunk.

This adds support to vectorizable_live_reduction to handle multiple exits by
doing a search for which exit the live value should be materialized in.

Additinally which value in the index we're after depends on whether the exit
it's materialized in is an early exit or whether the loop's main exit is
different from the loop's natural one (i.e. the one with the same src block as
the latch).

In those two cases we want the first rather than the last value as we're going
to restart the iteration in the scalar loop.  For VLA this means we need to
reverse both the mask and vector since there's only a way to get the last
active element and not the first.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
	* tree-vectorizer.h (perm_mask_for_reverse): Expose.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 4cf7f65dc164db27a498b31fe7ce0d9af3f3e299..2476e59ef488fd0a3b296ced7b0d4d3e76a3634f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10627,12 +10627,60 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+      /* A value can only be live in one exit.  So figure out which one.  */
+      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      bool restart_loop = false;
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	    if (!is_gimple_debug (use_stmt)
+		&& !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	      {
+		basic_block use_bb = gimple_bb (use_stmt);
+		for (auto edge : get_loop_exit_edges (loop))
+		  {
+		    /* Alternative exits can have an intermediate BB in
+		       between to update the IV.  In those cases we need to
+		       look one block further.  */
+		    if (use_bb == edge->dest
+			|| (single_succ_p (edge->dest)
+			    && use_bb == single_succ (edge->dest)))
+		      {
+			exit_e = edge;
+			goto found;
+		      }
+		  }
+	      }
+found:
+	  /* If the edge isn't a single pred then split the edge so we have a
+	     location to place the live operations.  Perhaps we should always
+	     split during IV updating.  But this way the CFG is cleaner to
+	     follow.  */
+	  restart_loop = !vect_is_loop_exit_latch_pred (exit_e, loop);
+	  if (!single_pred_p (exit_e->dest))
+	    exit_e = single_pred_edge (split_edge (exit_e));
+
+	  /* For early exit where the exit is not in the BB that leads to the
+	     latch then we're restarting the iteration in the scalar loop. So
+	     get the first live value.  */
+	  if (restart_loop)
+	    {
+	      vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+	      vec_lhs = gimple_get_lhs (vec_stmt);
+	      bitstart = build_zero_cst (TREE_TYPE (bitstart));
+	    }
+	}
+
+      basic_block exit_bb = exit_e->dest;
       gcc_assert (single_pred_p (exit_bb));
 
       tree vec_lhs_phi = copy_ssa_name (vec_lhs);
       gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
+      SET_PHI_ARG_DEF (phi, exit_e->dest_idx, vec_lhs);
 
       gimple_seq stmts = NULL;
       tree new_tree;
@@ -10663,6 +10711,12 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
 					  len, bias_minus_one);
 
+	  /* This needs to implement extraction of the first index, but not sure
+	     how the LEN stuff works.  At the moment we shouldn't get here since
+	     there's no LEN support for early breaks.  But guard this so there's
+	     no incorrect codegen.  */
+	  gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
 	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
 	  tree scalar_res
 	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
@@ -10687,8 +10741,37 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 					  &LOOP_VINFO_MASKS (loop_vinfo),
 					  1, vectype, 0);
 	  gimple_seq_add_seq (&stmts, tem);
-	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-					  mask, vec_lhs_phi);
+	  tree scalar_res;
+
+	  /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	     instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+	  if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	    {
+	      auto gsi_stmt = gsi_last (stmts);
+
+	       /* First create the permuted mask.  */
+	      tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	      tree perm_dest = copy_ssa_name (mask);
+	      gimple *perm_stmt
+		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+					   mask, perm_mask);
+	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+					   &gsi_stmt);
+	      mask = perm_dest;
+
+	       /* Then permute the vector contents.  */
+	      tree perm_elem = perm_mask_for_reverse (vectype);
+	      perm_dest = copy_ssa_name (vec_lhs_phi);
+	      perm_stmt
+		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+					   vec_lhs_phi, perm_elem);
+	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+					   &gsi_stmt);
+	      vec_lhs_phi = perm_dest;
+	    }
+
+	  scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				     mask, vec_lhs_phi);
 
 	  /* Convert the extracted vector element to the scalar type.  */
 	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
@@ -10708,26 +10791,36 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (stmts)
 	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
 
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
+      /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
+      bool single_use = true;
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
 	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
+	  if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	    continue;
+
+	  gcc_assert (single_use);
+	  if (is_a <gphi *> (use_stmt)
+	      && gimple_phi_arg_def (as_a <gphi *> (use_stmt), 0) == lhs)
 	    {
+	      /* Remove existing phis that copy from lhs and create copies
+		 from new_tree.  */
+	      gphi *phi = as_a <gphi *> (use_stmt);
+	      auto gsi = gsi_for_phi (phi);
 	      remove_phi_node (&gsi, false);
 	      tree lhs_phi = gimple_phi_result (phi);
 	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
 	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
 	    }
 	  else
-	    gsi_next (&gsi);
+	    {
+	      /* Or just update the use in place if not a phi.  */
+	      use_operand_p use_p;
+	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+		SET_USE (use_p, new_tree);
+	      update_stmt (use_stmt);
+	    }
+	  single_use = false;
 	}
-
-      /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
-      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
-	gcc_assert (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)));
     }
   else
     {
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 3a22bf02f5ab16ded0af61cd1d719a98b8982144..7c3d6d196e122d67f750dfef6d615aabc6c28281 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1774,7 +1774,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index b9a71a0b5f5407417e8366b0df132df20c7f60aa..f261fc74b8795b4516b17155441d25baaf8c22ae 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2246,6 +2246,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

[-- Attachment #2: rb17968.patch --]
[-- Type: application/octet-stream, Size: 7941 bytes --]

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 4cf7f65dc164db27a498b31fe7ce0d9af3f3e299..2476e59ef488fd0a3b296ced7b0d4d3e76a3634f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10627,12 +10627,60 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+      /* A value can only be live in one exit.  So figure out which one.  */
+      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      bool restart_loop = false;
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	    if (!is_gimple_debug (use_stmt)
+		&& !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	      {
+		basic_block use_bb = gimple_bb (use_stmt);
+		for (auto edge : get_loop_exit_edges (loop))
+		  {
+		    /* Alternative exits can have an intermediate BB in
+		       between to update the IV.  In those cases we need to
+		       look one block further.  */
+		    if (use_bb == edge->dest
+			|| (single_succ_p (edge->dest)
+			    && use_bb == single_succ (edge->dest)))
+		      {
+			exit_e = edge;
+			goto found;
+		      }
+		  }
+	      }
+found:
+	  /* If the edge isn't a single pred then split the edge so we have a
+	     location to place the live operations.  Perhaps we should always
+	     split during IV updating.  But this way the CFG is cleaner to
+	     follow.  */
+	  restart_loop = !vect_is_loop_exit_latch_pred (exit_e, loop);
+	  if (!single_pred_p (exit_e->dest))
+	    exit_e = single_pred_edge (split_edge (exit_e));
+
+	  /* For early exit where the exit is not in the BB that leads to the
+	     latch then we're restarting the iteration in the scalar loop. So
+	     get the first live value.  */
+	  if (restart_loop)
+	    {
+	      vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+	      vec_lhs = gimple_get_lhs (vec_stmt);
+	      bitstart = build_zero_cst (TREE_TYPE (bitstart));
+	    }
+	}
+
+      basic_block exit_bb = exit_e->dest;
       gcc_assert (single_pred_p (exit_bb));
 
       tree vec_lhs_phi = copy_ssa_name (vec_lhs);
       gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
+      SET_PHI_ARG_DEF (phi, exit_e->dest_idx, vec_lhs);
 
       gimple_seq stmts = NULL;
       tree new_tree;
@@ -10663,6 +10711,12 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
 					  len, bias_minus_one);
 
+	  /* This needs to implement extraction of the first index, but not sure
+	     how the LEN stuff works.  At the moment we shouldn't get here since
+	     there's no LEN support for early breaks.  But guard this so there's
+	     no incorrect codegen.  */
+	  gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
 	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
 	  tree scalar_res
 	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
@@ -10687,8 +10741,37 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 					  &LOOP_VINFO_MASKS (loop_vinfo),
 					  1, vectype, 0);
 	  gimple_seq_add_seq (&stmts, tem);
-	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-					  mask, vec_lhs_phi);
+	  tree scalar_res;
+
+	  /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	     instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+	  if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	    {
+	      auto gsi_stmt = gsi_last (stmts);
+
+	       /* First create the permuted mask.  */
+	      tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	      tree perm_dest = copy_ssa_name (mask);
+	      gimple *perm_stmt
+		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+					   mask, perm_mask);
+	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+					   &gsi_stmt);
+	      mask = perm_dest;
+
+	       /* Then permute the vector contents.  */
+	      tree perm_elem = perm_mask_for_reverse (vectype);
+	      perm_dest = copy_ssa_name (vec_lhs_phi);
+	      perm_stmt
+		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+					   vec_lhs_phi, perm_elem);
+	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+					   &gsi_stmt);
+	      vec_lhs_phi = perm_dest;
+	    }
+
+	  scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				     mask, vec_lhs_phi);
 
 	  /* Convert the extracted vector element to the scalar type.  */
 	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
@@ -10708,26 +10791,36 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (stmts)
 	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
 
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
+      /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
+      bool single_use = true;
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
 	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
+	  if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	    continue;
+
+	  gcc_assert (single_use);
+	  if (is_a <gphi *> (use_stmt)
+	      && gimple_phi_arg_def (as_a <gphi *> (use_stmt), 0) == lhs)
 	    {
+	      /* Remove existing phis that copy from lhs and create copies
+		 from new_tree.  */
+	      gphi *phi = as_a <gphi *> (use_stmt);
+	      auto gsi = gsi_for_phi (phi);
 	      remove_phi_node (&gsi, false);
 	      tree lhs_phi = gimple_phi_result (phi);
 	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
 	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
 	    }
 	  else
-	    gsi_next (&gsi);
+	    {
+	      /* Or just update the use in place if not a phi.  */
+	      use_operand_p use_p;
+	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+		SET_USE (use_p, new_tree);
+	      update_stmt (use_stmt);
+	    }
+	  single_use = false;
 	}
-
-      /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
-      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
-	gcc_assert (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)));
     }
   else
     {
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 3a22bf02f5ab16ded0af61cd1d719a98b8982144..7c3d6d196e122d67f750dfef6d615aabc6c28281 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1774,7 +1774,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index b9a71a0b5f5407417e8366b0df132df20c7f60aa..f261fc74b8795b4516b17155441d25baaf8c22ae 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2246,6 +2246,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch
  2023-11-14 23:59           ` Tamar Christina
@ 2023-11-15 12:14             ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-11-15 12:14 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Tue, 14 Nov 2023, Tamar Christina wrote:

> > > OK, but then I think the fix is to not use
> > > standard_iv_increment_position (it's a weird API anyway).  Instead insert
> > before the main exit condition.
> > 
> > I figured as much, Almost done respinning it with the vectorizer's own simpler
> > copy.
> > Should be out today with the rest.
> > 
> > >
> > > Btw, I assumed this order of main / early exit cannot happen.  But I
> > > didn't re- review the main exit identification code yet.
> > >
> > 
> > It can happen because we allowed vec_init_loop_exit_info to pick the last
> > analyzable exit.  In cases like these it happens because the final exit has no
> > Information from SCEV. It then picks the last exit it could analyze which by
> > default is an early exit.
> > 
> > It's very tricky to deal with and have just finished cleaning up the IV update
> > code to make it easier to follow... but it does seem to add about 970 more
> > vectorized cases (most of which are execution tests).
> > 
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop-manip.cc (vect_iv_increment_position): New.
> 	(vect_set_loop_controls_directly): Use it.
> 	(vect_set_loop_condition_partial_vectors_avx512): Likewise.
> 	(vect_set_loop_condition_normal): Likewise.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index fafbf924e8db18eb4eec7a4a1906d10f6ce9812f..a5a612dc6b47436730592469176623685a7a413f 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -448,6 +448,20 @@ vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
>      }
>  }
>  
> +/* Stores the standard position for induction variable increment in belonging to
> +   LOOP_EXIT (just before the exit condition of the given exit to BSI.
> +   INSERT_AFTER is set to true if the increment should be inserted after
> +   *BSI.  */
> +
> +static void
> +vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
> +			    bool *insert_after)
> +{
> +  basic_block bb = loop_exit->src;
> +  *bsi = gsi_last_bb (bb);
> +  *insert_after = false;
> +}
> +
>  /* Helper for vect_set_loop_condition_partial_vectors.  Generate definitions
>     for all the rgroup controls in RGC and return a control that is nonzero
>     when the loop needs to iterate.  Add any new preheader statements to
> @@ -531,7 +545,8 @@ vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
>    tree index_before_incr, index_after_incr;
>    gimple_stmt_iterator incr_gsi;
>    bool insert_after;
> -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +  vect_iv_increment_position (exit_e, &incr_gsi, &insert_after);
>    if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
>      {
>        /* Create an IV that counts down from niters_total and whose step
> @@ -1017,7 +1032,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
>    tree index_before_incr, index_after_incr;
>    gimple_stmt_iterator incr_gsi;
>    bool insert_after;
> -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +  vect_iv_increment_position (exit_edge, &incr_gsi, &insert_after);
>    create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
>  	     &incr_gsi, insert_after, &index_before_incr,
>  	     &index_after_incr);
> @@ -1185,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
>     loop handles exactly VF scalars per iteration.  */
>  
>  static gcond *
> -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
> +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
>  				class loop *loop, tree niters, tree step,
>  				tree final_iv, bool niters_maybe_zero,
>  				gimple_stmt_iterator loop_cond_gsi)
> @@ -1278,7 +1293,7 @@ vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
>  	}
>      }
>  
> -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +  vect_iv_increment_position (exit_edge, &incr_gsi, &insert_after);
>    create_iv (init, PLUS_EXPR, step, NULL_TREE, loop,
>               &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
>    indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, indx_after_incr,
> @@ -1446,7 +1461,7 @@ slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
>        redirect_edge_and_branch (exit, dest);
>      }
>  
> -  /* Only fush the main exit, the remaining exits we need to match the order
> +  /* Only flush the main exit, the remaining exits we need to match the order
>       in the loop->header which with multiple exits may not be the same.  */
>    flush_pending_stmts (loop_exit);
>  
> @@ -1519,7 +1534,9 @@ slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
>  		  SET_PHI_ARG_DEF (alt_lcssa_phi, main_e->dest_idx, new_arg);
>  		}
>  	    new_arg = alt_res; /* Push it down to the new_loop header.  */
> -	  } else if (!res) {
> +	  }
> +	else if (!res)
> +	  {
>  	    /* For non-early break we need to keep the possibly live values in
>  	       the exit block.  For early break these are kept in the merge
>  	       block in the code above.  */
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks
  2023-11-15  0:00   ` Tamar Christina
@ 2023-11-15 12:40     ` Richard Biener
  2023-11-20 21:51       ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-15 12:40 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 15 Nov 2023, Tamar Christina wrote:

> Patch updated to latest trunk,
> 
> This splits the part of the function that does peeling for loops at exits to
> a different function.  In this new function we also peel for early breaks.
> 
> Peeling for early breaks works by redirecting all early break exits to a
> single "early break" block and combine them and the normal exit edge together
> later in a different block which then goes into the epilog preheader.
> 
> This allows us to re-use all the existing code for IV updates, Additionally this
> also enables correct linking for multiple vector epilogues.
> 
> flush_pending_stmts cannot be used in this scenario since it updates the PHI
> nodes in the order that they are in the exit destination blocks.  This means
> they are in CFG visit order.  With a single exit this doesn't matter but with
> multiple exits with different live values through the different exits the order
> usually does not line up.
> 
> Additionally the vectorizer helper functions expect to be able to iterate over
> the nodes in the order that they occur in the loop header blocks.  This is an
> invariant we must maintain.  To do this we just inline the work
> flush_pending_stmts but maintain the order by using the header blocks to guide
> the work.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop-manip.cc (vect_is_loop_exit_latch_pred): New.
> 	(slpeel_tree_duplicate_loop_for_vectorization): New.
> 	(slpeel_tree_duplicate_loop_to_edge_cfg): use it.
> 	* tree-vectorizer.h (is_loop_header_bb_p): Drop assert.
> 	(slpeel_tree_duplicate_loop_to_edge_cfg): Update signature.
> 	(vect_is_loop_exit_latch_pred): New.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index b9161274ce401a7307f3e61ad23aa036701190d7..fafbf924e8db18eb4eec7a4a1906d10f6ce9812f 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1392,6 +1392,153 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
>  		     (gimple *) cond_stmt);
>  }
>  
> +/* Determine if the exit choosen by the loop vectorizer differs from the
> +   natural loop exit.  i.e. if the exit leads to the loop patch or not.
> +   When this happens we need to flip the understanding of main and other
> +   exits by peeling and IV updates.  */
> +
> +bool inline
> +vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)

Ick, bad name - didn't see its use(s) in this patch?


> +{
> +  return single_pred (loop->latch) == loop_exit->src;
> +}
> +
> +/* Perform peeling for when the peeled loop is placed after the original loop.
> +   This maintains LCSSA and creates the appropriate blocks for multiple exit
> +   vectorization.   */
> +
> +void static
> +slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge loop_exit,
> +					      vec<edge> &loop_exits,
> +					      class loop *new_loop,
> +					      bool flow_loops,
> +					      basic_block new_preheader)

also bad name ;)  I don't see a strong reason to factor this out.

> +{
> +  bool multiple_exits_p = loop_exits.length () > 1;
> +  basic_block main_loop_exit_block = new_preheader;
> +  if (multiple_exits_p)
> +    {
> +      edge loop_entry = single_succ_edge (new_preheader);
> +      new_preheader = split_edge (loop_entry);
> +    }
> +
> +  auto_vec <gimple *> new_phis;
> +  hash_map <tree, tree> new_phi_args;
> +  /* First create the empty phi nodes so that when we flush the
> +     statements they can be filled in.   However because there is no order
> +     between the PHI nodes in the exits and the loop headers we need to
> +     order them base on the order of the two headers.  First record the new
> +     phi nodes. Then redirect the edges and flush the changes.  This writes out
> +     the new SSA names.  */
> +  for (auto gsi_from = gsi_start_phis (loop_exit->dest);
> +       !gsi_end_p (gsi_from); gsi_next (&gsi_from))
> +    {
> +      gimple *from_phi = gsi_stmt (gsi_from);
> +      tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> +      gphi *res = create_phi_node (new_res, main_loop_exit_block);
> +      new_phis.safe_push (res);
> +    }
> +
> +  for (auto exit : loop_exits)
> +    {
> +      basic_block dest
> +	= exit == loop_exit ? main_loop_exit_block : new_preheader;
> +      redirect_edge_and_branch (exit, dest);
> +    }
> +
> +  /* Only fush the main exit, the remaining exits we need to match the order
> +     in the loop->header which with multiple exits may not be the same.  */
> +  flush_pending_stmts (loop_exit);
> +
> +  /* Record the new SSA names in the cache so that we can skip materializing
> +     them again when we fill in the rest of the LCSSA variables.  */
> +  for (auto phi : new_phis)
> +    {
> +      tree new_arg = gimple_phi_arg (phi, 0)->def;
> +
> +      if (!SSA_VAR_P (new_arg))
> +	continue;
> +
> +      /* If the PHI MEM node dominates the loop then we shouldn't create
> +	 a new LC-SSSA PHI for it in the intermediate block.   */
> +      /* A MEM phi that consitutes a new DEF for the vUSE chain can either
> +	 be a .VDEF or a PHI that operates on MEM. And said definition
> +	 must not be inside the main loop.  Or we must be a parameter.
> +	 In the last two cases we may remove a non-MEM PHI node, but since
> +	 they dominate both loops the removal is unlikely to cause trouble
> +	 as the exits must already be using them.  */
> +      if (virtual_operand_p (new_arg)
> +	  && (SSA_NAME_IS_DEFAULT_DEF (new_arg)
> +	      || !flow_bb_inside_loop_p (loop,
> +				gimple_bb (SSA_NAME_DEF_STMT (new_arg)))))
> +	{
> +	  auto gsi = gsi_for_stmt (phi);
> +	  remove_phi_node (&gsi, true);
> +	  continue;
> +	}
> +
> +      /* If we decide to remove the PHI node we should also not
> +	 rematerialize it later on.  */
> +      new_phi_args.put (new_arg, gimple_phi_result (phi));
> +
> +      if (TREE_CODE (new_arg) != SSA_NAME)
> +	continue;
> +    }
> +
> +  /* Copy the current loop LC PHI nodes between the original loop exit
> +     block and the new loop header.  This allows us to later split the
> +     preheader block and still find the right LC nodes.  */
> +  edge loop_entry = single_succ_edge (new_preheader);
> +  if (flow_loops)
> +    for (auto gsi_from = gsi_start_phis (loop->header),
> +	 gsi_to = gsi_start_phis (new_loop->header);
> +	 !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> +	 gsi_next (&gsi_from), gsi_next (&gsi_to))
> +      {
> +	gimple *from_phi = gsi_stmt (gsi_from);
> +	gimple *to_phi = gsi_stmt (gsi_to);
> +	tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, loop_latch_edge (loop));
> +	tree *res = NULL;
> +
> +	/* Check if we've already created a new phi node during edge
> +	   redirection.  If we have, only propagate the value downwards.  */
> +	if ((res = new_phi_args.get (new_arg)))
> +	  new_arg = *res;
> +
> +	/* All other exits use the previous iters.  */
> +	if (multiple_exits_p)
> +	  {
> +	    tree alt_arg = gimple_phi_result (from_phi);
> +	    tree alt_res = copy_ssa_name (alt_arg);
> +	    gphi *alt_lcssa_phi = create_phi_node (alt_res, new_preheader);
> +	    edge main_e = single_succ_edge (main_loop_exit_block);
> +	    for (edge e : loop_exits)
> +	      if (e != loop_exit)
> +		{
> +		  add_phi_arg (alt_lcssa_phi, alt_arg, e, UNKNOWN_LOCATION);
> +		  SET_PHI_ARG_DEF (alt_lcssa_phi, main_e->dest_idx, new_arg);
> +		}
> +	    new_arg = alt_res; /* Push it down to the new_loop header.  */

I think it would be clearer to separate alternate exit from main exit
handling more completely - we don't have the new_phi_args map for
the alternate exits.

Thus first only redirect and fixup the main exit and then redirect
the alternate exits, immediately wiping the edge_var_map, and
manually create the only relevant PHIs.

In principle this early-break handling could be fully within the
if (flow_loops) condition (including populating the new_phi_args
map for the main exit).

The code itself looks fine to me.

Richard.

> +	  } else if (!res) {
> +	    /* For non-early break we need to keep the possibly live values in
> +	       the exit block.  For early break these are kept in the merge
> +	       block in the code above.  */
> +	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> +	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
> +
> +	    /* Main loop exit should use the final iter value.  */
> +	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
> +	    new_arg = new_res;
> +	  }
> +
> +	adjust_phi_and_debug_stmts (to_phi, loop_entry, new_arg);
> +    }
> +
> +  /* Now clear all the redirect maps.  */
> +  for (auto exit : loop_exits)
> +    redirect_edge_var_map_clear (exit);
> +}
> +
>  /* Given LOOP this function generates a new copy of it and puts it
>     on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
>     non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
> @@ -1403,13 +1550,16 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
>     copies remains the same.
>  
>     If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
> -   dominators were updated during the peeling.  */
> +   dominators were updated during the peeling.  When doing early break vectorization
> +   then LOOP_VINFO needs to be provided and is used to keep track of any newly created
> +   memory references that need to be updated should we decide to vectorize.  */
>  
>  class loop *
>  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  					class loop *scalar_loop,
>  					edge scalar_exit, edge e, edge *new_e,
> -					bool flow_loops)
> +					bool flow_loops,
> +					vec<basic_block> *updated_doms)
>  {
>    class loop *new_loop;
>    basic_block *new_bbs, *bbs, *pbbs;
> @@ -1526,7 +1676,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>        }
>  
>    auto loop_exits = get_loop_exit_edges (loop);
> +  bool multiple_exits_p = loop_exits.length () > 1;
>    auto_vec<basic_block> doms;
> +  class loop *update_loop = NULL;
>  
>    if (at_exit) /* Add the loop copy at exit.  */
>      {
> @@ -1536,91 +1688,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  	  flush_pending_stmts (new_exit);
>  	}
>  
> -      auto_vec <gimple *> new_phis;
> -      hash_map <tree, tree> new_phi_args;
> -      /* First create the empty phi nodes so that when we flush the
> -	 statements they can be filled in.   However because there is no order
> -	 between the PHI nodes in the exits and the loop headers we need to
> -	 order them base on the order of the two headers.  First record the new
> -	 phi nodes.  */
> -      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> -	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
> -	{
> -	  gimple *from_phi = gsi_stmt (gsi_from);
> -	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> -	  gphi *res = create_phi_node (new_res, new_preheader);
> -	  new_phis.safe_push (res);
> -	}
> -
> -      /* Then redirect the edges and flush the changes.  This writes out the new
> -	 SSA names.  */
> -      for (edge exit : loop_exits)
> -	{
> -	  edge temp_e = redirect_edge_and_branch (exit, new_preheader);
> -	  flush_pending_stmts (temp_e);
> -	}
> -      /* Record the new SSA names in the cache so that we can skip materializing
> -	 them again when we fill in the rest of the LCSSA variables.  */
> -      for (auto phi : new_phis)
> -	{
> -	  tree new_arg = gimple_phi_arg (phi, 0)->def;
> -
> -	  if (!SSA_VAR_P (new_arg))
> -	    continue;
> -	  /* If the PHI MEM node dominates the loop then we shouldn't create
> -	      a new LC-SSSA PHI for it in the intermediate block.   */
> -	  /* A MEM phi that consitutes a new DEF for the vUSE chain can either
> -	     be a .VDEF or a PHI that operates on MEM. And said definition
> -	     must not be inside the main loop.  Or we must be a parameter.
> -	     In the last two cases we may remove a non-MEM PHI node, but since
> -	     they dominate both loops the removal is unlikely to cause trouble
> -	     as the exits must already be using them.  */
> -	  if (virtual_operand_p (new_arg)
> -	      && (SSA_NAME_IS_DEFAULT_DEF (new_arg)
> -		  || !flow_bb_inside_loop_p (loop,
> -				gimple_bb (SSA_NAME_DEF_STMT (new_arg)))))
> -	    {
> -	      auto gsi = gsi_for_stmt (phi);
> -	      remove_phi_node (&gsi, true);
> -	      continue;
> -	    }
> -	  new_phi_args.put (new_arg, gimple_phi_result (phi));
> -
> -	  if (TREE_CODE (new_arg) != SSA_NAME)
> -	    continue;
> -	}
> -
> -      /* Copy the current loop LC PHI nodes between the original loop exit
> -	 block and the new loop header.  This allows us to later split the
> -	 preheader block and still find the right LC nodes.  */
> -      edge loop_entry = single_succ_edge (new_preheader);
> -      if (flow_loops)
> -	for (auto gsi_from = gsi_start_phis (loop->header),
> -	     gsi_to = gsi_start_phis (new_loop->header);
> -	     !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> -	     gsi_next (&gsi_from), gsi_next (&gsi_to))
> -	  {
> -	    gimple *from_phi = gsi_stmt (gsi_from);
> -	    gimple *to_phi = gsi_stmt (gsi_to);
> -	    tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> -						  loop_latch_edge (loop));
> -
> -	    /* Check if we've already created a new phi node during edge
> -	       redirection.  If we have, only propagate the value downwards.  */
> -	    if (tree *res = new_phi_args.get (new_arg))
> -	      {
> -		adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
> -		continue;
> -	      }
> -
> -	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> -	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
> -
> -	    /* Main loop exit should use the final iter value.  */
> -	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
> -
> -	    adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
> -	  }
> +      slpeel_tree_duplicate_loop_for_vectorization (loop, loop_exit, loop_exits,
> +						    new_loop, flow_loops,
> +						    new_preheader);
>  
>        set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
>  
> @@ -1634,6 +1704,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>        delete_basic_block (preheader);
>        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
>  			       loop_preheader_edge (scalar_loop)->src);
> +
> +      /* Finally after wiring the new epilogue we need to update its main exit
> +	 to the original function exit we recorded.  Other exits are already
> +	 correct.  */
> +      if (multiple_exits_p)
> +	{
> +	  update_loop = new_loop;
> +	  for (edge e : get_loop_exit_edges (loop))
> +	    doms.safe_push (e->dest);
> +	  doms.safe_push (exit_dest);
> +
> +	  /* Likely a fall-through edge, so update if needed.  */
> +	  if (single_succ_p (exit_dest))
> +	    doms.safe_push (single_succ (exit_dest));
> +	}
>      }
>    else /* Add the copy at entry.  */
>      {
> @@ -1681,6 +1766,34 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>        delete_basic_block (new_preheader);
>        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
>  			       loop_preheader_edge (new_loop)->src);
> +
> +      if (multiple_exits_p)
> +	update_loop = loop;
> +    }
> +
> +  if (multiple_exits_p)
> +    {
> +      for (edge e : get_loop_exit_edges (update_loop))
> +	{
> +	  edge ex;
> +	  edge_iterator ei;
> +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> +	    {
> +	      /* Find the first non-fallthrough block as fall-throughs can't
> +		 dominate other blocks.  */
> +	      if (single_succ_p (ex->dest))
> +		{
> +		  doms.safe_push (ex->dest);
> +		  ex = single_succ_edge (ex->dest);
> +		}
> +	      doms.safe_push (ex->dest);
> +	    }
> +	  doms.safe_push (e->dest);
> +	}
> +
> +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> +      if (updated_doms)
> +	updated_doms->safe_splice (doms);
>      }
>  
>    free (new_bbs);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 76451a7fefe6ff966cecfa2cbc7b11336b038565..b9a71a0b5f5407417e8366b0df132df20c7f60aa 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1821,7 +1821,7 @@ is_loop_header_bb_p (basic_block bb)
>  {
>    if (bb == (bb->loop_father)->header)
>      return true;
> -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> +
>    return false;
>  }
>  
> @@ -2212,7 +2212,8 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
>  					 const_edge);
>  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
>  						    class loop *, edge,
> -						    edge, edge *, bool = true);
> +						    edge, edge *, bool = true,
> +						    vec<basic_block> * = NULL);
>  class loop *vect_loop_versioning (loop_vec_info, gimple *);
>  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
>  				    tree *, tree *, tree *, int, bool, bool,
> @@ -2223,6 +2224,7 @@ extern dump_user_location_t find_loop_location (class loop *);
>  extern bool vect_can_advance_ivs_p (loop_vec_info);
>  extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
>  extern edge vec_init_loop_exit_info (class loop *);
> +extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
>  
>  /* In tree-vect-stmts.cc.  */
>  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-15  0:03   ` Tamar Christina
@ 2023-11-15 13:01     ` Richard Biener
  2023-11-15 13:09       ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-15 13:01 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 15 Nov 2023, Tamar Christina wrote:

> Patch updated to latest trunk:
> 
> Hi All,
> 
> This changes the PHI node updates to support early breaks.
> It has to support both the case where the loop's exit matches the normal loop
> exit and one where the early exit is "inverted", i.e. it's an early exit edge.
> 
> In the latter case we must always restart the loop for VF iterations.  For an
> early exit the reason is obvious, but there are cases where the "normal" exit
> is located before the early one.  This exit then does a check on ivtmp resulting
> in us leaving the loop since it thinks we're done.
> 
> In these case we may still have side-effects to perform so we also go to the
> scalar loop.
> 
> For the "normal" exit niters has already been adjusted for peeling, for the
> early exits we must find out how many iterations we actually did.  So we have
> to recalculate the new position for each exit.
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide unused.
> 	(vect_update_ivs_after_vectorizer): Support early break.
> 	(vect_do_peeling): Use it.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3d2654cf1c842baac58f5 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
>     loop handles exactly VF scalars per iteration.  */
>  
>  static gcond *
> -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
> +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
>  				class loop *loop, tree niters, tree step,
>  				tree final_iv, bool niters_maybe_zero,
>  				gimple_stmt_iterator loop_cond_gsi)
> @@ -1412,7 +1412,7 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
>     When this happens we need to flip the understanding of main and other
>     exits by peeling and IV updates.  */
>  
> -bool inline
> +bool
>  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
>  {
>    return single_pred (loop->latch) == loop_exit->src;
> @@ -2142,6 +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
>       Input:
>       - LOOP - a loop that is going to be vectorized. The last few iterations
>                of LOOP were peeled.
> +     - VF   - The chosen vectorization factor for LOOP.
>       - NITERS - the number of iterations that LOOP executes (before it is
>                  vectorized). i.e, the number of times the ivs should be bumped.
>       - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path

the comment on this is now a bit misleading, can you try to update it
and/or move the comment bits to the docs on EARLY_EXIT?

> @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
>                    The phi args associated with the edge UPDATE_E in the bb
>                    UPDATE_E->dest are updated accordingly.
>  
> +     - restart_loop - Indicates whether the scalar loop needs to restart the

params are ALL_CAPS

> +		      iteration count where the vector loop began.
> +
>       Assumption 1: Like the rest of the vectorizer, this function assumes
>       a single loop exit that has a single predecessor.
>  
> @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
>   */
>  
>  static void
> -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> -				  tree niters, edge update_e)
> +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, poly_uint64 vf,

LOOP_VINFO_VECT_FACTOR?

> +				  tree niters, edge update_e, bool restart_loop)

I think 'bool early_exit' is better here?  I wonder if we have an "early"
exit after the main exit we are probably sure there are no side-effects
to re-execute and could avoid this restarting?

>  {
>    gphi_iterator gsi, gsi1;
>    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>    basic_block update_bb = update_e->dest;
> -
> -  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> -
> -  /* Make sure there exists a single-predecessor exit bb:  */
> -  gcc_assert (single_pred_p (exit_bb));
> -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> +  bool inversed_iv
> +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> +					 LOOP_VINFO_LOOP (loop_vinfo));
> +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +			    && flow_bb_inside_loop_p (loop, update_e->src);
> +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +  gcond *cond = get_loop_exit_condition (loop_e);
> +  basic_block exit_bb = loop_e->dest;
> +  basic_block iv_block = NULL;
> +  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
>  
>    for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
>         !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> @@ -2190,7 +2198,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>        tree step_expr, off;
>        tree type;
>        tree var, ni, ni_name;
> -      gimple_stmt_iterator last_gsi;
>  
>        gphi *phi = gsi.phi ();
>        gphi *phi1 = gsi1.phi ();
> @@ -2222,11 +2229,52 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>        enum vect_induction_op_type induction_type
>  	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
>  
> -      if (induction_type == vect_step_op_add)
> +      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
> +      /* create_iv always places it on the LHS.  Alternatively we can set a
> +	 property during create_iv to identify it.  */
> +      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> +      if (restart_loop && ivtemp)
>  	{
> +	  type = TREE_TYPE (gimple_phi_result (phi));
> +	  ni = build_int_cst (type, vf);
> +	  if (inversed_iv)
> +	    ni = fold_build2 (MINUS_EXPR, type, ni,
> +			      fold_convert (type, step_expr));
> +	}
> +      else if (induction_type == vect_step_op_add)
> +	{
> +
>  	  tree stype = TREE_TYPE (step_expr);
> -	  off = fold_build2 (MULT_EXPR, stype,
> -			     fold_convert (stype, niters), step_expr);
> +
> +	  /* Early exits always use last iter value not niters. */
> +	  if (restart_loop)
> +	    {
> +	      /* Live statements in the non-main exit shouldn't be adjusted.  We
> +		 normally didn't have this problem with a single exit as live
> +		 values would be in the exit block.  However when dealing with
> +		 multiple exits all exits are redirected to the merge block
> +		 and we restart the iteration.  */

Hmm, I fail to see how this works - we're either using the value to 
continue the induction or not, independent of STMT_VINFO_LIVE_P.

> +	      if (STMT_VINFO_LIVE_P (phi_info))
> +		continue;
> +
> +	      /* For early break the final loop IV is:
> +		 init + (final - init) * vf which takes into account peeling
> +		 values and non-single steps.  The main exit can use niters
> +		 since if you exit from the main exit you've done all vector
> +		 iterations.  For an early exit we don't know when we exit so we
> +		 must re-calculate this on the exit.  */
> +	      tree start_expr = gimple_phi_result (phi);
> +	      off = fold_build2 (MINUS_EXPR, stype,
> +				 fold_convert (stype, start_expr),
> +				 fold_convert (stype, init_expr));
> +	      /* Now adjust for VF to get the final iteration value.  */
> +	      off = fold_build2 (MULT_EXPR, stype, off,
> +				 build_int_cst (stype, vf));
> +	    }
> +	  else
> +	    off = fold_build2 (MULT_EXPR, stype,
> +			       fold_convert (stype, niters), step_expr);
> +
>  	  if (POINTER_TYPE_P (type))
>  	    ni = fold_build_pointer_plus (init_expr, off);
>  	  else
> @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>        /* Don't bother call vect_peel_nonlinear_iv_init.  */
>        else if (induction_type == vect_step_op_neg)
>  	ni = init_expr;
> +      else if (restart_loop)
> +	continue;

This looks all a bit complicated - why wouldn't we simply always
use the PHI result when 'restart_loop'?  Isn't that the correct old start
value in all cases?

>        else
>  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
>  					  niters, step_expr,
> @@ -2245,9 +2295,20 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>  
>        var = create_tmp_var (type, "tmp");
>  
> -      last_gsi = gsi_last_bb (exit_bb);
>        gimple_seq new_stmts = NULL;
>        ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> +
> +      /* For non-main exit create an intermediat edge to get any updated iv
> +	 calculations.  */
> +      if (needs_interm_block
> +	  && !iv_block
> +	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p (new_stmts)))
> +	{
> +	  iv_block = split_edge (update_e);
> +	  update_e = single_succ_edge (update_e->dest);
> +	  last_gsi = gsi_last_bb (iv_block);
> +	}
> +
>        /* Exit_bb shouldn't be empty.  */
>        if (!gsi_end_p (last_gsi))
>  	{
> @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	 niters_vector_mult_vf steps.  */
>        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
>        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> -      vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> -					update_e);
> +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	update_e = single_succ_edge (e->dest);
> +      bool inversed_iv
> +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> +					 LOOP_VINFO_LOOP (loop_vinfo));

You are computing this here and in vect_update_ivs_after_vectorizer?

> +
> +      /* Update the main exit first.  */
> +      vect_update_ivs_after_vectorizer (loop_vinfo, vf, niters_vector_mult_vf,
> +					update_e, inversed_iv);
> +
> +      /* And then update the early exits.  */
> +      for (auto exit : get_loop_exit_edges (loop))
> +	{
> +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> +	    continue;
> +
> +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> +					    niters_vector_mult_vf,
> +					    exit, true);

... why does the same not work here?  Wouldn't the proper condition
be !dominated_by_p (CDI_DOMINATORS, exit->src, LOOP_VINFO_IV_EXIT 
(loop_vinfo)->src) or similar?  That is, whether the exit is at or
after the main IV exit?  (consider having two)

> +	}
>  
>        if (skip_epilog)
>  	{
> 

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-15 13:01     ` Richard Biener
@ 2023-11-15 13:09       ` Tamar Christina
  2023-11-15 13:22         ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-15 13:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, November 15, 2023 1:01 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> On Wed, 15 Nov 2023, Tamar Christina wrote:
> 
> > Patch updated to latest trunk:
> >
> > Hi All,
> >
> > This changes the PHI node updates to support early breaks.
> > It has to support both the case where the loop's exit matches the
> > normal loop exit and one where the early exit is "inverted", i.e. it's an early
> exit edge.
> >
> > In the latter case we must always restart the loop for VF iterations.
> > For an early exit the reason is obvious, but there are cases where the
> > "normal" exit is located before the early one.  This exit then does a
> > check on ivtmp resulting in us leaving the loop since it thinks we're done.
> >
> > In these case we may still have side-effects to perform so we also go
> > to the scalar loop.
> >
> > For the "normal" exit niters has already been adjusted for peeling,
> > for the early exits we must find out how many iterations we actually
> > did.  So we have to recalculate the new position for each exit.
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide
> unused.
> > 	(vect_update_ivs_after_vectorizer): Support early break.
> > 	(vect_do_peeling): Use it.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> >
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> d2654cf1
> > c842baac58f5 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512
> (class loop *loop,
> >     loop handles exactly VF scalars per iteration.  */
> >
> >  static gcond *
> > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > exit_edge,
> > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > +exit_edge,
> >  				class loop *loop, tree niters, tree step,
> >  				tree final_iv, bool niters_maybe_zero,
> >  				gimple_stmt_iterator loop_cond_gsi) @@ -
> 1412,7 +1412,7 @@
> > vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info
> loop_vinfo
> >     When this happens we need to flip the understanding of main and other
> >     exits by peeling and IV updates.  */
> >
> > -bool inline
> > +bool
> >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> >    return single_pred (loop->latch) == loop_exit->src; @@ -2142,6
> > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> >       Input:
> >       - LOOP - a loop that is going to be vectorized. The last few iterations
> >                of LOOP were peeled.
> > +     - VF   - The chosen vectorization factor for LOOP.
> >       - NITERS - the number of iterations that LOOP executes (before it is
> >                  vectorized). i.e, the number of times the ivs should be bumped.
> >       - UPDATE_E - a successor edge of LOOP->exit that is on the
> > (only) path
> 
> the comment on this is now a bit misleading, can you try to update it and/or
> move the comment bits to the docs on EARLY_EXIT?
> 
> > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> loop_vinfo)
> >                    The phi args associated with the edge UPDATE_E in the bb
> >                    UPDATE_E->dest are updated accordingly.
> >
> > +     - restart_loop - Indicates whether the scalar loop needs to
> > + restart the
> 
> params are ALL_CAPS
> 
> > +		      iteration count where the vector loop began.
> > +
> >       Assumption 1: Like the rest of the vectorizer, this function assumes
> >       a single loop exit that has a single predecessor.
> >
> > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> loop_vinfo)
> >   */
> >
> >  static void
> > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > -				  tree niters, edge update_e)
> > +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > +poly_uint64 vf,
> 
> LOOP_VINFO_VECT_FACTOR?
> 
> > +				  tree niters, edge update_e, bool
> restart_loop)
> 
> I think 'bool early_exit' is better here?  I wonder if we have an "early"
> exit after the main exit we are probably sure there are no side-effects to re-
> execute and could avoid this restarting?

Side effects yes, but the actual check may not have been performed yet.
If you remember https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
There in the clz loop through the "main" exit you still have to see if that iteration
did not contain the entry.  This is because the loop counter is incremented
before you iterate.

> 
> >  {
> >    gphi_iterator gsi, gsi1;
> >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> >    basic_block update_bb = update_e->dest;
> > -
> > -  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > -
> > -  /* Make sure there exists a single-predecessor exit bb:  */
> > -  gcc_assert (single_pred_p (exit_bb));
> > -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> > +  bool inversed_iv
> > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> > +					 LOOP_VINFO_LOOP (loop_vinfo));
> > +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > +			    && flow_bb_inside_loop_p (loop, update_e->src);
> > +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > +  gcond *cond = get_loop_exit_condition (loop_e);
> > +  basic_block exit_bb = loop_e->dest;
> > +  basic_block iv_block = NULL;
> > +  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
> >
> >    for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
> >         !gsi_end_p (gsi) && !gsi_end_p (gsi1); @@ -2190,7 +2198,6 @@
> > vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> >        tree step_expr, off;
> >        tree type;
> >        tree var, ni, ni_name;
> > -      gimple_stmt_iterator last_gsi;
> >
> >        gphi *phi = gsi.phi ();
> >        gphi *phi1 = gsi1.phi ();
> > @@ -2222,11 +2229,52 @@ vect_update_ivs_after_vectorizer
> (loop_vec_info loop_vinfo,
> >        enum vect_induction_op_type induction_type
> >  	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
> >
> > -      if (induction_type == vect_step_op_add)
> > +      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
> > +      /* create_iv always places it on the LHS.  Alternatively we can set a
> > +	 property during create_iv to identify it.  */
> > +      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > +      if (restart_loop && ivtemp)
> >  	{
> > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > +	  ni = build_int_cst (type, vf);
> > +	  if (inversed_iv)
> > +	    ni = fold_build2 (MINUS_EXPR, type, ni,
> > +			      fold_convert (type, step_expr));
> > +	}
> > +      else if (induction_type == vect_step_op_add)
> > +	{
> > +
> >  	  tree stype = TREE_TYPE (step_expr);
> > -	  off = fold_build2 (MULT_EXPR, stype,
> > -			     fold_convert (stype, niters), step_expr);
> > +
> > +	  /* Early exits always use last iter value not niters. */
> > +	  if (restart_loop)
> > +	    {
> > +	      /* Live statements in the non-main exit shouldn't be adjusted.  We
> > +		 normally didn't have this problem with a single exit as live
> > +		 values would be in the exit block.  However when dealing with
> > +		 multiple exits all exits are redirected to the merge block
> > +		 and we restart the iteration.  */
> 
> Hmm, I fail to see how this works - we're either using the value to continue the
> induction or not, independent of STMT_VINFO_LIVE_P.

That becomes clear in the patch to update live reductions.  Essentially any live
Reductions inside an alternative exit will reduce to the first element
rather than the last and use that as the seed for the scalar loop.

It has to do this since you have to perform the side effects for the non-matching
elements still.

Regards,
Tamar

> 
> > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > +		continue;
> > +
> > +	      /* For early break the final loop IV is:
> > +		 init + (final - init) * vf which takes into account peeling
> > +		 values and non-single steps.  The main exit can use niters
> > +		 since if you exit from the main exit you've done all vector
> > +		 iterations.  For an early exit we don't know when we exit so
> we
> > +		 must re-calculate this on the exit.  */
> > +	      tree start_expr = gimple_phi_result (phi);
> > +	      off = fold_build2 (MINUS_EXPR, stype,
> > +				 fold_convert (stype, start_expr),
> > +				 fold_convert (stype, init_expr));
> > +	      /* Now adjust for VF to get the final iteration value.  */
> > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > +				 build_int_cst (stype, vf));
> > +	    }
> > +	  else
> > +	    off = fold_build2 (MULT_EXPR, stype,
> > +			       fold_convert (stype, niters), step_expr);
> > +
> >  	  if (POINTER_TYPE_P (type))
> >  	    ni = fold_build_pointer_plus (init_expr, off);
> >  	  else
> > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info
> loop_vinfo,
> >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> >        else if (induction_type == vect_step_op_neg)
> >  	ni = init_expr;
> > +      else if (restart_loop)
> > +	continue;
> 
> This looks all a bit complicated - why wouldn't we simply always use the PHI
> result when 'restart_loop'?  Isn't that the correct old start value in all cases?
> 
> >        else
> >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> >  					  niters, step_expr,
> > @@ -2245,9 +2295,20 @@ vect_update_ivs_after_vectorizer
> (loop_vec_info
> > loop_vinfo,
> >
> >        var = create_tmp_var (type, "tmp");
> >
> > -      last_gsi = gsi_last_bb (exit_bb);
> >        gimple_seq new_stmts = NULL;
> >        ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > +
> > +      /* For non-main exit create an intermediat edge to get any updated iv
> > +	 calculations.  */
> > +      if (needs_interm_block
> > +	  && !iv_block
> > +	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> (new_stmts)))
> > +	{
> > +	  iv_block = split_edge (update_e);
> > +	  update_e = single_succ_edge (update_e->dest);
> > +	  last_gsi = gsi_last_bb (iv_block);
> > +	}
> > +
> >        /* Exit_bb shouldn't be empty.  */
> >        if (!gsi_end_p (last_gsi))
> >  	{
> > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >  	 niters_vector_mult_vf steps.  */
> >        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> >        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> > -      vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > -					update_e);
> > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +	update_e = single_succ_edge (e->dest);
> > +      bool inversed_iv
> > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> > +					 LOOP_VINFO_LOOP (loop_vinfo));
> 
> You are computing this here and in vect_update_ivs_after_vectorizer?
> 
> > +
> > +      /* Update the main exit first.  */
> > +      vect_update_ivs_after_vectorizer (loop_vinfo, vf, niters_vector_mult_vf,
> > +					update_e, inversed_iv);
> > +
> > +      /* And then update the early exits.  */
> > +      for (auto exit : get_loop_exit_edges (loop))
> > +	{
> > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > +	    continue;
> > +
> > +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > +					    niters_vector_mult_vf,
> > +					    exit, true);
> 
> ... why does the same not work here?  Wouldn't the proper condition be
> !dominated_by_p (CDI_DOMINATORS, exit->src, LOOP_VINFO_IV_EXIT
> (loop_vinfo)->src) or similar?  That is, whether the exit is at or after the main IV
> exit?  (consider having two)
> 
> > +	}
> >
> >        if (skip_epilog)
> >  	{
> >

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-15 13:09       ` Tamar Christina
@ 2023-11-15 13:22         ` Richard Biener
  2023-11-15 14:14           ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-15 13:22 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 15 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, November 15, 2023 1:01 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> > breaks and arbitrary exits
> > 
> > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > 
> > > Patch updated to latest trunk:
> > >
> > > Hi All,
> > >
> > > This changes the PHI node updates to support early breaks.
> > > It has to support both the case where the loop's exit matches the
> > > normal loop exit and one where the early exit is "inverted", i.e. it's an early
> > exit edge.
> > >
> > > In the latter case we must always restart the loop for VF iterations.
> > > For an early exit the reason is obvious, but there are cases where the
> > > "normal" exit is located before the early one.  This exit then does a
> > > check on ivtmp resulting in us leaving the loop since it thinks we're done.
> > >
> > > In these case we may still have side-effects to perform so we also go
> > > to the scalar loop.
> > >
> > > For the "normal" exit niters has already been adjusted for peeling,
> > > for the early exits we must find out how many iterations we actually
> > > did.  So we have to recalculate the new position for each exit.
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide
> > unused.
> > > 	(vect_update_ivs_after_vectorizer): Support early break.
> > > 	(vect_do_peeling): Use it.
> > >
> > > --- inline copy of patch ---
> > >
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > >
> > d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > d2654cf1
> > > c842baac58f5 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512
> > (class loop *loop,
> > >     loop handles exactly VF scalars per iteration.  */
> > >
> > >  static gcond *
> > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > > exit_edge,
> > > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > > +exit_edge,
> > >  				class loop *loop, tree niters, tree step,
> > >  				tree final_iv, bool niters_maybe_zero,
> > >  				gimple_stmt_iterator loop_cond_gsi) @@ -
> > 1412,7 +1412,7 @@
> > > vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info
> > loop_vinfo
> > >     When this happens we need to flip the understanding of main and other
> > >     exits by peeling and IV updates.  */
> > >
> > > -bool inline
> > > +bool
> > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > >    return single_pred (loop->latch) == loop_exit->src; @@ -2142,6
> > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> > >       Input:
> > >       - LOOP - a loop that is going to be vectorized. The last few iterations
> > >                of LOOP were peeled.
> > > +     - VF   - The chosen vectorization factor for LOOP.
> > >       - NITERS - the number of iterations that LOOP executes (before it is
> > >                  vectorized). i.e, the number of times the ivs should be bumped.
> > >       - UPDATE_E - a successor edge of LOOP->exit that is on the
> > > (only) path
> > 
> > the comment on this is now a bit misleading, can you try to update it and/or
> > move the comment bits to the docs on EARLY_EXIT?
> > 
> > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> > loop_vinfo)
> > >                    The phi args associated with the edge UPDATE_E in the bb
> > >                    UPDATE_E->dest are updated accordingly.
> > >
> > > +     - restart_loop - Indicates whether the scalar loop needs to
> > > + restart the
> > 
> > params are ALL_CAPS
> > 
> > > +		      iteration count where the vector loop began.
> > > +
> > >       Assumption 1: Like the rest of the vectorizer, this function assumes
> > >       a single loop exit that has a single predecessor.
> > >
> > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> > loop_vinfo)
> > >   */
> > >
> > >  static void
> > > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > -				  tree niters, edge update_e)
> > > +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > +poly_uint64 vf,
> > 
> > LOOP_VINFO_VECT_FACTOR?
> > 
> > > +				  tree niters, edge update_e, bool
> > restart_loop)
> > 
> > I think 'bool early_exit' is better here?  I wonder if we have an "early"
> > exit after the main exit we are probably sure there are no side-effects to re-
> > execute and could avoid this restarting?
> 
> Side effects yes, but the actual check may not have been performed yet.
> If you remember https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
> There in the clz loop through the "main" exit you still have to see if that iteration
> did not contain the entry.  This is because the loop counter is incremented
> before you iterate.
> 
> > 
> > >  {
> > >    gphi_iterator gsi, gsi1;
> > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > >    basic_block update_bb = update_e->dest;
> > > -
> > > -  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > -
> > > -  /* Make sure there exists a single-predecessor exit bb:  */
> > > -  gcc_assert (single_pred_p (exit_bb));
> > > -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> > > +  bool inversed_iv
> > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > +					 LOOP_VINFO_LOOP (loop_vinfo));
> > > +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > +			    && flow_bb_inside_loop_p (loop, update_e->src);
> > > +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > +  gcond *cond = get_loop_exit_condition (loop_e);
> > > +  basic_block exit_bb = loop_e->dest;
> > > +  basic_block iv_block = NULL;
> > > +  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
> > >
> > >    for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
> > >         !gsi_end_p (gsi) && !gsi_end_p (gsi1); @@ -2190,7 +2198,6 @@
> > > vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > >        tree step_expr, off;
> > >        tree type;
> > >        tree var, ni, ni_name;
> > > -      gimple_stmt_iterator last_gsi;
> > >
> > >        gphi *phi = gsi.phi ();
> > >        gphi *phi1 = gsi1.phi ();
> > > @@ -2222,11 +2229,52 @@ vect_update_ivs_after_vectorizer
> > (loop_vec_info loop_vinfo,
> > >        enum vect_induction_op_type induction_type
> > >  	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
> > >
> > > -      if (induction_type == vect_step_op_add)
> > > +      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
> > > +      /* create_iv always places it on the LHS.  Alternatively we can set a
> > > +	 property during create_iv to identify it.  */
> > > +      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > +      if (restart_loop && ivtemp)
> > >  	{
> > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > +	  ni = build_int_cst (type, vf);
> > > +	  if (inversed_iv)
> > > +	    ni = fold_build2 (MINUS_EXPR, type, ni,
> > > +			      fold_convert (type, step_expr));
> > > +	}
> > > +      else if (induction_type == vect_step_op_add)
> > > +	{
> > > +
> > >  	  tree stype = TREE_TYPE (step_expr);
> > > -	  off = fold_build2 (MULT_EXPR, stype,
> > > -			     fold_convert (stype, niters), step_expr);
> > > +
> > > +	  /* Early exits always use last iter value not niters. */
> > > +	  if (restart_loop)
> > > +	    {
> > > +	      /* Live statements in the non-main exit shouldn't be adjusted.  We
> > > +		 normally didn't have this problem with a single exit as live
> > > +		 values would be in the exit block.  However when dealing with
> > > +		 multiple exits all exits are redirected to the merge block
> > > +		 and we restart the iteration.  */
> > 
> > Hmm, I fail to see how this works - we're either using the value to continue the
> > induction or not, independent of STMT_VINFO_LIVE_P.
> 
> That becomes clear in the patch to update live reductions.  Essentially any live
> Reductions inside an alternative exit will reduce to the first element
> rather than the last and use that as the seed for the scalar loop.

Hum.  Reductions are vectorized as N separate reductions.  I don't think
you can simply change the reduction between the lanes to "skip"
part of the vector iteration.  But you can use the value of the vector
from before the vector iteration - the loop header PHI result, and
fully reduce that to get at the proper value.

> It has to do this since you have to perform the side effects for the non-matching
> elements still.
> 
> Regards,
> Tamar
> 
> > 
> > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > +		continue;
> > > +
> > > +	      /* For early break the final loop IV is:
> > > +		 init + (final - init) * vf which takes into account peeling
> > > +		 values and non-single steps.  The main exit can use niters
> > > +		 since if you exit from the main exit you've done all vector
> > > +		 iterations.  For an early exit we don't know when we exit so
> > we
> > > +		 must re-calculate this on the exit.  */
> > > +	      tree start_expr = gimple_phi_result (phi);
> > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > +				 fold_convert (stype, start_expr),
> > > +				 fold_convert (stype, init_expr));
> > > +	      /* Now adjust for VF to get the final iteration value.  */
> > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > +				 build_int_cst (stype, vf));
> > > +	    }
> > > +	  else
> > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > +			       fold_convert (stype, niters), step_expr);
> > > +
> > >  	  if (POINTER_TYPE_P (type))
> > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > >  	  else
> > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info
> > loop_vinfo,
> > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > >        else if (induction_type == vect_step_op_neg)
> > >  	ni = init_expr;
> > > +      else if (restart_loop)
> > > +	continue;
> > 
> > This looks all a bit complicated - why wouldn't we simply always use the PHI
> > result when 'restart_loop'?  Isn't that the correct old start value in all cases?
> > 
> > >        else
> > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > >  					  niters, step_expr,
> > > @@ -2245,9 +2295,20 @@ vect_update_ivs_after_vectorizer
> > (loop_vec_info
> > > loop_vinfo,
> > >
> > >        var = create_tmp_var (type, "tmp");
> > >
> > > -      last_gsi = gsi_last_bb (exit_bb);
> > >        gimple_seq new_stmts = NULL;
> > >        ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> > > +
> > > +      /* For non-main exit create an intermediat edge to get any updated iv
> > > +	 calculations.  */
> > > +      if (needs_interm_block
> > > +	  && !iv_block
> > > +	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > (new_stmts)))
> > > +	{
> > > +	  iv_block = split_edge (update_e);
> > > +	  update_e = single_succ_edge (update_e->dest);
> > > +	  last_gsi = gsi_last_bb (iv_block);
> > > +	}
> > > +
> > >        /* Exit_bb shouldn't be empty.  */
> > >        if (!gsi_end_p (last_gsi))
> > >  	{
> > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >  	 niters_vector_mult_vf steps.  */
> > >        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> > >        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> > > -      vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > > -					update_e);
> > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +	update_e = single_succ_edge (e->dest);
> > > +      bool inversed_iv
> > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > +					 LOOP_VINFO_LOOP (loop_vinfo));
> > 
> > You are computing this here and in vect_update_ivs_after_vectorizer?
> > 
> > > +
> > > +      /* Update the main exit first.  */
> > > +      vect_update_ivs_after_vectorizer (loop_vinfo, vf, niters_vector_mult_vf,
> > > +					update_e, inversed_iv);
> > > +
> > > +      /* And then update the early exits.  */
> > > +      for (auto exit : get_loop_exit_edges (loop))
> > > +	{
> > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > +	    continue;
> > > +
> > > +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > +					    niters_vector_mult_vf,
> > > +					    exit, true);
> > 
> > ... why does the same not work here?  Wouldn't the proper condition be
> > !dominated_by_p (CDI_DOMINATORS, exit->src, LOOP_VINFO_IV_EXIT
> > (loop_vinfo)->src) or similar?  That is, whether the exit is at or after the main IV
> > exit?  (consider having two)
> > 
> > > +	}
> > >
> > >        if (skip_epilog)
> > >  	{
> > >
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-15  0:05   ` Tamar Christina
@ 2023-11-15 13:41     ` Richard Biener
  2023-11-15 14:26       ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-15 13:41 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 15 Nov 2023, Tamar Christina wrote:

> Patch updated to trunk.
> 
> This adds support to vectorizable_live_reduction to handle multiple exits by

vectorizable_live_operation, but I do wonder how you handle reductions?

> doing a search for which exit the live value should be materialized in.
> 
> Additinally which value in the index we're after depends on whether the exit
> it's materialized in is an early exit or whether the loop's main exit is
> different from the loop's natural one (i.e. the one with the same src block as
> the latch).
> 
> In those two cases we want the first rather than the last value as we're going
> to restart the iteration in the scalar loop.  For VLA this means we need to
> reverse both the mask and vector since there's only a way to get the last
> active element and not the first.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
> 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 4cf7f65dc164db27a498b31fe7ce0d9af3f3e299..2476e59ef488fd0a3b296ced7b0d4d3e76a3634f 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10627,12 +10627,60 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>  	   lhs' = new_tree;  */
>  
>        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +      /* A value can only be live in one exit.  So figure out which one.  */

Well, a value can be live across multiple exits!

> +      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +      /* Check if we have a loop where the chosen exit is not the main exit,
> +	 in these cases for an early break we restart the iteration the vector code
> +	 did.  For the live values we want the value at the start of the iteration
> +	 rather than at the end.  */
> +      bool restart_loop = false;
> +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	{
> +	  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> +	    if (!is_gimple_debug (use_stmt)
> +		&& !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> +	      {

In fact when you get here you know the use is in a LC PHI.  Use
FOR_EACH_IMM_USE_FAST and you can get at the edge
via phi_arg_index_from_use and gimple_phi_arg_edge.

As said you have to process all exits the value is live on, not only
the first.

> +		basic_block use_bb = gimple_bb (use_stmt);
> +		for (auto edge : get_loop_exit_edges (loop))
> +		  {
> +		    /* Alternative exits can have an intermediate BB in
> +		       between to update the IV.  In those cases we need to
> +		       look one block further.  */
> +		    if (use_bb == edge->dest
> +			|| (single_succ_p (edge->dest)
> +			    && use_bb == single_succ (edge->dest)))
> +		      {
> +			exit_e = edge;
> +			goto found;
> +		      }
> +		  }
> +	      }
> +found:
> +	  /* If the edge isn't a single pred then split the edge so we have a
> +	     location to place the live operations.  Perhaps we should always
> +	     split during IV updating.  But this way the CFG is cleaner to
> +	     follow.  */
> +	  restart_loop = !vect_is_loop_exit_latch_pred (exit_e, loop);
> +	  if (!single_pred_p (exit_e->dest))
> +	    exit_e = single_pred_edge (split_edge (exit_e));
> +
> +	  /* For early exit where the exit is not in the BB that leads to the
> +	     latch then we're restarting the iteration in the scalar loop. So
> +	     get the first live value.  */
> +	  if (restart_loop)
> +	    {
> +	      vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> +	      vec_lhs = gimple_get_lhs (vec_stmt);
> +	      bitstart = build_zero_cst (TREE_TYPE (bitstart));

No, this doesn't work for SLP.  Note this also gets you the "first" live
value _after_ the vector iteration.  Btw, I fail to see why you need
to handle STMT_VINFO_LIVE at all for the early exits - this is
scalar values live _after_ all iterations of the loop, thus it's
provided by the scalar epilog that always runs when we exit the vector
loop early.

The story is different for reductions though (unless we fail to support
early breaks for those at the moment).

Richard.


> +	    }
> +	}
> +
> +      basic_block exit_bb = exit_e->dest;
>        gcc_assert (single_pred_p (exit_bb));
>  
>        tree vec_lhs_phi = copy_ssa_name (vec_lhs);
>        gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> -      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
> +      SET_PHI_ARG_DEF (phi, exit_e->dest_idx, vec_lhs);
>  
>        gimple_seq stmts = NULL;
>        tree new_tree;
> @@ -10663,6 +10711,12 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>  	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
>  					  len, bias_minus_one);
>  
> +	  /* This needs to implement extraction of the first index, but not sure
> +	     how the LEN stuff works.  At the moment we shouldn't get here since
> +	     there's no LEN support for early breaks.  But guard this so there's
> +	     no incorrect codegen.  */
> +	  gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> +
>  	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
>  	  tree scalar_res
>  	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> @@ -10687,8 +10741,37 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>  					  &LOOP_VINFO_MASKS (loop_vinfo),
>  					  1, vectype, 0);
>  	  gimple_seq_add_seq (&stmts, tem);
> -	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> -					  mask, vec_lhs_phi);
> +	  tree scalar_res;
> +
> +	  /* For an inverted control flow with early breaks we want EXTRACT_FIRST
> +	     instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
> +	  if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	    {
> +	      auto gsi_stmt = gsi_last (stmts);
> +
> +	       /* First create the permuted mask.  */
> +	      tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> +	      tree perm_dest = copy_ssa_name (mask);
> +	      gimple *perm_stmt
> +		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> +					   mask, perm_mask);
> +	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> +					   &gsi_stmt);
> +	      mask = perm_dest;
> +
> +	       /* Then permute the vector contents.  */
> +	      tree perm_elem = perm_mask_for_reverse (vectype);
> +	      perm_dest = copy_ssa_name (vec_lhs_phi);
> +	      perm_stmt
> +		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
> +					   vec_lhs_phi, perm_elem);
> +	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> +					   &gsi_stmt);
> +	      vec_lhs_phi = perm_dest;
> +	    }
> +
> +	  scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> +				     mask, vec_lhs_phi);
>  
>  	  /* Convert the extracted vector element to the scalar type.  */
>  	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> @@ -10708,26 +10791,36 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>        if (stmts)
>  	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
>  
> -      /* Remove existing phis that copy from lhs and create copies
> -	 from new_tree.  */
> -      gimple_stmt_iterator gsi;
> -      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
> +      /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
> +      bool single_use = true;
> +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
>  	{
> -	  gimple *phi = gsi_stmt (gsi);
> -	  if ((gimple_phi_arg_def (phi, 0) == lhs))
> +	  if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> +	    continue;
> +
> +	  gcc_assert (single_use);
> +	  if (is_a <gphi *> (use_stmt)
> +	      && gimple_phi_arg_def (as_a <gphi *> (use_stmt), 0) == lhs)
>  	    {
> +	      /* Remove existing phis that copy from lhs and create copies
> +		 from new_tree.  */
> +	      gphi *phi = as_a <gphi *> (use_stmt);
> +	      auto gsi = gsi_for_phi (phi);
>  	      remove_phi_node (&gsi, false);
>  	      tree lhs_phi = gimple_phi_result (phi);
>  	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
>  	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
>  	    }
>  	  else
> -	    gsi_next (&gsi);
> +	    {
> +	      /* Or just update the use in place if not a phi.  */
> +	      use_operand_p use_p;
> +	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> +		SET_USE (use_p, new_tree);
> +	      update_stmt (use_stmt);
> +	    }
> +	  single_use = false;
>  	}
> -
> -      /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
> -      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> -	gcc_assert (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)));
>      }
>    else
>      {
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 3a22bf02f5ab16ded0af61cd1d719a98b8982144..7c3d6d196e122d67f750dfef6d615aabc6c28281 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1774,7 +1774,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
>  /* If the target supports a permute mask that reverses the elements in
>     a vector of type VECTYPE, return that mask, otherwise return null.  */
>  
> -static tree
> +tree
>  perm_mask_for_reverse (tree vectype)
>  {
>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index b9a71a0b5f5407417e8366b0df132df20c7f60aa..f261fc74b8795b4516b17155441d25baaf8c22ae 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2246,6 +2246,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
>  				enum vect_def_type *,
>  				tree *, stmt_vec_info * = NULL);
>  extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
> +extern tree perm_mask_for_reverse (tree);
>  extern bool supportable_widening_operation (vec_info*, code_helper,
>  					    stmt_vec_info, tree, tree,
>  					    code_helper*, code_helper*,
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-15 13:22         ` Richard Biener
@ 2023-11-15 14:14           ` Tamar Christina
  2023-11-16 10:40             ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-15 14:14 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, November 15, 2023 1:23 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> On Wed, 15 Nov 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Wednesday, November 15, 2023 1:01 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> jlaw@ventanamicro.com
> > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > support early breaks and arbitrary exits
> > >
> > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > >
> > > > Patch updated to latest trunk:
> > > >
> > > > Hi All,
> > > >
> > > > This changes the PHI node updates to support early breaks.
> > > > It has to support both the case where the loop's exit matches the
> > > > normal loop exit and one where the early exit is "inverted", i.e.
> > > > it's an early
> > > exit edge.
> > > >
> > > > In the latter case we must always restart the loop for VF iterations.
> > > > For an early exit the reason is obvious, but there are cases where
> > > > the "normal" exit is located before the early one.  This exit then
> > > > does a check on ivtmp resulting in us leaving the loop since it thinks we're
> done.
> > > >
> > > > In these case we may still have side-effects to perform so we also
> > > > go to the scalar loop.
> > > >
> > > > For the "normal" exit niters has already been adjusted for
> > > > peeling, for the early exits we must find out how many iterations
> > > > we actually did.  So we have to recalculate the new position for each exit.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide
> > > unused.
> > > > 	(vect_update_ivs_after_vectorizer): Support early break.
> > > > 	(vect_do_peeling): Use it.
> > > >
> > > > --- inline copy of patch ---
> > > >
> > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > b/gcc/tree-vect-loop-manip.cc index
> > > >
> > >
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > > d2654cf1
> > > > c842baac58f5 100644
> > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > @@ -1200,7 +1200,7 @@
> > > > vect_set_loop_condition_partial_vectors_avx512
> > > (class loop *loop,
> > > >     loop handles exactly VF scalars per iteration.  */
> > > >
> > > >  static gcond *
> > > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > > > exit_edge,
> > > > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */,
> > > > +edge exit_edge,
> > > >  				class loop *loop, tree niters, tree step,
> > > >  				tree final_iv, bool niters_maybe_zero,
> > > >  				gimple_stmt_iterator loop_cond_gsi) @@ -
> > > 1412,7 +1412,7 @@
> > > > vect_set_loop_condition (class loop *loop, edge loop_e,
> > > > loop_vec_info
> > > loop_vinfo
> > > >     When this happens we need to flip the understanding of main and
> other
> > > >     exits by peeling and IV updates.  */
> > > >
> > > > -bool inline
> > > > +bool
> > > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > > >    return single_pred (loop->latch) == loop_exit->src; @@ -2142,6
> > > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> > > >       Input:
> > > >       - LOOP - a loop that is going to be vectorized. The last few iterations
> > > >                of LOOP were peeled.
> > > > +     - VF   - The chosen vectorization factor for LOOP.
> > > >       - NITERS - the number of iterations that LOOP executes (before it is
> > > >                  vectorized). i.e, the number of times the ivs should be bumped.
> > > >       - UPDATE_E - a successor edge of LOOP->exit that is on the
> > > > (only) path
> > >
> > > the comment on this is now a bit misleading, can you try to update
> > > it and/or move the comment bits to the docs on EARLY_EXIT?
> > >
> > > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> > > loop_vinfo)
> > > >                    The phi args associated with the edge UPDATE_E in the bb
> > > >                    UPDATE_E->dest are updated accordingly.
> > > >
> > > > +     - restart_loop - Indicates whether the scalar loop needs to
> > > > + restart the
> > >
> > > params are ALL_CAPS
> > >
> > > > +		      iteration count where the vector loop began.
> > > > +
> > > >       Assumption 1: Like the rest of the vectorizer, this function assumes
> > > >       a single loop exit that has a single predecessor.
> > > >
> > > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> > > loop_vinfo)
> > > >   */
> > > >
> > > >  static void
> > > > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > -				  tree niters, edge update_e)
> > > > +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > +poly_uint64 vf,
> > >
> > > LOOP_VINFO_VECT_FACTOR?
> > >
> > > > +				  tree niters, edge update_e, bool
> > > restart_loop)
> > >
> > > I think 'bool early_exit' is better here?  I wonder if we have an "early"
> > > exit after the main exit we are probably sure there are no
> > > side-effects to re- execute and could avoid this restarting?
> >
> > Side effects yes, but the actual check may not have been performed yet.
> > If you remember
> > https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
> > There in the clz loop through the "main" exit you still have to see if
> > that iteration did not contain the entry.  This is because the loop
> > counter is incremented before you iterate.
> >
> > >
> > > >  {
> > > >    gphi_iterator gsi, gsi1;
> > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > >    basic_block update_bb = update_e->dest;
> > > > -
> > > > -  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > -
> > > > -  /* Make sure there exists a single-predecessor exit bb:  */
> > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> > > > +  bool inversed_iv
> > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > > +					 LOOP_VINFO_LOOP (loop_vinfo));
> > > > +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > +			    && flow_bb_inside_loop_p (loop, update_e->src);
> > > > +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > +  gcond *cond = get_loop_exit_condition (loop_e);
> > > > +  basic_block exit_bb = loop_e->dest;
> > > > +  basic_block iv_block = NULL;
> > > > +  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
> > > >
> > > >    for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> (update_bb);
> > > >         !gsi_end_p (gsi) && !gsi_end_p (gsi1); @@ -2190,7 +2198,6
> > > > @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > >        tree step_expr, off;
> > > >        tree type;
> > > >        tree var, ni, ni_name;
> > > > -      gimple_stmt_iterator last_gsi;
> > > >
> > > >        gphi *phi = gsi.phi ();
> > > >        gphi *phi1 = gsi1.phi ();
> > > > @@ -2222,11 +2229,52 @@ vect_update_ivs_after_vectorizer
> > > (loop_vec_info loop_vinfo,
> > > >        enum vect_induction_op_type induction_type
> > > >  	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
> > > >
> > > > -      if (induction_type == vect_step_op_add)
> > > > +      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge
> (loop));
> > > > +      /* create_iv always places it on the LHS.  Alternatively we can set a
> > > > +	 property during create_iv to identify it.  */
> > > > +      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > +      if (restart_loop && ivtemp)
> > > >  	{
> > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > +	  ni = build_int_cst (type, vf);
> > > > +	  if (inversed_iv)
> > > > +	    ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > +			      fold_convert (type, step_expr));
> > > > +	}
> > > > +      else if (induction_type == vect_step_op_add)
> > > > +	{
> > > > +
> > > >  	  tree stype = TREE_TYPE (step_expr);
> > > > -	  off = fold_build2 (MULT_EXPR, stype,
> > > > -			     fold_convert (stype, niters), step_expr);
> > > > +
> > > > +	  /* Early exits always use last iter value not niters. */
> > > > +	  if (restart_loop)
> > > > +	    {
> > > > +	      /* Live statements in the non-main exit shouldn't be adjusted.  We
> > > > +		 normally didn't have this problem with a single exit as live
> > > > +		 values would be in the exit block.  However when dealing with
> > > > +		 multiple exits all exits are redirected to the merge block
> > > > +		 and we restart the iteration.  */
> > >
> > > Hmm, I fail to see how this works - we're either using the value to
> > > continue the induction or not, independent of STMT_VINFO_LIVE_P.
> >
> > That becomes clear in the patch to update live reductions.
> > Essentially any live Reductions inside an alternative exit will reduce
> > to the first element rather than the last and use that as the seed for the
> scalar loop.
> 
> Hum.  Reductions are vectorized as N separate reductions.  I don't think you
> can simply change the reduction between the lanes to "skip"
> part of the vector iteration.  But you can use the value of the vector from
> before the vector iteration - the loop header PHI result, and fully reduce that
> to get at the proper value.

That's what It's supposed to be doing though.  The reason live operations
are skipped here is that if we don't we'll re-adjust the IV even though the value
will already be correct after vectorization.

Remember that this code only gets so far for IV PHI nodes.

The loop phi header result itself can be live, i.e. see testcases
vect-early-break_70.c to vect-early-break_75.c

you have i_15 = PHI <i_14 (6), 1(2)>

we use i_15 in the early exit. This should not be adjusted because when it's
vectorized the value at 0[lane 0] is already correct.  This is why for any PHI
inside the early exits it uses the value 0[0] instead of N[lane_max].

Perhaps I'm missing something here?

Regards,
Tamar
> 
> > It has to do this since you have to perform the side effects for the
> > non-matching elements still.
> >
> > Regards,
> > Tamar
> >
> > >
> > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > +		continue;
> > > > +
> > > > +	      /* For early break the final loop IV is:
> > > > +		 init + (final - init) * vf which takes into account peeling
> > > > +		 values and non-single steps.  The main exit can use niters
> > > > +		 since if you exit from the main exit you've done all vector
> > > > +		 iterations.  For an early exit we don't know when we exit so
> > > we
> > > > +		 must re-calculate this on the exit.  */
> > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > +				 fold_convert (stype, start_expr),
> > > > +				 fold_convert (stype, init_expr));
> > > > +	      /* Now adjust for VF to get the final iteration value.  */
> > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > +				 build_int_cst (stype, vf));
> > > > +	    }
> > > > +	  else
> > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > +			       fold_convert (stype, niters), step_expr);
> > > > +
> > > >  	  if (POINTER_TYPE_P (type))
> > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > >  	  else
> > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer
> > > > (loop_vec_info
> > > loop_vinfo,
> > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > >        else if (induction_type == vect_step_op_neg)
> > > >  	ni = init_expr;
> > > > +      else if (restart_loop)
> > > > +	continue;
> > >
> > > This looks all a bit complicated - why wouldn't we simply always use
> > > the PHI result when 'restart_loop'?  Isn't that the correct old start value in
> all cases?
> > >
> > > >        else
> > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > >  					  niters, step_expr,
> > > > @@ -2245,9 +2295,20 @@ vect_update_ivs_after_vectorizer
> > > (loop_vec_info
> > > > loop_vinfo,
> > > >
> > > >        var = create_tmp_var (type, "tmp");
> > > >
> > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > >        gimple_seq new_stmts = NULL;
> > > >        ni_name = force_gimple_operand (ni, &new_stmts, false,
> > > > var);
> > > > +
> > > > +      /* For non-main exit create an intermediat edge to get any updated iv
> > > > +	 calculations.  */
> > > > +      if (needs_interm_block
> > > > +	  && !iv_block
> > > > +	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > (new_stmts)))
> > > > +	{
> > > > +	  iv_block = split_edge (update_e);
> > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > +	}
> > > > +
> > > >        /* Exit_bb shouldn't be empty.  */
> > > >        if (!gsi_end_p (last_gsi))
> > > >  	{
> > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > > > tree
> > > niters, tree nitersm1,
> > > >  	 niters_vector_mult_vf steps.  */
> > > >        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> > > >        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> > > > -      vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > > > -					update_e);
> > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +	update_e = single_succ_edge (e->dest);
> > > > +      bool inversed_iv
> > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > > +					 LOOP_VINFO_LOOP (loop_vinfo));
> > >
> > > You are computing this here and in vect_update_ivs_after_vectorizer?
> > >
> > > > +
> > > > +      /* Update the main exit first.  */
> > > > +      vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> niters_vector_mult_vf,
> > > > +					update_e, inversed_iv);
> > > > +
> > > > +      /* And then update the early exits.  */
> > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > +	{
> > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > +	    continue;
> > > > +
> > > > +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > +					    niters_vector_mult_vf,
> > > > +					    exit, true);
> > >
> > > ... why does the same not work here?  Wouldn't the proper condition
> > > be !dominated_by_p (CDI_DOMINATORS, exit->src, LOOP_VINFO_IV_EXIT
> > > (loop_vinfo)->src) or similar?  That is, whether the exit is at or
> > > after the main IV exit?  (consider having two)
> > >
> > > > +	}
> > > >
> > > >        if (skip_epilog)
> > > >  	{
> > > >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-15 13:41     ` Richard Biener
@ 2023-11-15 14:26       ` Tamar Christina
  2023-11-16 11:16         ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-15 14:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw



> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, November 15, 2023 1:42 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction
> with support for multiple exits and different exits
> 
> On Wed, 15 Nov 2023, Tamar Christina wrote:
> 
> > Patch updated to trunk.
> >
> > This adds support to vectorizable_live_reduction to handle multiple
> > exits by
> 
> vectorizable_live_operation, but I do wonder how you handle reductions?

In the testcases I have reductions all seem to work fine, since reductions are
Placed in the merge block between the two loops and always have the
"value so far from full loop iterations".  These will just be used as seed for the
Scalar loop for any partial iterations.

> 
> > doing a search for which exit the live value should be materialized in.
> >
> > Additinally which value in the index we're after depends on whether
> > the exit it's materialized in is an early exit or whether the loop's
> > main exit is different from the loop's natural one (i.e. the one with
> > the same src block as the latch).
> >
> > In those two cases we want the first rather than the last value as
> > we're going to restart the iteration in the scalar loop.  For VLA this
> > means we need to reverse both the mask and vector since there's only a
> > way to get the last active element and not the first.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
> > 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> > 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> >
> 4cf7f65dc164db27a498b31fe7ce0d9af3f3e299..2476e59ef488fd0a3b296c
> ed7b0d
> > 4d3e76a3634f 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -10627,12 +10627,60 @@ vectorizable_live_operation (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >  	   lhs' = new_tree;  */
> >
> >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > -      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > +      /* A value can only be live in one exit.  So figure out which
> > + one.  */
> 
> Well, a value can be live across multiple exits!

The same value can only be live across multiple early exits no?  In which
case they'll all still be I the same block as all the early exits end In the same
merge block.

So this code is essentially just figuring out if you're an early or normal exit.
Perhaps the comment is inclear..

> 
> > +      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > +      /* Check if we have a loop where the chosen exit is not the main exit,
> > +	 in these cases for an early break we restart the iteration the vector
> code
> > +	 did.  For the live values we want the value at the start of the iteration
> > +	 rather than at the end.  */
> > +      bool restart_loop = false;
> > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +	{
> > +	  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> > +	    if (!is_gimple_debug (use_stmt)
> > +		&& !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> > +	      {
> 
> In fact when you get here you know the use is in a LC PHI.  Use
> FOR_EACH_IMM_USE_FAST and you can get at the edge via
> phi_arg_index_from_use and gimple_phi_arg_edge.
> 
> As said you have to process all exits the value is live on, not only the first.
> 
> > +		basic_block use_bb = gimple_bb (use_stmt);
> > +		for (auto edge : get_loop_exit_edges (loop))
> > +		  {
> > +		    /* Alternative exits can have an intermediate BB in
> > +		       between to update the IV.  In those cases we need to
> > +		       look one block further.  */
> > +		    if (use_bb == edge->dest
> > +			|| (single_succ_p (edge->dest)
> > +			    && use_bb == single_succ (edge->dest)))
> > +		      {
> > +			exit_e = edge;
> > +			goto found;
> > +		      }
> > +		  }
> > +	      }
> > +found:
> > +	  /* If the edge isn't a single pred then split the edge so we have a
> > +	     location to place the live operations.  Perhaps we should always
> > +	     split during IV updating.  But this way the CFG is cleaner to
> > +	     follow.  */
> > +	  restart_loop = !vect_is_loop_exit_latch_pred (exit_e, loop);
> > +	  if (!single_pred_p (exit_e->dest))
> > +	    exit_e = single_pred_edge (split_edge (exit_e));
> > +
> > +	  /* For early exit where the exit is not in the BB that leads to the
> > +	     latch then we're restarting the iteration in the scalar loop. So
> > +	     get the first live value.  */
> > +	  if (restart_loop)
> > +	    {
> > +	      vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > +	      vec_lhs = gimple_get_lhs (vec_stmt);
> > +	      bitstart = build_zero_cst (TREE_TYPE (bitstart));
> 
> No, this doesn't work for SLP.  Note this also gets you the "first" live value
> _after_ the vector iteration.

Yes we're after the first value for a full vector iteration.  In the initial iteration that
value the seed vector is always started from the initial value of the PHI node no?

> Btw, I fail to see why you need to handle
> STMT_VINFO_LIVE at all for the early exits - this is scalar values live _after_ all
> iterations of the loop, thus it's provided by the scalar epilog that always runs
> when we exit the vector loop early.

In the patch of last year I basically exited here with return true, and did not bother
vectorizing them at all and instead adjusted them during the vect_update_ivs_after_vectorizer
just like we normally would..  But you didn't seem to like that approach.

If we take that approach again then the only thing needing to be changed here is
to ignore the live operations inside an early exit block.

The reason they appear is that if you have something like

If (foo)
  return I;

when we redirect the edge `i` ends up in the block between the two loops, and I Is also
the loop counter.

Would you prefer I use last year's approach instead? i.e. just ignore them and recalculate
Any loop IVs as normal?

Thanks,
Tamar

> 
> The story is different for reductions though (unless we fail to support early
> breaks for those at the moment).
> 
> Richard.
> 
> 
> > +	    }
> > +	}
> > +
> > +      basic_block exit_bb = exit_e->dest;
> >        gcc_assert (single_pred_p (exit_bb));
> >
> >        tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> >        gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> > -      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx,
> vec_lhs);
> > +      SET_PHI_ARG_DEF (phi, exit_e->dest_idx, vec_lhs);
> >
> >        gimple_seq stmts = NULL;
> >        tree new_tree;
> > @@ -10663,6 +10711,12 @@ vectorizable_live_operation (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >  	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
> >  					  len, bias_minus_one);
> >
> > +	  /* This needs to implement extraction of the first index, but not sure
> > +	     how the LEN stuff works.  At the moment we shouldn't get here
> since
> > +	     there's no LEN support for early breaks.  But guard this so there's
> > +	     no incorrect codegen.  */
> > +	  gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> > +
> >  	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
> >  	  tree scalar_res
> >  	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> @@
> > -10687,8 +10741,37 @@ vectorizable_live_operation (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >  					  &LOOP_VINFO_MASKS (loop_vinfo),
> >  					  1, vectype, 0);
> >  	  gimple_seq_add_seq (&stmts, tem);
> > -	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST,
> scalar_type,
> > -					  mask, vec_lhs_phi);
> > +	  tree scalar_res;
> > +
> > +	  /* For an inverted control flow with early breaks we want
> EXTRACT_FIRST
> > +	     instead of EXTRACT_LAST.  Emulate by reversing the vector and
> mask. */
> > +	  if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +	    {
> > +	      auto gsi_stmt = gsi_last (stmts);
> > +
> > +	       /* First create the permuted mask.  */
> > +	      tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> > +	      tree perm_dest = copy_ssa_name (mask);
> > +	      gimple *perm_stmt
> > +		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> > +					   mask, perm_mask);
> > +	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> > +					   &gsi_stmt);
> > +	      mask = perm_dest;
> > +
> > +	       /* Then permute the vector contents.  */
> > +	      tree perm_elem = perm_mask_for_reverse (vectype);
> > +	      perm_dest = copy_ssa_name (vec_lhs_phi);
> > +	      perm_stmt
> > +		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> vec_lhs_phi,
> > +					   vec_lhs_phi, perm_elem);
> > +	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> > +					   &gsi_stmt);
> > +	      vec_lhs_phi = perm_dest;
> > +	    }
> > +
> > +	  scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> > +				     mask, vec_lhs_phi);
> >
> >  	  /* Convert the extracted vector element to the scalar type.  */
> >  	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res); @@
> > -10708,26 +10791,36 @@ vectorizable_live_operation (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >        if (stmts)
> >  	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> >
> > -      /* Remove existing phis that copy from lhs and create copies
> > -	 from new_tree.  */
> > -      gimple_stmt_iterator gsi;
> > -      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
> > +      /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
> > +      bool single_use = true;
> > +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> >  	{
> > -	  gimple *phi = gsi_stmt (gsi);
> > -	  if ((gimple_phi_arg_def (phi, 0) == lhs))
> > +	  if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> > +	    continue;
> > +
> > +	  gcc_assert (single_use);
> > +	  if (is_a <gphi *> (use_stmt)
> > +	      && gimple_phi_arg_def (as_a <gphi *> (use_stmt), 0) == lhs)
> >  	    {
> > +	      /* Remove existing phis that copy from lhs and create copies
> > +		 from new_tree.  */
> > +	      gphi *phi = as_a <gphi *> (use_stmt);
> > +	      auto gsi = gsi_for_phi (phi);
> >  	      remove_phi_node (&gsi, false);
> >  	      tree lhs_phi = gimple_phi_result (phi);
> >  	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> >  	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> >  	    }
> >  	  else
> > -	    gsi_next (&gsi);
> > +	    {
> > +	      /* Or just update the use in place if not a phi.  */
> > +	      use_operand_p use_p;
> > +	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> > +		SET_USE (use_p, new_tree);
> > +	      update_stmt (use_stmt);
> > +	    }
> > +	  single_use = false;
> >  	}
> > -
> > -      /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
> > -      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> > -	gcc_assert (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)));
> >      }
> >    else
> >      {
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> >
> 3a22bf02f5ab16ded0af61cd1d719a98b8982144..7c3d6d196e122d67f750
> dfef6d61
> > 5aabc6c28281 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -1774,7 +1774,7 @@ compare_step_with_zero (vec_info *vinfo,
> > stmt_vec_info stmt_info)
> >  /* If the target supports a permute mask that reverses the elements in
> >     a vector of type VECTYPE, return that mask, otherwise return null.
> > */
> >
> > -static tree
> > +tree
> >  perm_mask_for_reverse (tree vectype)
> >  {
> >    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); diff --git
> > a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> >
> b9a71a0b5f5407417e8366b0df132df20c7f60aa..f261fc74b8795b4516b17
> 155441d
> > 25baaf8c22ae 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -2246,6 +2246,7 @@ extern bool vect_is_simple_use (vec_info *,
> stmt_vec_info, slp_tree,
> >  				enum vect_def_type *,
> >  				tree *, stmt_vec_info * = NULL);
> >  extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
> > +extern tree perm_mask_for_reverse (tree);
> >  extern bool supportable_widening_operation (vec_info*, code_helper,
> >  					    stmt_vec_info, tree, tree,
> >  					    code_helper*, code_helper*,
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-15 14:14           ` Tamar Christina
@ 2023-11-16 10:40             ` Richard Biener
  2023-11-16 11:08               ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-16 10:40 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 15 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, November 15, 2023 1:23 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> > breaks and arbitrary exits
> > 
> > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Wednesday, November 15, 2023 1:01 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > jlaw@ventanamicro.com
> > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > > support early breaks and arbitrary exits
> > > >
> > > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > > >
> > > > > Patch updated to latest trunk:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > This changes the PHI node updates to support early breaks.
> > > > > It has to support both the case where the loop's exit matches the
> > > > > normal loop exit and one where the early exit is "inverted", i.e.
> > > > > it's an early
> > > > exit edge.
> > > > >
> > > > > In the latter case we must always restart the loop for VF iterations.
> > > > > For an early exit the reason is obvious, but there are cases where
> > > > > the "normal" exit is located before the early one.  This exit then
> > > > > does a check on ivtmp resulting in us leaving the loop since it thinks we're
> > done.
> > > > >
> > > > > In these case we may still have side-effects to perform so we also
> > > > > go to the scalar loop.
> > > > >
> > > > > For the "normal" exit niters has already been adjusted for
> > > > > peeling, for the early exits we must find out how many iterations
> > > > > we actually did.  So we have to recalculate the new position for each exit.
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide
> > > > unused.
> > > > > 	(vect_update_ivs_after_vectorizer): Support early break.
> > > > > 	(vect_do_peeling): Use it.
> > > > >
> > > > > --- inline copy of patch ---
> > > > >
> > > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > > b/gcc/tree-vect-loop-manip.cc index
> > > > >
> > > >
> > d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > > > d2654cf1
> > > > > c842baac58f5 100644
> > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > @@ -1200,7 +1200,7 @@
> > > > > vect_set_loop_condition_partial_vectors_avx512
> > > > (class loop *loop,
> > > > >     loop handles exactly VF scalars per iteration.  */
> > > > >
> > > > >  static gcond *
> > > > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > > > > exit_edge,
> > > > > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */,
> > > > > +edge exit_edge,
> > > > >  				class loop *loop, tree niters, tree step,
> > > > >  				tree final_iv, bool niters_maybe_zero,
> > > > >  				gimple_stmt_iterator loop_cond_gsi) @@ -
> > > > 1412,7 +1412,7 @@
> > > > > vect_set_loop_condition (class loop *loop, edge loop_e,
> > > > > loop_vec_info
> > > > loop_vinfo
> > > > >     When this happens we need to flip the understanding of main and
> > other
> > > > >     exits by peeling and IV updates.  */
> > > > >
> > > > > -bool inline
> > > > > +bool
> > > > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > > > >    return single_pred (loop->latch) == loop_exit->src; @@ -2142,6
> > > > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> > > > >       Input:
> > > > >       - LOOP - a loop that is going to be vectorized. The last few iterations
> > > > >                of LOOP were peeled.
> > > > > +     - VF   - The chosen vectorization factor for LOOP.
> > > > >       - NITERS - the number of iterations that LOOP executes (before it is
> > > > >                  vectorized). i.e, the number of times the ivs should be bumped.
> > > > >       - UPDATE_E - a successor edge of LOOP->exit that is on the
> > > > > (only) path
> > > >
> > > > the comment on this is now a bit misleading, can you try to update
> > > > it and/or move the comment bits to the docs on EARLY_EXIT?
> > > >
> > > > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> > > > loop_vinfo)
> > > > >                    The phi args associated with the edge UPDATE_E in the bb
> > > > >                    UPDATE_E->dest are updated accordingly.
> > > > >
> > > > > +     - restart_loop - Indicates whether the scalar loop needs to
> > > > > + restart the
> > > >
> > > > params are ALL_CAPS
> > > >
> > > > > +		      iteration count where the vector loop began.
> > > > > +
> > > > >       Assumption 1: Like the rest of the vectorizer, this function assumes
> > > > >       a single loop exit that has a single predecessor.
> > > > >
> > > > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> > > > loop_vinfo)
> > > > >   */
> > > > >
> > > > >  static void
> > > > > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > > -				  tree niters, edge update_e)
> > > > > +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > > +poly_uint64 vf,
> > > >
> > > > LOOP_VINFO_VECT_FACTOR?
> > > >
> > > > > +				  tree niters, edge update_e, bool
> > > > restart_loop)
> > > >
> > > > I think 'bool early_exit' is better here?  I wonder if we have an "early"
> > > > exit after the main exit we are probably sure there are no
> > > > side-effects to re- execute and could avoid this restarting?
> > >
> > > Side effects yes, but the actual check may not have been performed yet.
> > > If you remember
> > > https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
> > > There in the clz loop through the "main" exit you still have to see if
> > > that iteration did not contain the entry.  This is because the loop
> > > counter is incremented before you iterate.
> > >
> > > >
> > > > >  {
> > > > >    gphi_iterator gsi, gsi1;
> > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > >    basic_block update_bb = update_e->dest;
> > > > > -
> > > > > -  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > > -
> > > > > -  /* Make sure there exists a single-predecessor exit bb:  */
> > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> > > > > +  bool inversed_iv
> > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > > > +					 LOOP_VINFO_LOOP (loop_vinfo));
> > > > > +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > +			    && flow_bb_inside_loop_p (loop, update_e->src);
> > > > > +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > +  gcond *cond = get_loop_exit_condition (loop_e);
> > > > > +  basic_block exit_bb = loop_e->dest;
> > > > > +  basic_block iv_block = NULL;
> > > > > +  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
> > > > >
> > > > >    for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis
> > (update_bb);
> > > > >         !gsi_end_p (gsi) && !gsi_end_p (gsi1); @@ -2190,7 +2198,6
> > > > > @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > >        tree step_expr, off;
> > > > >        tree type;
> > > > >        tree var, ni, ni_name;
> > > > > -      gimple_stmt_iterator last_gsi;
> > > > >
> > > > >        gphi *phi = gsi.phi ();
> > > > >        gphi *phi1 = gsi1.phi ();
> > > > > @@ -2222,11 +2229,52 @@ vect_update_ivs_after_vectorizer
> > > > (loop_vec_info loop_vinfo,
> > > > >        enum vect_induction_op_type induction_type
> > > > >  	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
> > > > >
> > > > > -      if (induction_type == vect_step_op_add)
> > > > > +      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge
> > (loop));
> > > > > +      /* create_iv always places it on the LHS.  Alternatively we can set a
> > > > > +	 property during create_iv to identify it.  */
> > > > > +      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > +      if (restart_loop && ivtemp)
> > > > >  	{
> > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > +	  ni = build_int_cst (type, vf);
> > > > > +	  if (inversed_iv)
> > > > > +	    ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > +			      fold_convert (type, step_expr));
> > > > > +	}
> > > > > +      else if (induction_type == vect_step_op_add)
> > > > > +	{
> > > > > +
> > > > >  	  tree stype = TREE_TYPE (step_expr);
> > > > > -	  off = fold_build2 (MULT_EXPR, stype,
> > > > > -			     fold_convert (stype, niters), step_expr);
> > > > > +
> > > > > +	  /* Early exits always use last iter value not niters. */
> > > > > +	  if (restart_loop)
> > > > > +	    {
> > > > > +	      /* Live statements in the non-main exit shouldn't be adjusted.  We
> > > > > +		 normally didn't have this problem with a single exit as live
> > > > > +		 values would be in the exit block.  However when dealing with
> > > > > +		 multiple exits all exits are redirected to the merge block
> > > > > +		 and we restart the iteration.  */
> > > >
> > > > Hmm, I fail to see how this works - we're either using the value to
> > > > continue the induction or not, independent of STMT_VINFO_LIVE_P.
> > >
> > > That becomes clear in the patch to update live reductions.
> > > Essentially any live Reductions inside an alternative exit will reduce
> > > to the first element rather than the last and use that as the seed for the
> > scalar loop.
> > 
> > Hum.  Reductions are vectorized as N separate reductions.  I don't think you
> > can simply change the reduction between the lanes to "skip"
> > part of the vector iteration.  But you can use the value of the vector from
> > before the vector iteration - the loop header PHI result, and fully reduce that
> > to get at the proper value.
> 
> That's what It's supposed to be doing though.  The reason live operations
> are skipped here is that if we don't we'll re-adjust the IV even though the value
> will already be correct after vectorization.
> 
> Remember that this code only gets so far for IV PHI nodes.
> 
> The loop phi header result itself can be live, i.e. see testcases
> vect-early-break_70.c to vect-early-break_75.c
> 
> you have i_15 = PHI <i_14 (6), 1(2)>
> 
> we use i_15 in the early exit. This should not be adjusted because when it's
> vectorized the value at 0[lane 0] is already correct.  This is why for any PHI
> inside the early exits it uses the value 0[0] instead of N[lane_max].
> 
> Perhaps I'm missing something here?

OK, so I refreshed my mind of what vect_update_ivs_after_vectorizer does.

I still do not understand the (complexity of the) patch.  Basically
the function computes the new value of the IV "from scratch" based
on the number of scalar iterations of the vector loop, the 'niter'
argument.  I would have expected that for the early exits we either
pass in a different 'niter' or alternatively a 'niter_adjustment'.

It seems your change handles different kinds of inductions differently.
Specifically

      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
      if (restart_loop && ivtemp)
        {
          type = TREE_TYPE (gimple_phi_result (phi));
          ni = build_int_cst (type, vf);
          if (inversed_iv)
            ni = fold_build2 (MINUS_EXPR, type, ni,
                              fold_convert (type, step_expr));
        }

it looks like for the exit test IV we use either 'VF' or 'VF - step'
as the new value.  That seems to be very odd special casing for
unknown reasons.  And while you adjust vec_step_op_add, you don't
adjust vect_peel_nonlinear_iv_init (maybe not supported - better
assert here).

Also the vec_step_op_add case will keep the original scalar IV
live even when it is a vectorized induction.  The code
recomputing the value from scratch avoids this.

      /* For non-main exit create an intermediat edge to get any updated 
iv
         calculations.  */
      if (needs_interm_block
          && !iv_block
          && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p 
(new_stmts)))
        {
          iv_block = split_edge (update_e);
          update_e = single_succ_edge (update_e->dest);
          last_gsi = gsi_last_bb (iv_block);
        }

this is also odd, can we adjust the API instead?  I suppose this
is because your computation uses the original loop IV, if you
based the computation off the initial value only this might not
be necessary?

That said, I wonder why we cannot simply pass in an adjusted niter
which would be niters_vector_mult_vf - vf and be done with that?

Thanks,
Richard.


> Regards,
> Tamar
> > 
> > > It has to do this since you have to perform the side effects for the
> > > non-matching elements still.
> > >
> > > Regards,
> > > Tamar
> > >
> > > >
> > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > +		continue;
> > > > > +
> > > > > +	      /* For early break the final loop IV is:
> > > > > +		 init + (final - init) * vf which takes into account peeling
> > > > > +		 values and non-single steps.  The main exit can use niters
> > > > > +		 since if you exit from the main exit you've done all vector
> > > > > +		 iterations.  For an early exit we don't know when we exit so
> > > > we
> > > > > +		 must re-calculate this on the exit.  */
> > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > +				 fold_convert (stype, start_expr),
> > > > > +				 fold_convert (stype, init_expr));
> > > > > +	      /* Now adjust for VF to get the final iteration value.  */
> > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > +				 build_int_cst (stype, vf));
> > > > > +	    }
> > > > > +	  else
> > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > +			       fold_convert (stype, niters), step_expr);
> > > > > +
> > > > >  	  if (POINTER_TYPE_P (type))
> > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > >  	  else
> > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer
> > > > > (loop_vec_info
> > > > loop_vinfo,
> > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > >        else if (induction_type == vect_step_op_neg)
> > > > >  	ni = init_expr;
> > > > > +      else if (restart_loop)
> > > > > +	continue;
> > > >
> > > > This looks all a bit complicated - why wouldn't we simply always use
> > > > the PHI result when 'restart_loop'?  Isn't that the correct old start value in
> > all cases?
> > > >
> > > > >        else
> > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > >  					  niters, step_expr,
> > > > > @@ -2245,9 +2295,20 @@ vect_update_ivs_after_vectorizer
> > > > (loop_vec_info
> > > > > loop_vinfo,
> > > > >
> > > > >        var = create_tmp_var (type, "tmp");
> > > > >
> > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > >        gimple_seq new_stmts = NULL;
> > > > >        ni_name = force_gimple_operand (ni, &new_stmts, false,
> > > > > var);
> > > > > +
> > > > > +      /* For non-main exit create an intermediat edge to get any updated iv
> > > > > +	 calculations.  */
> > > > > +      if (needs_interm_block
> > > > > +	  && !iv_block
> > > > > +	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > > (new_stmts)))
> > > > > +	{
> > > > > +	  iv_block = split_edge (update_e);
> > > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > +	}
> > > > > +
> > > > >        /* Exit_bb shouldn't be empty.  */
> > > > >        if (!gsi_end_p (last_gsi))
> > > > >  	{
> > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info loop_vinfo,
> > > > > tree
> > > > niters, tree nitersm1,
> > > > >  	 niters_vector_mult_vf steps.  */
> > > > >        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> > > > >        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
> > > > > -					update_e);
> > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +	update_e = single_succ_edge (e->dest);
> > > > > +      bool inversed_iv
> > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > > > +					 LOOP_VINFO_LOOP (loop_vinfo));
> > > >
> > > > You are computing this here and in vect_update_ivs_after_vectorizer?
> > > >
> > > > > +
> > > > > +      /* Update the main exit first.  */
> > > > > +      vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > niters_vector_mult_vf,
> > > > > +					update_e, inversed_iv);
> > > > > +
> > > > > +      /* And then update the early exits.  */
> > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > +	{
> > > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > +	    continue;
> > > > > +
> > > > > +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > +					    niters_vector_mult_vf,
> > > > > +					    exit, true);
> > > >
> > > > ... why does the same not work here?  Wouldn't the proper condition
> > > > be !dominated_by_p (CDI_DOMINATORS, exit->src, LOOP_VINFO_IV_EXIT
> > > > (loop_vinfo)->src) or similar?  That is, whether the exit is at or
> > > > after the main IV exit?  (consider having two)
> > > >
> > > > > +	}
> > > > >
> > > > >        if (skip_epilog)
> > > > >  	{
> > > > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization
  2023-11-07  9:52   ` Richard Biener
@ 2023-11-16 10:53     ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-11-16 10:53 UTC (permalink / raw)
  To: Richard Biener, Hongtao Liu; +Cc: Tamar Christina, gcc-patches, nd, jlaw

On Tue, Nov 7, 2023 at 10:53 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 6 Nov 2023, Tamar Christina wrote:
>
> > Hi All,
> >
> > This adds new test to check for all the early break functionality.
> > It includes a number of codegen and runtime tests checking the values at
> > different needles in the array.
> >
> > They also check the values on different array sizes and peeling positions,
> > datatypes, VL, ncopies and every other variant I could think of.
> >
> > Additionally it also contains reduced cases from issues found running over
> > various codebases.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Also regtested with:
> >  -march=armv8.3-a+sve
> >  -march=armv8.3-a+nosve
> >  -march=armv9-a
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >       * doc/sourcebuild.texi: Document it.
>
> Document what?
>
> > gcc/testsuite/ChangeLog:
> >
> >       * lib/target-supports.exp:
>
> ?
>
> For all runtime testcases you need to include "tree-vect.h"
> and call check_vect () in main so appropriate cpuid checks
> can be performed.
>
> In vect/ you shouldn't use { dg-do run }, that's the default
> and is overridden by some .exp magic.  If you add dg-do run
> that magic doesn't work.
>
> x86 also can do cbranch with SSE4.1, not sure how to
> auto-magically add -msse4.1 for the tests though.
> There's a sse4 target but that only checks whether you
> can use -msse4.1.  Anyway, we can do x86 testsuite adjustments
> as followup.

But it gives extra test coverage:

diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index 769df6cd766..1767fda36dc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4081,6 +4081,7 @@ proc check_effective_target_vect_early_break { } {
        || [check_effective_target_arm_neon_ok]
        || ([check_effective_target_arm_v8_1m_mve_fp_ok]
             && [check_effective_target_arm_little_endian])
+       || [check_effective_target_sse4]
        }}]
 }
 # Return 1 if the target supports hardware vectorization of complex
additions of

and running the testsuite with -msse4.1 yields

Running target unix/-msse4.1
FAIL: gcc.dg/vect/vect-early-break_2.c (internal compiler error:
verify_ssa failed)
FAIL: gcc.dg/vect/vect-early-break_2.c (test for excess errors)
FAIL: gcc.dg/vect/vect-early-break_21.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect
"Vectorizing an unaligned access" 14
FAIL: gcc.dg/vect/vect-early-break_52.c scan-tree-dump vect
"vectorized 1 loops in function"
FAIL: gcc.dg/vect/vect-early-break_67.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_69.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_7.c (internal compiler error:
verify_ssa failed)
FAIL: gcc.dg/vect/vect-early-break_7.c (test for excess errors)
FAIL: gcc.dg/vect/vect-early-break_70.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_71.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_71.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_72.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_73.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_73.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_74.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_75.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_76.c scan-tree-dump vect "LOOP VECTORIZED"
FAIL: gcc.dg/vect/vect-early-break_8.c scan-tree-dump vect "LOOP VECTORIZED"

maybe we can amend check_vect_support_and_set_flags to enable more
than just -msse2 on x86_64 by default when available.  To avoid too much churn
I'd not use -mavx2 but instead -msse4.1 which would not alter the set of vector
modes available.

The other option is to add dg-additional-options -msse4.1 to all early-break
tests for { target sse4 } (and as said, use check_vect () for run tests).

Richard.

>
>
> >       * g++.dg/vect/vect-early-break_1.cc: New test.
> >       * g++.dg/vect/vect-early-break_2.cc: New test.
> >       * g++.dg/vect/vect-early-break_3.cc: New test.
> >       * gcc.dg/vect/vect-early-break-run_1.c: New test.
> >       * gcc.dg/vect/vect-early-break-run_10.c: New test.
> >       * gcc.dg/vect/vect-early-break-run_2.c: New test.
> >       * gcc.dg/vect/vect-early-break-run_3.c: New test.
> >       * gcc.dg/vect/vect-early-break-run_4.c: New test.
> >       * gcc.dg/vect/vect-early-break-run_5.c: New test.
> >       * gcc.dg/vect/vect-early-break-run_6.c: New test.
> >       * gcc.dg/vect/vect-early-break-run_7.c: New test.
> >       * gcc.dg/vect/vect-early-break-run_8.c: New test.
> >       * gcc.dg/vect/vect-early-break-run_9.c: New test.
> >       * gcc.dg/vect/vect-early-break-template_1.c: New test.
> >       * gcc.dg/vect/vect-early-break-template_2.c: New test.
> >       * gcc.dg/vect/vect-early-break_1.c: New test.
> >       * gcc.dg/vect/vect-early-break_10.c: New test.
> >       * gcc.dg/vect/vect-early-break_11.c: New test.
> >       * gcc.dg/vect/vect-early-break_12.c: New test.
> >       * gcc.dg/vect/vect-early-break_13.c: New test.
> >       * gcc.dg/vect/vect-early-break_14.c: New test.
> >       * gcc.dg/vect/vect-early-break_15.c: New test.
> >       * gcc.dg/vect/vect-early-break_16.c: New test.
> >       * gcc.dg/vect/vect-early-break_17.c: New test.
> >       * gcc.dg/vect/vect-early-break_18.c: New test.
> >       * gcc.dg/vect/vect-early-break_19.c: New test.
> >       * gcc.dg/vect/vect-early-break_2.c: New test.
> >       * gcc.dg/vect/vect-early-break_20.c: New test.
> >       * gcc.dg/vect/vect-early-break_21.c: New test.
> >       * gcc.dg/vect/vect-early-break_22.c: New test.
> >       * gcc.dg/vect/vect-early-break_23.c: New test.
> >       * gcc.dg/vect/vect-early-break_24.c: New test.
> >       * gcc.dg/vect/vect-early-break_25.c: New test.
> >       * gcc.dg/vect/vect-early-break_26.c: New test.
> >       * gcc.dg/vect/vect-early-break_27.c: New test.
> >       * gcc.dg/vect/vect-early-break_28.c: New test.
> >       * gcc.dg/vect/vect-early-break_29.c: New test.
> >       * gcc.dg/vect/vect-early-break_3.c: New test.
> >       * gcc.dg/vect/vect-early-break_30.c: New test.
> >       * gcc.dg/vect/vect-early-break_31.c: New test.
> >       * gcc.dg/vect/vect-early-break_32.c: New test.
> >       * gcc.dg/vect/vect-early-break_33.c: New test.
> >       * gcc.dg/vect/vect-early-break_34.c: New test.
> >       * gcc.dg/vect/vect-early-break_35.c: New test.
> >       * gcc.dg/vect/vect-early-break_36.c: New test.
> >       * gcc.dg/vect/vect-early-break_37.c: New test.
> >       * gcc.dg/vect/vect-early-break_38.c: New test.
> >       * gcc.dg/vect/vect-early-break_39.c: New test.
> >       * gcc.dg/vect/vect-early-break_4.c: New test.
> >       * gcc.dg/vect/vect-early-break_40.c: New test.
> >       * gcc.dg/vect/vect-early-break_41.c: New test.
> >       * gcc.dg/vect/vect-early-break_42.c: New test.
> >       * gcc.dg/vect/vect-early-break_43.c: New test.
> >       * gcc.dg/vect/vect-early-break_44.c: New test.
> >       * gcc.dg/vect/vect-early-break_45.c: New test.
> >       * gcc.dg/vect/vect-early-break_46.c: New test.
> >       * gcc.dg/vect/vect-early-break_47.c: New test.
> >       * gcc.dg/vect/vect-early-break_48.c: New test.
> >       * gcc.dg/vect/vect-early-break_49.c: New test.
> >       * gcc.dg/vect/vect-early-break_5.c: New test.
> >       * gcc.dg/vect/vect-early-break_50.c: New test.
> >       * gcc.dg/vect/vect-early-break_51.c: New test.
> >       * gcc.dg/vect/vect-early-break_52.c: New test.
> >       * gcc.dg/vect/vect-early-break_53.c: New test.
> >       * gcc.dg/vect/vect-early-break_54.c: New test.
> >       * gcc.dg/vect/vect-early-break_55.c: New test.
> >       * gcc.dg/vect/vect-early-break_56.c: New test.
> >       * gcc.dg/vect/vect-early-break_57.c: New test.
> >       * gcc.dg/vect/vect-early-break_58.c: New test.
> >       * gcc.dg/vect/vect-early-break_59.c: New test.
> >       * gcc.dg/vect/vect-early-break_6.c: New test.
> >       * gcc.dg/vect/vect-early-break_60.c: New test.
> >       * gcc.dg/vect/vect-early-break_61.c: New test.
> >       * gcc.dg/vect/vect-early-break_62.c: New test.
> >       * gcc.dg/vect/vect-early-break_63.c: New test.
> >       * gcc.dg/vect/vect-early-break_64.c: New test.
> >       * gcc.dg/vect/vect-early-break_65.c: New test.
> >       * gcc.dg/vect/vect-early-break_66.c: New test.
> >       * gcc.dg/vect/vect-early-break_67.c: New test.
> >       * gcc.dg/vect/vect-early-break_68.c: New test.
> >       * gcc.dg/vect/vect-early-break_69.c: New test.
> >       * gcc.dg/vect/vect-early-break_7.c: New test.
> >       * gcc.dg/vect/vect-early-break_70.c: New test.
> >       * gcc.dg/vect/vect-early-break_71.c: New test.
> >       * gcc.dg/vect/vect-early-break_72.c: New test.
> >       * gcc.dg/vect/vect-early-break_73.c: New test.
> >       * gcc.dg/vect/vect-early-break_74.c: New test.
> >       * gcc.dg/vect/vect-early-break_75.c: New test.
> >       * gcc.dg/vect/vect-early-break_76.c: New test.
> >       * gcc.dg/vect/vect-early-break_8.c: New test.
> >       * gcc.dg/vect/vect-early-break_9.c: New test.
> >       * gcc.target/aarch64/opt_mismatch_1.c: New test.
> >       * gcc.target/aarch64/opt_mismatch_2.c: New test.
> >       * gcc.target/aarch64/opt_mismatch_3.c: New test.
> >       * gcc.target/aarch64/vect-early-break-cbranch_1.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> > index c20af31c64237baff70f8781b1dc47f4d1a48aa9..4c351335f2bec9c6bb6856bd38d9132da7447c13 100644
> > --- a/gcc/doc/sourcebuild.texi
> > +++ b/gcc/doc/sourcebuild.texi
> > @@ -1636,6 +1636,10 @@ Target supports hardware vectors of @code{float} when
> >  @option{-funsafe-math-optimizations} is not in effect.
> >  This implies @code{vect_float}.
> >
> > +@item vect_early_break
> > +Target supports hardware vectorization of loops with early breaks.
> > +This requires an implementation of the cbranch optab for vectors.
> > +
> >  @item vect_int
> >  Target supports hardware vectors of @code{int}.
> >
> > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..810d990e3efab0cf0363a3b76481f2cb649ad3ba
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_1.cc
> > @@ -0,0 +1,60 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-w -O2" } */
> > +
> > +void fancy_abort(char *, int, const char *) __attribute__((__noreturn__));
> > +template <unsigned N, typename> struct poly_int_pod { int coeffs[N]; };
> > +template <unsigned N, typename> class poly_int : public poly_int_pod<N, int> {
> > +public:
> > +  template <typename Ca> poly_int &operator+=(const poly_int_pod<N, Ca> &);
> > +};
> > +template <unsigned N, typename C>
> > +template <typename Ca>
> > +poly_int<N, C> &poly_int<N, C>::operator+=(const poly_int_pod<N, Ca> &a) {
> > +  for (int i = 0; i < N; i++)
> > +    this->coeffs[i] += a.coeffs[i];
> > +  return *this;
> > +}
> > +template <unsigned N, typename Ca, typename Cb>
> > +poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
> > +  poly_int<N, long> r;
> > +  return r;
> > +}
> > +struct vec_prefix {
> > +  unsigned m_num;
> > +};
> > +struct vl_ptr;
> > +struct va_heap {
> > +  typedef vl_ptr default_layout;
> > +};
> > +template <typename, typename A, typename = typename A::default_layout>
> > +struct vec;
> > +template <typename T, typename A> struct vec<T, A, int> {
> > +  T &operator[](unsigned);
> > +  vec_prefix m_vecpfx;
> > +  T m_vecdata[];
> > +};
> > +template <typename T, typename A> T &vec<T, A, int>::operator[](unsigned ix) {
> > +  m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
> > +  return m_vecdata[ix];
> > +}
> > +template <typename T> struct vec<T, va_heap> {
> > +  T &operator[](unsigned ix) { return m_vec[ix]; }
> > +  vec<T, va_heap, int> m_vec;
> > +};
> > +class auto_vec : public vec<poly_int<2, long>, va_heap> {};
> > +template <typename> class vector_builder : public auto_vec {};
> > +class int_vector_builder : public vector_builder<int> {
> > +public:
> > +  int_vector_builder(poly_int<2, long>, int, int);
> > +};
> > +bool vect_grouped_store_supported() {
> > +  int i;
> > +  poly_int<2, long> nelt;
> > +  int_vector_builder sel(nelt, 2, 3);
> > +  for (i = 0; i < 6; i++)
> > +    sel[i] += exact_div(nelt, 2);
> > +}
> > +
> > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..810d990e3efab0cf0363a3b76481f2cb649ad3ba
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_2.cc
> > @@ -0,0 +1,60 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-w -O2" } */
> > +
> > +void fancy_abort(char *, int, const char *) __attribute__((__noreturn__));
> > +template <unsigned N, typename> struct poly_int_pod { int coeffs[N]; };
> > +template <unsigned N, typename> class poly_int : public poly_int_pod<N, int> {
> > +public:
> > +  template <typename Ca> poly_int &operator+=(const poly_int_pod<N, Ca> &);
> > +};
> > +template <unsigned N, typename C>
> > +template <typename Ca>
> > +poly_int<N, C> &poly_int<N, C>::operator+=(const poly_int_pod<N, Ca> &a) {
> > +  for (int i = 0; i < N; i++)
> > +    this->coeffs[i] += a.coeffs[i];
> > +  return *this;
> > +}
> > +template <unsigned N, typename Ca, typename Cb>
> > +poly_int<N, long> exact_div(poly_int_pod<N, Ca>, Cb) {
> > +  poly_int<N, long> r;
> > +  return r;
> > +}
> > +struct vec_prefix {
> > +  unsigned m_num;
> > +};
> > +struct vl_ptr;
> > +struct va_heap {
> > +  typedef vl_ptr default_layout;
> > +};
> > +template <typename, typename A, typename = typename A::default_layout>
> > +struct vec;
> > +template <typename T, typename A> struct vec<T, A, int> {
> > +  T &operator[](unsigned);
> > +  vec_prefix m_vecpfx;
> > +  T m_vecdata[];
> > +};
> > +template <typename T, typename A> T &vec<T, A, int>::operator[](unsigned ix) {
> > +  m_vecpfx.m_num ? fancy_abort("", 9, __FUNCTION__), 0 : 0;
> > +  return m_vecdata[ix];
> > +}
> > +template <typename T> struct vec<T, va_heap> {
> > +  T &operator[](unsigned ix) { return m_vec[ix]; }
> > +  vec<T, va_heap, int> m_vec;
> > +};
> > +class auto_vec : public vec<poly_int<2, long>, va_heap> {};
> > +template <typename> class vector_builder : public auto_vec {};
> > +class int_vector_builder : public vector_builder<int> {
> > +public:
> > +  int_vector_builder(poly_int<2, long>, int, int);
> > +};
> > +bool vect_grouped_store_supported() {
> > +  int i;
> > +  poly_int<2, long> nelt;
> > +  int_vector_builder sel(nelt, 2, 3);
> > +  for (i = 0; i < 6; i++)
> > +    sel[i] += exact_div(nelt, 2);
> > +}
> > +
> > diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a12e5ca434b2ac37c03dbaa12273fd8e5aa2018c
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/vect/vect-early-break_3.cc
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-w -O2" } */
> > +
> > +int aarch64_advsimd_valid_immediate_hs_val32;
> > +bool aarch64_advsimd_valid_immediate_hs() {
> > +  for (int shift = 0; shift < 32; shift += 8)
> > +    if (aarch64_advsimd_valid_immediate_hs_val32 & shift)
> > +      return aarch64_advsimd_valid_immediate_hs_val32;
> > +  for (;;)
> > +    ;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..2495b36a72eae94cb7abc4a0d17a5c979fd78083
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast -save-temps" } */
> > +
> > +#define N 803
> > +#define P 0
> > +#include "vect-early-break-template_1.c"
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9bcd7f7e57ef9a1d4649d18569b3406050e54603
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast -save-temps" } */
> > +
> > +#define N 800
> > +#define P 799
> > +#include "vect-early-break-template_2.c"
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..63f63101a467909f328be7f3acbc5bcb721967ff
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast -save-temps" } */
> > +
> > +#define N 803
> > +#define P 802
> > +#include "vect-early-break-template_1.c"
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..626b95e9b8517081d41d794e9e0264d6301c8589
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast -save-temps" } */
> > +
> > +#define N 803
> > +#define P 5
> > +#include "vect-early-break-template_1.c"
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..7e0e6426120551152a7bd800c15d9ed6ab15bada
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast -save-temps" } */
> > +
> > +#define N 803
> > +#define P 278
> > +#include "vect-early-break-template_1.c"
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..242cf486f9c40055df0aef5fd238d1aff7a7c7da
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast -save-temps" } */
> > +
> > +#define N 800
> > +#define P 799
> > +#include "vect-early-break-template_1.c"
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9fe7136b7213a463ca6573c60476b7c8f531ddcb
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast -save-temps" } */
> > +
> > +#define N 803
> > +#define P 0
> > +#include "vect-early-break-template_2.c"
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..02f93d77dba31b938f6fd9e8c7f5e4acde4aeec9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast -save-temps" } */
> > +
> > +#define N 803
> > +#define P 802
> > +#include "vect-early-break-template_2.c"
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a614925465606b54c638221ffb95a5e8d3bee797
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast -save-temps" } */
> > +
> > +#define N 803
> > +#define P 5
> > +#include "vect-early-break-template_2.c"
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..94e2b9c301456eda8f9ad7eaa67604563f0afee7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast -save-temps" } */
> > +
> > +#define N 803
> > +#define P 278
> > +#include "vect-early-break-template_2.c"
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..af70a8e2a5a9dc9756edb5580f2de02ddcc95de9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c
> > @@ -0,0 +1,47 @@
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +
> > +#ifndef P
> > +#define P 0
> > +#endif
> > +
> > +unsigned vect_a[N] = {0};
> > +unsigned vect_b[N] = {0};
> > +
> > +__attribute__((noipa, noinline))
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +extern void abort ();
> > +
> > +int main ()
> > +{
> > +
> > +  int x = 1;
> > +  int idx = P;
> > +  vect_a[idx] = x + 1;
> > +
> > +  test4(x);
> > +
> > +  if (vect_b[idx] != (x + idx))
> > +    abort ();
> > +
> > +  if (vect_a[idx] != x + 1)
> > +    abort ();
> > +
> > +  if (idx > 0 && vect_a[idx-1] != x)
> > +    abort ();
> > +
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..d0f924d904437e71567d27cc1f1089e5607dca0d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c
> > @@ -0,0 +1,50 @@
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +
> > +#ifndef P
> > +#define P 0
> > +#endif
> > +
> > +unsigned vect_a[N] = {0};
> > +unsigned vect_b[N] = {0};
> > +
> > +__attribute__((noipa, noinline))
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     return i;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +extern void abort ();
> > +
> > +int main ()
> > +{
> > +
> > +  int x = 1;
> > +  int idx = P;
> > +  vect_a[idx] = x + 1;
> > +
> > +  unsigned res = test4(x);
> > +
> > +  if (res != idx)
> > +    abort ();
> > +
> > +  if (vect_b[idx] != (x + idx))
> > +    abort ();
> > +
> > +  if (vect_a[idx] != x + 1)
> > +    abort ();
> > +
> > +  if (idx > 0 && vect_a[idx-1] != x)
> > +    abort ();
> > +
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..51e7d6489b99c25b9b4b3d1c839f98562b6d4dd7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9e4ad1763202dfdab3ed7961ead5114fcc61a11b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x,int y, int z)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > + }
> > +
> > + ret = x + y * z;
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a613dd9909fb09278dd92a81a24ef854994a9890
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c
> > @@ -0,0 +1,31 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x, int y)
> > +{
> > + unsigned ret = 0;
> > +for (int o = 0; o < y; o++)
> > +{
> > + ret += o;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > +}
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..cc10f3238f1cb8e1307e024a3ebcb5c25a39d1b2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c
> > @@ -0,0 +1,31 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x, int y)
> > +{
> > + unsigned ret = 0;
> > +for (int o = 0; o < y; o++)
> > +{
> > + ret += o;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     return vect_a[i];
> > +   vect_a[i] = x;
> > +
> > + }
> > +}
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..6967b7395ed7c19e38a436d6edcfe7c1580c7113
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     return vect_a[i] * x;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..03cce5cf6cadecb520b46be666bf608e3bc6a511
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#define N 803
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +int test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     return i;
> > +   vect_a[i] += x * vect_b[i];
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..dec6872e1115ff66695f5a500ffa7ca01c0f8d3a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#define N 803
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +int test4(unsigned x)
> > +{
> > + int ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     return i;
> > +   vect_a[i] += x * vect_b[i];
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..30812d12a39bd94b4b8a3aade6512b162697d659
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#define N 1024
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     return vect_a[i];
> > +   vect_a[i] = x;
> > +   ret += vect_a[i] + vect_b[i];
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..510227a18435a8e47c5a754580180c6d340c0823
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#define N 1024
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     return vect_a[i];
> > +   vect_a[i] = x;
> > +   ret = vect_a[i] + vect_b[i];
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..1372f79242b250cabbab29757b62cbc28a9064a8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i+=2)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..677487f7da496a8f467d8c529575d47ff22c6a31
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x, unsigned step)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i+=step)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..7268f6ae2485d0274fd85ea53cc1e44ef4b84d5c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <complex.h>
> > +
> > +#define N 1024
> > +complex double vect_a[N];
> > +complex double vect_b[N];
> > +
> > +complex double test4(complex double x)
> > +{
> > + complex double ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] += x + i;
> > +   if (vect_a[i] == x)
> > +     return i;
> > +   vect_a[i] += x * vect_b[i];
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..ed41377d1c979bf14e0a4e80401831c09ffa463f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c
> > @@ -0,0 +1,37 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <stdbool.h>
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_b[N];
> > +struct testStruct {
> > + long e;
> > + long f;
> > + bool a : 1;
> > + bool b : 1;
> > + int c : 14;
> > + int d;
> > +};
> > +struct testStruct vect_a[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i].a > x)
> > +     return true;
> > +   vect_a[i].e = x;
> > + }
> > + return ret;
> > +}
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..6415e4951cb9ef70e56b7cfb1db3d3151368666d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c
> > @@ -0,0 +1,37 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <stdbool.h>
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_b[N];
> > +struct testStruct {
> > + long e;
> > + long f;
> > + bool a : 1;
> > + bool b : 1;
> > + int c : 14;
> > + int d;
> > +};
> > +struct testStruct vect_a[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i].a)
> > +     return true;
> > +   vect_a[i].e = x;
> > + }
> > + return ret;
> > +}
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..2ca189899fb6bd6dfdf63de7729f54e3bee06ba0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c
> > @@ -0,0 +1,45 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target vect_perm } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +
> > +#include "tree-vect.h"
> > +
> > +void __attribute__((noipa))
> > +foo (int * __restrict__ a, short * __restrict__ b, int * __restrict__ c)
> > +{
> > +  int t1 = *c;
> > +  int t2 = *c;
> > +  for (int i = 0; i < 64; i+=2)
> > +    {
> > +      b[i] = a[i] - t1;
> > +      t1 = a[i];
> > +      b[i+1] = a[i+1] - t2;
> > +      t2 = a[i+1];
> > +    }
> > +}
> > +
> > +int a[64];
> > +short b[64];
> > +
> > +int
> > +main ()
> > +{
> > +  check_vect ();
> > +  for (int i = 0; i < 64; ++i)
> > +    {
> > +      a[i] = i;
> > +      __asm__ volatile ("" ::: "memory");
> > +    }
> > +  int c = 7;
> > +  foo (a, b, &c);
> > +  for (int i = 2; i < 64; i+=2)
> > +    if (b[i] != a[i] - a[i-2]
> > +     || b[i+1] != a[i+1] - a[i-1])
> > +      abort ();
> > +  if (b[0] != -7 || b[1] != -6)
> > +    abort ();
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..f3298656d5d67fd137c4029a96a2f9c1bae344ce
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c
> > @@ -0,0 +1,61 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#define N 200
> > +#define M 4
> > +
> > +typedef signed char sc;
> > +typedef unsigned char uc;
> > +typedef signed short ss;
> > +typedef unsigned short us;
> > +typedef int si;
> > +typedef unsigned int ui;
> > +typedef signed long long sll;
> > +typedef unsigned long long ull;
> > +
> > +#define FOR_EACH_TYPE(M) \
> > +  M (sc) M (uc) \
> > +  M (ss) M (us) \
> > +  M (si) M (ui) \
> > +  M (sll) M (ull) \
> > +  M (float) M (double)
> > +
> > +#define TEST_VALUE(I) ((I) * 17 / 2)
> > +
> > +#define ADD_TEST(TYPE)                               \
> > +  void __attribute__((noinline, noclone))    \
> > +  test_##TYPE (TYPE *a, TYPE *b)             \
> > +  {                                          \
> > +    for (int i = 0; i < N; i += 2)           \
> > +      {                                              \
> > +     a[i + 0] = b[i + 0] + 2;                \
> > +     a[i + 1] = b[i + 1] + 3;                \
> > +      }                                              \
> > +  }
> > +
> > +#define DO_TEST(TYPE)                                        \
> > +  for (int j = 1; j < M; ++j)                                \
> > +    {                                                        \
> > +      TYPE a[N + M];                                 \
> > +      for (int i = 0; i < N + M; ++i)                        \
> > +     a[i] = TEST_VALUE (i);                          \
> > +      test_##TYPE (a + j, a);                                \
> > +      for (int i = 0; i < N; i += 2)                 \
> > +     if (a[i + j] != (TYPE) (a[i] + 2)               \
> > +         || a[i + j + 1] != (TYPE) (a[i + 1] + 3))   \
> > +       __builtin_abort ();                           \
> > +    }
> > +
> > +FOR_EACH_TYPE (ADD_TEST)
> > +
> > +int
> > +main (void)
> > +{
> > +  FOR_EACH_TYPE (DO_TEST)
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */
> > +/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
> > +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..7b4b2ffb9b75db6d5ca7e313d1f18d9b51f5b566
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c
> > @@ -0,0 +1,46 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_double } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +
> > +#include "tree-vect.h"
> > +
> > +extern void abort (void);
> > +void __attribute__((noinline,noclone))
> > +foo (double *b, double *d, double *f)
> > +{
> > +  int i;
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      d[2*i] = 2. * d[2*i];
> > +      d[2*i+1] = 4. * d[2*i+1];
> > +      b[i] = d[2*i] - 1.;
> > +      f[i] = d[2*i+1] + 2.;
> > +    }
> > +}
> > +int main()
> > +{
> > +  double b[1024], d[2*1024], f[1024];
> > +  int i;
> > +
> > +  check_vect ();
> > +
> > +  for (i = 0; i < 2*1024; i++)
> > +    d[i] = 1.;
> > +  foo (b, d, f);
> > +  for (i = 0; i < 1024; i+= 2)
> > +    {
> > +      if (d[2*i] != 2.)
> > +     abort ();
> > +      if (d[2*i+1] != 4.)
> > +     abort ();
> > +    }
> > +  for (i = 0; i < 1024; i++)
> > +    {
> > +      if (b[i] != 1.)
> > +     abort ();
> > +      if (f[i] != 6.)
> > +     abort ();
> > +    }
> > +  return 0;
> > +}
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..8db9b60128b9e21529ae73ea1902afb8fa327112
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* Disabling epilogues until we find a better way to deal with scans.  */
> > +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#include "vect-peel-1-src.c"
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 14 "vect" { target { { vect_element_align } && { vect_aligned_arrays } } xfail { ! vect_unaligned_possible } } } } */
> > +/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail vect_element_align_preferred } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..5905847cc0b6b393dde728a9f4ecb44c8ab42da5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c
> > @@ -0,0 +1,44 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-require-effective-target vect_perm } */
> > +
> > +#include "tree-vect.h"
> > +
> > +void __attribute__((noipa))
> > +foo (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c)
> > +{
> > +  int t1 = *c;
> > +  int t2 = *c;
> > +  for (int i = 0; i < 64; i+=2)
> > +    {
> > +      b[i] = a[i] - t1;
> > +      t1 = a[i];
> > +      b[i+1] = a[i+1] - t2;
> > +      t2 = a[i+1];
> > +    }
> > +}
> > +
> > +int a[64], b[64];
> > +
> > +int
> > +main ()
> > +{
> > +  check_vect ();
> > +  for (int i = 0; i < 64; ++i)
> > +    {
> > +      a[i] = i;
> > +      __asm__ volatile ("" ::: "memory");
> > +    }
> > +  int c = 7;
> > +  foo (a, b, &c);
> > +  for (int i = 2; i < 64; i+=2)
> > +    if (b[i] != a[i] - a[i-2]
> > +     || b[i+1] != a[i+1] - a[i-1])
> > +      abort ();
> > +  if (b[0] != -7 || b[1] != -6)
> > +    abort ();
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..d0cfbb01667fa016d72828d098aeaa252c2c9318
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +void abort ();
> > +int a[128];
> > +
> > +int main ()
> > +{
> > +  int i;
> > +  for (i = 1; i < 128; i++)
> > +    if (a[i] != i%4 + 1)
> > +      abort ();
> > +  if (a[0] != 5)
> > +    abort ();
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a5eae81f3f5f5b7d92082f1588c6453a71e205cc
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +void abort ();
> > +int a[128];
> > +int main ()
> > +{
> > +  int i;
> > +  for (i = 1; i < 128; i++)
> > +    if (a[i] != i%4 + 1)
> > +    abort ();
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..75d87e99e939fab61f751be025ca0398fa5bd078
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +int in[100];
> > +int out[100 * 2];
> > +
> > +int main (void)
> > +{
> > +  if (out[0] != in[100 - 1])
> > +  for (int i = 1; i <= 100; ++i)
> > +    if (out[i] != 2)
> > +      __builtin_abort ();
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..3c6d28bd2d6e6e794146baf89e43c3b70293b7d9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > +
> > +unsigned test4(char x, char *vect, int n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < n; i++)
> > + {
> > +   if (vect[i] > x)
> > +     return 1;
> > +
> > +   vect[i] = x;
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..e09d883db84685679e73867d83aba9900563983d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +int x[100];
> > +int choose1(int);
> > +int choose2();
> > +void consume(int);
> > +void f() {
> > +    for (int i = 0; i < 100; ++i) {
> > +        if (x[i] == 11) {
> > +            if (choose1(i))
> > +                goto A;
> > +            else
> > +                goto B;
> > +        }
> > +    }
> > +    if (choose2())
> > +        goto B;
> > +A:
> > +    for (int i = 0; i < 100; ++i)
> > +        consume(i);
> > +B:
> > +    for (int i = 0; i < 100; ++i)
> > +        consume(i * i);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..6001523162d24d140af73143435f25bcd3a217c8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 1025
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     return vect_a[i];
> > +   vect_a[i] = x;
> > +   ret += vect_a[i] + vect_b[i];
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..73abddc267a0170c2d97a7e7c680525721455f22
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 1024
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     return vect_a[i];
> > +   vect_a[i] = x;
> > +   ret = vect_a[i] + vect_b[i];
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..29b37f70939af7fa9409edd3a1e29f718c959706
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a2[N];
> > +unsigned vect_a1[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x, int z)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a1[i]*2 > x)
> > +     {
> > +       for (int y = 0; y < z; y++)
> > +      vect_a2 [y] *= vect_a1[i];
> > +       break;
> > +     }
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 2 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..2c48e3cee33fc37f45ef59c2bbaff7bc5a76b460
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +
> > +unsigned vect_a[N] __attribute__ ((aligned (4)));;
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > +
> > + for (int i = 1; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i]*2 > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..3442484a81161f9bd09e30bc268fbcf66a899902
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a2[N];
> > +unsigned vect_a1[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a1[i]*2 > x)
> > +     break;
> > +   vect_a1[i] = x;
> > +   if (vect_a2[i]*4 > x)
> > +     break;
> > +   vect_a2[i] = x*x;
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..027766c51f508eab157db365a1653f3e92dcac10
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a2[N];
> > +unsigned vect_a1[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a1[i]*2 > x)
> > +     break;
> > +   vect_a1[i] = x;
> > +   if (vect_a2[i]*4 > x)
> > +     return i;
> > +   vect_a2[i] = x*x;
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..8d363120898232bb1402b9cf7b4b83b38a10505b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 4
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i]*2 != x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..226d55d7194ca3f676ab52976fea25b7e335bbec
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i+=2)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i]*2 > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..554e6ec84318c600c87982ad6ef0f90e8b47af01
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x, unsigned n)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i+= (N % 4))
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i]*2 > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..216c56faf330449bf1969b7e51ff1e94270dc861
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > +
> > +#define N 1024
> > +unsigned vect[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   if (i > 16 && vect[i] > x)
> > +     break;
> > +
> > +   vect[i] = x;
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..f2ae372cd96e74cc06254937c2b8fa69ecdedf09
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i*=3)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i]*2 > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +/* SCEV can't currently analyze this loop bounds.  */
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..6ad9b3f17ddb953bfbf614e9331fa81f565b262f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > +#pragma GCC novector
> > +#pragma GCC unroll 4
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] += vect_a[i] + x;
> > + }
> > + return ret;
> > +}
> > +
> > +/* novector should have blocked vectorization.  */
> > +/* { dg-final { scan-tree-dump-not "vectorized \d loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..88652f01595cb49a8736a1da6563507b607aae8f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 800
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i]*2 > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 802
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i+=2)
> > + {
> > +   vect_b[i] = x + i;
> > +   vect_b[i+1] = x + i + 1;
> > +   if (vect_a[i]*2 > x)
> > +     break;
> > +   if (vect_a[i+1]*2 > x)
> > +     break;
> > +   vect_a[i] = x;
> > +   vect_a[i+1] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 802
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i+=2)
> > + {
> > +   vect_b[i] = x + i;
> > +   vect_b[i+1] = x + i + 1;
> > +   if (vect_a[i]*2 > x)
> > +     break;
> > +   if (vect_a[i+1]*2 > x)
> > +     break;
> > +   vect_a[i] = x;
> > +   vect_a[i+1] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..cf1cb903b31d5fb5527bc6216c0cb9047357da96
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i]*2 > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..356d971e3a1f69f5c190b49d1d108e6be8766b39
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_float } */
> > +
> > +#include <complex.h>
> > +
> > +#define N 1024
> > +complex double vect_a[N];
> > +complex double vect_b[N];
> > +
> > +complex double test4(complex double x)
> > +{
> > + complex double ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] += x + i;
> > +   if (vect_a[i] == x)
> > +     return i;
> > +   vect_a[i] += x * vect_b[i];
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +/* At -O2 we can't currently vectorize this because of the libcalls not being
> > +   lowered.  */
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect"  { xfail *-*-* } } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..d1cca4a33a25fbf6b631d46ce3dcd3608cffa046
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_float } */
> > +
> > +void abort ();
> > +
> > +float results1[16] = {192.00,240.00,288.00,336.00,384.00,432.00,480.00,528.00,0.00};
> > +float results2[16] = {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,54.00,120.00,198.00,288.00,390.00,504.00,630.00};
> > +float a[16] = {0};
> > +float e[16] = {0};
> > +float b[16] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
> > +int main1 ()
> > +{
> > +  int i;
> > +  for (i=0; i<16; i++)
> > +    {
> > +      if (a[i] != results1[i] || e[i] != results2[i])
> > +        abort();
> > +    }
> > +
> > +  if (a[i+3] != b[i-1])
> > +    abort ();
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..77043182860321a9e265a89ad8f29ec7946b17e8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +int main (void)
> > +{
> > +  signed char a[50], b[50], c[50];
> > +  for (int i = 0; i < 50; ++i)
> > +    if (a[i] != ((((signed int) -1 < 0 ? -126 : 4) + ((signed int) -1 < 0 ? -101 : 26) + i * 9 + 0) >> 1))
> > +      __builtin_abort ();
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..bc9e5bf899a54c5b2ef67e0193d56b243ec5f043
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +void abort();
> > +struct foostr {
> > +  _Complex short f1;
> > +  _Complex short f2;
> > +};
> > +struct foostr a[16] __attribute__ ((__aligned__(16))) = {};
> > +struct foostr c[16] __attribute__ ((__aligned__(16)));
> > +struct foostr res[16] = {};
> > +void
> > +foo (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < 16; i++)
> > +    {
> > +      if (c[i].f1 != res[i].f1)
> > + abort ();
> > +      if (c[i].f2 != res[i].f2)
> > + abort ();
> > +    }
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..4a36d6979db1fd1f97ba2a290f78ac3b84f6de24
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#define N 1024
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     return vect_a[i];
> > +   vect_a[i] = x;
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..e2ac8283091597f6f4776560c86f89d1f98b58ee
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_float } */
> > +
> > +extern void abort();
> > +float a[1024], b[1024], c[1024], d[1024];
> > +_Bool k[1024];
> > +
> > +int main ()
> > +{
> > +  int i;
> > +  for (i = 0; i < 1024; i++)
> > +    if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0))
> > +      abort ();
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..af036079457a7f5e50eae5a9ad4c952f33e62f87
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +int x_in[32];
> > +int x_out_a[32], x_out_b[32];
> > +int c[16] = {3,2,1,10,1,42,3,4,50,9,32,8,11,10,1,2};
> > +int a[16 +1] = {0,16,32,48,64,128,256,512,0,16,32,48,64,128,256,512,1024};
> > +int b[16 +1] = {17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1};
> > +
> > +void foo ()
> > +{
> > +  int j, i, x;
> > +  int curr_a, flag, next_a, curr_b, next_b;
> > +    {
> > +      for (i = 0; i < 16; i++)
> > +        {
> > +          next_b = b[i+1];
> > +          curr_b = flag ? next_b : curr_b;
> > +        }
> > +      x_out_b[j] = curr_b;
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..85cdfe0938e4093c7725e7f397accf26198f6a53
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +void abort();
> > +int main1 (short X)
> > +{
> > +  unsigned char a[128];
> > +  unsigned short b[128];
> > +  unsigned int c[128];
> > +  short myX = X;
> > +  int i;
> > +  for (i = 0; i < 128; i++)
> > +    {
> > +      if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++)
> > +        abort ();
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..f066ddcfe458ca04bb1336f832121c91d7a3e80e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +void abort ();
> > +int a[64], b[64];
> > +int main ()
> > +{
> > +  int c = 7;
> > +  for (int i = 1; i < 64; ++i)
> > +    if (b[i] != a[i] - a[i-1])
> > +      abort ();
> > +  if (b[0] != -7)
> > +    abort ();
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9d0dd8dc5fccb05aeabcbce4014c4994bafdfb05
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + unsigned tmp[N];
> > + for (int i = 0; i < N; i++)
> > + {
> > +   tmp[i] = x + i;
> > +   vect_b[i] = tmp[i];
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..073cbdf614f81525975dbd188632582218e60e9e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   volatile unsigned tmp = x + i;
> > +   vect_b[i] = tmp;
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9086e885f56974d17f8cdf2dce4c6a44e580d74b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c
> > @@ -0,0 +1,101 @@
> > +/* Disabling epilogues until we find a better way to deal with scans.  */
> > +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-add-options bind_pic_locally } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +
> > +#include <stdarg.h>
> > +#include "tree-vect.h"
> > +
> > +#define N 32
> > +
> > +unsigned short sa[N];
> > +unsigned short sc[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> > +             16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> > +unsigned short sb[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> > +             16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> > +unsigned int ia[N];
> > +unsigned int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> > +            0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> > +unsigned int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> > +            0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> > +
> > +/* Current peeling-for-alignment scheme will consider the 'sa[i+7]'
> > +   access for peeling, and therefore will examine the option of
> > +   using a peeling factor = VF-7%VF. This will result in a peeling factor 1,
> > +   which will also align the access to 'ia[i+3]', and the loop could be
> > +   vectorized on all targets that support unaligned loads.
> > +   Without cost model on targets that support misaligned stores, no peeling
> > +   will be applied since we want to keep the four loads aligned.  */
> > +
> > +__attribute__ ((noinline))
> > +int main1 ()
> > +{
> > +  int i;
> > +  int n = N - 7;
> > +
> > +  /* Multiple types with different sizes, used in independent
> > +     copmutations. Vectorizable.  */
> > +  for (i = 0; i < n; i++)
> > +    {
> > +      sa[i+7] = sb[i] + sc[i];
> > +      ia[i+3] = ib[i] + ic[i];
> > +    }
> > +
> > +  /* check results:  */
> > +  for (i = 0; i < n; i++)
> > +    {
> > +      if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> > +     abort ();
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +/* Current peeling-for-alignment scheme will consider the 'ia[i+3]'
> > +   access for peeling, and therefore will examine the option of
> > +   using a peeling factor = VF-3%VF. This will result in a peeling factor
> > +   1 if VF=4,2. This will not align the access to 'sa[i+3]', for which we
> > +   need to peel 5,1 iterations for VF=4,2 respectively, so the loop can not
> > +   be vectorized.  However, 'ia[i+3]' also gets aligned if we peel 5
> > +   iterations, so the loop is vectorizable on all targets that support
> > +   unaligned loads.
> > +   Without cost model on targets that support misaligned stores, no peeling
> > +   will be applied since we want to keep the four loads aligned.  */
> > +
> > +__attribute__ ((noinline))
> > +int main2 ()
> > +{
> > +  int i;
> > +  int n = N-3;
> > +
> > +  /* Multiple types with different sizes, used in independent
> > +     copmutations. Vectorizable.  */
> > +  for (i = 0; i < n; i++)
> > +    {
> > +      ia[i+3] = ib[i] + ic[i];
> > +      sa[i+3] = sb[i] + sc[i];
> > +    }
> > +
> > +  /* check results:  */
> > +  for (i = 0; i < n; i++)
> > +    {
> > +      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> > +        abort ();
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +int main (void)
> > +{
> > +  check_vect ();
> > +
> > +  main1 ();
> > +  main2 ();
> > +
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 2 "vect" { xfail { vect_early_break && { ! vect_hw_misalign } } } } } */
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..be4a0c7426093059ce37a9f824defb7ae270094d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +void abort ();
> > +
> > +unsigned short sa[32];
> > +unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> > +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> > +unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> > +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> > +unsigned int ia[32];
> > +unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> > +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> > +unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> > +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> > +
> > +int main2 (int n)
> > +{
> > +  int i;
> > +  for (i = 0; i < n; i++)
> > +    {
> > +      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> > +        abort ();
> > +    }
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..84ea627b4927609079297f11674bdb4c6b301140
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_float } */
> > +
> > +extern void abort();
> > +float a[1024], b[1024], c[1024], d[1024];
> > +_Bool k[1024];
> > +
> > +int main ()
> > +{
> > +  int i;
> > +  for (i = 0; i < 1024; i++)
> > +    if (k[i] != ((i % 3) == 0))
> > +      abort ();
> > +}
> > +
> > +/* Pattern didn't match inside gcond.  */
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..193f14e8a4d90793f65a5902eabb8d06496bd6e1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_float } */
> > +
> > +extern void abort();
> > +float a[1024], b[1024], c[1024], d[1024];
> > +_Bool k[1024];
> > +
> > +int main ()
> > +{
> > +  int i;
> > +  for (i = 0; i < 1024; i++)
> > +    if (k[i] != (i == 0))
> > +      abort ();
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..63ff6662f5c2c93201897e43680daa580ed53867
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#define N 1024
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < (N/2); i+=2)
> > + {
> > +   vect_b[i] = x + i;
> > +   vect_b[i+1] = x + i+1;
> > +   if (vect_a[i] > x || vect_a[i+1] > x)
> > +     break;
> > +   vect_a[i] += x * vect_b[i];
> > +   vect_a[i+1] += x * vect_b[i+1];
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..4c523d4e714ba67e84b213c2aaf3a56231f8b7e3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_float } */
> > +
> > +extern void abort();
> > +float a[1024], b[1024], c[1024], d[1024];
> > +_Bool k[1024];
> > +
> > +int main ()
> > +{
> > +  char i;
> > +  for (i = 0; i < 1024; i++)
> > +    if (k[i] != (i == 0))
> > +      abort ();
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a0c34f71e3bbd3516247a8e026fe513c25413252
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_float } */
> > +
> > +typedef float real_t;
> > +__attribute__((aligned(64))) real_t a[32000], b[32000], c[32000];
> > +real_t s482()
> > +{
> > +    for (int nl = 0; nl < 10000; nl++) {
> > +        for (int i = 0; i < 32000; i++) {
> > +            a[i] += b[i] * c[i];
> > +            if (c[i] > b[i]) break;
> > +        }
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9b94772934f75e685d71a41f3a0336fbfb7320d5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +int a, b;
> > +int e() {
> > +  int d, c;
> > +  d = 0;
> > +  for (; d < b; d++)
> > +    a = 0;
> > +  d = 0;
> > +  for (; d < b; d++)
> > +    if (d)
> > +      c++;
> > +  for (;;)
> > +    if (c)
> > +      break;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..11f7fb8547b351734a964175380d1ada696011ae
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c
> > @@ -0,0 +1,28 @@
> > +/* Disabling epilogues until we find a better way to deal with scans.  */
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> > +/* { dg-require-effective-target vect_long } */
> > +/* { dg-require-effective-target vect_shift } */
> > +/* { dg-additional-options "-fno-tree-scev-cprop" } */
> > +
> > +/* Statement used outside the loop.
> > +   NOTE: SCEV disabled to ensure the live operation is not removed before
> > +   vectorization.  */
> > +__attribute__ ((noinline)) int
> > +liveloop (int start, int n, int *x, int *y)
> > +{
> > +  int i = start;
> > +  int j;
> > +  int ret;
> > +
> > +  for (j = 0; j < n; ++j)
> > +    {
> > +      i += 1;
> > +      x[j] = i;
> > +      ret = y[j];
> > +    }
> > +  return ret;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> > +/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..32b9c087feba1780223e3aee8a2636c99990408c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-fdump-tree-vect-all" } */
> > +
> > +int d(unsigned);
> > +
> > +void a() {
> > +  char b[8];
> > +  unsigned c = 0;
> > +  while (c < 7 && b[c])
> > +    ++c;
> > +  if (d(c))
> > +    return;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_partial_vectors } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..577c4e96ba91d4dd4aa448233c632de508286eb9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-options "-Ofast -fno-vect-cost-model -fdump-tree-vect-details" } */
> > +
> > +enum a { b };
> > +
> > +struct {
> > +  enum a c;
> > +} d[10], *e;
> > +
> > +void f() {
> > +  int g;
> > +  for (g = 0, e = d; g < sizeof(1); g++, e++)
> > +    if (e->c)
> > +      return;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..b56a4f755f89225cedd8c156cc7385fe5e07eee5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +int a[0];
> > +int b;
> > +
> > +void g();
> > +
> > +void f() {
> > +  int d, e;
> > +  for (; e; e++) {
> > +    int c;
> > +    switch (b)
> > +    case '9': {
> > +      for (; d < 1; d++)
> > +        if (a[d])
> > +          c = 1;
> > +      break;
> > +    case '<':
> > +      g();
> > +      c = 0;
> > +    }
> > +      while (c)
> > +        ;
> > +  }
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..80f23d1e2431133035895946a5d6b24bef3ca294
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_67.c
> > @@ -0,0 +1,41 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target int32plus } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +
> > +
> > +int main()
> > +{
> > +  int var6 = -1267827473;
> > +  do {
> > +      ++var6;
> > +      double s1_115[4], s2_108[4];
> > +      int var8 = -161498264;
> > +      do {
> > +       ++var8;
> > +       int var12 = 1260960076;
> > +       for (; var12 <= 1260960080; ++var12) {
> > +           int var13 = 1960990937;
> > +           do {
> > +               ++var13;
> > +               int var14 = 2128638723;
> > +               for (; var14 <= 2128638728; ++var14) {
> > +                   int var22 = -1141190839;
> > +                   do {
> > +                       ++var22;
> > +                       if (s2_108 > s1_115) {
> > +                           int var23 = -890798748;
> > +                           do {
> > +                               long long e_119[4];
> > +                           } while (var23 <= -890798746);
> > +                       }
> > +                   } while (var22 <= -1141190829);
> > +               }
> > +           } while (var13 <= 1960990946);
> > +       }
> > +      } while (var8 <= -161498254);
> > +  } while (var6 <= -1267827462);
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..c9a8298a8b51e05079041ae7a05086a47b1be5dd
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_68.c
> > @@ -0,0 +1,41 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 800
> > +#endif
> > +unsigned vect_a1[N];
> > +unsigned vect_b1[N];
> > +unsigned vect_c1[N];
> > +unsigned vect_d1[N];
> > +
> > +unsigned vect_a2[N];
> > +unsigned vect_b2[N];
> > +unsigned vect_c2[N];
> > +unsigned vect_d2[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b1[i] += x + i;
> > +   vect_c1[i] += x + i;
> > +   vect_d1[i] += x + i;
> > +   if (vect_a1[i]*2 != x)
> > +     break;
> > +   vect_a1[i] = x;
> > +
> > +   vect_b2[i] += x + i;
> > +   vect_c2[i] += x + i;
> > +   vect_d2[i] += x + i;
> > +   if (vect_a2[i]*2 != x)
> > +     break;
> > +   vect_a2[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..f99de8e1f0650a3b590ed8bd9052e18173fc97d0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_69.c
> > @@ -0,0 +1,76 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +#include <limits.h>
> > +#include <assert.h>
> > +
> > +#  define BITSIZEOF_INT 32
> > +#  define BITSIZEOF_LONG 64
> > +#  define BITSIZEOF_LONG_LONG 64
> > +
> > +#define MAKE_FUNS(suffix, type)                                              \
> > +int my_ffs##suffix(type x) {                                         \
> > +    int i;                                                           \
> > +    if (x == 0)                                                              \
> > +      return 0;                                                      \
> > +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (x & ((type) 1  << i))                                       \
> > +         break;                                                      \
> > +    return i + 1;                                                    \
> > +}                                                                    \
> > +                                                                     \
> > +int my_clz##suffix(type x) {                                         \
> > +    int i;                                                           \
> > +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))     \
> > +         break;                                                      \
> > +    return i;                                                                \
> > +}
> > +
> > +
> > +MAKE_FUNS (, unsigned);
> > +
> > +extern void abort (void);
> > +extern void exit (int);
> > +
> > +#define NUMS32                                       \
> > +  {                                             \
> > +    0x00000000UL,                               \
> > +    0x00000001UL,                               \
> > +    0x80000000UL,                               \
> > +    0x00000002UL,                               \
> > +    0x40000000UL,                               \
> > +    0x00010000UL,                               \
> > +    0x00008000UL,                               \
> > +    0xa5a5a5a5UL,                               \
> > +    0x5a5a5a5aUL,                               \
> > +    0xcafe0000UL,                               \
> > +    0x00cafe00UL,                               \
> > +    0x0000cafeUL,                               \
> > +    0xffffffffUL                                \
> > +  }
> > +
> > +
> > +unsigned int ints[] = NUMS32;
> > +
> > +#define N(table) (sizeof (table) / sizeof (table[0]))
> > +
> > +int
> > +main (void)
> > +{
> > +  int i;
> > +
> > +  for (i = 0; i < N(ints); i++)
> > +    {
> > +      if (__builtin_ffs (ints[i]) != my_ffs (ints[i]))
> > +     abort ();
> > +      if (ints[i] != 0
> > +       && __builtin_clz (ints[i]) != my_clz (ints[i]))
> > +     abort ();
> > +    }
> > +
> > +  exit (0);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..10fd8b42952c42f3d3a014da103931ca394423d5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <complex.h>
> > +
> > +#define N 1024
> > +complex double vect_a[N];
> > +complex double vect_b[N];
> > +
> > +complex double test4(complex double x)
> > +{
> > + complex double ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] += x + i;
> > +   if (vect_a[i] == x)
> > +     break;
> > +   vect_a[i] += x * vect_b[i];
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9073130197e124527f8e38c238d8f13452a7780e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_70.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <limits.h>
> > +#include <assert.h>
> > +
> > +#  define BITSIZEOF_INT 32
> > +#  define BITSIZEOF_LONG 64
> > +#  define BITSIZEOF_LONG_LONG 64
> > +
> > +#define MAKE_FUNS(suffix, type)                                              \
> > +__attribute__((noinline)) \
> > +int my_clz##suffix(type x) {                                         \
> > +    int i;                                                           \
> > +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))     \
> > +         break;                                                      \
> > +    return i;                                                                \
> > +}
> > +
> > +
> > +MAKE_FUNS (, unsigned);
> > +
> > +extern void abort (void);
> > +extern void exit (int);
> > +
> > +#define NUMS32                                       \
> > +  {                                             \
> > +    0x00000000UL,                               \
> > +    0x00000001UL,                               \
> > +    0x80000000UL,                               \
> > +    0x00000002UL,                               \
> > +    0x40000000UL,                               \
> > +    0x00010000UL,                               \
> > +    0x00008000UL,                               \
> > +    0xa5a5a5a5UL,                               \
> > +    0x5a5a5a5aUL,                               \
> > +    0xcafe0000UL,                               \
> > +    0x00cafe00UL,                               \
> > +    0x0000cafeUL,                               \
> > +    0xffffffffUL                                \
> > +  }
> > +
> > +
> > +unsigned int ints[] = NUMS32;
> > +
> > +#define N(table) (sizeof (table) / sizeof (table[0]))
> > +
> > +int
> > +main (void)
> > +{
> > +  int i;
> > +
> > +#pragma GCC novector
> > +  for (i = 0; i < N(ints); i++)
> > +    {
> > +      if (ints[i] != 0
> > +       && __builtin_clz (ints[i]) != my_clz (ints[i]))
> > +       abort ();
> > +    }
> > +
> > +  exit (0);
> > +  return 0;
> > +}
> > +
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..c6d6eb526e618ee93547e04eaba3c6a159a18075
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_71.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <limits.h>
> > +#include <assert.h>
> > +
> > +#  define BITSIZEOF_INT 32
> > +#  define BITSIZEOF_LONG 64
> > +#  define BITSIZEOF_LONG_LONG 64
> > +
> > +#define MAKE_FUNS(suffix, type)                                              \
> > +__attribute__((noinline)) \
> > +int my_ffs##suffix(type x) {                                         \
> > +    int i;                                                           \
> > +    if (x == 0)                                                              \
> > +      return 0;                                                      \
> > +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (x & ((type) 1  << i))                                       \
> > +         break;                                                      \
> > +    return i + 1;                                                    \
> > +}
> > +
> > +MAKE_FUNS (, unsigned);
> > +
> > +extern void abort (void);
> > +extern void exit (int);
> > +
> > +#define NUMS32                                       \
> > +  {                                             \
> > +    0x00000000UL,                               \
> > +    0x00000001UL,                               \
> > +    0x80000000UL,                               \
> > +    0x00000002UL,                               \
> > +    0x40000000UL,                               \
> > +    0x00010000UL,                               \
> > +    0x00008000UL,                               \
> > +    0xa5a5a5a5UL,                               \
> > +    0x5a5a5a5aUL,                               \
> > +    0xcafe0000UL,                               \
> > +    0x00cafe00UL,                               \
> > +    0x0000cafeUL,                               \
> > +    0xffffffffUL                                \
> > +  }
> > +
> > +
> > +unsigned int ints[] = NUMS32;
> > +
> > +#define N(table) (sizeof (table) / sizeof (table[0]))
> > +
> > +int
> > +main (void)
> > +{
> > +  int i;
> > +
> > +#pragma GCC novector
> > +  for (i = 0; i < N(ints); i++)
> > +    {
> > +      if (__builtin_ffs (ints[i]) != my_ffs (ints[i]))
> > +     abort ();
> > +    }
> > +
> > +  exit (0);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..0f0a1f30ab95bf540027efa8c03aff8fe03a960b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_72.c
> > @@ -0,0 +1,147 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <limits.h>
> > +#include <assert.h>
> > +
> > +#if __INT_MAX__ > 2147483647L
> > +# if __INT_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_INT 64
> > +# else
> > +#  define BITSIZEOF_INT 32
> > +# endif
> > +#else
> > +# if __INT_MAX__ >= 2147483647L
> > +#  define BITSIZEOF_INT 32
> > +# else
> > +#  define BITSIZEOF_INT 16
> > +# endif
> > +#endif
> > +
> > +#if __LONG_MAX__ > 2147483647L
> > +# if __LONG_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_LONG 64
> > +# else
> > +#  define BITSIZEOF_LONG 32
> > +# endif
> > +#else
> > +# define BITSIZEOF_LONG 32
> > +#endif
> > +
> > +#if __LONG_LONG_MAX__ > 2147483647L
> > +# if __LONG_LONG_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_LONG_LONG 64
> > +# else
> > +#  define BITSIZEOF_LONG_LONG 32
> > +# endif
> > +#else
> > +# define BITSIZEOF_LONG_LONG 32
> > +#endif
> > +
> > +#define MAKE_FUNS(suffix, type)                                              \
> > +__attribute__((noinline)) \
> > +int my_ctz##suffix(type x) {                                         \
> > +    int i;                                                           \
> > +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (x & ((type) 1  << i))                                       \
> > +         break;                                                      \
> > +    return i;                                                                \
> > +}
> > +
> > +MAKE_FUNS (, unsigned);
> > +
> > +extern void abort (void);
> > +extern void exit (int);
> > +
> > +#define NUMS16                                       \
> > +  {                                          \
> > +    0x0000U,                                 \
> > +    0x0001U,                                 \
> > +    0x8000U,                                 \
> > +    0x0002U,                                 \
> > +    0x4000U,                                 \
> > +    0x0100U,                                 \
> > +    0x0080U,                                 \
> > +    0xa5a5U,                                 \
> > +    0x5a5aU,                                 \
> > +    0xcafeU,                                 \
> > +    0xffffU                                  \
> > +  }
> > +
> > +#define NUMS32                                       \
> > +  {                                          \
> > +    0x00000000UL,                            \
> > +    0x00000001UL,                            \
> > +    0x80000000UL,                            \
> > +    0x00000002UL,                            \
> > +    0x40000000UL,                            \
> > +    0x00010000UL,                            \
> > +    0x00008000UL,                            \
> > +    0xa5a5a5a5UL,                            \
> > +    0x5a5a5a5aUL,                            \
> > +    0xcafe0000UL,                            \
> > +    0x00cafe00UL,                            \
> > +    0x0000cafeUL,                            \
> > +    0xffffffffUL                             \
> > +  }
> > +
> > +#define NUMS64                                       \
> > +  {                                          \
> > +    0x0000000000000000ULL,                   \
> > +    0x0000000000000001ULL,                   \
> > +    0x8000000000000000ULL,                   \
> > +    0x0000000000000002ULL,                   \
> > +    0x4000000000000000ULL,                   \
> > +    0x0000000100000000ULL,                   \
> > +    0x0000000080000000ULL,                   \
> > +    0xa5a5a5a5a5a5a5a5ULL,                   \
> > +    0x5a5a5a5a5a5a5a5aULL,                   \
> > +    0xcafecafe00000000ULL,                   \
> > +    0x0000cafecafe0000ULL,                   \
> > +    0x00000000cafecafeULL,                   \
> > +    0xffffffffffffffffULL                    \
> > +  }
> > +
> > +unsigned int ints[] =
> > +#if BITSIZEOF_INT == 64
> > +NUMS64;
> > +#elif BITSIZEOF_INT == 32
> > +NUMS32;
> > +#else
> > +NUMS16;
> > +#endif
> > +
> > +unsigned long longs[] =
> > +#if BITSIZEOF_LONG == 64
> > +NUMS64;
> > +#else
> > +NUMS32;
> > +#endif
> > +
> > +unsigned long long longlongs[] =
> > +#if BITSIZEOF_LONG_LONG == 64
> > +NUMS64;
> > +#else
> > +NUMS32;
> > +#endif
> > +
> > +#define N(table) (sizeof (table) / sizeof (table[0]))
> > +
> > +int
> > +main (void)
> > +{
> > +  int i;
> > +
> > +#pragma GCC novector
> > +  for (i = 0; i < N(ints); i++)
> > +    {
> > +      if (ints[i] != 0
> > +       && __builtin_ctz (ints[i]) != my_ctz (ints[i]))
> > +       abort ();
> > +    }
> > +
> > +  exit (0);
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..5cce21cd16aa89d96cdac2b302d29ee918b67249
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_73.c
> > @@ -0,0 +1,68 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <limits.h>
> > +#include <assert.h>
> > +
> > +#  define BITSIZEOF_INT 32
> > +#  define BITSIZEOF_LONG 64
> > +#  define BITSIZEOF_LONG_LONG 64
> > +
> > +#define MAKE_FUNS(suffix, type)                                              \
> > +__attribute__((noinline)) \
> > +int my_clz##suffix(type x) {                                         \
> > +    int i;                                                           \
> > +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))     \
> > +         break;                                                      \
> > +    return i;                                                                \
> > +}
> > +
> > +
> > +MAKE_FUNS (, unsigned);
> > +
> > +extern void abort (void);
> > +extern void exit (int);
> > +
> > +#define NUMS32                                       \
> > +  {                                             \
> > +    0x00000000UL,                               \
> > +    0x00000001UL,                               \
> > +    0x80000000UL,                               \
> > +    0x00000002UL,                               \
> > +    0x40000000UL,                               \
> > +    0x00010000UL,                               \
> > +    0x00008000UL,                               \
> > +    0xa5a5a5a5UL,                               \
> > +    0x5a5a5a5aUL,                               \
> > +    0xcafe0000UL,                               \
> > +    0x00cafe00UL,                               \
> > +    0x0000cafeUL,                               \
> > +    0xffffffffUL                                \
> > +  }
> > +
> > +
> > +unsigned int ints[] = NUMS32;
> > +
> > +#define N(table) (sizeof (table) / sizeof (table[0]))
> > +
> > +int
> > +main (void)
> > +{
> > +  int i;
> > +
> > +#pragma GCC novector
> > +  for (i = 0; i < N(ints); i++)
> > +    {
> > +      if (ints[i] != 0
> > +       && __builtin_clz (ints[i]) != my_clz (ints[i]))
> > +       abort ();
> > +    }
> > +
> > +  exit (0);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..83676da28884e79874fb0b5cc6a434a0fe6b87cf
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_74.c
> > @@ -0,0 +1,161 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <limits.h>
> > +#include <assert.h>
> > +
> > +#if __INT_MAX__ > 2147483647L
> > +# if __INT_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_INT 64
> > +# else
> > +#  define BITSIZEOF_INT 32
> > +# endif
> > +#else
> > +# if __INT_MAX__ >= 2147483647L
> > +#  define BITSIZEOF_INT 32
> > +# else
> > +#  define BITSIZEOF_INT 16
> > +# endif
> > +#endif
> > +
> > +#if __LONG_MAX__ > 2147483647L
> > +# if __LONG_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_LONG 64
> > +# else
> > +#  define BITSIZEOF_LONG 32
> > +# endif
> > +#else
> > +# define BITSIZEOF_LONG 32
> > +#endif
> > +
> > +#if __LONG_LONG_MAX__ > 2147483647L
> > +# if __LONG_LONG_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_LONG_LONG 64
> > +# else
> > +#  define BITSIZEOF_LONG_LONG 32
> > +# endif
> > +#else
> > +# define BITSIZEOF_LONG_LONG 32
> > +#endif
> > +
> > +#define MAKE_FUNS(suffix, type)                                              \
> > +int my_clrsb##suffix(type x) {                                               \
> > +    int i;                                                           \
> > +    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;           \
> > +    for (i = 1; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)           \
> > +         != leading)                                                 \
> > +         break;                                                      \
> > +    return i - 1;                                                    \
> > +}
> > +
> > +MAKE_FUNS (, unsigned);
> > +
> > +extern void abort (void);
> > +extern void exit (int);
> > +
> > +#define NUMS16                                       \
> > +  {                                          \
> > +    0x0000U,                                 \
> > +    0x0001U,                                 \
> > +    0x8000U,                                 \
> > +    0x0002U,                                 \
> > +    0x4000U,                                 \
> > +    0x0100U,                                 \
> > +    0x0080U,                                 \
> > +    0xa5a5U,                                 \
> > +    0x5a5aU,                                 \
> > +    0xcafeU,                                 \
> > +    0xffffU                                  \
> > +  }
> > +
> > +#define NUMS32                                       \
> > +  {                                          \
> > +    0x00000000UL,                            \
> > +    0x00000001UL,                            \
> > +    0x80000000UL,                            \
> > +    0x00000002UL,                            \
> > +    0x40000000UL,                            \
> > +    0x00010000UL,                            \
> > +    0x00008000UL,                            \
> > +    0xa5a5a5a5UL,                            \
> > +    0x5a5a5a5aUL,                            \
> > +    0xcafe0000UL,                            \
> > +    0x00cafe00UL,                            \
> > +    0x0000cafeUL,                            \
> > +    0xffffffffUL                             \
> > +  }
> > +
> > +#define NUMS64                                       \
> > +  {                                          \
> > +    0x0000000000000000ULL,                   \
> > +    0x0000000000000001ULL,                   \
> > +    0x8000000000000000ULL,                   \
> > +    0x0000000000000002ULL,                   \
> > +    0x4000000000000000ULL,                   \
> > +    0x0000000100000000ULL,                   \
> > +    0x0000000080000000ULL,                   \
> > +    0xa5a5a5a5a5a5a5a5ULL,                   \
> > +    0x5a5a5a5a5a5a5a5aULL,                   \
> > +    0xcafecafe00000000ULL,                   \
> > +    0x0000cafecafe0000ULL,                   \
> > +    0x00000000cafecafeULL,                   \
> > +    0xffffffffffffffffULL                    \
> > +  }
> > +
> > +unsigned int ints[] =
> > +#if BITSIZEOF_INT == 64
> > +NUMS64;
> > +#elif BITSIZEOF_INT == 32
> > +NUMS32;
> > +#else
> > +NUMS16;
> > +#endif
> > +
> > +unsigned long longs[] =
> > +#if BITSIZEOF_LONG == 64
> > +NUMS64;
> > +#else
> > +NUMS32;
> > +#endif
> > +
> > +unsigned long long longlongs[] =
> > +#if BITSIZEOF_LONG_LONG == 64
> > +NUMS64;
> > +#else
> > +NUMS32;
> > +#endif
> > +
> > +#define N(table) (sizeof (table) / sizeof (table[0]))
> > +
> > +int
> > +main (void)
> > +{
> > +  int i;
> > +
> > +  /* Test constant folding.  */
> > +
> > +#define TEST(x, suffix)                                                      \
> > +  if (__builtin_clrsb##suffix (x) != my_clrsb##suffix (x))           \
> > +    abort ();
> > +
> > +#if BITSIZEOF_INT == 32
> > +  TEST(0x00000000UL,);
> > +  TEST(0x00000001UL,);
> > +  TEST(0x80000000UL,);
> > +  TEST(0x40000000UL,);
> > +  TEST(0x00010000UL,);
> > +  TEST(0x00008000UL,);
> > +  TEST(0xa5a5a5a5UL,);
> > +  TEST(0x5a5a5a5aUL,);
> > +  TEST(0xcafe0000UL,);
> > +  TEST(0x00cafe00UL,);
> > +  TEST(0x0000cafeUL,);
> > +  TEST(0xffffffffUL,);
> > +#endif
> > +
> > +  exit (0);
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..cc1ce4cf298ee0747f41ea4941af5a65f8a688ef
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_75.c
> > @@ -0,0 +1,230 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-O3" } */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <limits.h>
> > +#include <assert.h>
> > +
> > +#if __INT_MAX__ > 2147483647L
> > +# if __INT_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_INT 64
> > +# else
> > +#  define BITSIZEOF_INT 32
> > +# endif
> > +#else
> > +# if __INT_MAX__ >= 2147483647L
> > +#  define BITSIZEOF_INT 32
> > +# else
> > +#  define BITSIZEOF_INT 16
> > +# endif
> > +#endif
> > +
> > +#if __LONG_MAX__ > 2147483647L
> > +# if __LONG_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_LONG 64
> > +# else
> > +#  define BITSIZEOF_LONG 32
> > +# endif
> > +#else
> > +# define BITSIZEOF_LONG 32
> > +#endif
> > +
> > +#if __LONG_LONG_MAX__ > 2147483647L
> > +# if __LONG_LONG_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_LONG_LONG 64
> > +# else
> > +#  define BITSIZEOF_LONG_LONG 32
> > +# endif
> > +#else
> > +# define BITSIZEOF_LONG_LONG 32
> > +#endif
> > +
> > +#define MAKE_FUNS(suffix, type)                                              \
> > +__attribute__((noinline)) \
> > +int my_ffs##suffix(type x) {                                         \
> > +    int i;                                                           \
> > +    if (x == 0)                                                              \
> > +      return 0;                                                      \
> > +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (x & ((type) 1  << i))                                       \
> > +         break;                                                      \
> > +    return i + 1;                                                    \
> > +}                                                                    \
> > +                                                                     \
> > +int my_ctz##suffix(type x) {                                         \
> > +    int i;                                                           \
> > +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (x & ((type) 1  << i))                                       \
> > +         break;                                                      \
> > +    return i;                                                                \
> > +}                                                                    \
> > +                                                                     \
> > +__attribute__((noinline)) \
> > +int my_clz##suffix(type x) {                                         \
> > +    int i;                                                           \
> > +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (x & ((type) 1 << ((CHAR_BIT * sizeof (type)) - i - 1)))     \
> > +         break;                                                      \
> > +    return i;                                                                \
> > +}                                                                    \
> > +                                                                     \
> > +int my_clrsb##suffix(type x) {                                               \
> > +    int i;                                                           \
> > +    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;           \
> > +    for (i = 1; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)           \
> > +         != leading)                                                 \
> > +         break;                                                      \
> > +    return i - 1;                                                    \
> > +}                                                                    \
> > +                                                                     \
> > +__attribute__((noinline)) \
> > +int my_popcount##suffix(type x) {                                    \
> > +    int i;                                                           \
> > +    int count = 0;                                                   \
> > +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (x & ((type) 1 << i))                                        \
> > +         count++;                                                    \
> > +    return count;                                                    \
> > +}                                                                    \
> > +                                                                     \
> > +__attribute__((noinline)) \
> > +int my_parity##suffix(type x) {                                              \
> > +    int i;                                                           \
> > +    int count = 0;                                                   \
> > +    for (i = 0; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (x & ((type) 1 << i))                                        \
> > +         count++;                                                    \
> > +    return count & 1;                                                        \
> > +}
> > +
> > +MAKE_FUNS (ll, unsigned long long);
> > +
> > +extern void abort (void);
> > +extern void exit (int);
> > +
> > +#define NUMS16                                       \
> > +  {                                          \
> > +    0x0000U,                                 \
> > +    0x0001U,                                 \
> > +    0x8000U,                                 \
> > +    0x0002U,                                 \
> > +    0x4000U,                                 \
> > +    0x0100U,                                 \
> > +    0x0080U,                                 \
> > +    0xa5a5U,                                 \
> > +    0x5a5aU,                                 \
> > +    0xcafeU,                                 \
> > +    0xffffU                                  \
> > +  }
> > +
> > +#define NUMS32                                       \
> > +  {                                          \
> > +    0x00000000UL,                            \
> > +    0x00000001UL,                            \
> > +    0x80000000UL,                            \
> > +    0x00000002UL,                            \
> > +    0x40000000UL,                            \
> > +    0x00010000UL,                            \
> > +    0x00008000UL,                            \
> > +    0xa5a5a5a5UL,                            \
> > +    0x5a5a5a5aUL,                            \
> > +    0xcafe0000UL,                            \
> > +    0x00cafe00UL,                            \
> > +    0x0000cafeUL,                            \
> > +    0xffffffffUL                             \
> > +  }
> > +
> > +#define NUMS64                                       \
> > +  {                                          \
> > +    0x0000000000000000ULL,                   \
> > +    0x0000000000000001ULL,                   \
> > +    0x8000000000000000ULL,                   \
> > +    0x0000000000000002ULL,                   \
> > +    0x4000000000000000ULL,                   \
> > +    0x0000000100000000ULL,                   \
> > +    0x0000000080000000ULL,                   \
> > +    0xa5a5a5a5a5a5a5a5ULL,                   \
> > +    0x5a5a5a5a5a5a5a5aULL,                   \
> > +    0xcafecafe00000000ULL,                   \
> > +    0x0000cafecafe0000ULL,                   \
> > +    0x00000000cafecafeULL,                   \
> > +    0xffffffffffffffffULL                    \
> > +  }
> > +
> > +unsigned int ints[] =
> > +#if BITSIZEOF_INT == 64
> > +NUMS64;
> > +#elif BITSIZEOF_INT == 32
> > +NUMS32;
> > +#else
> > +NUMS16;
> > +#endif
> > +
> > +unsigned long longs[] =
> > +#if BITSIZEOF_LONG == 64
> > +NUMS64;
> > +#else
> > +NUMS32;
> > +#endif
> > +
> > +unsigned long long longlongs[] =
> > +#if BITSIZEOF_LONG_LONG == 64
> > +NUMS64;
> > +#else
> > +NUMS32;
> > +#endif
> > +
> > +#define N(table) (sizeof (table) / sizeof (table[0]))
> > +
> > +int
> > +main (void)
> > +{
> > +  int i;
> > +
> > +#pragma GCC novector
> > +  for (i = 0; i < N(longlongs); i++)
> > +    {
> > +      if (__builtin_ffsll (longlongs[i]) != my_ffsll (longlongs[i]))
> > +     abort ();
> > +      if (longlongs[i] != 0
> > +       && __builtin_clzll (longlongs[i]) != my_clzll (longlongs[i]))
> > +     abort ();
> > +      if (longlongs[i] != 0
> > +       && __builtin_ctzll (longlongs[i]) != my_ctzll (longlongs[i]))
> > +     abort ();
> > +      if (__builtin_clrsbll (longlongs[i]) != my_clrsbll (longlongs[i]))
> > +     abort ();
> > +      if (__builtin_popcountll (longlongs[i]) != my_popcountll (longlongs[i]))
> > +     abort ();
> > +      if (__builtin_parityll (longlongs[i]) != my_parityll (longlongs[i]))
> > +     abort ();
> > +    }
> > +
> > +  /* Test constant folding.  */
> > +
> > +#define TEST(x, suffix)                                                      \
> > +  if (__builtin_ffs##suffix (x) != my_ffs##suffix (x))                       \
> > +    abort ();                                                                \
> > +
> > +#if BITSIZEOF_LONG_LONG == 64
> > +  TEST(0x0000000000000000ULL, ll);
> > +  TEST(0x0000000000000001ULL, ll);
> > +  TEST(0x8000000000000000ULL, ll);
> > +  TEST(0x0000000000000002ULL, ll);
> > +  TEST(0x4000000000000000ULL, ll);
> > +  TEST(0x0000000100000000ULL, ll);
> > +  TEST(0x0000000080000000ULL, ll);
> > +  TEST(0xa5a5a5a5a5a5a5a5ULL, ll);
> > +  TEST(0x5a5a5a5a5a5a5a5aULL, ll);
> > +  TEST(0xcafecafe00000000ULL, ll);
> > +  TEST(0x0000cafecafe0000ULL, ll);
> > +  TEST(0x00000000cafecafeULL, ll);
> > +  TEST(0xffffffffffffffffULL, ll);
> > +#endif
> > +
> > +  exit (0);
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..adba337b101f4d7cafaa50329a933594b0d501ad
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_76.c
> > @@ -0,0 +1,165 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-O3" } */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <limits.h>
> > +#include <assert.h>
> > +
> > +#if __INT_MAX__ > 2147483647L
> > +# if __INT_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_INT 64
> > +# else
> > +#  define BITSIZEOF_INT 32
> > +# endif
> > +#else
> > +# if __INT_MAX__ >= 2147483647L
> > +#  define BITSIZEOF_INT 32
> > +# else
> > +#  define BITSIZEOF_INT 16
> > +# endif
> > +#endif
> > +
> > +#if __LONG_MAX__ > 2147483647L
> > +# if __LONG_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_LONG 64
> > +# else
> > +#  define BITSIZEOF_LONG 32
> > +# endif
> > +#else
> > +# define BITSIZEOF_LONG 32
> > +#endif
> > +
> > +#if __LONG_LONG_MAX__ > 2147483647L
> > +# if __LONG_LONG_MAX__ >= 9223372036854775807L
> > +#  define BITSIZEOF_LONG_LONG 64
> > +# else
> > +#  define BITSIZEOF_LONG_LONG 32
> > +# endif
> > +#else
> > +# define BITSIZEOF_LONG_LONG 32
> > +#endif
> > +
> > +#define MAKE_FUNS(suffix, type)                                              \
> > +int my_clrsb##suffix(type x) {                                               \
> > +    int i;                                                           \
> > +    int leading = (x >> CHAR_BIT * sizeof (type) - 1) & 1;           \
> > +    for (i = 1; i < CHAR_BIT * sizeof (type); i++)                   \
> > +     if (((x >> ((CHAR_BIT * sizeof (type)) - i - 1)) & 1)           \
> > +         != leading)                                                 \
> > +         break;                                                      \
> > +    return i - 1;                                                    \
> > +}                                                                    \
> > +                                                                     \
> > +
> > +MAKE_FUNS (, unsigned);
> > +MAKE_FUNS (ll, unsigned long long);
> > +
> > +extern void abort (void);
> > +extern void exit (int);
> > +
> > +#define NUMS16                                       \
> > +  {                                          \
> > +    0x0000U,                                 \
> > +    0x0001U,                                 \
> > +    0x8000U,                                 \
> > +    0x0002U,                                 \
> > +    0x4000U,                                 \
> > +    0x0100U,                                 \
> > +    0x0080U,                                 \
> > +    0xa5a5U,                                 \
> > +    0x5a5aU,                                 \
> > +    0xcafeU,                                 \
> > +    0xffffU                                  \
> > +  }
> > +
> > +#define NUMS32                                       \
> > +  {                                          \
> > +    0x00000000UL,                            \
> > +    0x00000001UL,                            \
> > +    0x80000000UL,                            \
> > +    0x00000002UL,                            \
> > +    0x40000000UL,                            \
> > +    0x00010000UL,                            \
> > +    0x00008000UL,                            \
> > +    0xa5a5a5a5UL,                            \
> > +    0x5a5a5a5aUL,                            \
> > +    0xcafe0000UL,                            \
> > +    0x00cafe00UL,                            \
> > +    0x0000cafeUL,                            \
> > +    0xffffffffUL                             \
> > +  }
> > +
> > +#define NUMS64                                       \
> > +  {                                          \
> > +    0x0000000000000000ULL,                   \
> > +    0x0000000000000001ULL,                   \
> > +    0x8000000000000000ULL,                   \
> > +    0x0000000000000002ULL,                   \
> > +    0x4000000000000000ULL,                   \
> > +    0x0000000100000000ULL,                   \
> > +    0x0000000080000000ULL,                   \
> > +    0xa5a5a5a5a5a5a5a5ULL,                   \
> > +    0x5a5a5a5a5a5a5a5aULL,                   \
> > +    0xcafecafe00000000ULL,                   \
> > +    0x0000cafecafe0000ULL,                   \
> > +    0x00000000cafecafeULL,                   \
> > +    0xffffffffffffffffULL                    \
> > +  }
> > +
> > +unsigned int ints[] =
> > +#if BITSIZEOF_INT == 64
> > +NUMS64;
> > +#elif BITSIZEOF_INT == 32
> > +NUMS32;
> > +#else
> > +NUMS16;
> > +#endif
> > +
> > +unsigned long longs[] =
> > +#if BITSIZEOF_LONG == 64
> > +NUMS64;
> > +#else
> > +NUMS32;
> > +#endif
> > +
> > +unsigned long long longlongs[] =
> > +#if BITSIZEOF_LONG_LONG == 64
> > +NUMS64;
> > +#else
> > +NUMS32;
> > +#endif
> > +
> > +#define N(table) (sizeof (table) / sizeof (table[0]))
> > +
> > +int
> > +main (void)
> > +{
> > +  int i;
> > +
> > +#pragma GCC novector
> > +  for (i = 0; i < N(ints); i++)
> > +    {
> > +      if (__builtin_clrsb (ints[i]) != my_clrsb (ints[i]))
> > +     abort ();
> > +    }
> > +
> > +  /* Test constant folding.  */
> > +
> > +#define TEST(x, suffix)                                                      \
> > +  if (__builtin_clrsb##suffix (x) != my_clrsb##suffix (x))           \
> > +    abort ();
> > +
> > +#if BITSIZEOF_LONG_LONG == 64
> > +  TEST(0xffffffffffffffffULL, ll);
> > +  TEST(0xffffffffffffffffULL, ll);
> > +  TEST(0xffffffffffffffffULL, ll);
> > +  TEST(0xffffffffffffffffULL, ll);
> > +  TEST(0xffffffffffffffffULL, ll);
> > +  TEST(0xffffffffffffffffULL, ll);
> > +#endif
> > +
> > +  exit (0);
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..ae706b2952cfcecf20546a67a735b8d902cbb607
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#include <complex.h>
> > +
> > +#define N 1024
> > +char vect_a[N];
> > +char vect_b[N];
> > +
> > +char test4(char x, char * restrict res)
> > +{
> > + char ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_b[i] += x + i;
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] += x * vect_b[i];
> > +   res[i] *= vect_b[i];
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..4e8b5bdea5ff9aa0cadbea0af10d51707da011c5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast" } */
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 803
> > +#endif
> > +unsigned vect_a[N];
> > +unsigned vect_b[N];
> > +
> > +unsigned test4(unsigned x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   vect_a[i] = x + i;
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..571aec0ccfdbcdc318ba1f17de31958c16b3e9bc
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_1.c
> > @@ -0,0 +1,6 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-march=armv8.3-a -mcpu=neoverse-n1" } */
> > +
> > +#include <arm_neon.h>
> > +
> > +/* { dg-warning "switch ?-mcpu=neoverse-n1? conflicts with ?-march=armv8.3-a? switch and would result in options \\+fp16\\+dotprod\\+profile\\+nopauth" "" { target *-*-* } 0 } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..cee42c84c4f762a4d4773ea4380163742b5137b0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_2.c
> > @@ -0,0 +1,6 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-march=armv8-a+sve -mcpu=neoverse-n1" } */
> > +
> > +#include <arm_neon.h>
> > +
> > +/* { dg-warning "switch ?-mcpu=neoverse-n1? conflicts with ?-march=armv8-a+sve? switch and would result in options \\+lse\\+rcpc\\+rdma\\+dotprod\\+profile\\+nosve" } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..0a05b98eedb8bd743bb5af8e4dd3c95aab001c4b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/opt_mismatch_3.c
> > @@ -0,0 +1,5 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-march=armv8-a -mcpu=neovese-n1 -Wpedentic -Werror" } */
> > +
> > +#include <arm_neon.h>
> > +
> > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_1.c
> > @@ -0,0 +1,124 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3" } */
> > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> > +
> > +#pragma GCC target "+nosve"
> > +
> > +#define N 640
> > +int a[N] = {0};
> > +int b[N] = {0};
> > +
> > +
> > +/*
> > +** f1:
> > +**   ...
> > +**   cmgt    v[0-9]+.4s, v[0-9]+.4s, #0
> > +**   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> > +**   fmov    x[0-9]+, d[0-9]+
> > +**   cbnz    x[0-9]+, \.L[0-9]+
> > +**   ...
> > +*/
> > +void f1 ()
> > +{
> > +  for (int i = 0; i < N; i++)
> > +    {
> > +      b[i] += a[i];
> > +      if (a[i] > 0)
> > +     break;
> > +    }
> > +}
> > +
> > +/*
> > +** f2:
> > +**   ...
> > +**   cmge    v[0-9]+.4s, v[0-9]+.4s, #0
> > +**   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> > +**   fmov    x[0-9]+, d[0-9]+
> > +**   cbnz    x[0-9]+, \.L[0-9]+
> > +**   ...
> > +*/
> > +void f2 ()
> > +{
> > +  for (int i = 0; i < N; i++)
> > +    {
> > +      b[i] += a[i];
> > +      if (a[i] >= 0)
> > +     break;
> > +    }
> > +}
> > +
> > +/*
> > +** f3:
> > +**   ...
> > +**   cmeq    v[0-9]+.4s, v[0-9]+.4s, #0
> > +**   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> > +**   fmov    x[0-9]+, d[0-9]+
> > +**   cbnz    x[0-9]+, \.L[0-9]+
> > +**   ...
> > +*/
> > +void f3 ()
> > +{
> > +  for (int i = 0; i < N; i++)
> > +    {
> > +      b[i] += a[i];
> > +      if (a[i] == 0)
> > +     break;
> > +    }
> > +}
> > +
> > +/*
> > +** f4:
> > +**   ...
> > +**   cmtst   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> > +**   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> > +**   fmov    x[0-9]+, d[0-9]+
> > +**   cbnz    x[0-9]+, \.L[0-9]+
> > +**   ...
> > +*/
> > +void f4 ()
> > +{
> > +  for (int i = 0; i < N; i++)
> > +    {
> > +      b[i] += a[i];
> > +      if (a[i] != 0)
> > +     break;
> > +    }
> > +}
> > +
> > +/*
> > +** f5:
> > +**   ...
> > +**   cmlt    v[0-9]+.4s, v[0-9]+.4s, #0
> > +**   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> > +**   fmov    x[0-9]+, d[0-9]+
> > +**   cbnz    x[0-9]+, \.L[0-9]+
> > +**   ...
> > +*/
> > +void f5 ()
> > +{
> > +  for (int i = 0; i < N; i++)
> > +    {
> > +      b[i] += a[i];
> > +      if (a[i] < 0)
> > +     break;
> > +    }
> > +}
> > +
> > +/*
> > +** f6:
> > +**   ...
> > +**   cmle    v[0-9]+.4s, v[0-9]+.4s, #0
> > +**   umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> > +**   fmov    x[0-9]+, d[0-9]+
> > +**   cbnz    x[0-9]+, \.L[0-9]+
> > +**   ...
> > +*/
> > +void f6 ()
> > +{
> > +  for (int i = 0; i < N; i++)
> > +    {
> > +      b[i] += a[i];
> > +      if (a[i] <= 0)
> > +     break;
> > +    }
> > +}
> > diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> > index f0b692a2e19bae3cf3ffee8f27bd39b05aba3b9c..1e47ae84080f9908736d1c3be9c14d589e8772a7 100644
> > --- a/gcc/testsuite/lib/target-supports.exp
> > +++ b/gcc/testsuite/lib/target-supports.exp
> > @@ -3975,6 +3975,17 @@ proc check_effective_target_vect_int { } {
> >       }}]
> >  }
> >
> > +# Return 1 if the target supports hardware vectorization of early breaks,
> > +# 0 otherwise.
> > +#
> > +# This won't change for different subtargets so cache the result.
> > +
> > +proc check_effective_target_vect_early_break { } {
> > +    return [check_cached_effective_target_indexed vect_early_break {
> > +      expr {
> > +     [istarget aarch64*-*-*]
> > +     }}]
> > +}
> >  # Return 1 if the target supports hardware vectorization of complex additions of
> >  # byte, 0 otherwise.
> >  #
> >
> >
> >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-16 10:40             ` Richard Biener
@ 2023-11-16 11:08               ` Tamar Christina
  2023-11-16 11:27                 ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-16 11:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, November 16, 2023 10:40 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> On Wed, 15 Nov 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Wednesday, November 15, 2023 1:23 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> jlaw@ventanamicro.com
> > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > support early breaks and arbitrary exits
> > >
> > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Wednesday, November 15, 2023 1:01 PM
> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > > jlaw@ventanamicro.com
> > > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > > > support early breaks and arbitrary exits
> > > > >
> > > > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > > > >
> > > > > > Patch updated to latest trunk:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > This changes the PHI node updates to support early breaks.
> > > > > > It has to support both the case where the loop's exit matches
> > > > > > the normal loop exit and one where the early exit is "inverted", i.e.
> > > > > > it's an early
> > > > > exit edge.
> > > > > >
> > > > > > In the latter case we must always restart the loop for VF iterations.
> > > > > > For an early exit the reason is obvious, but there are cases
> > > > > > where the "normal" exit is located before the early one.  This
> > > > > > exit then does a check on ivtmp resulting in us leaving the
> > > > > > loop since it thinks we're
> > > done.
> > > > > >
> > > > > > In these case we may still have side-effects to perform so we
> > > > > > also go to the scalar loop.
> > > > > >
> > > > > > For the "normal" exit niters has already been adjusted for
> > > > > > peeling, for the early exits we must find out how many
> > > > > > iterations we actually did.  So we have to recalculate the new position
> for each exit.
> > > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal):
> > > > > > Hide
> > > > > unused.
> > > > > > 	(vect_update_ivs_after_vectorizer): Support early break.
> > > > > > 	(vect_do_peeling): Use it.
> > > > > >
> > > > > > --- inline copy of patch ---
> > > > > >
> > > > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > > > b/gcc/tree-vect-loop-manip.cc index
> > > > > >
> > > > >
> > >
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > > > > d2654cf1
> > > > > > c842baac58f5 100644
> > > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > > @@ -1200,7 +1200,7 @@
> > > > > > vect_set_loop_condition_partial_vectors_avx512
> > > > > (class loop *loop,
> > > > > >     loop handles exactly VF scalars per iteration.  */
> > > > > >
> > > > > >  static gcond *
> > > > > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
> > > > > > edge exit_edge,
> > > > > > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo
> > > > > > +*/, edge exit_edge,
> > > > > >  				class loop *loop, tree niters, tree step,
> > > > > >  				tree final_iv, bool niters_maybe_zero,
> > > > > >  				gimple_stmt_iterator loop_cond_gsi)
> @@ -
> > > > > 1412,7 +1412,7 @@
> > > > > > vect_set_loop_condition (class loop *loop, edge loop_e,
> > > > > > loop_vec_info
> > > > > loop_vinfo
> > > > > >     When this happens we need to flip the understanding of
> > > > > > main and
> > > other
> > > > > >     exits by peeling and IV updates.  */
> > > > > >
> > > > > > -bool inline
> > > > > > +bool
> > > > > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > > > > >    return single_pred (loop->latch) == loop_exit->src; @@
> > > > > > -2142,6
> > > > > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> > > > > >       Input:
> > > > > >       - LOOP - a loop that is going to be vectorized. The last few
> iterations
> > > > > >                of LOOP were peeled.
> > > > > > +     - VF   - The chosen vectorization factor for LOOP.
> > > > > >       - NITERS - the number of iterations that LOOP executes (before it is
> > > > > >                  vectorized). i.e, the number of times the ivs should be
> bumped.
> > > > > >       - UPDATE_E - a successor edge of LOOP->exit that is on
> > > > > > the
> > > > > > (only) path
> > > > >
> > > > > the comment on this is now a bit misleading, can you try to
> > > > > update it and/or move the comment bits to the docs on EARLY_EXIT?
> > > > >
> > > > > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> > > > > loop_vinfo)
> > > > > >                    The phi args associated with the edge UPDATE_E in the bb
> > > > > >                    UPDATE_E->dest are updated accordingly.
> > > > > >
> > > > > > +     - restart_loop - Indicates whether the scalar loop needs
> > > > > > + to restart the
> > > > >
> > > > > params are ALL_CAPS
> > > > >
> > > > > > +		      iteration count where the vector loop began.
> > > > > > +
> > > > > >       Assumption 1: Like the rest of the vectorizer, this function assumes
> > > > > >       a single loop exit that has a single predecessor.
> > > > > >
> > > > > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> > > > > loop_vinfo)
> > > > > >   */
> > > > > >
> > > > > >  static void
> > > > > > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > > > -				  tree niters, edge update_e)
> > > > > > +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > > > +poly_uint64 vf,
> > > > >
> > > > > LOOP_VINFO_VECT_FACTOR?
> > > > >
> > > > > > +				  tree niters, edge update_e, bool
> > > > > restart_loop)
> > > > >
> > > > > I think 'bool early_exit' is better here?  I wonder if we have an "early"
> > > > > exit after the main exit we are probably sure there are no
> > > > > side-effects to re- execute and could avoid this restarting?
> > > >
> > > > Side effects yes, but the actual check may not have been performed yet.
> > > > If you remember
> > > >
> https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
> > > > There in the clz loop through the "main" exit you still have to
> > > > see if that iteration did not contain the entry.  This is because
> > > > the loop counter is incremented before you iterate.
> > > >
> > > > >
> > > > > >  {
> > > > > >    gphi_iterator gsi, gsi1;
> > > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > >    basic_block update_bb = update_e->dest;
> > > > > > -
> > > > > > -  basic_block exit_bb = LOOP_VINFO_IV_EXIT
> > > > > > (loop_vinfo)->dest;
> > > > > > -
> > > > > > -  /* Make sure there exists a single-predecessor exit bb:  */
> > > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > > -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> > > > > > +  bool inversed_iv
> > > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT
> (loop_vinfo),
> > > > > > +					 LOOP_VINFO_LOOP
> (loop_vinfo));
> > > > > > +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS
> (loop_vinfo)
> > > > > > +			    && flow_bb_inside_loop_p (loop,
> update_e->src);
> > > > > > +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > > +  gcond *cond = get_loop_exit_condition (loop_e);
> > > > > > +  basic_block exit_bb = loop_e->dest;
> > > > > > +  basic_block iv_block = NULL;
> > > > > > +  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
> > > > > >
> > > > > >    for (gsi = gsi_start_phis (loop->header), gsi1 =
> > > > > > gsi_start_phis
> > > (update_bb);
> > > > > >         !gsi_end_p (gsi) && !gsi_end_p (gsi1); @@ -2190,7
> > > > > > +2198,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info
> loop_vinfo,
> > > > > >        tree step_expr, off;
> > > > > >        tree type;
> > > > > >        tree var, ni, ni_name;
> > > > > > -      gimple_stmt_iterator last_gsi;
> > > > > >
> > > > > >        gphi *phi = gsi.phi ();
> > > > > >        gphi *phi1 = gsi1.phi (); @@ -2222,11 +2229,52 @@
> > > > > > vect_update_ivs_after_vectorizer
> > > > > (loop_vec_info loop_vinfo,
> > > > > >        enum vect_induction_op_type induction_type
> > > > > >  	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
> > > > > >
> > > > > > -      if (induction_type == vect_step_op_add)
> > > > > > +      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi,
> > > > > > + loop_latch_edge
> > > (loop));
> > > > > > +      /* create_iv always places it on the LHS.  Alternatively we can set a
> > > > > > +	 property during create_iv to identify it.  */
> > > > > > +      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > +      if (restart_loop && ivtemp)
> > > > > >  	{
> > > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > +	  ni = build_int_cst (type, vf);
> > > > > > +	  if (inversed_iv)
> > > > > > +	    ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > +			      fold_convert (type, step_expr));
> > > > > > +	}
> > > > > > +      else if (induction_type == vect_step_op_add)
> > > > > > +	{
> > > > > > +
> > > > > >  	  tree stype = TREE_TYPE (step_expr);
> > > > > > -	  off = fold_build2 (MULT_EXPR, stype,
> > > > > > -			     fold_convert (stype, niters), step_expr);
> > > > > > +
> > > > > > +	  /* Early exits always use last iter value not niters. */
> > > > > > +	  if (restart_loop)
> > > > > > +	    {
> > > > > > +	      /* Live statements in the non-main exit shouldn't be
> adjusted.  We
> > > > > > +		 normally didn't have this problem with a single exit as
> live
> > > > > > +		 values would be in the exit block.  However when
> dealing with
> > > > > > +		 multiple exits all exits are redirected to the merge
> block
> > > > > > +		 and we restart the iteration.  */
> > > > >
> > > > > Hmm, I fail to see how this works - we're either using the value
> > > > > to continue the induction or not, independent of STMT_VINFO_LIVE_P.
> > > >
> > > > That becomes clear in the patch to update live reductions.
> > > > Essentially any live Reductions inside an alternative exit will
> > > > reduce to the first element rather than the last and use that as
> > > > the seed for the
> > > scalar loop.
> > >
> > > Hum.  Reductions are vectorized as N separate reductions.  I don't
> > > think you can simply change the reduction between the lanes to "skip"
> > > part of the vector iteration.  But you can use the value of the
> > > vector from before the vector iteration - the loop header PHI
> > > result, and fully reduce that to get at the proper value.
> >
> > That's what It's supposed to be doing though.  The reason live
> > operations are skipped here is that if we don't we'll re-adjust the IV
> > even though the value will already be correct after vectorization.
> >
> > Remember that this code only gets so far for IV PHI nodes.
> >
> > The loop phi header result itself can be live, i.e. see testcases
> > vect-early-break_70.c to vect-early-break_75.c
> >
> > you have i_15 = PHI <i_14 (6), 1(2)>
> >
> > we use i_15 in the early exit. This should not be adjusted because
> > when it's vectorized the value at 0[lane 0] is already correct.  This
> > is why for any PHI inside the early exits it uses the value 0[0] instead of
> N[lane_max].
> >
> > Perhaps I'm missing something here?
> 
> OK, so I refreshed my mind of what vect_update_ivs_after_vectorizer does.
> 
> I still do not understand the (complexity of the) patch.  Basically the function
> computes the new value of the IV "from scratch" based on the number of
> scalar iterations of the vector loop, the 'niter'
> argument.  I would have expected that for the early exits we either pass in a
> different 'niter' or alternatively a 'niter_adjustment'.

But for an early exit there's no static value for adjusted niter, since you don't know
which iteration you exited from. Unlike the normal exit when you know if you get
there you've done all possible iterations.

So you must compute the scalar iteration count on the exit itself.

> 
> It seems your change handles different kinds of inductions differently.
> Specifically
> 
>       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
>       if (restart_loop && ivtemp)
>         {
>           type = TREE_TYPE (gimple_phi_result (phi));
>           ni = build_int_cst (type, vf);
>           if (inversed_iv)
>             ni = fold_build2 (MINUS_EXPR, type, ni,
>                               fold_convert (type, step_expr));
>         }
> 
> it looks like for the exit test IV we use either 'VF' or 'VF - step'
> as the new value.  That seems to be very odd special casing for unknown
> reasons.  And while you adjust vec_step_op_add, you don't adjust
> vect_peel_nonlinear_iv_init (maybe not supported - better assert here).

The VF case is for a normal "non-inverted" loop, where if you take an early exit
you know that you have to do at most VF iterations.  The VF - step is to account
for the inverted loop control flow where you exit after adjusting the IV already
by + step.

Peeling doesn't matter here, since you know you were able to do a vector iteration
so it's safe to do VF iterations.  So having peeled doesn't affect the remaining
iters count.

> 
> Also the vec_step_op_add case will keep the original scalar IV live even when it
> is a vectorized induction.  The code recomputing the value from scratch avoids
> this.
> 
>       /* For non-main exit create an intermediat edge to get any updated iv
>          calculations.  */
>       if (needs_interm_block
>           && !iv_block
>           && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> (new_stmts)))
>         {
>           iv_block = split_edge (update_e);
>           update_e = single_succ_edge (update_e->dest);
>           last_gsi = gsi_last_bb (iv_block);
>         }
> 
> this is also odd, can we adjust the API instead?  I suppose this is because your
> computation uses the original loop IV, if you based the computation off the
> initial value only this might not be necessary?

No, on the main exit the code updates the value in the loop header and puts the
Calculation in the merge block.  This works because it only needs to consume PHI
nodes in the merge block and things like niters are adjusted in the guard block.

For an early exit, we don't have a guard block, only the merge block. We have to
update the PHI nodes in that block,  but can't do so since you can't produce a value
and consume it in a PHI node in the same BB.  So we need to create the block to put
the values in for use in the merge block.  Because there's no "guard" block for early
exits.

The API can be adjusted by always creating the empty block either during peeling.
That would prevent us from having to do anything special here.  Would that work
better?  Or I can do it in the loop that iterates over the exits to before the call
to vect_update_ivs_after_vectorizer, which I think might be more consistent.

> 
> That said, I wonder why we cannot simply pass in an adjusted niter which
> would be niters_vector_mult_vf - vf and be done with that?
> 

We can ofcourse not have this and recompute it from niters itself, however this does
affect the epilog code layout. Particularly knowing the static number if iterations left
causes it to usually unroll the loop and share some of the computations.  i.e. the scalar
code is often more efficient.

The computation would be niters_vector_mult_vf - iters_done * vf, since the value put
Here is the remaining iteration count.  It's static for early exits.

But can do whatever you prefer here.  Let me know what you prefer for the above.

Thanks,
Tamar

> Thanks,
> Richard.
> 
> 
> > Regards,
> > Tamar
> > >
> > > > It has to do this since you have to perform the side effects for
> > > > the non-matching elements still.
> > > >
> > > > Regards,
> > > > Tamar
> > > >
> > > > >
> > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > +		continue;
> > > > > > +
> > > > > > +	      /* For early break the final loop IV is:
> > > > > > +		 init + (final - init) * vf which takes into account peeling
> > > > > > +		 values and non-single steps.  The main exit can use
> niters
> > > > > > +		 since if you exit from the main exit you've done all
> vector
> > > > > > +		 iterations.  For an early exit we don't know when we
> exit
> > > > > > +so
> > > > > we
> > > > > > +		 must re-calculate this on the exit.  */
> > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > +				 fold_convert (stype, start_expr),
> > > > > > +				 fold_convert (stype, init_expr));
> > > > > > +	      /* Now adjust for VF to get the final iteration value.  */
> > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > +				 build_int_cst (stype, vf));
> > > > > > +	    }
> > > > > > +	  else
> > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > +			       fold_convert (stype, niters), step_expr);
> > > > > > +
> > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > >  	  else
> > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer
> > > > > > (loop_vec_info
> > > > > loop_vinfo,
> > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > >  	ni = init_expr;
> > > > > > +      else if (restart_loop)
> > > > > > +	continue;
> > > > >
> > > > > This looks all a bit complicated - why wouldn't we simply always
> > > > > use the PHI result when 'restart_loop'?  Isn't that the correct
> > > > > old start value in
> > > all cases?
> > > > >
> > > > > >        else
> > > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > >  					  niters, step_expr,
> > > > > > @@ -2245,9 +2295,20 @@ vect_update_ivs_after_vectorizer
> > > > > (loop_vec_info
> > > > > > loop_vinfo,
> > > > > >
> > > > > >        var = create_tmp_var (type, "tmp");
> > > > > >
> > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > >        gimple_seq new_stmts = NULL;
> > > > > >        ni_name = force_gimple_operand (ni, &new_stmts, false,
> > > > > > var);
> > > > > > +
> > > > > > +      /* For non-main exit create an intermediat edge to get any
> updated iv
> > > > > > +	 calculations.  */
> > > > > > +      if (needs_interm_block
> > > > > > +	  && !iv_block
> > > > > > +	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > > > (new_stmts)))
> > > > > > +	{
> > > > > > +	  iv_block = split_edge (update_e);
> > > > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > +	}
> > > > > > +
> > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > >        if (!gsi_end_p (last_gsi))
> > > > > >  	{
> > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info
> > > > > > loop_vinfo, tree
> > > > > niters, tree nitersm1,
> > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > >        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> > > > > >        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> niters_vector_mult_vf,
> > > > > > -					update_e);
> > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > +      bool inversed_iv
> > > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT
> (loop_vinfo),
> > > > > > +					 LOOP_VINFO_LOOP
> (loop_vinfo));
> > > > >
> > > > > You are computing this here and in vect_update_ivs_after_vectorizer?
> > > > >
> > > > > > +
> > > > > > +      /* Update the main exit first.  */
> > > > > > +      vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > niters_vector_mult_vf,
> > > > > > +					update_e, inversed_iv);
> > > > > > +
> > > > > > +      /* And then update the early exits.  */
> > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > +	{
> > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > +	    continue;
> > > > > > +
> > > > > > +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > > +					    niters_vector_mult_vf,
> > > > > > +					    exit, true);
> > > > >
> > > > > ... why does the same not work here?  Wouldn't the proper
> > > > > condition be !dominated_by_p (CDI_DOMINATORS, exit->src,
> > > > > LOOP_VINFO_IV_EXIT
> > > > > (loop_vinfo)->src) or similar?  That is, whether the exit is at
> > > > > or after the main IV exit?  (consider having two)
> > > > >
> > > > > > +	}
> > > > > >
> > > > > >        if (skip_epilog)
> > > > > >  	{
> > > > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-15 14:26       ` Tamar Christina
@ 2023-11-16 11:16         ` Richard Biener
  2023-11-20 21:57           ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-16 11:16 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 15 Nov 2023, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, November 15, 2023 1:42 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction
> > with support for multiple exits and different exits
> > 
> > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > 
> > > Patch updated to trunk.
> > >
> > > This adds support to vectorizable_live_reduction to handle multiple
> > > exits by
> > 
> > vectorizable_live_operation, but I do wonder how you handle reductions?
> 
> In the testcases I have reductions all seem to work fine, since reductions are
> Placed in the merge block between the two loops and always have the
> "value so far from full loop iterations".

Is that so?  A simple testcase shows

  <bb 3> [local count: 1063004408]:
  # sum_9 = PHI <sum_6(5), 0.0(2)>
  # i_11 = PHI <i_7(5), 0(2)>
  # ivtmp_8 = PHI <ivtmp_4(5), 1024(2)>
  # vect_sum_9.4_3 = PHI <vect_sum_6.8_14(5), { 0.0, 0.0 }(2)>
  # vectp_x.5_2 = PHI <vectp_x.5_12(5), &x(2)>
  # ivtmp_17 = PHI <ivtmp_18(5), 0(2)>
  vect__1.7_13 = MEM <vector(2) double> [(double *)vectp_x.5_2];
  _1 = x[i_11];
  vect_sum_6.8_14 = vect__1.7_13 + vect_sum_9.4_3;
  sum_6 = _1 + sum_9;
  i_7 = i_11 + 1;
  ivtmp_4 = ivtmp_8 - 1;
  vectp_x.5_12 = vectp_x.5_2 + 16;
  ivtmp_18 = ivtmp_17 + 1;
  if (ivtmp_18 < 511)
    goto <bb 5>; [98.99%]
  else
    goto <bb 4>; [1.01%]

  <bb 5> [local count: 1052266995]:
  goto <bb 3>; [100.00%]

  <bb 4> [local count: 10737416]:
  # sum_10 = PHI <sum_6(3)>
  # vect_sum_6.8_15 = PHI <vect_sum_6.8_14(3)>
  _16 = .REDUC_PLUS (vect_sum_6.8_15);

so we're using the updated value vect_sum_6.8_14, not the
start value vect_sum_9.4_3 to compute the result.  For an
early exit I would have expected we need to do .REDUC_PLUS
on vect_sum_9.4_3 instead?

What I see when doing an experiment on your branch is that
you keep the scalar reduction variable live in the vectorized
loop, resulting in wrong code as that will no longer compute
the sum of all elements but just the first N (the IV increment
will also not be adjusted).

double x[1024];
int a[1024];
double __attribute__((noipa)) foo  ()
{
  double sum = 0.0;
  for (int i = 0 ; i < 1023; ++i)
    {
      sum += x[i];
      if (a[i])
        break;
    }
  return sum;
}

int main()
{
  for (int i = 0; i < 1024; ++i)
    x[i] = i;
  a[19] = 1;
  if (foo () != 190.)
    __builtin_abort ();
  return 0;
}

is miscompiled on x86_64 with -msse4.1 -Ofast because of that. 

>  These will just be used as seed for the
> Scalar loop for any partial iterations.


> > 
> > > doing a search for which exit the live value should be materialized in.
> > >
> > > Additinally which value in the index we're after depends on whether
> > > the exit it's materialized in is an early exit or whether the loop's
> > > main exit is different from the loop's natural one (i.e. the one with
> > > the same src block as the latch).
> > >
> > > In those two cases we want the first rather than the last value as
> > > we're going to restart the iteration in the scalar loop.  For VLA this
> > > means we need to reverse both the mask and vector since there's only a
> > > way to get the last active element and not the first.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
> > > 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> > > 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> > >
> > > --- inline copy of patch ---
> > >
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > >
> > 4cf7f65dc164db27a498b31fe7ce0d9af3f3e299..2476e59ef488fd0a3b296c
> > ed7b0d
> > > 4d3e76a3634f 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -10627,12 +10627,60 @@ vectorizable_live_operation (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> > >  	   lhs' = new_tree;  */
> > >
> > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > -      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > +      /* A value can only be live in one exit.  So figure out which
> > > + one.  */
> > 
> > Well, a value can be live across multiple exits!
> 
> The same value can only be live across multiple early exits no?  In which
> case they'll all still be I the same block as all the early exits end In the same
> merge block.

A value can be live on the normal exit as well though we wouldn't
need its value there.  I think besides reductions we advance
all inductions in vect_update_ivs_after_vectorizer, we don't
have code to "continue" an induction in the epilogue.

> So this code is essentially just figuring out if you're an early or normal exit.
> Perhaps the comment is inclear..
> 
> > 
> > > +      edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > +      /* Check if we have a loop where the chosen exit is not the main exit,
> > > +	 in these cases for an early break we restart the iteration the vector
> > code
> > > +	 did.  For the live values we want the value at the start of the iteration
> > > +	 rather than at the end.  */
> > > +      bool restart_loop = false;
> > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +	{
> > > +	  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> > > +	    if (!is_gimple_debug (use_stmt)
> > > +		&& !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> > > +	      {
> > 
> > In fact when you get here you know the use is in a LC PHI.  Use
> > FOR_EACH_IMM_USE_FAST and you can get at the edge via
> > phi_arg_index_from_use and gimple_phi_arg_edge.
> > 
> > As said you have to process all exits the value is live on, not only the first.
> > 
> > > +		basic_block use_bb = gimple_bb (use_stmt);
> > > +		for (auto edge : get_loop_exit_edges (loop))
> > > +		  {
> > > +		    /* Alternative exits can have an intermediate BB in
> > > +		       between to update the IV.  In those cases we need to
> > > +		       look one block further.  */
> > > +		    if (use_bb == edge->dest
> > > +			|| (single_succ_p (edge->dest)
> > > +			    && use_bb == single_succ (edge->dest)))
> > > +		      {
> > > +			exit_e = edge;
> > > +			goto found;
> > > +		      }
> > > +		  }
> > > +	      }
> > > +found:
> > > +	  /* If the edge isn't a single pred then split the edge so we have a
> > > +	     location to place the live operations.  Perhaps we should always
> > > +	     split during IV updating.  But this way the CFG is cleaner to
> > > +	     follow.  */
> > > +	  restart_loop = !vect_is_loop_exit_latch_pred (exit_e, loop);
> > > +	  if (!single_pred_p (exit_e->dest))
> > > +	    exit_e = single_pred_edge (split_edge (exit_e));
> > > +
> > > +	  /* For early exit where the exit is not in the BB that leads to the
> > > +	     latch then we're restarting the iteration in the scalar loop. So
> > > +	     get the first live value.  */
> > > +	  if (restart_loop)
> > > +	    {
> > > +	      vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > > +	      vec_lhs = gimple_get_lhs (vec_stmt);
> > > +	      bitstart = build_zero_cst (TREE_TYPE (bitstart));
> > 
> > No, this doesn't work for SLP.  Note this also gets you the "first" live value
> > _after_ the vector iteration.
> 
> Yes we're after the first value for a full vector iteration.  In the initial iteration that
> value the seed vector is always started from the initial value of the PHI node no?
> 
> > Btw, I fail to see why you need to handle
> > STMT_VINFO_LIVE at all for the early exits - this is scalar values live _after_ all
> > iterations of the loop, thus it's provided by the scalar epilog that always runs
> > when we exit the vector loop early.
> 
> In the patch of last year I basically exited here with return true, and did not bother
> vectorizing them at all and instead adjusted them during the vect_update_ivs_after_vectorizer
> just like we normally would..  But you didn't seem to like that approach.
> 
> If we take that approach again then the only thing needing to be changed here is
> to ignore the live operations inside an early exit block.

I think that's the appropriate approach for any exit that will always
lead to an epilogue loop (we're only creating dead code).  I wonder
what will happen if you just leave it alone, just handling the
main IV edge as before?

> The reason they appear is that if you have something like
> 
> If (foo)
>   return I;
> 
> when we redirect the edge `i` ends up in the block between the two loops, and I Is also
> the loop counter.
> 
> Would you prefer I use last year's approach instead? i.e. just ignore them and recalculate
> Any loop IVs as normal?

Yes.

Richard.

> 
> Thanks,
> Tamar
> 
> > 
> > The story is different for reductions though (unless we fail to support early
> > breaks for those at the moment).
> > 
> > Richard.
> > 
> > 
> > > +	    }
> > > +	}
> > > +
> > > +      basic_block exit_bb = exit_e->dest;
> > >        gcc_assert (single_pred_p (exit_bb));
> > >
> > >        tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> > >        gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> > > -      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx,
> > vec_lhs);
> > > +      SET_PHI_ARG_DEF (phi, exit_e->dest_idx, vec_lhs);
> > >
> > >        gimple_seq stmts = NULL;
> > >        tree new_tree;
> > > @@ -10663,6 +10711,12 @@ vectorizable_live_operation (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> > >  	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
> > >  					  len, bias_minus_one);
> > >
> > > +	  /* This needs to implement extraction of the first index, but not sure
> > > +	     how the LEN stuff works.  At the moment we shouldn't get here
> > since
> > > +	     there's no LEN support for early breaks.  But guard this so there's
> > > +	     no incorrect codegen.  */
> > > +	  gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> > > +
> > >  	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
> > >  	  tree scalar_res
> > >  	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> > @@
> > > -10687,8 +10741,37 @@ vectorizable_live_operation (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> > >  					  &LOOP_VINFO_MASKS (loop_vinfo),
> > >  					  1, vectype, 0);
> > >  	  gimple_seq_add_seq (&stmts, tem);
> > > -	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST,
> > scalar_type,
> > > -					  mask, vec_lhs_phi);
> > > +	  tree scalar_res;
> > > +
> > > +	  /* For an inverted control flow with early breaks we want
> > EXTRACT_FIRST
> > > +	     instead of EXTRACT_LAST.  Emulate by reversing the vector and
> > mask. */
> > > +	  if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +	    {
> > > +	      auto gsi_stmt = gsi_last (stmts);
> > > +
> > > +	       /* First create the permuted mask.  */
> > > +	      tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> > > +	      tree perm_dest = copy_ssa_name (mask);
> > > +	      gimple *perm_stmt
> > > +		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> > > +					   mask, perm_mask);
> > > +	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> > > +					   &gsi_stmt);
> > > +	      mask = perm_dest;
> > > +
> > > +	       /* Then permute the vector contents.  */
> > > +	      tree perm_elem = perm_mask_for_reverse (vectype);
> > > +	      perm_dest = copy_ssa_name (vec_lhs_phi);
> > > +	      perm_stmt
> > > +		    = gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> > vec_lhs_phi,
> > > +					   vec_lhs_phi, perm_elem);
> > > +	      vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
> > > +					   &gsi_stmt);
> > > +	      vec_lhs_phi = perm_dest;
> > > +	    }
> > > +
> > > +	  scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> > > +				     mask, vec_lhs_phi);
> > >
> > >  	  /* Convert the extracted vector element to the scalar type.  */
> > >  	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res); @@
> > > -10708,26 +10791,36 @@ vectorizable_live_operation (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> > >        if (stmts)
> > >  	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> > >
> > > -      /* Remove existing phis that copy from lhs and create copies
> > > -	 from new_tree.  */
> > > -      gimple_stmt_iterator gsi;
> > > -      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
> > > +      /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
> > > +      bool single_use = true;
> > > +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> > >  	{
> > > -	  gimple *phi = gsi_stmt (gsi);
> > > -	  if ((gimple_phi_arg_def (phi, 0) == lhs))
> > > +	  if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> > > +	    continue;
> > > +
> > > +	  gcc_assert (single_use);
> > > +	  if (is_a <gphi *> (use_stmt)
> > > +	      && gimple_phi_arg_def (as_a <gphi *> (use_stmt), 0) == lhs)
> > >  	    {
> > > +	      /* Remove existing phis that copy from lhs and create copies
> > > +		 from new_tree.  */
> > > +	      gphi *phi = as_a <gphi *> (use_stmt);
> > > +	      auto gsi = gsi_for_phi (phi);
> > >  	      remove_phi_node (&gsi, false);
> > >  	      tree lhs_phi = gimple_phi_result (phi);
> > >  	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> > >  	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> > >  	    }
> > >  	  else
> > > -	    gsi_next (&gsi);
> > > +	    {
> > > +	      /* Or just update the use in place if not a phi.  */
> > > +	      use_operand_p use_p;
> > > +	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> > > +		SET_USE (use_p, new_tree);
> > > +	      update_stmt (use_stmt);
> > > +	    }
> > > +	  single_use = false;
> > >  	}
> > > -
> > > -      /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
> > > -      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> > > -	gcc_assert (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)));
> > >      }
> > >    else
> > >      {
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> > >
> > 3a22bf02f5ab16ded0af61cd1d719a98b8982144..7c3d6d196e122d67f750
> > dfef6d61
> > > 5aabc6c28281 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -1774,7 +1774,7 @@ compare_step_with_zero (vec_info *vinfo,
> > > stmt_vec_info stmt_info)
> > >  /* If the target supports a permute mask that reverses the elements in
> > >     a vector of type VECTYPE, return that mask, otherwise return null.
> > > */
> > >
> > > -static tree
> > > +tree
> > >  perm_mask_for_reverse (tree vectype)
> > >  {
> > >    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); diff --git
> > > a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > >
> > b9a71a0b5f5407417e8366b0df132df20c7f60aa..f261fc74b8795b4516b17
> > 155441d
> > > 25baaf8c22ae 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -2246,6 +2246,7 @@ extern bool vect_is_simple_use (vec_info *,
> > stmt_vec_info, slp_tree,
> > >  				enum vect_def_type *,
> > >  				tree *, stmt_vec_info * = NULL);
> > >  extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
> > > +extern tree perm_mask_for_reverse (tree);
> > >  extern bool supportable_widening_operation (vec_info*, code_helper,
> > >  					    stmt_vec_info, tree, tree,
> > >  					    code_helper*, code_helper*,
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-16 11:08               ` Tamar Christina
@ 2023-11-16 11:27                 ` Richard Biener
  2023-11-16 12:01                   ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-16 11:27 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Thu, 16 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Thursday, November 16, 2023 10:40 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> > breaks and arbitrary exits
> > 
> > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Wednesday, November 15, 2023 1:23 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > jlaw@ventanamicro.com
> > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > > support early breaks and arbitrary exits
> > > >
> > > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > Sent: Wednesday, November 15, 2023 1:01 PM
> > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > > > jlaw@ventanamicro.com
> > > > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > > > > support early breaks and arbitrary exits
> > > > > >
> > > > > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > > > > >
> > > > > > > Patch updated to latest trunk:
> > > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > This changes the PHI node updates to support early breaks.
> > > > > > > It has to support both the case where the loop's exit matches
> > > > > > > the normal loop exit and one where the early exit is "inverted", i.e.
> > > > > > > it's an early
> > > > > > exit edge.
> > > > > > >
> > > > > > > In the latter case we must always restart the loop for VF iterations.
> > > > > > > For an early exit the reason is obvious, but there are cases
> > > > > > > where the "normal" exit is located before the early one.  This
> > > > > > > exit then does a check on ivtmp resulting in us leaving the
> > > > > > > loop since it thinks we're
> > > > done.
> > > > > > >
> > > > > > > In these case we may still have side-effects to perform so we
> > > > > > > also go to the scalar loop.
> > > > > > >
> > > > > > > For the "normal" exit niters has already been adjusted for
> > > > > > > peeling, for the early exits we must find out how many
> > > > > > > iterations we actually did.  So we have to recalculate the new position
> > for each exit.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal):
> > > > > > > Hide
> > > > > > unused.
> > > > > > > 	(vect_update_ivs_after_vectorizer): Support early break.
> > > > > > > 	(vect_do_peeling): Use it.
> > > > > > >
> > > > > > > --- inline copy of patch ---
> > > > > > >
> > > > > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > > > > b/gcc/tree-vect-loop-manip.cc index
> > > > > > >
> > > > > >
> > > >
> > d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > > > > > d2654cf1
> > > > > > > c842baac58f5 100644
> > > > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > > > @@ -1200,7 +1200,7 @@
> > > > > > > vect_set_loop_condition_partial_vectors_avx512
> > > > > > (class loop *loop,
> > > > > > >     loop handles exactly VF scalars per iteration.  */
> > > > > > >
> > > > > > >  static gcond *
> > > > > > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
> > > > > > > edge exit_edge,
> > > > > > > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo
> > > > > > > +*/, edge exit_edge,
> > > > > > >  				class loop *loop, tree niters, tree step,
> > > > > > >  				tree final_iv, bool niters_maybe_zero,
> > > > > > >  				gimple_stmt_iterator loop_cond_gsi)
> > @@ -
> > > > > > 1412,7 +1412,7 @@
> > > > > > > vect_set_loop_condition (class loop *loop, edge loop_e,
> > > > > > > loop_vec_info
> > > > > > loop_vinfo
> > > > > > >     When this happens we need to flip the understanding of
> > > > > > > main and
> > > > other
> > > > > > >     exits by peeling and IV updates.  */
> > > > > > >
> > > > > > > -bool inline
> > > > > > > +bool
> > > > > > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > > > > > >    return single_pred (loop->latch) == loop_exit->src; @@
> > > > > > > -2142,6
> > > > > > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> > > > > > >       Input:
> > > > > > >       - LOOP - a loop that is going to be vectorized. The last few
> > iterations
> > > > > > >                of LOOP were peeled.
> > > > > > > +     - VF   - The chosen vectorization factor for LOOP.
> > > > > > >       - NITERS - the number of iterations that LOOP executes (before it is
> > > > > > >                  vectorized). i.e, the number of times the ivs should be
> > bumped.
> > > > > > >       - UPDATE_E - a successor edge of LOOP->exit that is on
> > > > > > > the
> > > > > > > (only) path
> > > > > >
> > > > > > the comment on this is now a bit misleading, can you try to
> > > > > > update it and/or move the comment bits to the docs on EARLY_EXIT?
> > > > > >
> > > > > > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> > > > > > loop_vinfo)
> > > > > > >                    The phi args associated with the edge UPDATE_E in the bb
> > > > > > >                    UPDATE_E->dest are updated accordingly.
> > > > > > >
> > > > > > > +     - restart_loop - Indicates whether the scalar loop needs
> > > > > > > + to restart the
> > > > > >
> > > > > > params are ALL_CAPS
> > > > > >
> > > > > > > +		      iteration count where the vector loop began.
> > > > > > > +
> > > > > > >       Assumption 1: Like the rest of the vectorizer, this function assumes
> > > > > > >       a single loop exit that has a single predecessor.
> > > > > > >
> > > > > > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> > > > > > loop_vinfo)
> > > > > > >   */
> > > > > > >
> > > > > > >  static void
> > > > > > > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > > > > -				  tree niters, edge update_e)
> > > > > > > +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > > > > +poly_uint64 vf,
> > > > > >
> > > > > > LOOP_VINFO_VECT_FACTOR?
> > > > > >
> > > > > > > +				  tree niters, edge update_e, bool
> > > > > > restart_loop)
> > > > > >
> > > > > > I think 'bool early_exit' is better here?  I wonder if we have an "early"
> > > > > > exit after the main exit we are probably sure there are no
> > > > > > side-effects to re- execute and could avoid this restarting?
> > > > >
> > > > > Side effects yes, but the actual check may not have been performed yet.
> > > > > If you remember
> > > > >
> > https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
> > > > > There in the clz loop through the "main" exit you still have to
> > > > > see if that iteration did not contain the entry.  This is because
> > > > > the loop counter is incremented before you iterate.
> > > > >
> > > > > >
> > > > > > >  {
> > > > > > >    gphi_iterator gsi, gsi1;
> > > > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > >    basic_block update_bb = update_e->dest;
> > > > > > > -
> > > > > > > -  basic_block exit_bb = LOOP_VINFO_IV_EXIT
> > > > > > > (loop_vinfo)->dest;
> > > > > > > -
> > > > > > > -  /* Make sure there exists a single-predecessor exit bb:  */
> > > > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > > > -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> > > > > > > +  bool inversed_iv
> > > > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT
> > (loop_vinfo),
> > > > > > > +					 LOOP_VINFO_LOOP
> > (loop_vinfo));
> > > > > > > +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS
> > (loop_vinfo)
> > > > > > > +			    && flow_bb_inside_loop_p (loop,
> > update_e->src);
> > > > > > > +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > > > > +  gcond *cond = get_loop_exit_condition (loop_e);
> > > > > > > +  basic_block exit_bb = loop_e->dest;
> > > > > > > +  basic_block iv_block = NULL;
> > > > > > > +  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
> > > > > > >
> > > > > > >    for (gsi = gsi_start_phis (loop->header), gsi1 =
> > > > > > > gsi_start_phis
> > > > (update_bb);
> > > > > > >         !gsi_end_p (gsi) && !gsi_end_p (gsi1); @@ -2190,7
> > > > > > > +2198,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info
> > loop_vinfo,
> > > > > > >        tree step_expr, off;
> > > > > > >        tree type;
> > > > > > >        tree var, ni, ni_name;
> > > > > > > -      gimple_stmt_iterator last_gsi;
> > > > > > >
> > > > > > >        gphi *phi = gsi.phi ();
> > > > > > >        gphi *phi1 = gsi1.phi (); @@ -2222,11 +2229,52 @@
> > > > > > > vect_update_ivs_after_vectorizer
> > > > > > (loop_vec_info loop_vinfo,
> > > > > > >        enum vect_induction_op_type induction_type
> > > > > > >  	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
> > > > > > >
> > > > > > > -      if (induction_type == vect_step_op_add)
> > > > > > > +      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi,
> > > > > > > + loop_latch_edge
> > > > (loop));
> > > > > > > +      /* create_iv always places it on the LHS.  Alternatively we can set a
> > > > > > > +	 property during create_iv to identify it.  */
> > > > > > > +      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > > +      if (restart_loop && ivtemp)
> > > > > > >  	{
> > > > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > +	  ni = build_int_cst (type, vf);
> > > > > > > +	  if (inversed_iv)
> > > > > > > +	    ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > > +			      fold_convert (type, step_expr));
> > > > > > > +	}
> > > > > > > +      else if (induction_type == vect_step_op_add)
> > > > > > > +	{
> > > > > > > +
> > > > > > >  	  tree stype = TREE_TYPE (step_expr);
> > > > > > > -	  off = fold_build2 (MULT_EXPR, stype,
> > > > > > > -			     fold_convert (stype, niters), step_expr);
> > > > > > > +
> > > > > > > +	  /* Early exits always use last iter value not niters. */
> > > > > > > +	  if (restart_loop)
> > > > > > > +	    {
> > > > > > > +	      /* Live statements in the non-main exit shouldn't be
> > adjusted.  We
> > > > > > > +		 normally didn't have this problem with a single exit as
> > live
> > > > > > > +		 values would be in the exit block.  However when
> > dealing with
> > > > > > > +		 multiple exits all exits are redirected to the merge
> > block
> > > > > > > +		 and we restart the iteration.  */
> > > > > >
> > > > > > Hmm, I fail to see how this works - we're either using the value
> > > > > > to continue the induction or not, independent of STMT_VINFO_LIVE_P.
> > > > >
> > > > > That becomes clear in the patch to update live reductions.
> > > > > Essentially any live Reductions inside an alternative exit will
> > > > > reduce to the first element rather than the last and use that as
> > > > > the seed for the
> > > > scalar loop.
> > > >
> > > > Hum.  Reductions are vectorized as N separate reductions.  I don't
> > > > think you can simply change the reduction between the lanes to "skip"
> > > > part of the vector iteration.  But you can use the value of the
> > > > vector from before the vector iteration - the loop header PHI
> > > > result, and fully reduce that to get at the proper value.
> > >
> > > That's what It's supposed to be doing though.  The reason live
> > > operations are skipped here is that if we don't we'll re-adjust the IV
> > > even though the value will already be correct after vectorization.
> > >
> > > Remember that this code only gets so far for IV PHI nodes.
> > >
> > > The loop phi header result itself can be live, i.e. see testcases
> > > vect-early-break_70.c to vect-early-break_75.c
> > >
> > > you have i_15 = PHI <i_14 (6), 1(2)>
> > >
> > > we use i_15 in the early exit. This should not be adjusted because
> > > when it's vectorized the value at 0[lane 0] is already correct.  This
> > > is why for any PHI inside the early exits it uses the value 0[0] instead of
> > N[lane_max].
> > >
> > > Perhaps I'm missing something here?
> > 
> > OK, so I refreshed my mind of what vect_update_ivs_after_vectorizer does.
> > 
> > I still do not understand the (complexity of the) patch.  Basically the function
> > computes the new value of the IV "from scratch" based on the number of
> > scalar iterations of the vector loop, the 'niter'
> > argument.  I would have expected that for the early exits we either pass in a
> > different 'niter' or alternatively a 'niter_adjustment'.
> 
> But for an early exit there's no static value for adjusted niter, since you don't know
> which iteration you exited from. Unlike the normal exit when you know if you get
> there you've done all possible iterations.
> 
> So you must compute the scalar iteration count on the exit itself.

?  You do not need the actual scalar iteration you exited (you don't
compute that either), you need the scalar iteration the vector iteration
started with when it exited prematurely and that's readily available?

> > 
> > It seems your change handles different kinds of inductions differently.
> > Specifically
> > 
> >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> >       if (restart_loop && ivtemp)
> >         {
> >           type = TREE_TYPE (gimple_phi_result (phi));
> >           ni = build_int_cst (type, vf);
> >           if (inversed_iv)
> >             ni = fold_build2 (MINUS_EXPR, type, ni,
> >                               fold_convert (type, step_expr));
> >         }
> > 
> > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > as the new value.  That seems to be very odd special casing for unknown
> > reasons.  And while you adjust vec_step_op_add, you don't adjust
> > vect_peel_nonlinear_iv_init (maybe not supported - better assert here).
> 
> The VF case is for a normal "non-inverted" loop, where if you take an early exit
> you know that you have to do at most VF iterations.  The VF - step is to account
> for the inverted loop control flow where you exit after adjusting the IV already
> by + step.

But doesn't that assume the IV counts from niter to zero?  I don't
see this special case is actually necessary, no?

> 
> Peeling doesn't matter here, since you know you were able to do a vector iteration
> so it's safe to do VF iterations.  So having peeled doesn't affect the remaining
> iters count.
> 
> > 
> > Also the vec_step_op_add case will keep the original scalar IV live even when it
> > is a vectorized induction.  The code recomputing the value from scratch avoids
> > this.
> > 
> >       /* For non-main exit create an intermediat edge to get any updated iv
> >          calculations.  */
> >       if (needs_interm_block
> >           && !iv_block
> >           && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > (new_stmts)))
> >         {
> >           iv_block = split_edge (update_e);
> >           update_e = single_succ_edge (update_e->dest);
> >           last_gsi = gsi_last_bb (iv_block);
> >         }
> > 
> > this is also odd, can we adjust the API instead?  I suppose this is because your
> > computation uses the original loop IV, if you based the computation off the
> > initial value only this might not be necessary?
> 
> No, on the main exit the code updates the value in the loop header and puts the
> Calculation in the merge block.  This works because it only needs to consume PHI
> nodes in the merge block and things like niters are adjusted in the guard block.
> 
> For an early exit, we don't have a guard block, only the merge block. We have to
> update the PHI nodes in that block,  but can't do so since you can't produce a value
> and consume it in a PHI node in the same BB.  So we need to create the block to put
> the values in for use in the merge block.  Because there's no "guard" block for early
> exits.

?  then compute niters in that block as well.

> The API can be adjusted by always creating the empty block either during peeling.
> That would prevent us from having to do anything special here.  Would that work
> better?  Or I can do it in the loop that iterates over the exits to before the call
> to vect_update_ivs_after_vectorizer, which I think might be more consistent.
> 
> > 
> > That said, I wonder why we cannot simply pass in an adjusted niter which
> > would be niters_vector_mult_vf - vf and be done with that?
> > 
> 
> We can ofcourse not have this and recompute it from niters itself, however this does
> affect the epilog code layout. Particularly knowing the static number if iterations left
> causes it to usually unroll the loop and share some of the computations.  i.e. the scalar
> code is often more efficient.
> 
> The computation would be niters_vector_mult_vf - iters_done * vf, since the value put
> Here is the remaining iteration count.  It's static for early exits.

Well, it might be "static" in that it doesn't really matter what you
use for the epilog main IV initial value as long as you are sure
you're not going to take that exit as you are sure we're going to
take one of the early exits.  So yeah, the special code is probably
OK, but it needs a better comment and as said the structure of
vect_update_ivs_after_vectorizer is a bit hard to follow now.

As said an important part for optimization is to not keep the scalar
IVs live in the vector loop.

> But can do whatever you prefer here.  Let me know what you prefer for the above.
> 
> Thanks,
> Tamar
> 
> > Thanks,
> > Richard.
> > 
> > 
> > > Regards,
> > > Tamar
> > > >
> > > > > It has to do this since you have to perform the side effects for
> > > > > the non-matching elements still.
> > > > >
> > > > > Regards,
> > > > > Tamar
> > > > >
> > > > > >
> > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > +		continue;
> > > > > > > +
> > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > +		 init + (final - init) * vf which takes into account peeling
> > > > > > > +		 values and non-single steps.  The main exit can use
> > niters
> > > > > > > +		 since if you exit from the main exit you've done all
> > vector
> > > > > > > +		 iterations.  For an early exit we don't know when we
> > exit
> > > > > > > +so
> > > > > > we
> > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > +				 fold_convert (stype, start_expr),
> > > > > > > +				 fold_convert (stype, init_expr));
> > > > > > > +	      /* Now adjust for VF to get the final iteration value.  */
> > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > +				 build_int_cst (stype, vf));
> > > > > > > +	    }
> > > > > > > +	  else
> > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > +			       fold_convert (stype, niters), step_expr);
> > > > > > > +
> > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > >  	  else
> > > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer
> > > > > > > (loop_vec_info
> > > > > > loop_vinfo,
> > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > >  	ni = init_expr;
> > > > > > > +      else if (restart_loop)
> > > > > > > +	continue;
> > > > > >
> > > > > > This looks all a bit complicated - why wouldn't we simply always
> > > > > > use the PHI result when 'restart_loop'?  Isn't that the correct
> > > > > > old start value in
> > > > all cases?
> > > > > >
> > > > > > >        else
> > > > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > > >  					  niters, step_expr,
> > > > > > > @@ -2245,9 +2295,20 @@ vect_update_ivs_after_vectorizer
> > > > > > (loop_vec_info
> > > > > > > loop_vinfo,
> > > > > > >
> > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > >
> > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > >        ni_name = force_gimple_operand (ni, &new_stmts, false,
> > > > > > > var);
> > > > > > > +
> > > > > > > +      /* For non-main exit create an intermediat edge to get any
> > updated iv
> > > > > > > +	 calculations.  */
> > > > > > > +      if (needs_interm_block
> > > > > > > +	  && !iv_block
> > > > > > > +	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > > > > (new_stmts)))
> > > > > > > +	{
> > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > +	}
> > > > > > > +
> > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > >  	{
> > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info
> > > > > > > loop_vinfo, tree
> > > > > > niters, tree nitersm1,
> > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > >        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> > > > > > >        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > niters_vector_mult_vf,
> > > > > > > -					update_e);
> > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > +      bool inversed_iv
> > > > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT
> > (loop_vinfo),
> > > > > > > +					 LOOP_VINFO_LOOP
> > (loop_vinfo));
> > > > > >
> > > > > > You are computing this here and in vect_update_ivs_after_vectorizer?
> > > > > >
> > > > > > > +
> > > > > > > +      /* Update the main exit first.  */
> > > > > > > +      vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > niters_vector_mult_vf,
> > > > > > > +					update_e, inversed_iv);
> > > > > > > +
> > > > > > > +      /* And then update the early exits.  */
> > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > +	{
> > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > +	    continue;
> > > > > > > +
> > > > > > > +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > > > +					    niters_vector_mult_vf,
> > > > > > > +					    exit, true);
> > > > > >
> > > > > > ... why does the same not work here?  Wouldn't the proper
> > > > > > condition be !dominated_by_p (CDI_DOMINATORS, exit->src,
> > > > > > LOOP_VINFO_IV_EXIT
> > > > > > (loop_vinfo)->src) or similar?  That is, whether the exit is at
> > > > > > or after the main IV exit?  (consider having two)
> > > > > >
> > > > > > > +	}
> > > > > > >
> > > > > > >        if (skip_epilog)
> > > > > > >  	{
> > > > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-16 11:27                 ` Richard Biener
@ 2023-11-16 12:01                   ` Tamar Christina
  2023-11-16 12:30                     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-16 12:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, November 16, 2023 11:28 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> On Thu, 16 Nov 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Thursday, November 16, 2023 10:40 AM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> jlaw@ventanamicro.com
> > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > support early breaks and arbitrary exits
> > >
> > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Wednesday, November 15, 2023 1:23 PM
> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > > jlaw@ventanamicro.com
> > > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > > > support early breaks and arbitrary exits
> > > > >
> > > > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > Sent: Wednesday, November 15, 2023 1:01 PM
> > > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > > > > jlaw@ventanamicro.com
> > > > > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code
> > > > > > > to support early breaks and arbitrary exits
> > > > > > >
> > > > > > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > > > > > >
> > > > > > > > Patch updated to latest trunk:
> > > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > This changes the PHI node updates to support early breaks.
> > > > > > > > It has to support both the case where the loop's exit
> > > > > > > > matches the normal loop exit and one where the early exit is
> "inverted", i.e.
> > > > > > > > it's an early
> > > > > > > exit edge.
> > > > > > > >
> > > > > > > > In the latter case we must always restart the loop for VF iterations.
> > > > > > > > For an early exit the reason is obvious, but there are
> > > > > > > > cases where the "normal" exit is located before the early
> > > > > > > > one.  This exit then does a check on ivtmp resulting in us
> > > > > > > > leaving the loop since it thinks we're
> > > > > done.
> > > > > > > >
> > > > > > > > In these case we may still have side-effects to perform so
> > > > > > > > we also go to the scalar loop.
> > > > > > > >
> > > > > > > > For the "normal" exit niters has already been adjusted for
> > > > > > > > peeling, for the early exits we must find out how many
> > > > > > > > iterations we actually did.  So we have to recalculate the
> > > > > > > > new position
> > > for each exit.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Tamar
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal):
> > > > > > > > Hide
> > > > > > > unused.
> > > > > > > > 	(vect_update_ivs_after_vectorizer): Support early break.
> > > > > > > > 	(vect_do_peeling): Use it.
> > > > > > > >
> > > > > > > > --- inline copy of patch ---
> > > > > > > >
> > > > > > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > > > > > b/gcc/tree-vect-loop-manip.cc index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > > > > > > d2654cf1
> > > > > > > > c842baac58f5 100644
> > > > > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > > > > @@ -1200,7 +1200,7 @@
> > > > > > > > vect_set_loop_condition_partial_vectors_avx512
> > > > > > > (class loop *loop,
> > > > > > > >     loop handles exactly VF scalars per iteration.  */
> > > > > > > >
> > > > > > > >  static gcond *
> > > > > > > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
> > > > > > > > edge exit_edge,
> > > > > > > > +vect_set_loop_condition_normal (loop_vec_info /*
> > > > > > > > +loop_vinfo */, edge exit_edge,
> > > > > > > >  				class loop *loop, tree niters, tree step,
> > > > > > > >  				tree final_iv, bool niters_maybe_zero,
> > > > > > > >  				gimple_stmt_iterator loop_cond_gsi)
> > > @@ -
> > > > > > > 1412,7 +1412,7 @@
> > > > > > > > vect_set_loop_condition (class loop *loop, edge loop_e,
> > > > > > > > loop_vec_info
> > > > > > > loop_vinfo
> > > > > > > >     When this happens we need to flip the understanding of
> > > > > > > > main and
> > > > > other
> > > > > > > >     exits by peeling and IV updates.  */
> > > > > > > >
> > > > > > > > -bool inline
> > > > > > > > +bool
> > > > > > > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > > > > > > >    return single_pred (loop->latch) == loop_exit->src; @@
> > > > > > > > -2142,6
> > > > > > > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info
> > > > > > > > +loop_vinfo)
> > > > > > > >       Input:
> > > > > > > >       - LOOP - a loop that is going to be vectorized. The
> > > > > > > > last few
> > > iterations
> > > > > > > >                of LOOP were peeled.
> > > > > > > > +     - VF   - The chosen vectorization factor for LOOP.
> > > > > > > >       - NITERS - the number of iterations that LOOP executes (before
> it is
> > > > > > > >                  vectorized). i.e, the number of times the
> > > > > > > > ivs should be
> > > bumped.
> > > > > > > >       - UPDATE_E - a successor edge of LOOP->exit that is
> > > > > > > > on the
> > > > > > > > (only) path
> > > > > > >
> > > > > > > the comment on this is now a bit misleading, can you try to
> > > > > > > update it and/or move the comment bits to the docs on
> EARLY_EXIT?
> > > > > > >
> > > > > > > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p
> > > > > > > > (loop_vec_info
> > > > > > > loop_vinfo)
> > > > > > > >                    The phi args associated with the edge UPDATE_E in the
> bb
> > > > > > > >                    UPDATE_E->dest are updated accordingly.
> > > > > > > >
> > > > > > > > +     - restart_loop - Indicates whether the scalar loop
> > > > > > > > + needs to restart the
> > > > > > >
> > > > > > > params are ALL_CAPS
> > > > > > >
> > > > > > > > +		      iteration count where the vector loop began.
> > > > > > > > +
> > > > > > > >       Assumption 1: Like the rest of the vectorizer, this function
> assumes
> > > > > > > >       a single loop exit that has a single predecessor.
> > > > > > > >
> > > > > > > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p
> > > > > > > > (loop_vec_info
> > > > > > > loop_vinfo)
> > > > > > > >   */
> > > > > > > >
> > > > > > > >  static void
> > > > > > > > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > > > > > -				  tree niters, edge update_e)
> > > > > > > > +vect_update_ivs_after_vectorizer (loop_vec_info
> > > > > > > > +loop_vinfo,
> > > > > > > > +poly_uint64 vf,
> > > > > > >
> > > > > > > LOOP_VINFO_VECT_FACTOR?
> > > > > > >
> > > > > > > > +				  tree niters, edge update_e, bool
> > > > > > > restart_loop)
> > > > > > >
> > > > > > > I think 'bool early_exit' is better here?  I wonder if we have an "early"
> > > > > > > exit after the main exit we are probably sure there are no
> > > > > > > side-effects to re- execute and could avoid this restarting?
> > > > > >
> > > > > > Side effects yes, but the actual check may not have been performed
> yet.
> > > > > > If you remember
> > > > > >
> > > https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
> > > > > > There in the clz loop through the "main" exit you still have
> > > > > > to see if that iteration did not contain the entry.  This is
> > > > > > because the loop counter is incremented before you iterate.
> > > > > >
> > > > > > >
> > > > > > > >  {
> > > > > > > >    gphi_iterator gsi, gsi1;
> > > > > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > > >    basic_block update_bb = update_e->dest;
> > > > > > > > -
> > > > > > > > -  basic_block exit_bb = LOOP_VINFO_IV_EXIT
> > > > > > > > (loop_vinfo)->dest;
> > > > > > > > -
> > > > > > > > -  /* Make sure there exists a single-predecessor exit bb:
> > > > > > > > */
> > > > > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > > > > -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> > > > > > > > +  bool inversed_iv
> > > > > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT
> > > (loop_vinfo),
> > > > > > > > +					 LOOP_VINFO_LOOP
> > > (loop_vinfo));
> > > > > > > > +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS
> > > (loop_vinfo)
> > > > > > > > +			    && flow_bb_inside_loop_p (loop,
> > > update_e->src);
> > > > > > > > +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);  gcond
> > > > > > > > + *cond = get_loop_exit_condition (loop_e);  basic_block
> > > > > > > > + exit_bb = loop_e->dest;  basic_block iv_block = NULL;
> > > > > > > > + gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
> > > > > > > >
> > > > > > > >    for (gsi = gsi_start_phis (loop->header), gsi1 =
> > > > > > > > gsi_start_phis
> > > > > (update_bb);
> > > > > > > >         !gsi_end_p (gsi) && !gsi_end_p (gsi1); @@ -2190,7
> > > > > > > > +2198,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info
> > > loop_vinfo,
> > > > > > > >        tree step_expr, off;
> > > > > > > >        tree type;
> > > > > > > >        tree var, ni, ni_name;
> > > > > > > > -      gimple_stmt_iterator last_gsi;
> > > > > > > >
> > > > > > > >        gphi *phi = gsi.phi ();
> > > > > > > >        gphi *phi1 = gsi1.phi (); @@ -2222,11 +2229,52 @@
> > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > (loop_vec_info loop_vinfo,
> > > > > > > >        enum vect_induction_op_type induction_type
> > > > > > > >  	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
> > > > > > > >
> > > > > > > > -      if (induction_type == vect_step_op_add)
> > > > > > > > +      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi,
> > > > > > > > + loop_latch_edge
> > > > > (loop));
> > > > > > > > +      /* create_iv always places it on the LHS.  Alternatively we can
> set a
> > > > > > > > +	 property during create_iv to identify it.  */
> > > > > > > > +      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > > > +      if (restart_loop && ivtemp)
> > > > > > > >  	{
> > > > > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > > +	  ni = build_int_cst (type, vf);
> > > > > > > > +	  if (inversed_iv)
> > > > > > > > +	    ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > > > +			      fold_convert (type, step_expr));
> > > > > > > > +	}
> > > > > > > > +      else if (induction_type == vect_step_op_add)
> > > > > > > > +	{
> > > > > > > > +
> > > > > > > >  	  tree stype = TREE_TYPE (step_expr);
> > > > > > > > -	  off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > -			     fold_convert (stype, niters), step_expr);
> > > > > > > > +
> > > > > > > > +	  /* Early exits always use last iter value not niters. */
> > > > > > > > +	  if (restart_loop)
> > > > > > > > +	    {
> > > > > > > > +	      /* Live statements in the non-main exit shouldn't
> > > > > > > > +be
> > > adjusted.  We
> > > > > > > > +		 normally didn't have this problem with a single exit
> > > > > > > > +as
> > > live
> > > > > > > > +		 values would be in the exit block.  However when
> > > dealing with
> > > > > > > > +		 multiple exits all exits are redirected to the merge
> > > block
> > > > > > > > +		 and we restart the iteration.  */
> > > > > > >
> > > > > > > Hmm, I fail to see how this works - we're either using the
> > > > > > > value to continue the induction or not, independent of
> STMT_VINFO_LIVE_P.
> > > > > >
> > > > > > That becomes clear in the patch to update live reductions.
> > > > > > Essentially any live Reductions inside an alternative exit
> > > > > > will reduce to the first element rather than the last and use
> > > > > > that as the seed for the
> > > > > scalar loop.
> > > > >
> > > > > Hum.  Reductions are vectorized as N separate reductions.  I
> > > > > don't think you can simply change the reduction between the lanes to
> "skip"
> > > > > part of the vector iteration.  But you can use the value of the
> > > > > vector from before the vector iteration - the loop header PHI
> > > > > result, and fully reduce that to get at the proper value.
> > > >
> > > > That's what It's supposed to be doing though.  The reason live
> > > > operations are skipped here is that if we don't we'll re-adjust
> > > > the IV even though the value will already be correct after vectorization.
> > > >
> > > > Remember that this code only gets so far for IV PHI nodes.
> > > >
> > > > The loop phi header result itself can be live, i.e. see testcases
> > > > vect-early-break_70.c to vect-early-break_75.c
> > > >
> > > > you have i_15 = PHI <i_14 (6), 1(2)>
> > > >
> > > > we use i_15 in the early exit. This should not be adjusted because
> > > > when it's vectorized the value at 0[lane 0] is already correct.
> > > > This is why for any PHI inside the early exits it uses the value
> > > > 0[0] instead of
> > > N[lane_max].
> > > >
> > > > Perhaps I'm missing something here?
> > >
> > > OK, so I refreshed my mind of what vect_update_ivs_after_vectorizer
> does.
> > >
> > > I still do not understand the (complexity of the) patch.  Basically
> > > the function computes the new value of the IV "from scratch" based
> > > on the number of scalar iterations of the vector loop, the 'niter'
> > > argument.  I would have expected that for the early exits we either
> > > pass in a different 'niter' or alternatively a 'niter_adjustment'.
> >
> > But for an early exit there's no static value for adjusted niter,
> > since you don't know which iteration you exited from. Unlike the
> > normal exit when you know if you get there you've done all possible
> iterations.
> >
> > So you must compute the scalar iteration count on the exit itself.
> 
> ?  You do not need the actual scalar iteration you exited (you don't compute
> that either), you need the scalar iteration the vector iteration started with
> when it exited prematurely and that's readily available?

For a normal exit yes, not for an early exit no? niters_vector_mult_vf is only
valid for the main exit.

There's the unadjusted scalar count, which is what it's using to adjust it to
the final count.  Unless I'm missing something?

> 
> > >
> > > It seems your change handles different kinds of inductions differently.
> > > Specifically
> > >
> > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > >       if (restart_loop && ivtemp)
> > >         {
> > >           type = TREE_TYPE (gimple_phi_result (phi));
> > >           ni = build_int_cst (type, vf);
> > >           if (inversed_iv)
> > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > >                               fold_convert (type, step_expr));
> > >         }
> > >
> > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > as the new value.  That seems to be very odd special casing for
> > > unknown reasons.  And while you adjust vec_step_op_add, you don't
> > > adjust vect_peel_nonlinear_iv_init (maybe not supported - better assert
> here).
> >
> > The VF case is for a normal "non-inverted" loop, where if you take an
> > early exit you know that you have to do at most VF iterations.  The VF
> > - step is to account for the inverted loop control flow where you exit
> > after adjusting the IV already by + step.
> 
> But doesn't that assume the IV counts from niter to zero?  I don't see this
> special case is actually necessary, no?
> 

I needed it because otherwise the scalar loop iterates one iteration too little
So I got a miscompile with the inverter loop stuff.  I'll look at it again perhaps
It can be solved differently.

> >
> > Peeling doesn't matter here, since you know you were able to do a
> > vector iteration so it's safe to do VF iterations.  So having peeled
> > doesn't affect the remaining iters count.
> >
> > >
> > > Also the vec_step_op_add case will keep the original scalar IV live
> > > even when it is a vectorized induction.  The code recomputing the
> > > value from scratch avoids this.
> > >
> > >       /* For non-main exit create an intermediat edge to get any updated iv
> > >          calculations.  */
> > >       if (needs_interm_block
> > >           && !iv_block
> > >           && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > (new_stmts)))
> > >         {
> > >           iv_block = split_edge (update_e);
> > >           update_e = single_succ_edge (update_e->dest);
> > >           last_gsi = gsi_last_bb (iv_block);
> > >         }
> > >
> > > this is also odd, can we adjust the API instead?  I suppose this is
> > > because your computation uses the original loop IV, if you based the
> > > computation off the initial value only this might not be necessary?
> >
> > No, on the main exit the code updates the value in the loop header and
> > puts the Calculation in the merge block.  This works because it only
> > needs to consume PHI nodes in the merge block and things like niters are
> adjusted in the guard block.
> >
> > For an early exit, we don't have a guard block, only the merge block.
> > We have to update the PHI nodes in that block,  but can't do so since
> > you can't produce a value and consume it in a PHI node in the same BB.
> > So we need to create the block to put the values in for use in the
> > merge block.  Because there's no "guard" block for early exits.
> 
> ?  then compute niters in that block as well.

We can't since it'll not be reachable through the right edge.  What we can
do if you want is slightly change peeling, we currently peel as:

  \        \             /
  E1     E2        Normal exit
    \       |          |
       \    |          Guard
          \ |          |
         Merge block
                  |
             Pre Header

If we instead peel as:


  \        \             /
  E1     E2        Normal exit
    \       |          |
       Exit join   Guard
          \ |          |
         Merge block
                  |
             Pre Header

We can use the exit join block.  This would also mean vect_update_ivs_after_vectorizer
Doesn't need to iterate over all exits and only really needs to adjust the phi nodes
Coming out of the exit join and guard block.

Does this work for you?

Thanks,
Tamar
> 
> > The API can be adjusted by always creating the empty block either during
> peeling.
> > That would prevent us from having to do anything special here.  Would
> > that work better?  Or I can do it in the loop that iterates over the
> > exits to before the call to vect_update_ivs_after_vectorizer, which I think
> might be more consistent.
> >
> > >
> > > That said, I wonder why we cannot simply pass in an adjusted niter
> > > which would be niters_vector_mult_vf - vf and be done with that?
> > >
> >
> > We can ofcourse not have this and recompute it from niters itself,
> > however this does affect the epilog code layout. Particularly knowing
> > the static number if iterations left causes it to usually unroll the
> > loop and share some of the computations.  i.e. the scalar code is often more
> efficient.
> >
> > The computation would be niters_vector_mult_vf - iters_done * vf,
> > since the value put Here is the remaining iteration count.  It's static for early
> exits.
> 
> Well, it might be "static" in that it doesn't really matter what you use for the
> epilog main IV initial value as long as you are sure you're not going to take that
> exit as you are sure we're going to take one of the early exits.  So yeah, the
> special code is probably OK, but it needs a better comment and as said the
> structure of vect_update_ivs_after_vectorizer is a bit hard to follow now.
> 
> As said an important part for optimization is to not keep the scalar IVs live in
> the vector loop.
> 
> > But can do whatever you prefer here.  Let me know what you prefer for the
> above.
> >
> > Thanks,
> > Tamar
> >
> > > Thanks,
> > > Richard.
> > >
> > >
> > > > Regards,
> > > > Tamar
> > > > >
> > > > > > It has to do this since you have to perform the side effects
> > > > > > for the non-matching elements still.
> > > > > >
> > > > > > Regards,
> > > > > > Tamar
> > > > > >
> > > > > > >
> > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > +		continue;
> > > > > > > > +
> > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > +		 init + (final - init) * vf which takes into account peeling
> > > > > > > > +		 values and non-single steps.  The main exit can use
> > > niters
> > > > > > > > +		 since if you exit from the main exit you've done all
> > > vector
> > > > > > > > +		 iterations.  For an early exit we don't know when we
> > > exit
> > > > > > > > +so
> > > > > > > we
> > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > +				 fold_convert (stype, start_expr),
> > > > > > > > +				 fold_convert (stype, init_expr));
> > > > > > > > +	      /* Now adjust for VF to get the final iteration value.  */
> > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > +				 build_int_cst (stype, vf));
> > > > > > > > +	    }
> > > > > > > > +	  else
> > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > +			       fold_convert (stype, niters), step_expr);
> > > > > > > > +
> > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > >  	  else
> > > > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer
> > > > > > > > (loop_vec_info
> > > > > > > loop_vinfo,
> > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > >  	ni = init_expr;
> > > > > > > > +      else if (restart_loop)
> > > > > > > > +	continue;
> > > > > > >
> > > > > > > This looks all a bit complicated - why wouldn't we simply
> > > > > > > always use the PHI result when 'restart_loop'?  Isn't that
> > > > > > > the correct old start value in
> > > > > all cases?
> > > > > > >
> > > > > > > >        else
> > > > > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > > > >  					  niters, step_expr, @@ -
> 2245,9 +2295,20 @@
> > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > (loop_vec_info
> > > > > > > > loop_vinfo,
> > > > > > > >
> > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > >
> > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > >        ni_name = force_gimple_operand (ni, &new_stmts,
> > > > > > > > false, var);
> > > > > > > > +
> > > > > > > > +      /* For non-main exit create an intermediat edge to
> > > > > > > > + get any
> > > updated iv
> > > > > > > > +	 calculations.  */
> > > > > > > > +      if (needs_interm_block
> > > > > > > > +	  && !iv_block
> > > > > > > > +	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > > > > > (new_stmts)))
> > > > > > > > +	{
> > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > +	}
> > > > > > > > +
> > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > >  	{
> > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info
> > > > > > > > loop_vinfo, tree
> > > > > > > niters, tree nitersm1,
> > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > >        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> > > > > > > >        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > niters_vector_mult_vf,
> > > > > > > > -					update_e);
> > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > +      bool inversed_iv
> > > > > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT
> > > (loop_vinfo),
> > > > > > > > +					 LOOP_VINFO_LOOP
> > > (loop_vinfo));
> > > > > > >
> > > > > > > You are computing this here and in
> vect_update_ivs_after_vectorizer?
> > > > > > >
> > > > > > > > +
> > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > +      vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > niters_vector_mult_vf,
> > > > > > > > +					update_e, inversed_iv);
> > > > > > > > +
> > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > +	{
> > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > > +	    continue;
> > > > > > > > +
> > > > > > > > +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > > > > +					    niters_vector_mult_vf,
> > > > > > > > +					    exit, true);
> > > > > > >
> > > > > > > ... why does the same not work here?  Wouldn't the proper
> > > > > > > condition be !dominated_by_p (CDI_DOMINATORS, exit->src,
> > > > > > > LOOP_VINFO_IV_EXIT
> > > > > > > (loop_vinfo)->src) or similar?  That is, whether the exit is
> > > > > > > at or after the main IV exit?  (consider having two)
> > > > > > >
> > > > > > > > +	}
> > > > > > > >
> > > > > > > >        if (skip_epilog)
> > > > > > > >  	{
> > > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-16 12:01                   ` Tamar Christina
@ 2023-11-16 12:30                     ` Richard Biener
  2023-11-16 13:22                       ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-16 12:30 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Thu, 16 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Thursday, November 16, 2023 11:28 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> > breaks and arbitrary exits
> > 
> > On Thu, 16 Nov 2023, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Thursday, November 16, 2023 10:40 AM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > jlaw@ventanamicro.com
> > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > > support early breaks and arbitrary exits
> > > >
> > > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > Sent: Wednesday, November 15, 2023 1:23 PM
> > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > > > jlaw@ventanamicro.com
> > > > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > > > > support early breaks and arbitrary exits
> > > > > >
> > > > > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > > Sent: Wednesday, November 15, 2023 1:01 PM
> > > > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > > > > > jlaw@ventanamicro.com
> > > > > > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code
> > > > > > > > to support early breaks and arbitrary exits
> > > > > > > >
> > > > > > > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > > > > > > >
> > > > > > > > > Patch updated to latest trunk:
> > > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > This changes the PHI node updates to support early breaks.
> > > > > > > > > It has to support both the case where the loop's exit
> > > > > > > > > matches the normal loop exit and one where the early exit is
> > "inverted", i.e.
> > > > > > > > > it's an early
> > > > > > > > exit edge.
> > > > > > > > >
> > > > > > > > > In the latter case we must always restart the loop for VF iterations.
> > > > > > > > > For an early exit the reason is obvious, but there are
> > > > > > > > > cases where the "normal" exit is located before the early
> > > > > > > > > one.  This exit then does a check on ivtmp resulting in us
> > > > > > > > > leaving the loop since it thinks we're
> > > > > > done.
> > > > > > > > >
> > > > > > > > > In these case we may still have side-effects to perform so
> > > > > > > > > we also go to the scalar loop.
> > > > > > > > >
> > > > > > > > > For the "normal" exit niters has already been adjusted for
> > > > > > > > > peeling, for the early exits we must find out how many
> > > > > > > > > iterations we actually did.  So we have to recalculate the
> > > > > > > > > new position
> > > > for each exit.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Tamar
> > > > > > > > >
> > > > > > > > > gcc/ChangeLog:
> > > > > > > > >
> > > > > > > > > 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal):
> > > > > > > > > Hide
> > > > > > > > unused.
> > > > > > > > > 	(vect_update_ivs_after_vectorizer): Support early break.
> > > > > > > > > 	(vect_do_peeling): Use it.
> > > > > > > > >
> > > > > > > > > --- inline copy of patch ---
> > > > > > > > >
> > > > > > > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > > > > > > b/gcc/tree-vect-loop-manip.cc index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > > > > > > > d2654cf1
> > > > > > > > > c842baac58f5 100644
> > > > > > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > > > > > @@ -1200,7 +1200,7 @@
> > > > > > > > > vect_set_loop_condition_partial_vectors_avx512
> > > > > > > > (class loop *loop,
> > > > > > > > >     loop handles exactly VF scalars per iteration.  */
> > > > > > > > >
> > > > > > > > >  static gcond *
> > > > > > > > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo,
> > > > > > > > > edge exit_edge,
> > > > > > > > > +vect_set_loop_condition_normal (loop_vec_info /*
> > > > > > > > > +loop_vinfo */, edge exit_edge,
> > > > > > > > >  				class loop *loop, tree niters, tree step,
> > > > > > > > >  				tree final_iv, bool niters_maybe_zero,
> > > > > > > > >  				gimple_stmt_iterator loop_cond_gsi)
> > > > @@ -
> > > > > > > > 1412,7 +1412,7 @@
> > > > > > > > > vect_set_loop_condition (class loop *loop, edge loop_e,
> > > > > > > > > loop_vec_info
> > > > > > > > loop_vinfo
> > > > > > > > >     When this happens we need to flip the understanding of
> > > > > > > > > main and
> > > > > > other
> > > > > > > > >     exits by peeling and IV updates.  */
> > > > > > > > >
> > > > > > > > > -bool inline
> > > > > > > > > +bool
> > > > > > > > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > > > > > > > >    return single_pred (loop->latch) == loop_exit->src; @@
> > > > > > > > > -2142,6
> > > > > > > > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info
> > > > > > > > > +loop_vinfo)
> > > > > > > > >       Input:
> > > > > > > > >       - LOOP - a loop that is going to be vectorized. The
> > > > > > > > > last few
> > > > iterations
> > > > > > > > >                of LOOP were peeled.
> > > > > > > > > +     - VF   - The chosen vectorization factor for LOOP.
> > > > > > > > >       - NITERS - the number of iterations that LOOP executes (before
> > it is
> > > > > > > > >                  vectorized). i.e, the number of times the
> > > > > > > > > ivs should be
> > > > bumped.
> > > > > > > > >       - UPDATE_E - a successor edge of LOOP->exit that is
> > > > > > > > > on the
> > > > > > > > > (only) path
> > > > > > > >
> > > > > > > > the comment on this is now a bit misleading, can you try to
> > > > > > > > update it and/or move the comment bits to the docs on
> > EARLY_EXIT?
> > > > > > > >
> > > > > > > > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p
> > > > > > > > > (loop_vec_info
> > > > > > > > loop_vinfo)
> > > > > > > > >                    The phi args associated with the edge UPDATE_E in the
> > bb
> > > > > > > > >                    UPDATE_E->dest are updated accordingly.
> > > > > > > > >
> > > > > > > > > +     - restart_loop - Indicates whether the scalar loop
> > > > > > > > > + needs to restart the
> > > > > > > >
> > > > > > > > params are ALL_CAPS
> > > > > > > >
> > > > > > > > > +		      iteration count where the vector loop began.
> > > > > > > > > +
> > > > > > > > >       Assumption 1: Like the rest of the vectorizer, this function
> > assumes
> > > > > > > > >       a single loop exit that has a single predecessor.
> > > > > > > > >
> > > > > > > > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p
> > > > > > > > > (loop_vec_info
> > > > > > > > loop_vinfo)
> > > > > > > > >   */
> > > > > > > > >
> > > > > > > > >  static void
> > > > > > > > > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > > > > > > > -				  tree niters, edge update_e)
> > > > > > > > > +vect_update_ivs_after_vectorizer (loop_vec_info
> > > > > > > > > +loop_vinfo,
> > > > > > > > > +poly_uint64 vf,
> > > > > > > >
> > > > > > > > LOOP_VINFO_VECT_FACTOR?
> > > > > > > >
> > > > > > > > > +				  tree niters, edge update_e, bool
> > > > > > > > restart_loop)
> > > > > > > >
> > > > > > > > I think 'bool early_exit' is better here?  I wonder if we have an "early"
> > > > > > > > exit after the main exit we are probably sure there are no
> > > > > > > > side-effects to re- execute and could avoid this restarting?
> > > > > > >
> > > > > > > Side effects yes, but the actual check may not have been performed
> > yet.
> > > > > > > If you remember
> > > > > > >
> > > > https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
> > > > > > > There in the clz loop through the "main" exit you still have
> > > > > > > to see if that iteration did not contain the entry.  This is
> > > > > > > because the loop counter is incremented before you iterate.
> > > > > > >
> > > > > > > >
> > > > > > > > >  {
> > > > > > > > >    gphi_iterator gsi, gsi1;
> > > > > > > > >    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > > > > > > >    basic_block update_bb = update_e->dest;
> > > > > > > > > -
> > > > > > > > > -  basic_block exit_bb = LOOP_VINFO_IV_EXIT
> > > > > > > > > (loop_vinfo)->dest;
> > > > > > > > > -
> > > > > > > > > -  /* Make sure there exists a single-predecessor exit bb:
> > > > > > > > > */
> > > > > > > > > -  gcc_assert (single_pred_p (exit_bb));
> > > > > > > > > -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> > > > > > > > > +  bool inversed_iv
> > > > > > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT
> > > > (loop_vinfo),
> > > > > > > > > +					 LOOP_VINFO_LOOP
> > > > (loop_vinfo));
> > > > > > > > > +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS
> > > > (loop_vinfo)
> > > > > > > > > +			    && flow_bb_inside_loop_p (loop,
> > > > update_e->src);
> > > > > > > > > +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);  gcond
> > > > > > > > > + *cond = get_loop_exit_condition (loop_e);  basic_block
> > > > > > > > > + exit_bb = loop_e->dest;  basic_block iv_block = NULL;
> > > > > > > > > + gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > >
> > > > > > > > >    for (gsi = gsi_start_phis (loop->header), gsi1 =
> > > > > > > > > gsi_start_phis
> > > > > > (update_bb);
> > > > > > > > >         !gsi_end_p (gsi) && !gsi_end_p (gsi1); @@ -2190,7
> > > > > > > > > +2198,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info
> > > > loop_vinfo,
> > > > > > > > >        tree step_expr, off;
> > > > > > > > >        tree type;
> > > > > > > > >        tree var, ni, ni_name;
> > > > > > > > > -      gimple_stmt_iterator last_gsi;
> > > > > > > > >
> > > > > > > > >        gphi *phi = gsi.phi ();
> > > > > > > > >        gphi *phi1 = gsi1.phi (); @@ -2222,11 +2229,52 @@
> > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > (loop_vec_info loop_vinfo,
> > > > > > > > >        enum vect_induction_op_type induction_type
> > > > > > > > >  	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
> > > > > > > > >
> > > > > > > > > -      if (induction_type == vect_step_op_add)
> > > > > > > > > +      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi,
> > > > > > > > > + loop_latch_edge
> > > > > > (loop));
> > > > > > > > > +      /* create_iv always places it on the LHS.  Alternatively we can
> > set a
> > > > > > > > > +	 property during create_iv to identify it.  */
> > > > > > > > > +      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > > > > +      if (restart_loop && ivtemp)
> > > > > > > > >  	{
> > > > > > > > > +	  type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > > > +	  ni = build_int_cst (type, vf);
> > > > > > > > > +	  if (inversed_iv)
> > > > > > > > > +	    ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > > > > +			      fold_convert (type, step_expr));
> > > > > > > > > +	}
> > > > > > > > > +      else if (induction_type == vect_step_op_add)
> > > > > > > > > +	{
> > > > > > > > > +
> > > > > > > > >  	  tree stype = TREE_TYPE (step_expr);
> > > > > > > > > -	  off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > -			     fold_convert (stype, niters), step_expr);
> > > > > > > > > +
> > > > > > > > > +	  /* Early exits always use last iter value not niters. */
> > > > > > > > > +	  if (restart_loop)
> > > > > > > > > +	    {
> > > > > > > > > +	      /* Live statements in the non-main exit shouldn't
> > > > > > > > > +be
> > > > adjusted.  We
> > > > > > > > > +		 normally didn't have this problem with a single exit
> > > > > > > > > +as
> > > > live
> > > > > > > > > +		 values would be in the exit block.  However when
> > > > dealing with
> > > > > > > > > +		 multiple exits all exits are redirected to the merge
> > > > block
> > > > > > > > > +		 and we restart the iteration.  */
> > > > > > > >
> > > > > > > > Hmm, I fail to see how this works - we're either using the
> > > > > > > > value to continue the induction or not, independent of
> > STMT_VINFO_LIVE_P.
> > > > > > >
> > > > > > > That becomes clear in the patch to update live reductions.
> > > > > > > Essentially any live Reductions inside an alternative exit
> > > > > > > will reduce to the first element rather than the last and use
> > > > > > > that as the seed for the
> > > > > > scalar loop.
> > > > > >
> > > > > > Hum.  Reductions are vectorized as N separate reductions.  I
> > > > > > don't think you can simply change the reduction between the lanes to
> > "skip"
> > > > > > part of the vector iteration.  But you can use the value of the
> > > > > > vector from before the vector iteration - the loop header PHI
> > > > > > result, and fully reduce that to get at the proper value.
> > > > >
> > > > > That's what It's supposed to be doing though.  The reason live
> > > > > operations are skipped here is that if we don't we'll re-adjust
> > > > > the IV even though the value will already be correct after vectorization.
> > > > >
> > > > > Remember that this code only gets so far for IV PHI nodes.
> > > > >
> > > > > The loop phi header result itself can be live, i.e. see testcases
> > > > > vect-early-break_70.c to vect-early-break_75.c
> > > > >
> > > > > you have i_15 = PHI <i_14 (6), 1(2)>
> > > > >
> > > > > we use i_15 in the early exit. This should not be adjusted because
> > > > > when it's vectorized the value at 0[lane 0] is already correct.
> > > > > This is why for any PHI inside the early exits it uses the value
> > > > > 0[0] instead of
> > > > N[lane_max].
> > > > >
> > > > > Perhaps I'm missing something here?
> > > >
> > > > OK, so I refreshed my mind of what vect_update_ivs_after_vectorizer
> > does.
> > > >
> > > > I still do not understand the (complexity of the) patch.  Basically
> > > > the function computes the new value of the IV "from scratch" based
> > > > on the number of scalar iterations of the vector loop, the 'niter'
> > > > argument.  I would have expected that for the early exits we either
> > > > pass in a different 'niter' or alternatively a 'niter_adjustment'.
> > >
> > > But for an early exit there's no static value for adjusted niter,
> > > since you don't know which iteration you exited from. Unlike the
> > > normal exit when you know if you get there you've done all possible
> > iterations.
> > >
> > > So you must compute the scalar iteration count on the exit itself.
> > 
> > ?  You do not need the actual scalar iteration you exited (you don't compute
> > that either), you need the scalar iteration the vector iteration started with
> > when it exited prematurely and that's readily available?
> 
> For a normal exit yes, not for an early exit no? niters_vector_mult_vf is only
> valid for the main exit.
> 
> There's the unadjusted scalar count, which is what it's using to adjust it to
> the final count.  Unless I'm missing something?

Ah, of course - niters_vector_mult_vf is for the countable exit.  For
the early exits we can't precompute the scalar iteration value.  But that
then means we should compute the appropriate "continuation" as live
value of the vectorized IVs even when they were not originally used
outside of the loop.  I don't see how we can express this in terms
of the scalar IVs in the (not yet) vectorized loop - similar to the
reduction case you are going to end up with the wrong values here.

That said, I've for a long time wanted to preserve the original
control IV also for the vector code (leaving any "optimization"
to IVOPTs there), that would enable us to compute the correct
"niters_vector_mult_vf" based on that IV.

So given we cannot use the scalar IVs you have to handle all
inductions (besides the main exit control IV) in
vectorizable_live_operation I think.

Or for now disable early-break for inductions that are not
the main exit control IV (in vect_can_advance_ivs_p)?

> > > >
> > > > It seems your change handles different kinds of inductions differently.
> > > > Specifically
> > > >
> > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > >       if (restart_loop && ivtemp)
> > > >         {
> > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > >           ni = build_int_cst (type, vf);
> > > >           if (inversed_iv)
> > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > >                               fold_convert (type, step_expr));
> > > >         }
> > > >
> > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > as the new value.  That seems to be very odd special casing for
> > > > unknown reasons.  And while you adjust vec_step_op_add, you don't
> > > > adjust vect_peel_nonlinear_iv_init (maybe not supported - better assert
> > here).
> > >
> > > The VF case is for a normal "non-inverted" loop, where if you take an
> > > early exit you know that you have to do at most VF iterations.  The VF
> > > - step is to account for the inverted loop control flow where you exit
> > > after adjusting the IV already by + step.
> > 
> > But doesn't that assume the IV counts from niter to zero?  I don't see this
> > special case is actually necessary, no?
> > 
> 
> I needed it because otherwise the scalar loop iterates one iteration too little
> So I got a miscompile with the inverter loop stuff.  I'll look at it again perhaps
> It can be solved differently.
> 
> > >
> > > Peeling doesn't matter here, since you know you were able to do a
> > > vector iteration so it's safe to do VF iterations.  So having peeled
> > > doesn't affect the remaining iters count.
> > >
> > > >
> > > > Also the vec_step_op_add case will keep the original scalar IV live
> > > > even when it is a vectorized induction.  The code recomputing the
> > > > value from scratch avoids this.
> > > >
> > > >       /* For non-main exit create an intermediat edge to get any updated iv
> > > >          calculations.  */
> > > >       if (needs_interm_block
> > > >           && !iv_block
> > > >           && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > > (new_stmts)))
> > > >         {
> > > >           iv_block = split_edge (update_e);
> > > >           update_e = single_succ_edge (update_e->dest);
> > > >           last_gsi = gsi_last_bb (iv_block);
> > > >         }
> > > >
> > > > this is also odd, can we adjust the API instead?  I suppose this is
> > > > because your computation uses the original loop IV, if you based the
> > > > computation off the initial value only this might not be necessary?
> > >
> > > No, on the main exit the code updates the value in the loop header and
> > > puts the Calculation in the merge block.  This works because it only
> > > needs to consume PHI nodes in the merge block and things like niters are
> > adjusted in the guard block.
> > >
> > > For an early exit, we don't have a guard block, only the merge block.
> > > We have to update the PHI nodes in that block,  but can't do so since
> > > you can't produce a value and consume it in a PHI node in the same BB.
> > > So we need to create the block to put the values in for use in the
> > > merge block.  Because there's no "guard" block for early exits.
> > 
> > ?  then compute niters in that block as well.
> 
> We can't since it'll not be reachable through the right edge.  What we can
> do if you want is slightly change peeling, we currently peel as:
> 
>   \        \             /
>   E1     E2        Normal exit
>     \       |          |
>        \    |          Guard
>           \ |          |
>          Merge block
>                   |
>              Pre Header
> 
> If we instead peel as:
> 
> 
>   \        \             /
>   E1     E2        Normal exit
>     \       |          |
>        Exit join   Guard
>           \ |          |
>          Merge block
>                   |
>              Pre Header
> 
> We can use the exit join block.  This would also mean vect_update_ivs_after_vectorizer
> Doesn't need to iterate over all exits and only really needs to adjust the phi nodes
> Coming out of the exit join and guard block.
> 
> Does this work for you?
> 
> Thanks,
> Tamar
> > 
> > > The API can be adjusted by always creating the empty block either during
> > peeling.
> > > That would prevent us from having to do anything special here.  Would
> > > that work better?  Or I can do it in the loop that iterates over the
> > > exits to before the call to vect_update_ivs_after_vectorizer, which I think
> > might be more consistent.
> > >
> > > >
> > > > That said, I wonder why we cannot simply pass in an adjusted niter
> > > > which would be niters_vector_mult_vf - vf and be done with that?
> > > >
> > >
> > > We can ofcourse not have this and recompute it from niters itself,
> > > however this does affect the epilog code layout. Particularly knowing
> > > the static number if iterations left causes it to usually unroll the
> > > loop and share some of the computations.  i.e. the scalar code is often more
> > efficient.
> > >
> > > The computation would be niters_vector_mult_vf - iters_done * vf,
> > > since the value put Here is the remaining iteration count.  It's static for early
> > exits.
> > 
> > Well, it might be "static" in that it doesn't really matter what you use for the
> > epilog main IV initial value as long as you are sure you're not going to take that
> > exit as you are sure we're going to take one of the early exits.  So yeah, the
> > special code is probably OK, but it needs a better comment and as said the
> > structure of vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > 
> > As said an important part for optimization is to not keep the scalar IVs live in
> > the vector loop.
> > 
> > > But can do whatever you prefer here.  Let me know what you prefer for the
> > above.
> > >
> > > Thanks,
> > > Tamar
> > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > >
> > > > > Regards,
> > > > > Tamar
> > > > > >
> > > > > > > It has to do this since you have to perform the side effects
> > > > > > > for the non-matching elements still.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Tamar
> > > > > > >
> > > > > > > >
> > > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > +		continue;
> > > > > > > > > +
> > > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > > +		 init + (final - init) * vf which takes into account peeling
> > > > > > > > > +		 values and non-single steps.  The main exit can use
> > > > niters
> > > > > > > > > +		 since if you exit from the main exit you've done all
> > > > vector
> > > > > > > > > +		 iterations.  For an early exit we don't know when we
> > > > exit
> > > > > > > > > +so
> > > > > > > > we
> > > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > +				 fold_convert (stype, start_expr),
> > > > > > > > > +				 fold_convert (stype, init_expr));
> > > > > > > > > +	      /* Now adjust for VF to get the final iteration value.  */
> > > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > +				 build_int_cst (stype, vf));
> > > > > > > > > +	    }
> > > > > > > > > +	  else
> > > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > +			       fold_convert (stype, niters), step_expr);
> > > > > > > > > +
> > > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > >  	  else
> > > > > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer
> > > > > > > > > (loop_vec_info
> > > > > > > > loop_vinfo,
> > > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > >  	ni = init_expr;
> > > > > > > > > +      else if (restart_loop)
> > > > > > > > > +	continue;
> > > > > > > >
> > > > > > > > This looks all a bit complicated - why wouldn't we simply
> > > > > > > > always use the PHI result when 'restart_loop'?  Isn't that
> > > > > > > > the correct old start value in
> > > > > > all cases?
> > > > > > > >
> > > > > > > > >        else
> > > > > > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > > > > >  					  niters, step_expr, @@ -
> > 2245,9 +2295,20 @@
> > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > (loop_vec_info
> > > > > > > > > loop_vinfo,
> > > > > > > > >
> > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > >
> > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > >        ni_name = force_gimple_operand (ni, &new_stmts,
> > > > > > > > > false, var);
> > > > > > > > > +
> > > > > > > > > +      /* For non-main exit create an intermediat edge to
> > > > > > > > > + get any
> > > > updated iv
> > > > > > > > > +	 calculations.  */
> > > > > > > > > +      if (needs_interm_block
> > > > > > > > > +	  && !iv_block
> > > > > > > > > +	  && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > > > > > > (new_stmts)))
> > > > > > > > > +	{
> > > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > >  	{
> > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info
> > > > > > > > > loop_vinfo, tree
> > > > > > > > niters, tree nitersm1,
> > > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > > >        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
> > > > > > > > >        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > niters_vector_mult_vf,
> > > > > > > > > -					update_e);
> > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > > +      bool inversed_iv
> > > > > > > > > +	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT
> > > > (loop_vinfo),
> > > > > > > > > +					 LOOP_VINFO_LOOP
> > > > (loop_vinfo));
> > > > > > > >
> > > > > > > > You are computing this here and in
> > vect_update_ivs_after_vectorizer?
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > +      vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > > niters_vector_mult_vf,
> > > > > > > > > +					update_e, inversed_iv);
> > > > > > > > > +
> > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > +	{
> > > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > > > +	    continue;
> > > > > > > > > +
> > > > > > > > > +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > > > > > +					    niters_vector_mult_vf,
> > > > > > > > > +					    exit, true);
> > > > > > > >
> > > > > > > > ... why does the same not work here?  Wouldn't the proper
> > > > > > > > condition be !dominated_by_p (CDI_DOMINATORS, exit->src,
> > > > > > > > LOOP_VINFO_IV_EXIT
> > > > > > > > (loop_vinfo)->src) or similar?  That is, whether the exit is
> > > > > > > > at or after the main IV exit?  (consider having two)
> > > > > > > >
> > > > > > > > > +	}
> > > > > > > > >
> > > > > > > > >        if (skip_epilog)
> > > > > > > > >  	{
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-16 12:30                     ` Richard Biener
@ 2023-11-16 13:22                       ` Tamar Christina
  2023-11-16 13:35                         ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-16 13:22 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> > > > > >
> > > > > > Perhaps I'm missing something here?
> > > > >
> > > > > OK, so I refreshed my mind of what
> > > > > vect_update_ivs_after_vectorizer
> > > does.
> > > > >
> > > > > I still do not understand the (complexity of the) patch.
> > > > > Basically the function computes the new value of the IV "from
> > > > > scratch" based on the number of scalar iterations of the vector loop,
> the 'niter'
> > > > > argument.  I would have expected that for the early exits we
> > > > > either pass in a different 'niter' or alternatively a 'niter_adjustment'.
> > > >
> > > > But for an early exit there's no static value for adjusted niter,
> > > > since you don't know which iteration you exited from. Unlike the
> > > > normal exit when you know if you get there you've done all
> > > > possible
> > > iterations.
> > > >
> > > > So you must compute the scalar iteration count on the exit itself.
> > >
> > > ?  You do not need the actual scalar iteration you exited (you don't
> > > compute that either), you need the scalar iteration the vector
> > > iteration started with when it exited prematurely and that's readily
> available?
> >
> > For a normal exit yes, not for an early exit no? niters_vector_mult_vf
> > is only valid for the main exit.
> >
> > There's the unadjusted scalar count, which is what it's using to
> > adjust it to the final count.  Unless I'm missing something?
> 
> Ah, of course - niters_vector_mult_vf is for the countable exit.  For the early
> exits we can't precompute the scalar iteration value.  But that then means we
> should compute the appropriate "continuation" as live value of the vectorized
> IVs even when they were not originally used outside of the loop.  I don't see
> how we can express this in terms of the scalar IVs in the (not yet) vectorized
> loop - similar to the reduction case you are going to end up with the wrong
> values here.
> 
> That said, I've for a long time wanted to preserve the original control IV also for
> the vector code (leaving any "optimization"
> to IVOPTs there), that would enable us to compute the correct
> "niters_vector_mult_vf" based on that IV.
> 
> So given we cannot use the scalar IVs you have to handle all inductions
> (besides the main exit control IV) in vectorizable_live_operation I think.
> 

That's what I currently do, that's why there was the
	      if (STMT_VINFO_LIVE_P (phi_info))
		continue;

although I don't understand why we use the scalar count,  I suppose the reasoning
is that we don't really want to keep it around, and referencing it forces it to be kept?

At the moment it just does `init + (final - init) * vf` which is correct no?

Also you missed the question below about how to avoid the creation of the block,
You ok with changing that?

Thanks,
Tamar

> Or for now disable early-break for inductions that are not the main exit control
> IV (in vect_can_advance_ivs_p)?
> 
> > > > >
> > > > > It seems your change handles different kinds of inductions differently.
> > > > > Specifically
> > > > >
> > > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > >       if (restart_loop && ivtemp)
> > > > >         {
> > > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > > >           ni = build_int_cst (type, vf);
> > > > >           if (inversed_iv)
> > > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > >                               fold_convert (type, step_expr));
> > > > >         }
> > > > >
> > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > > as the new value.  That seems to be very odd special casing for
> > > > > unknown reasons.  And while you adjust vec_step_op_add, you
> > > > > don't adjust vect_peel_nonlinear_iv_init (maybe not supported -
> > > > > better assert
> > > here).
> > > >
> > > > The VF case is for a normal "non-inverted" loop, where if you take
> > > > an early exit you know that you have to do at most VF iterations.
> > > > The VF
> > > > - step is to account for the inverted loop control flow where you
> > > > exit after adjusting the IV already by + step.
> > >
> > > But doesn't that assume the IV counts from niter to zero?  I don't
> > > see this special case is actually necessary, no?
> > >
> >
> > I needed it because otherwise the scalar loop iterates one iteration
> > too little So I got a miscompile with the inverter loop stuff.  I'll
> > look at it again perhaps It can be solved differently.
> >
> > > >
> > > > Peeling doesn't matter here, since you know you were able to do a
> > > > vector iteration so it's safe to do VF iterations.  So having
> > > > peeled doesn't affect the remaining iters count.
> > > >
> > > > >
> > > > > Also the vec_step_op_add case will keep the original scalar IV
> > > > > live even when it is a vectorized induction.  The code
> > > > > recomputing the value from scratch avoids this.
> > > > >
> > > > >       /* For non-main exit create an intermediat edge to get any updated
> iv
> > > > >          calculations.  */
> > > > >       if (needs_interm_block
> > > > >           && !iv_block
> > > > >           && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > > > (new_stmts)))
> > > > >         {
> > > > >           iv_block = split_edge (update_e);
> > > > >           update_e = single_succ_edge (update_e->dest);
> > > > >           last_gsi = gsi_last_bb (iv_block);
> > > > >         }
> > > > >
> > > > > this is also odd, can we adjust the API instead?  I suppose this
> > > > > is because your computation uses the original loop IV, if you
> > > > > based the computation off the initial value only this might not be
> necessary?
> > > >
> > > > No, on the main exit the code updates the value in the loop header
> > > > and puts the Calculation in the merge block.  This works because
> > > > it only needs to consume PHI nodes in the merge block and things
> > > > like niters are
> > > adjusted in the guard block.
> > > >
> > > > For an early exit, we don't have a guard block, only the merge block.
> > > > We have to update the PHI nodes in that block,  but can't do so
> > > > since you can't produce a value and consume it in a PHI node in the same
> BB.
> > > > So we need to create the block to put the values in for use in the
> > > > merge block.  Because there's no "guard" block for early exits.
> > >
> > > ?  then compute niters in that block as well.
> >
> > We can't since it'll not be reachable through the right edge.  What we
> > can do if you want is slightly change peeling, we currently peel as:
> >
> >   \        \             /
> >   E1     E2        Normal exit
> >     \       |          |
> >        \    |          Guard
> >           \ |          |
> >          Merge block
> >                   |
> >              Pre Header
> >
> > If we instead peel as:
> >
> >
> >   \        \             /
> >   E1     E2        Normal exit
> >     \       |          |
> >        Exit join   Guard
> >           \ |          |
> >          Merge block
> >                   |
> >              Pre Header
> >
> > We can use the exit join block.  This would also mean
> > vect_update_ivs_after_vectorizer Doesn't need to iterate over all
> > exits and only really needs to adjust the phi nodes Coming out of the exit join
> and guard block.
> >
> > Does this work for you?
> >
> > Thanks,
> > Tamar
> > >
> > > > The API can be adjusted by always creating the empty block either
> > > > during
> > > peeling.
> > > > That would prevent us from having to do anything special here.
> > > > Would that work better?  Or I can do it in the loop that iterates
> > > > over the exits to before the call to
> > > > vect_update_ivs_after_vectorizer, which I think
> > > might be more consistent.
> > > >
> > > > >
> > > > > That said, I wonder why we cannot simply pass in an adjusted
> > > > > niter which would be niters_vector_mult_vf - vf and be done with that?
> > > > >
> > > >
> > > > We can ofcourse not have this and recompute it from niters itself,
> > > > however this does affect the epilog code layout. Particularly
> > > > knowing the static number if iterations left causes it to usually
> > > > unroll the loop and share some of the computations.  i.e. the
> > > > scalar code is often more
> > > efficient.
> > > >
> > > > The computation would be niters_vector_mult_vf - iters_done * vf,
> > > > since the value put Here is the remaining iteration count.  It's
> > > > static for early
> > > exits.
> > >
> > > Well, it might be "static" in that it doesn't really matter what you
> > > use for the epilog main IV initial value as long as you are sure
> > > you're not going to take that exit as you are sure we're going to
> > > take one of the early exits.  So yeah, the special code is probably
> > > OK, but it needs a better comment and as said the structure of
> vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > >
> > > As said an important part for optimization is to not keep the scalar
> > > IVs live in the vector loop.
> > >
> > > > But can do whatever you prefer here.  Let me know what you prefer
> > > > for the
> > > above.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > >
> > > > > > Regards,
> > > > > > Tamar
> > > > > > >
> > > > > > > > It has to do this since you have to perform the side
> > > > > > > > effects for the non-matching elements still.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Tamar
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > +		continue;
> > > > > > > > > > +
> > > > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > > > +		 init + (final - init) * vf which takes into account
> peeling
> > > > > > > > > > +		 values and non-single steps.  The main exit
> can
> > > > > > > > > > +use
> > > > > niters
> > > > > > > > > > +		 since if you exit from the main exit you've
> done
> > > > > > > > > > +all
> > > > > vector
> > > > > > > > > > +		 iterations.  For an early exit we don't know
> when
> > > > > > > > > > +we
> > > > > exit
> > > > > > > > > > +so
> > > > > > > > > we
> > > > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > +				 fold_convert (stype,
> start_expr),
> > > > > > > > > > +				 fold_convert (stype,
> init_expr));
> > > > > > > > > > +	      /* Now adjust for VF to get the final iteration value.
> */
> > > > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > +				 build_int_cst (stype, vf));
> > > > > > > > > > +	    }
> > > > > > > > > > +	  else
> > > > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > +			       fold_convert (stype, niters),
> step_expr);
> > > > > > > > > > +
> > > > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > > >  	  else
> > > > > > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer
> > > > > > > > > > (loop_vec_info
> > > > > > > > > loop_vinfo,
> > > > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > > >  	ni = init_expr;
> > > > > > > > > > +      else if (restart_loop)
> > > > > > > > > > +	continue;
> > > > > > > > >
> > > > > > > > > This looks all a bit complicated - why wouldn't we
> > > > > > > > > simply always use the PHI result when 'restart_loop'?
> > > > > > > > > Isn't that the correct old start value in
> > > > > > > all cases?
> > > > > > > > >
> > > > > > > > > >        else
> > > > > > > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > > > > > >  					  niters, step_expr, @@ -
> > > 2245,9 +2295,20 @@
> > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > (loop_vec_info
> > > > > > > > > > loop_vinfo,
> > > > > > > > > >
> > > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > > >
> > > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > > >        ni_name = force_gimple_operand (ni, &new_stmts,
> > > > > > > > > > false, var);
> > > > > > > > > > +
> > > > > > > > > > +      /* For non-main exit create an intermediat edge
> > > > > > > > > > + to get any
> > > > > updated iv
> > > > > > > > > > +	 calculations.  */
> > > > > > > > > > +      if (needs_interm_block
> > > > > > > > > > +	  && !iv_block
> > > > > > > > > > +	  && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > (new_stmts)))
> > > > > > > > > > +	{
> > > > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > +	}
> > > > > > > > > > +
> > > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > > >  	{
> > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info
> > > > > > > > > > loop_vinfo, tree
> > > > > > > > > niters, tree nitersm1,
> > > > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > > > >        gcc_checking_assert (vect_can_advance_ivs_p
> (loop_vinfo));
> > > > > > > > > >        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> > > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > niters_vector_mult_vf,
> > > > > > > > > > -					update_e);
> > > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > > > +      bool inversed_iv
> > > > > > > > > > +	= !vect_is_loop_exit_latch_pred
> (LOOP_VINFO_IV_EXIT
> > > > > (loop_vinfo),
> > > > > > > > > > +					 LOOP_VINFO_LOOP
> > > > > (loop_vinfo));
> > > > > > > > >
> > > > > > > > > You are computing this here and in
> > > vect_update_ivs_after_vectorizer?
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > > +      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > > + vf,
> > > > > > > niters_vector_mult_vf,
> > > > > > > > > > +					update_e,
> inversed_iv);
> > > > > > > > > > +
> > > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > +	{
> > > > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > > > > +	    continue;
> > > > > > > > > > +
> > > > > > > > > > +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > > > > > > +
> niters_vector_mult_vf,
> > > > > > > > > > +					    exit, true);
> > > > > > > > >
> > > > > > > > > ... why does the same not work here?  Wouldn't the
> > > > > > > > > proper condition be !dominated_by_p (CDI_DOMINATORS,
> > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > (loop_vinfo)->src) or similar?  That is, whether the
> > > > > > > > > exit is at or after the main IV exit?  (consider having
> > > > > > > > > two)
> > > > > > > > >
> > > > > > > > > > +	}
> > > > > > > > > >
> > > > > > > > > >        if (skip_epilog)
> > > > > > > > > >  	{
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > > AG
> > > > > > > Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-16 13:22                       ` Tamar Christina
@ 2023-11-16 13:35                         ` Richard Biener
  2023-11-16 14:14                           ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-16 13:35 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Thu, 16 Nov 2023, Tamar Christina wrote:

> > > > > > >
> > > > > > > Perhaps I'm missing something here?
> > > > > >
> > > > > > OK, so I refreshed my mind of what
> > > > > > vect_update_ivs_after_vectorizer
> > > > does.
> > > > > >
> > > > > > I still do not understand the (complexity of the) patch.
> > > > > > Basically the function computes the new value of the IV "from
> > > > > > scratch" based on the number of scalar iterations of the vector loop,
> > the 'niter'
> > > > > > argument.  I would have expected that for the early exits we
> > > > > > either pass in a different 'niter' or alternatively a 'niter_adjustment'.
> > > > >
> > > > > But for an early exit there's no static value for adjusted niter,
> > > > > since you don't know which iteration you exited from. Unlike the
> > > > > normal exit when you know if you get there you've done all
> > > > > possible
> > > > iterations.
> > > > >
> > > > > So you must compute the scalar iteration count on the exit itself.
> > > >
> > > > ?  You do not need the actual scalar iteration you exited (you don't
> > > > compute that either), you need the scalar iteration the vector
> > > > iteration started with when it exited prematurely and that's readily
> > available?
> > >
> > > For a normal exit yes, not for an early exit no? niters_vector_mult_vf
> > > is only valid for the main exit.
> > >
> > > There's the unadjusted scalar count, which is what it's using to
> > > adjust it to the final count.  Unless I'm missing something?
> > 
> > Ah, of course - niters_vector_mult_vf is for the countable exit.  For the early
> > exits we can't precompute the scalar iteration value.  But that then means we
> > should compute the appropriate "continuation" as live value of the vectorized
> > IVs even when they were not originally used outside of the loop.  I don't see
> > how we can express this in terms of the scalar IVs in the (not yet) vectorized
> > loop - similar to the reduction case you are going to end up with the wrong
> > values here.
> > 
> > That said, I've for a long time wanted to preserve the original control IV also for
> > the vector code (leaving any "optimization"
> > to IVOPTs there), that would enable us to compute the correct
> > "niters_vector_mult_vf" based on that IV.
> > 
> > So given we cannot use the scalar IVs you have to handle all inductions
> > (besides the main exit control IV) in vectorizable_live_operation I think.
> > 
> 
> That's what I currently do, that's why there was the
> 	      if (STMT_VINFO_LIVE_P (phi_info))
> 		continue;

Yes, but that only works for the inductions marked so.  We'd need to
mark the others as well, but only for the early exits.

> although I don't understand why we use the scalar count,  I suppose the reasoning
> is that we don't really want to keep it around, and referencing it forces it to be kept?

Referencing it will cause the scalar compute to be retained, but since
we do not adjust the scalar compute during vectorization (but expect
it to be dead) the scalar compute will compute the wrong thing (as
shown by the reduction example - I suspect inductions will suffer from
the same problem).

> At the moment it just does `init + (final - init) * vf` which is correct no?

The issue is that 'final' is not computed correctly in the vectorized
loop.  This formula might work for affine evolutions of course.

Extracting the correct value from the vectorized induction would be
the preferred solution.

> Also you missed the question below about how to avoid the creation of the block,
> You ok with changing that?
> 
> Thanks,
> Tamar
> 
> > Or for now disable early-break for inductions that are not the main exit control
> > IV (in vect_can_advance_ivs_p)?
> > 
> > > > > >
> > > > > > It seems your change handles different kinds of inductions differently.
> > > > > > Specifically
> > > > > >
> > > > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > >       if (restart_loop && ivtemp)
> > > > > >         {
> > > > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > > > >           ni = build_int_cst (type, vf);
> > > > > >           if (inversed_iv)
> > > > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > >                               fold_convert (type, step_expr));
> > > > > >         }
> > > > > >
> > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > > > as the new value.  That seems to be very odd special casing for
> > > > > > unknown reasons.  And while you adjust vec_step_op_add, you
> > > > > > don't adjust vect_peel_nonlinear_iv_init (maybe not supported -
> > > > > > better assert
> > > > here).
> > > > >
> > > > > The VF case is for a normal "non-inverted" loop, where if you take
> > > > > an early exit you know that you have to do at most VF iterations.
> > > > > The VF
> > > > > - step is to account for the inverted loop control flow where you
> > > > > exit after adjusting the IV already by + step.
> > > >
> > > > But doesn't that assume the IV counts from niter to zero?  I don't
> > > > see this special case is actually necessary, no?
> > > >
> > >
> > > I needed it because otherwise the scalar loop iterates one iteration
> > > too little So I got a miscompile with the inverter loop stuff.  I'll
> > > look at it again perhaps It can be solved differently.
> > >
> > > > >
> > > > > Peeling doesn't matter here, since you know you were able to do a
> > > > > vector iteration so it's safe to do VF iterations.  So having
> > > > > peeled doesn't affect the remaining iters count.
> > > > >
> > > > > >
> > > > > > Also the vec_step_op_add case will keep the original scalar IV
> > > > > > live even when it is a vectorized induction.  The code
> > > > > > recomputing the value from scratch avoids this.
> > > > > >
> > > > > >       /* For non-main exit create an intermediat edge to get any updated
> > iv
> > > > > >          calculations.  */
> > > > > >       if (needs_interm_block
> > > > > >           && !iv_block
> > > > > >           && (!gimple_seq_empty_p (stmts) || !gimple_seq_empty_p
> > > > > > (new_stmts)))
> > > > > >         {
> > > > > >           iv_block = split_edge (update_e);
> > > > > >           update_e = single_succ_edge (update_e->dest);
> > > > > >           last_gsi = gsi_last_bb (iv_block);
> > > > > >         }
> > > > > >
> > > > > > this is also odd, can we adjust the API instead?  I suppose this
> > > > > > is because your computation uses the original loop IV, if you
> > > > > > based the computation off the initial value only this might not be
> > necessary?
> > > > >
> > > > > No, on the main exit the code updates the value in the loop header
> > > > > and puts the Calculation in the merge block.  This works because
> > > > > it only needs to consume PHI nodes in the merge block and things
> > > > > like niters are
> > > > adjusted in the guard block.
> > > > >
> > > > > For an early exit, we don't have a guard block, only the merge block.
> > > > > We have to update the PHI nodes in that block,  but can't do so
> > > > > since you can't produce a value and consume it in a PHI node in the same
> > BB.
> > > > > So we need to create the block to put the values in for use in the
> > > > > merge block.  Because there's no "guard" block for early exits.
> > > >
> > > > ?  then compute niters in that block as well.
> > >
> > > We can't since it'll not be reachable through the right edge.  What we
> > > can do if you want is slightly change peeling, we currently peel as:
> > >
> > >   \        \             /
> > >   E1     E2        Normal exit
> > >     \       |          |
> > >        \    |          Guard
> > >           \ |          |
> > >          Merge block
> > >                   |
> > >              Pre Header
> > >
> > > If we instead peel as:
> > >
> > >
> > >   \        \             /
> > >   E1     E2        Normal exit
> > >     \       |          |
> > >        Exit join   Guard
> > >           \ |          |
> > >          Merge block
> > >                   |
> > >              Pre Header
> > >
> > > We can use the exit join block.  This would also mean
> > > vect_update_ivs_after_vectorizer Doesn't need to iterate over all
> > > exits and only really needs to adjust the phi nodes Coming out of the exit join
> > and guard block.
> > >
> > > Does this work for you?

Yeah, I think that would work.  But I'd like to sort out the
correctness details of the IV update itself before sorting out
this code placement detail.

Richard.

> > > Thanks,
> > > Tamar
> > > >
> > > > > The API can be adjusted by always creating the empty block either
> > > > > during
> > > > peeling.
> > > > > That would prevent us from having to do anything special here.
> > > > > Would that work better?  Or I can do it in the loop that iterates
> > > > > over the exits to before the call to
> > > > > vect_update_ivs_after_vectorizer, which I think
> > > > might be more consistent.
> > > > >
> > > > > >
> > > > > > That said, I wonder why we cannot simply pass in an adjusted
> > > > > > niter which would be niters_vector_mult_vf - vf and be done with that?
> > > > > >
> > > > >
> > > > > We can ofcourse not have this and recompute it from niters itself,
> > > > > however this does affect the epilog code layout. Particularly
> > > > > knowing the static number if iterations left causes it to usually
> > > > > unroll the loop and share some of the computations.  i.e. the
> > > > > scalar code is often more
> > > > efficient.
> > > > >
> > > > > The computation would be niters_vector_mult_vf - iters_done * vf,
> > > > > since the value put Here is the remaining iteration count.  It's
> > > > > static for early
> > > > exits.
> > > >
> > > > Well, it might be "static" in that it doesn't really matter what you
> > > > use for the epilog main IV initial value as long as you are sure
> > > > you're not going to take that exit as you are sure we're going to
> > > > take one of the early exits.  So yeah, the special code is probably
> > > > OK, but it needs a better comment and as said the structure of
> > vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > > >
> > > > As said an important part for optimization is to not keep the scalar
> > > > IVs live in the vector loop.
> > > >
> > > > > But can do whatever you prefer here.  Let me know what you prefer
> > > > > for the
> > > > above.
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > > Thanks,
> > > > > > Richard.
> > > > > >
> > > > > >
> > > > > > > Regards,
> > > > > > > Tamar
> > > > > > > >
> > > > > > > > > It has to do this since you have to perform the side
> > > > > > > > > effects for the non-matching elements still.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Tamar
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > > +		continue;
> > > > > > > > > > > +
> > > > > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > > > > +		 init + (final - init) * vf which takes into account
> > peeling
> > > > > > > > > > > +		 values and non-single steps.  The main exit
> > can
> > > > > > > > > > > +use
> > > > > > niters
> > > > > > > > > > > +		 since if you exit from the main exit you've
> > done
> > > > > > > > > > > +all
> > > > > > vector
> > > > > > > > > > > +		 iterations.  For an early exit we don't know
> > when
> > > > > > > > > > > +we
> > > > > > exit
> > > > > > > > > > > +so
> > > > > > > > > > we
> > > > > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > > +				 fold_convert (stype,
> > start_expr),
> > > > > > > > > > > +				 fold_convert (stype,
> > init_expr));
> > > > > > > > > > > +	      /* Now adjust for VF to get the final iteration value.
> > */
> > > > > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > > +				 build_int_cst (stype, vf));
> > > > > > > > > > > +	    }
> > > > > > > > > > > +	  else
> > > > > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > > +			       fold_convert (stype, niters),
> > step_expr);
> > > > > > > > > > > +
> > > > > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > > > >  	  else
> > > > > > > > > > > @@ -2238,6 +2286,8 @@ vect_update_ivs_after_vectorizer
> > > > > > > > > > > (loop_vec_info
> > > > > > > > > > loop_vinfo,
> > > > > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > > > >  	ni = init_expr;
> > > > > > > > > > > +      else if (restart_loop)
> > > > > > > > > > > +	continue;
> > > > > > > > > >
> > > > > > > > > > This looks all a bit complicated - why wouldn't we
> > > > > > > > > > simply always use the PHI result when 'restart_loop'?
> > > > > > > > > > Isn't that the correct old start value in
> > > > > > > > all cases?
> > > > > > > > > >
> > > > > > > > > > >        else
> > > > > > > > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > > > > > > >  					  niters, step_expr, @@ -
> > > > 2245,9 +2295,20 @@
> > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > (loop_vec_info
> > > > > > > > > > > loop_vinfo,
> > > > > > > > > > >
> > > > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > > > >
> > > > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > > > >        ni_name = force_gimple_operand (ni, &new_stmts,
> > > > > > > > > > > false, var);
> > > > > > > > > > > +
> > > > > > > > > > > +      /* For non-main exit create an intermediat edge
> > > > > > > > > > > + to get any
> > > > > > updated iv
> > > > > > > > > > > +	 calculations.  */
> > > > > > > > > > > +      if (needs_interm_block
> > > > > > > > > > > +	  && !iv_block
> > > > > > > > > > > +	  && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > > (new_stmts)))
> > > > > > > > > > > +	{
> > > > > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > > +	}
> > > > > > > > > > > +
> > > > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > > > >  	{
> > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling (loop_vec_info
> > > > > > > > > > > loop_vinfo, tree
> > > > > > > > > > niters, tree nitersm1,
> > > > > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > > > > >        gcc_checking_assert (vect_can_advance_ivs_p
> > (loop_vinfo));
> > > > > > > > > > >        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> > > > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > niters_vector_mult_vf,
> > > > > > > > > > > -					update_e);
> > > > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > > > > +      bool inversed_iv
> > > > > > > > > > > +	= !vect_is_loop_exit_latch_pred
> > (LOOP_VINFO_IV_EXIT
> > > > > > (loop_vinfo),
> > > > > > > > > > > +					 LOOP_VINFO_LOOP
> > > > > > (loop_vinfo));
> > > > > > > > > >
> > > > > > > > > > You are computing this here and in
> > > > vect_update_ivs_after_vectorizer?
> > > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > > > +      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > > > + vf,
> > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > +					update_e,
> > inversed_iv);
> > > > > > > > > > > +
> > > > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > > +	{
> > > > > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > > > > > +	    continue;
> > > > > > > > > > > +
> > > > > > > > > > > +	  vect_update_ivs_after_vectorizer (loop_vinfo, vf,
> > > > > > > > > > > +
> > niters_vector_mult_vf,
> > > > > > > > > > > +					    exit, true);
> > > > > > > > > >
> > > > > > > > > > ... why does the same not work here?  Wouldn't the
> > > > > > > > > > proper condition be !dominated_by_p (CDI_DOMINATORS,
> > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > > (loop_vinfo)->src) or similar?  That is, whether the
> > > > > > > > > > exit is at or after the main IV exit?  (consider having
> > > > > > > > > > two)
> > > > > > > > > >
> > > > > > > > > > > +	}
> > > > > > > > > > >
> > > > > > > > > > >        if (skip_epilog)
> > > > > > > > > > >  	{
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > > > AG
> > > > > > > > Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-16 13:35                         ` Richard Biener
@ 2023-11-16 14:14                           ` Tamar Christina
  2023-11-16 14:17                             ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-16 14:14 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, November 16, 2023 1:36 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> On Thu, 16 Nov 2023, Tamar Christina wrote:
> 
> > > > > > > >
> > > > > > > > Perhaps I'm missing something here?
> > > > > > >
> > > > > > > OK, so I refreshed my mind of what
> > > > > > > vect_update_ivs_after_vectorizer
> > > > > does.
> > > > > > >
> > > > > > > I still do not understand the (complexity of the) patch.
> > > > > > > Basically the function computes the new value of the IV
> > > > > > > "from scratch" based on the number of scalar iterations of
> > > > > > > the vector loop,
> > > the 'niter'
> > > > > > > argument.  I would have expected that for the early exits we
> > > > > > > either pass in a different 'niter' or alternatively a 'niter_adjustment'.
> > > > > >
> > > > > > But for an early exit there's no static value for adjusted
> > > > > > niter, since you don't know which iteration you exited from.
> > > > > > Unlike the normal exit when you know if you get there you've
> > > > > > done all possible
> > > > > iterations.
> > > > > >
> > > > > > So you must compute the scalar iteration count on the exit itself.
> > > > >
> > > > > ?  You do not need the actual scalar iteration you exited (you
> > > > > don't compute that either), you need the scalar iteration the
> > > > > vector iteration started with when it exited prematurely and
> > > > > that's readily
> > > available?
> > > >
> > > > For a normal exit yes, not for an early exit no?
> > > > niters_vector_mult_vf is only valid for the main exit.
> > > >
> > > > There's the unadjusted scalar count, which is what it's using to
> > > > adjust it to the final count.  Unless I'm missing something?
> > >
> > > Ah, of course - niters_vector_mult_vf is for the countable exit.
> > > For the early exits we can't precompute the scalar iteration value.
> > > But that then means we should compute the appropriate "continuation"
> > > as live value of the vectorized IVs even when they were not
> > > originally used outside of the loop.  I don't see how we can express
> > > this in terms of the scalar IVs in the (not yet) vectorized loop -
> > > similar to the reduction case you are going to end up with the wrong values
> here.
> > >
> > > That said, I've for a long time wanted to preserve the original
> > > control IV also for the vector code (leaving any "optimization"
> > > to IVOPTs there), that would enable us to compute the correct
> > > "niters_vector_mult_vf" based on that IV.
> > >
> > > So given we cannot use the scalar IVs you have to handle all
> > > inductions (besides the main exit control IV) in vectorizable_live_operation
> I think.
> > >
> >
> > That's what I currently do, that's why there was the
> > 	      if (STMT_VINFO_LIVE_P (phi_info))
> > 		continue;
> 
> Yes, but that only works for the inductions marked so.  We'd need to mark the
> others as well, but only for the early exits.
> 
> > although I don't understand why we use the scalar count,  I suppose
> > the reasoning is that we don't really want to keep it around, and referencing
> it forces it to be kept?
> 
> Referencing it will cause the scalar compute to be retained, but since we do not
> adjust the scalar compute during vectorization (but expect it to be dead) the
> scalar compute will compute the wrong thing (as shown by the reduction
> example - I suspect inductions will suffer from the same problem).
> 
> > At the moment it just does `init + (final - init) * vf` which is correct no?
> 
> The issue is that 'final' is not computed correctly in the vectorized loop.  This
> formula might work for affine evolutions of course.
> 
> Extracting the correct value from the vectorized induction would be the
> preferred solution.

Ok, so I should be able to just mark IVs as live during process_use if there are
multiple exits right? Since it's just gonna be unused on the main exit since we
use niters?

Because since it's the PHI inside the loop that needs to be marked live I can't
just do it for a specific exits no?

If I create a copy of the PHI node during peeling for use in early exits and mark
it live it won't work no?

Tamar
> 
> > Also you missed the question below about how to avoid the creation of
> > the block, You ok with changing that?
> >
> > Thanks,
> > Tamar
> >
> > > Or for now disable early-break for inductions that are not the main
> > > exit control IV (in vect_can_advance_ivs_p)?
> > >
> > > > > > >
> > > > > > > It seems your change handles different kinds of inductions
> differently.
> > > > > > > Specifically
> > > > > > >
> > > > > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > >       if (restart_loop && ivtemp)
> > > > > > >         {
> > > > > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > >           ni = build_int_cst (type, vf);
> > > > > > >           if (inversed_iv)
> > > > > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > >                               fold_convert (type, step_expr));
> > > > > > >         }
> > > > > > >
> > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > > > > as the new value.  That seems to be very odd special casing
> > > > > > > for unknown reasons.  And while you adjust vec_step_op_add,
> > > > > > > you don't adjust vect_peel_nonlinear_iv_init (maybe not
> > > > > > > supported - better assert
> > > > > here).
> > > > > >
> > > > > > The VF case is for a normal "non-inverted" loop, where if you
> > > > > > take an early exit you know that you have to do at most VF iterations.
> > > > > > The VF
> > > > > > - step is to account for the inverted loop control flow where
> > > > > > you exit after adjusting the IV already by + step.
> > > > >
> > > > > But doesn't that assume the IV counts from niter to zero?  I
> > > > > don't see this special case is actually necessary, no?
> > > > >
> > > >
> > > > I needed it because otherwise the scalar loop iterates one
> > > > iteration too little So I got a miscompile with the inverter loop
> > > > stuff.  I'll look at it again perhaps It can be solved differently.
> > > >
> > > > > >
> > > > > > Peeling doesn't matter here, since you know you were able to
> > > > > > do a vector iteration so it's safe to do VF iterations.  So
> > > > > > having peeled doesn't affect the remaining iters count.
> > > > > >
> > > > > > >
> > > > > > > Also the vec_step_op_add case will keep the original scalar
> > > > > > > IV live even when it is a vectorized induction.  The code
> > > > > > > recomputing the value from scratch avoids this.
> > > > > > >
> > > > > > >       /* For non-main exit create an intermediat edge to get
> > > > > > > any updated
> > > iv
> > > > > > >          calculations.  */
> > > > > > >       if (needs_interm_block
> > > > > > >           && !iv_block
> > > > > > >           && (!gimple_seq_empty_p (stmts) ||
> > > > > > > !gimple_seq_empty_p
> > > > > > > (new_stmts)))
> > > > > > >         {
> > > > > > >           iv_block = split_edge (update_e);
> > > > > > >           update_e = single_succ_edge (update_e->dest);
> > > > > > >           last_gsi = gsi_last_bb (iv_block);
> > > > > > >         }
> > > > > > >
> > > > > > > this is also odd, can we adjust the API instead?  I suppose
> > > > > > > this is because your computation uses the original loop IV,
> > > > > > > if you based the computation off the initial value only this
> > > > > > > might not be
> > > necessary?
> > > > > >
> > > > > > No, on the main exit the code updates the value in the loop
> > > > > > header and puts the Calculation in the merge block.  This
> > > > > > works because it only needs to consume PHI nodes in the merge
> > > > > > block and things like niters are
> > > > > adjusted in the guard block.
> > > > > >
> > > > > > For an early exit, we don't have a guard block, only the merge block.
> > > > > > We have to update the PHI nodes in that block,  but can't do
> > > > > > so since you can't produce a value and consume it in a PHI
> > > > > > node in the same
> > > BB.
> > > > > > So we need to create the block to put the values in for use in
> > > > > > the merge block.  Because there's no "guard" block for early exits.
> > > > >
> > > > > ?  then compute niters in that block as well.
> > > >
> > > > We can't since it'll not be reachable through the right edge.
> > > > What we can do if you want is slightly change peeling, we currently peel
> as:
> > > >
> > > >   \        \             /
> > > >   E1     E2        Normal exit
> > > >     \       |          |
> > > >        \    |          Guard
> > > >           \ |          |
> > > >          Merge block
> > > >                   |
> > > >              Pre Header
> > > >
> > > > If we instead peel as:
> > > >
> > > >
> > > >   \        \             /
> > > >   E1     E2        Normal exit
> > > >     \       |          |
> > > >        Exit join   Guard
> > > >           \ |          |
> > > >          Merge block
> > > >                   |
> > > >              Pre Header
> > > >
> > > > We can use the exit join block.  This would also mean
> > > > vect_update_ivs_after_vectorizer Doesn't need to iterate over all
> > > > exits and only really needs to adjust the phi nodes Coming out of
> > > > the exit join
> > > and guard block.
> > > >
> > > > Does this work for you?
> 
> Yeah, I think that would work.  But I'd like to sort out the correctness details of
> the IV update itself before sorting out this code placement detail.
> 
> Richard.
> 
> > > > Thanks,
> > > > Tamar
> > > > >
> > > > > > The API can be adjusted by always creating the empty block
> > > > > > either during
> > > > > peeling.
> > > > > > That would prevent us from having to do anything special here.
> > > > > > Would that work better?  Or I can do it in the loop that
> > > > > > iterates over the exits to before the call to
> > > > > > vect_update_ivs_after_vectorizer, which I think
> > > > > might be more consistent.
> > > > > >
> > > > > > >
> > > > > > > That said, I wonder why we cannot simply pass in an adjusted
> > > > > > > niter which would be niters_vector_mult_vf - vf and be done with
> that?
> > > > > > >
> > > > > >
> > > > > > We can ofcourse not have this and recompute it from niters
> > > > > > itself, however this does affect the epilog code layout.
> > > > > > Particularly knowing the static number if iterations left
> > > > > > causes it to usually unroll the loop and share some of the
> > > > > > computations.  i.e. the scalar code is often more
> > > > > efficient.
> > > > > >
> > > > > > The computation would be niters_vector_mult_vf - iters_done *
> > > > > > vf, since the value put Here is the remaining iteration count.
> > > > > > It's static for early
> > > > > exits.
> > > > >
> > > > > Well, it might be "static" in that it doesn't really matter what
> > > > > you use for the epilog main IV initial value as long as you are
> > > > > sure you're not going to take that exit as you are sure we're
> > > > > going to take one of the early exits.  So yeah, the special code
> > > > > is probably OK, but it needs a better comment and as said the
> > > > > structure of
> > > vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > > > >
> > > > > As said an important part for optimization is to not keep the
> > > > > scalar IVs live in the vector loop.
> > > > >
> > > > > > But can do whatever you prefer here.  Let me know what you
> > > > > > prefer for the
> > > > > above.
> > > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > > > Thanks,
> > > > > > > Richard.
> > > > > > >
> > > > > > >
> > > > > > > > Regards,
> > > > > > > > Tamar
> > > > > > > > >
> > > > > > > > > > It has to do this since you have to perform the side
> > > > > > > > > > effects for the non-matching elements still.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Tamar
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > > > +		continue;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > > > > > +		 init + (final - init) * vf which takes into
> > > > > > > > > > > > +account
> > > peeling
> > > > > > > > > > > > +		 values and non-single steps.  The main exit
> > > can
> > > > > > > > > > > > +use
> > > > > > > niters
> > > > > > > > > > > > +		 since if you exit from the main exit you've
> > > done
> > > > > > > > > > > > +all
> > > > > > > vector
> > > > > > > > > > > > +		 iterations.  For an early exit we don't know
> > > when
> > > > > > > > > > > > +we
> > > > > > > exit
> > > > > > > > > > > > +so
> > > > > > > > > > > we
> > > > > > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > > > +				 fold_convert (stype,
> > > start_expr),
> > > > > > > > > > > > +				 fold_convert (stype,
> > > init_expr));
> > > > > > > > > > > > +	      /* Now adjust for VF to get the final iteration value.
> > > */
> > > > > > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > > > +				 build_int_cst (stype, vf));
> > > > > > > > > > > > +	    }
> > > > > > > > > > > > +	  else
> > > > > > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > > > +			       fold_convert (stype, niters),
> > > step_expr);
> > > > > > > > > > > > +
> > > > > > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > > > > >  	  else
> > > > > > > > > > > > @@ -2238,6 +2286,8 @@
> > > > > > > > > > > > vect_update_ivs_after_vectorizer (loop_vec_info
> > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > > > > >  	ni = init_expr;
> > > > > > > > > > > > +      else if (restart_loop)
> > > > > > > > > > > > +	continue;
> > > > > > > > > > >
> > > > > > > > > > > This looks all a bit complicated - why wouldn't we
> > > > > > > > > > > simply always use the PHI result when 'restart_loop'?
> > > > > > > > > > > Isn't that the correct old start value in
> > > > > > > > > all cases?
> > > > > > > > > > >
> > > > > > > > > > > >        else
> > > > > > > > > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > > > > > > > >  					  niters, step_expr,
> @@ -
> > > > > 2245,9 +2295,20 @@
> > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > >
> > > > > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > > > > >
> > > > > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > > > > >        ni_name = force_gimple_operand (ni,
> > > > > > > > > > > > &new_stmts, false, var);
> > > > > > > > > > > > +
> > > > > > > > > > > > +      /* For non-main exit create an intermediat
> > > > > > > > > > > > + edge to get any
> > > > > > > updated iv
> > > > > > > > > > > > +	 calculations.  */
> > > > > > > > > > > > +      if (needs_interm_block
> > > > > > > > > > > > +	  && !iv_block
> > > > > > > > > > > > +	  && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > > > (new_stmts)))
> > > > > > > > > > > > +	{
> > > > > > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > > > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > > > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > > > +	}
> > > > > > > > > > > > +
> > > > > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > > > > >  	{
> > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling
> > > > > > > > > > > > (loop_vec_info loop_vinfo, tree
> > > > > > > > > > > niters, tree nitersm1,
> > > > > > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > > > > > >        gcc_checking_assert (vect_can_advance_ivs_p
> > > (loop_vinfo));
> > > > > > > > > > > >        update_e = skip_vector ? e : loop_preheader_edge
> (epilog);
> > > > > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > -					update_e);
> > > > > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > > > > > +      bool inversed_iv
> > > > > > > > > > > > +	= !vect_is_loop_exit_latch_pred
> > > (LOOP_VINFO_IV_EXIT
> > > > > > > (loop_vinfo),
> > > > > > > > > > > > +					 LOOP_VINFO_LOOP
> > > > > > > (loop_vinfo));
> > > > > > > > > > >
> > > > > > > > > > > You are computing this here and in
> > > > > vect_update_ivs_after_vectorizer?
> > > > > > > > > > >
> > > > > > > > > > > > +
> > > > > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > > > > +      vect_update_ivs_after_vectorizer
> > > > > > > > > > > > + (loop_vinfo, vf,
> > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > +					update_e,
> > > inversed_iv);
> > > > > > > > > > > > +
> > > > > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > > > +	{
> > > > > > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > > > > > > +	    continue;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	  vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > > > > +vf,
> > > > > > > > > > > > +
> > > niters_vector_mult_vf,
> > > > > > > > > > > > +					    exit, true);
> > > > > > > > > > >
> > > > > > > > > > > ... why does the same not work here?  Wouldn't the
> > > > > > > > > > > proper condition be !dominated_by_p (CDI_DOMINATORS,
> > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > > > (loop_vinfo)->src) or similar?  That is, whether the
> > > > > > > > > > > exit is at or after the main IV exit?  (consider
> > > > > > > > > > > having
> > > > > > > > > > > two)
> > > > > > > > > > >
> > > > > > > > > > > > +	}
> > > > > > > > > > > >
> > > > > > > > > > > >        if (skip_epilog)
> > > > > > > > > > > >  	{
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > Nuernberg, Germany;
> > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB
> > > > > > > > > 36809, AG
> > > > > > > > > Nuernberg)
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > > AG
> > > > > > > Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-16 14:14                           ` Tamar Christina
@ 2023-11-16 14:17                             ` Richard Biener
  2023-11-16 15:19                               ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-16 14:17 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Thu, 16 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Thursday, November 16, 2023 1:36 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> > breaks and arbitrary exits
> > 
> > On Thu, 16 Nov 2023, Tamar Christina wrote:
> > 
> > > > > > > > >
> > > > > > > > > Perhaps I'm missing something here?
> > > > > > > >
> > > > > > > > OK, so I refreshed my mind of what
> > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > does.
> > > > > > > >
> > > > > > > > I still do not understand the (complexity of the) patch.
> > > > > > > > Basically the function computes the new value of the IV
> > > > > > > > "from scratch" based on the number of scalar iterations of
> > > > > > > > the vector loop,
> > > > the 'niter'
> > > > > > > > argument.  I would have expected that for the early exits we
> > > > > > > > either pass in a different 'niter' or alternatively a 'niter_adjustment'.
> > > > > > >
> > > > > > > But for an early exit there's no static value for adjusted
> > > > > > > niter, since you don't know which iteration you exited from.
> > > > > > > Unlike the normal exit when you know if you get there you've
> > > > > > > done all possible
> > > > > > iterations.
> > > > > > >
> > > > > > > So you must compute the scalar iteration count on the exit itself.
> > > > > >
> > > > > > ?  You do not need the actual scalar iteration you exited (you
> > > > > > don't compute that either), you need the scalar iteration the
> > > > > > vector iteration started with when it exited prematurely and
> > > > > > that's readily
> > > > available?
> > > > >
> > > > > For a normal exit yes, not for an early exit no?
> > > > > niters_vector_mult_vf is only valid for the main exit.
> > > > >
> > > > > There's the unadjusted scalar count, which is what it's using to
> > > > > adjust it to the final count.  Unless I'm missing something?
> > > >
> > > > Ah, of course - niters_vector_mult_vf is for the countable exit.
> > > > For the early exits we can't precompute the scalar iteration value.
> > > > But that then means we should compute the appropriate "continuation"
> > > > as live value of the vectorized IVs even when they were not
> > > > originally used outside of the loop.  I don't see how we can express
> > > > this in terms of the scalar IVs in the (not yet) vectorized loop -
> > > > similar to the reduction case you are going to end up with the wrong values
> > here.
> > > >
> > > > That said, I've for a long time wanted to preserve the original
> > > > control IV also for the vector code (leaving any "optimization"
> > > > to IVOPTs there), that would enable us to compute the correct
> > > > "niters_vector_mult_vf" based on that IV.
> > > >
> > > > So given we cannot use the scalar IVs you have to handle all
> > > > inductions (besides the main exit control IV) in vectorizable_live_operation
> > I think.
> > > >
> > >
> > > That's what I currently do, that's why there was the
> > > 	      if (STMT_VINFO_LIVE_P (phi_info))
> > > 		continue;
> > 
> > Yes, but that only works for the inductions marked so.  We'd need to mark the
> > others as well, but only for the early exits.
> > 
> > > although I don't understand why we use the scalar count,  I suppose
> > > the reasoning is that we don't really want to keep it around, and referencing
> > it forces it to be kept?
> > 
> > Referencing it will cause the scalar compute to be retained, but since we do not
> > adjust the scalar compute during vectorization (but expect it to be dead) the
> > scalar compute will compute the wrong thing (as shown by the reduction
> > example - I suspect inductions will suffer from the same problem).
> > 
> > > At the moment it just does `init + (final - init) * vf` which is correct no?
> > 
> > The issue is that 'final' is not computed correctly in the vectorized loop.  This
> > formula might work for affine evolutions of course.
> > 
> > Extracting the correct value from the vectorized induction would be the
> > preferred solution.
> 
> Ok, so I should be able to just mark IVs as live during process_use if there are
> multiple exits right? Since it's just gonna be unused on the main exit since we
> use niters?
> 
> Because since it's the PHI inside the loop that needs to be marked live I can't
> just do it for a specific exits no?
> 
> If I create a copy of the PHI node during peeling for use in early exits and mark
> it live it won't work no?

I guess I wouldn't actually mark it STMT_VINFO_LIVE_P but somehow
arrange vectorizable_live_operation to be called, possibly adding
a edge argument to that as well.

Maybe the thing to do for the moment is to reject vectorization with
early breaks if there's any (non-STMT_VINFO_LIVE_P?) induction or
reduction besides the main counting IV one you can already
special-case?

Richard.

> Tamar
> > 
> > > Also you missed the question below about how to avoid the creation of
> > > the block, You ok with changing that?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > > Or for now disable early-break for inductions that are not the main
> > > > exit control IV (in vect_can_advance_ivs_p)?
> > > >
> > > > > > > >
> > > > > > > > It seems your change handles different kinds of inductions
> > differently.
> > > > > > > > Specifically
> > > > > > > >
> > > > > > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > > >       if (restart_loop && ivtemp)
> > > > > > > >         {
> > > > > > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > >           ni = build_int_cst (type, vf);
> > > > > > > >           if (inversed_iv)
> > > > > > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > > >                               fold_convert (type, step_expr));
> > > > > > > >         }
> > > > > > > >
> > > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > > > > > as the new value.  That seems to be very odd special casing
> > > > > > > > for unknown reasons.  And while you adjust vec_step_op_add,
> > > > > > > > you don't adjust vect_peel_nonlinear_iv_init (maybe not
> > > > > > > > supported - better assert
> > > > > > here).
> > > > > > >
> > > > > > > The VF case is for a normal "non-inverted" loop, where if you
> > > > > > > take an early exit you know that you have to do at most VF iterations.
> > > > > > > The VF
> > > > > > > - step is to account for the inverted loop control flow where
> > > > > > > you exit after adjusting the IV already by + step.
> > > > > >
> > > > > > But doesn't that assume the IV counts from niter to zero?  I
> > > > > > don't see this special case is actually necessary, no?
> > > > > >
> > > > >
> > > > > I needed it because otherwise the scalar loop iterates one
> > > > > iteration too little So I got a miscompile with the inverter loop
> > > > > stuff.  I'll look at it again perhaps It can be solved differently.
> > > > >
> > > > > > >
> > > > > > > Peeling doesn't matter here, since you know you were able to
> > > > > > > do a vector iteration so it's safe to do VF iterations.  So
> > > > > > > having peeled doesn't affect the remaining iters count.
> > > > > > >
> > > > > > > >
> > > > > > > > Also the vec_step_op_add case will keep the original scalar
> > > > > > > > IV live even when it is a vectorized induction.  The code
> > > > > > > > recomputing the value from scratch avoids this.
> > > > > > > >
> > > > > > > >       /* For non-main exit create an intermediat edge to get
> > > > > > > > any updated
> > > > iv
> > > > > > > >          calculations.  */
> > > > > > > >       if (needs_interm_block
> > > > > > > >           && !iv_block
> > > > > > > >           && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > !gimple_seq_empty_p
> > > > > > > > (new_stmts)))
> > > > > > > >         {
> > > > > > > >           iv_block = split_edge (update_e);
> > > > > > > >           update_e = single_succ_edge (update_e->dest);
> > > > > > > >           last_gsi = gsi_last_bb (iv_block);
> > > > > > > >         }
> > > > > > > >
> > > > > > > > this is also odd, can we adjust the API instead?  I suppose
> > > > > > > > this is because your computation uses the original loop IV,
> > > > > > > > if you based the computation off the initial value only this
> > > > > > > > might not be
> > > > necessary?
> > > > > > >
> > > > > > > No, on the main exit the code updates the value in the loop
> > > > > > > header and puts the Calculation in the merge block.  This
> > > > > > > works because it only needs to consume PHI nodes in the merge
> > > > > > > block and things like niters are
> > > > > > adjusted in the guard block.
> > > > > > >
> > > > > > > For an early exit, we don't have a guard block, only the merge block.
> > > > > > > We have to update the PHI nodes in that block,  but can't do
> > > > > > > so since you can't produce a value and consume it in a PHI
> > > > > > > node in the same
> > > > BB.
> > > > > > > So we need to create the block to put the values in for use in
> > > > > > > the merge block.  Because there's no "guard" block for early exits.
> > > > > >
> > > > > > ?  then compute niters in that block as well.
> > > > >
> > > > > We can't since it'll not be reachable through the right edge.
> > > > > What we can do if you want is slightly change peeling, we currently peel
> > as:
> > > > >
> > > > >   \        \             /
> > > > >   E1     E2        Normal exit
> > > > >     \       |          |
> > > > >        \    |          Guard
> > > > >           \ |          |
> > > > >          Merge block
> > > > >                   |
> > > > >              Pre Header
> > > > >
> > > > > If we instead peel as:
> > > > >
> > > > >
> > > > >   \        \             /
> > > > >   E1     E2        Normal exit
> > > > >     \       |          |
> > > > >        Exit join   Guard
> > > > >           \ |          |
> > > > >          Merge block
> > > > >                   |
> > > > >              Pre Header
> > > > >
> > > > > We can use the exit join block.  This would also mean
> > > > > vect_update_ivs_after_vectorizer Doesn't need to iterate over all
> > > > > exits and only really needs to adjust the phi nodes Coming out of
> > > > > the exit join
> > > > and guard block.
> > > > >
> > > > > Does this work for you?
> > 
> > Yeah, I think that would work.  But I'd like to sort out the correctness details of
> > the IV update itself before sorting out this code placement detail.
> > 
> > Richard.
> > 
> > > > > Thanks,
> > > > > Tamar
> > > > > >
> > > > > > > The API can be adjusted by always creating the empty block
> > > > > > > either during
> > > > > > peeling.
> > > > > > > That would prevent us from having to do anything special here.
> > > > > > > Would that work better?  Or I can do it in the loop that
> > > > > > > iterates over the exits to before the call to
> > > > > > > vect_update_ivs_after_vectorizer, which I think
> > > > > > might be more consistent.
> > > > > > >
> > > > > > > >
> > > > > > > > That said, I wonder why we cannot simply pass in an adjusted
> > > > > > > > niter which would be niters_vector_mult_vf - vf and be done with
> > that?
> > > > > > > >
> > > > > > >
> > > > > > > We can ofcourse not have this and recompute it from niters
> > > > > > > itself, however this does affect the epilog code layout.
> > > > > > > Particularly knowing the static number if iterations left
> > > > > > > causes it to usually unroll the loop and share some of the
> > > > > > > computations.  i.e. the scalar code is often more
> > > > > > efficient.
> > > > > > >
> > > > > > > The computation would be niters_vector_mult_vf - iters_done *
> > > > > > > vf, since the value put Here is the remaining iteration count.
> > > > > > > It's static for early
> > > > > > exits.
> > > > > >
> > > > > > Well, it might be "static" in that it doesn't really matter what
> > > > > > you use for the epilog main IV initial value as long as you are
> > > > > > sure you're not going to take that exit as you are sure we're
> > > > > > going to take one of the early exits.  So yeah, the special code
> > > > > > is probably OK, but it needs a better comment and as said the
> > > > > > structure of
> > > > vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > > > > >
> > > > > > As said an important part for optimization is to not keep the
> > > > > > scalar IVs live in the vector loop.
> > > > > >
> > > > > > > But can do whatever you prefer here.  Let me know what you
> > > > > > > prefer for the
> > > > > > above.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Tamar
> > > > > > > > > >
> > > > > > > > > > > It has to do this since you have to perform the side
> > > > > > > > > > > effects for the non-matching elements still.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Tamar
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > > > > +		continue;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > > > > > > +		 init + (final - init) * vf which takes into
> > > > > > > > > > > > > +account
> > > > peeling
> > > > > > > > > > > > > +		 values and non-single steps.  The main exit
> > > > can
> > > > > > > > > > > > > +use
> > > > > > > > niters
> > > > > > > > > > > > > +		 since if you exit from the main exit you've
> > > > done
> > > > > > > > > > > > > +all
> > > > > > > > vector
> > > > > > > > > > > > > +		 iterations.  For an early exit we don't know
> > > > when
> > > > > > > > > > > > > +we
> > > > > > > > exit
> > > > > > > > > > > > > +so
> > > > > > > > > > > > we
> > > > > > > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > > > > +				 fold_convert (stype,
> > > > start_expr),
> > > > > > > > > > > > > +				 fold_convert (stype,
> > > > init_expr));
> > > > > > > > > > > > > +	      /* Now adjust for VF to get the final iteration value.
> > > > */
> > > > > > > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > > > > +				 build_int_cst (stype, vf));
> > > > > > > > > > > > > +	    }
> > > > > > > > > > > > > +	  else
> > > > > > > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > > > > +			       fold_convert (stype, niters),
> > > > step_expr);
> > > > > > > > > > > > > +
> > > > > > > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > > > > > >  	  else
> > > > > > > > > > > > > @@ -2238,6 +2286,8 @@
> > > > > > > > > > > > > vect_update_ivs_after_vectorizer (loop_vec_info
> > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > > > > > >  	ni = init_expr;
> > > > > > > > > > > > > +      else if (restart_loop)
> > > > > > > > > > > > > +	continue;
> > > > > > > > > > > >
> > > > > > > > > > > > This looks all a bit complicated - why wouldn't we
> > > > > > > > > > > > simply always use the PHI result when 'restart_loop'?
> > > > > > > > > > > > Isn't that the correct old start value in
> > > > > > > > > > all cases?
> > > > > > > > > > > >
> > > > > > > > > > > > >        else
> > > > > > > > > > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > > > > > > > > >  					  niters, step_expr,
> > @@ -
> > > > > > 2245,9 +2295,20 @@
> > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > >
> > > > > > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > > > > > >
> > > > > > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > > > > > >        ni_name = force_gimple_operand (ni,
> > > > > > > > > > > > > &new_stmts, false, var);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +      /* For non-main exit create an intermediat
> > > > > > > > > > > > > + edge to get any
> > > > > > > > updated iv
> > > > > > > > > > > > > +	 calculations.  */
> > > > > > > > > > > > > +      if (needs_interm_block
> > > > > > > > > > > > > +	  && !iv_block
> > > > > > > > > > > > > +	  && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > > > > (new_stmts)))
> > > > > > > > > > > > > +	{
> > > > > > > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > > > > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > > > > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > > > > +	}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > > > > > >  	{
> > > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling
> > > > > > > > > > > > > (loop_vec_info loop_vinfo, tree
> > > > > > > > > > > > niters, tree nitersm1,
> > > > > > > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > > > > > > >        gcc_checking_assert (vect_can_advance_ivs_p
> > > > (loop_vinfo));
> > > > > > > > > > > > >        update_e = skip_vector ? e : loop_preheader_edge
> > (epilog);
> > > > > > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > -					update_e);
> > > > > > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > > > > > > +      bool inversed_iv
> > > > > > > > > > > > > +	= !vect_is_loop_exit_latch_pred
> > > > (LOOP_VINFO_IV_EXIT
> > > > > > > > (loop_vinfo),
> > > > > > > > > > > > > +					 LOOP_VINFO_LOOP
> > > > > > > > (loop_vinfo));
> > > > > > > > > > > >
> > > > > > > > > > > > You are computing this here and in
> > > > > > vect_update_ivs_after_vectorizer?
> > > > > > > > > > > >
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > > > > > +      vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > + (loop_vinfo, vf,
> > > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > +					update_e,
> > > > inversed_iv);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > > > > +	{
> > > > > > > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > > > > > > > +	    continue;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +	  vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > > > > > +vf,
> > > > > > > > > > > > > +
> > > > niters_vector_mult_vf,
> > > > > > > > > > > > > +					    exit, true);
> > > > > > > > > > > >
> > > > > > > > > > > > ... why does the same not work here?  Wouldn't the
> > > > > > > > > > > > proper condition be !dominated_by_p (CDI_DOMINATORS,
> > > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > > > > (loop_vinfo)->src) or similar?  That is, whether the
> > > > > > > > > > > > exit is at or after the main IV exit?  (consider
> > > > > > > > > > > > having
> > > > > > > > > > > > two)
> > > > > > > > > > > >
> > > > > > > > > > > > > +	}
> > > > > > > > > > > > >
> > > > > > > > > > > > >        if (skip_epilog)
> > > > > > > > > > > > >  	{
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > > Nuernberg, Germany;
> > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB
> > > > > > > > > > 36809, AG
> > > > > > > > > > Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > > > AG
> > > > > > > > Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-16 14:17                             ` Richard Biener
@ 2023-11-16 15:19                               ` Tamar Christina
  2023-11-16 18:41                                 ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-16 15:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, November 16, 2023 2:18 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> On Thu, 16 Nov 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Thursday, November 16, 2023 1:36 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> jlaw@ventanamicro.com
> > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > support early breaks and arbitrary exits
> > >
> > > On Thu, 16 Nov 2023, Tamar Christina wrote:
> > >
> > > > > > > > > >
> > > > > > > > > > Perhaps I'm missing something here?
> > > > > > > > >
> > > > > > > > > OK, so I refreshed my mind of what
> > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > does.
> > > > > > > > >
> > > > > > > > > I still do not understand the (complexity of the) patch.
> > > > > > > > > Basically the function computes the new value of the IV
> > > > > > > > > "from scratch" based on the number of scalar iterations
> > > > > > > > > of the vector loop,
> > > > > the 'niter'
> > > > > > > > > argument.  I would have expected that for the early
> > > > > > > > > exits we either pass in a different 'niter' or alternatively a
> 'niter_adjustment'.
> > > > > > > >
> > > > > > > > But for an early exit there's no static value for adjusted
> > > > > > > > niter, since you don't know which iteration you exited from.
> > > > > > > > Unlike the normal exit when you know if you get there
> > > > > > > > you've done all possible
> > > > > > > iterations.
> > > > > > > >
> > > > > > > > So you must compute the scalar iteration count on the exit itself.
> > > > > > >
> > > > > > > ?  You do not need the actual scalar iteration you exited
> > > > > > > (you don't compute that either), you need the scalar
> > > > > > > iteration the vector iteration started with when it exited
> > > > > > > prematurely and that's readily
> > > > > available?
> > > > > >
> > > > > > For a normal exit yes, not for an early exit no?
> > > > > > niters_vector_mult_vf is only valid for the main exit.
> > > > > >
> > > > > > There's the unadjusted scalar count, which is what it's using
> > > > > > to adjust it to the final count.  Unless I'm missing something?
> > > > >
> > > > > Ah, of course - niters_vector_mult_vf is for the countable exit.
> > > > > For the early exits we can't precompute the scalar iteration value.
> > > > > But that then means we should compute the appropriate
> "continuation"
> > > > > as live value of the vectorized IVs even when they were not
> > > > > originally used outside of the loop.  I don't see how we can
> > > > > express this in terms of the scalar IVs in the (not yet)
> > > > > vectorized loop - similar to the reduction case you are going to
> > > > > end up with the wrong values
> > > here.
> > > > >
> > > > > That said, I've for a long time wanted to preserve the original
> > > > > control IV also for the vector code (leaving any "optimization"
> > > > > to IVOPTs there), that would enable us to compute the correct
> > > > > "niters_vector_mult_vf" based on that IV.
> > > > >
> > > > > So given we cannot use the scalar IVs you have to handle all
> > > > > inductions (besides the main exit control IV) in
> > > > > vectorizable_live_operation
> > > I think.
> > > > >
> > > >
> > > > That's what I currently do, that's why there was the
> > > > 	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > 		continue;
> > >
> > > Yes, but that only works for the inductions marked so.  We'd need to
> > > mark the others as well, but only for the early exits.
> > >
> > > > although I don't understand why we use the scalar count,  I
> > > > suppose the reasoning is that we don't really want to keep it
> > > > around, and referencing
> > > it forces it to be kept?
> > >
> > > Referencing it will cause the scalar compute to be retained, but
> > > since we do not adjust the scalar compute during vectorization (but
> > > expect it to be dead) the scalar compute will compute the wrong
> > > thing (as shown by the reduction example - I suspect inductions will suffer
> from the same problem).
> > >
> > > > At the moment it just does `init + (final - init) * vf` which is correct no?
> > >
> > > The issue is that 'final' is not computed correctly in the
> > > vectorized loop.  This formula might work for affine evolutions of course.
> > >
> > > Extracting the correct value from the vectorized induction would be
> > > the preferred solution.
> >
> > Ok, so I should be able to just mark IVs as live during process_use if
> > there are multiple exits right? Since it's just gonna be unused on the
> > main exit since we use niters?
> >
> > Because since it's the PHI inside the loop that needs to be marked
> > live I can't just do it for a specific exits no?
> >
> > If I create a copy of the PHI node during peeling for use in early
> > exits and mark it live it won't work no?
> 
> I guess I wouldn't actually mark it STMT_VINFO_LIVE_P but somehow arrange
> vectorizable_live_operation to be called, possibly adding a edge argument to
> that as well.
> 
> Maybe the thing to do for the moment is to reject vectorization with early
> breaks if there's any (non-STMT_VINFO_LIVE_P?) induction or reduction
> besides the main counting IV one you can already special-case?

Ok so I did a quick hack with:

      if (!virtual_operand_p (PHI_RESULT (phi))
	  && !STMT_VINFO_LIVE_P (phi_info))
	{
	  use_operand_p use_p;
	  imm_use_iterator imm_iter;
	  bool non_exit_use = false;
	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, PHI_RESULT (phi))
	    if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p))))
	      for (auto exit : get_loop_exit_edges (loop))
		{
		  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
		    continue;

		  if (gimple_bb (USE_STMT (use_p)) != exit->dest)
		    {
		      non_exit_use = true;
		      goto fail;
		    }  
		}
fail:
	  if (non_exit_use)
	    return false;
	}

And it does seem to still allow all the cases I want.  I've placed this in vect_can_advance_ivs_p.

Does this cover what you meant?

Thanks,
Tamar

> 
> Richard.
> 
> > Tamar
> > >
> > > > Also you missed the question below about how to avoid the creation
> > > > of the block, You ok with changing that?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > > Or for now disable early-break for inductions that are not the
> > > > > main exit control IV (in vect_can_advance_ivs_p)?
> > > > >
> > > > > > > > >
> > > > > > > > > It seems your change handles different kinds of
> > > > > > > > > inductions
> > > differently.
> > > > > > > > > Specifically
> > > > > > > > >
> > > > > > > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > > > >       if (restart_loop && ivtemp)
> > > > > > > > >         {
> > > > > > > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > > >           ni = build_int_cst (type, vf);
> > > > > > > > >           if (inversed_iv)
> > > > > > > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > > > >                               fold_convert (type, step_expr));
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > > > > > > as the new value.  That seems to be very odd special
> > > > > > > > > casing for unknown reasons.  And while you adjust
> > > > > > > > > vec_step_op_add, you don't adjust
> > > > > > > > > vect_peel_nonlinear_iv_init (maybe not supported -
> > > > > > > > > better assert
> > > > > > > here).
> > > > > > > >
> > > > > > > > The VF case is for a normal "non-inverted" loop, where if
> > > > > > > > you take an early exit you know that you have to do at most VF
> iterations.
> > > > > > > > The VF
> > > > > > > > - step is to account for the inverted loop control flow
> > > > > > > > where you exit after adjusting the IV already by + step.
> > > > > > >
> > > > > > > But doesn't that assume the IV counts from niter to zero?  I
> > > > > > > don't see this special case is actually necessary, no?
> > > > > > >
> > > > > >
> > > > > > I needed it because otherwise the scalar loop iterates one
> > > > > > iteration too little So I got a miscompile with the inverter
> > > > > > loop stuff.  I'll look at it again perhaps It can be solved differently.
> > > > > >
> > > > > > > >
> > > > > > > > Peeling doesn't matter here, since you know you were able
> > > > > > > > to do a vector iteration so it's safe to do VF iterations.
> > > > > > > > So having peeled doesn't affect the remaining iters count.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Also the vec_step_op_add case will keep the original
> > > > > > > > > scalar IV live even when it is a vectorized induction.
> > > > > > > > > The code recomputing the value from scratch avoids this.
> > > > > > > > >
> > > > > > > > >       /* For non-main exit create an intermediat edge to
> > > > > > > > > get any updated
> > > > > iv
> > > > > > > > >          calculations.  */
> > > > > > > > >       if (needs_interm_block
> > > > > > > > >           && !iv_block
> > > > > > > > >           && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > !gimple_seq_empty_p
> > > > > > > > > (new_stmts)))
> > > > > > > > >         {
> > > > > > > > >           iv_block = split_edge (update_e);
> > > > > > > > >           update_e = single_succ_edge (update_e->dest);
> > > > > > > > >           last_gsi = gsi_last_bb (iv_block);
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > > this is also odd, can we adjust the API instead?  I
> > > > > > > > > suppose this is because your computation uses the
> > > > > > > > > original loop IV, if you based the computation off the
> > > > > > > > > initial value only this might not be
> > > > > necessary?
> > > > > > > >
> > > > > > > > No, on the main exit the code updates the value in the
> > > > > > > > loop header and puts the Calculation in the merge block.
> > > > > > > > This works because it only needs to consume PHI nodes in
> > > > > > > > the merge block and things like niters are
> > > > > > > adjusted in the guard block.
> > > > > > > >
> > > > > > > > For an early exit, we don't have a guard block, only the merge
> block.
> > > > > > > > We have to update the PHI nodes in that block,  but can't
> > > > > > > > do so since you can't produce a value and consume it in a
> > > > > > > > PHI node in the same
> > > > > BB.
> > > > > > > > So we need to create the block to put the values in for
> > > > > > > > use in the merge block.  Because there's no "guard" block for early
> exits.
> > > > > > >
> > > > > > > ?  then compute niters in that block as well.
> > > > > >
> > > > > > We can't since it'll not be reachable through the right edge.
> > > > > > What we can do if you want is slightly change peeling, we
> > > > > > currently peel
> > > as:
> > > > > >
> > > > > >   \        \             /
> > > > > >   E1     E2        Normal exit
> > > > > >     \       |          |
> > > > > >        \    |          Guard
> > > > > >           \ |          |
> > > > > >          Merge block
> > > > > >                   |
> > > > > >              Pre Header
> > > > > >
> > > > > > If we instead peel as:
> > > > > >
> > > > > >
> > > > > >   \        \             /
> > > > > >   E1     E2        Normal exit
> > > > > >     \       |          |
> > > > > >        Exit join   Guard
> > > > > >           \ |          |
> > > > > >          Merge block
> > > > > >                   |
> > > > > >              Pre Header
> > > > > >
> > > > > > We can use the exit join block.  This would also mean
> > > > > > vect_update_ivs_after_vectorizer Doesn't need to iterate over
> > > > > > all exits and only really needs to adjust the phi nodes Coming
> > > > > > out of the exit join
> > > > > and guard block.
> > > > > >
> > > > > > Does this work for you?
> > >
> > > Yeah, I think that would work.  But I'd like to sort out the
> > > correctness details of the IV update itself before sorting out this code
> placement detail.
> > >
> > > Richard.
> > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > > >
> > > > > > > > The API can be adjusted by always creating the empty block
> > > > > > > > either during
> > > > > > > peeling.
> > > > > > > > That would prevent us from having to do anything special here.
> > > > > > > > Would that work better?  Or I can do it in the loop that
> > > > > > > > iterates over the exits to before the call to
> > > > > > > > vect_update_ivs_after_vectorizer, which I think
> > > > > > > might be more consistent.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > That said, I wonder why we cannot simply pass in an
> > > > > > > > > adjusted niter which would be niters_vector_mult_vf - vf
> > > > > > > > > and be done with
> > > that?
> > > > > > > > >
> > > > > > > >
> > > > > > > > We can ofcourse not have this and recompute it from niters
> > > > > > > > itself, however this does affect the epilog code layout.
> > > > > > > > Particularly knowing the static number if iterations left
> > > > > > > > causes it to usually unroll the loop and share some of the
> > > > > > > > computations.  i.e. the scalar code is often more
> > > > > > > efficient.
> > > > > > > >
> > > > > > > > The computation would be niters_vector_mult_vf -
> > > > > > > > iters_done * vf, since the value put Here is the remaining iteration
> count.
> > > > > > > > It's static for early
> > > > > > > exits.
> > > > > > >
> > > > > > > Well, it might be "static" in that it doesn't really matter
> > > > > > > what you use for the epilog main IV initial value as long as
> > > > > > > you are sure you're not going to take that exit as you are
> > > > > > > sure we're going to take one of the early exits.  So yeah,
> > > > > > > the special code is probably OK, but it needs a better
> > > > > > > comment and as said the structure of
> > > > > vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > > > > > >
> > > > > > > As said an important part for optimization is to not keep
> > > > > > > the scalar IVs live in the vector loop.
> > > > > > >
> > > > > > > > But can do whatever you prefer here.  Let me know what you
> > > > > > > > prefer for the
> > > > > > > above.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Tamar
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Richard.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Tamar
> > > > > > > > > > >
> > > > > > > > > > > > It has to do this since you have to perform the
> > > > > > > > > > > > side effects for the non-matching elements still.
> > > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > Tamar
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > > > > > +		continue;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > > > > > > > +		 init + (final - init) * vf which takes
> > > > > > > > > > > > > > +into account
> > > > > peeling
> > > > > > > > > > > > > > +		 values and non-single steps.  The main
> > > > > > > > > > > > > > +exit
> > > > > can
> > > > > > > > > > > > > > +use
> > > > > > > > > niters
> > > > > > > > > > > > > > +		 since if you exit from the main exit
> > > > > > > > > > > > > > +you've
> > > > > done
> > > > > > > > > > > > > > +all
> > > > > > > > > vector
> > > > > > > > > > > > > > +		 iterations.  For an early exit we don't
> > > > > > > > > > > > > > +know
> > > > > when
> > > > > > > > > > > > > > +we
> > > > > > > > > exit
> > > > > > > > > > > > > > +so
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > > > > > +				 fold_convert (stype,
> > > > > start_expr),
> > > > > > > > > > > > > > +				 fold_convert (stype,
> > > > > init_expr));
> > > > > > > > > > > > > > +	      /* Now adjust for VF to get the final iteration value.
> > > > > */
> > > > > > > > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > > > > > +				 build_int_cst (stype, vf));
> > > > > > > > > > > > > > +	    }
> > > > > > > > > > > > > > +	  else
> > > > > > > > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > > > > > +			       fold_convert (stype, niters),
> > > > > step_expr);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > > > > > > >  	  else
> > > > > > > > > > > > > > @@ -2238,6 +2286,8 @@
> > > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > > > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > > > > > > >  	ni = init_expr;
> > > > > > > > > > > > > > +      else if (restart_loop)
> > > > > > > > > > > > > > +	continue;
> > > > > > > > > > > > >
> > > > > > > > > > > > > This looks all a bit complicated - why wouldn't
> > > > > > > > > > > > > we simply always use the PHI result when 'restart_loop'?
> > > > > > > > > > > > > Isn't that the correct old start value in
> > > > > > > > > > > all cases?
> > > > > > > > > > > > >
> > > > > > > > > > > > > >        else
> > > > > > > > > > > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts, init_expr,
> > > > > > > > > > > > > >  					  niters, step_expr,
> > > @@ -
> > > > > > > 2245,9 +2295,20 @@
> > > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > > > > > > >        ni_name = force_gimple_operand (ni,
> > > > > > > > > > > > > > &new_stmts, false, var);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +      /* For non-main exit create an
> > > > > > > > > > > > > > + intermediat edge to get any
> > > > > > > > > updated iv
> > > > > > > > > > > > > > +	 calculations.  */
> > > > > > > > > > > > > > +      if (needs_interm_block
> > > > > > > > > > > > > > +	  && !iv_block
> > > > > > > > > > > > > > +	  && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > > > > > (new_stmts)))
> > > > > > > > > > > > > > +	{
> > > > > > > > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > > > > > > > +	  update_e = single_succ_edge (update_e->dest);
> > > > > > > > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > > > > > > >  	{
> > > > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling
> > > > > > > > > > > > > > (loop_vec_info loop_vinfo, tree
> > > > > > > > > > > > > niters, tree nitersm1,
> > > > > > > > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > > > > > > > >        gcc_checking_assert
> > > > > > > > > > > > > > (vect_can_advance_ivs_p
> > > > > (loop_vinfo));
> > > > > > > > > > > > > >        update_e = skip_vector ? e :
> > > > > > > > > > > > > > loop_preheader_edge
> > > (epilog);
> > > > > > > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > -					update_e);
> > > > > > > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > > > > > > > +      bool inversed_iv
> > > > > > > > > > > > > > +	= !vect_is_loop_exit_latch_pred
> > > > > (LOOP_VINFO_IV_EXIT
> > > > > > > > > (loop_vinfo),
> > > > > > > > > > > > > > +					 LOOP_VINFO_LOOP
> > > > > > > > > (loop_vinfo));
> > > > > > > > > > > > >
> > > > > > > > > > > > > You are computing this here and in
> > > > > > > vect_update_ivs_after_vectorizer?
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > > > > > > +      vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > + (loop_vinfo, vf,
> > > > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > +					update_e,
> > > > > inversed_iv);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > > > > > +	{
> > > > > > > > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > > > > > > > > > > > > +	    continue;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +	  vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > +(loop_vinfo, vf,
> > > > > > > > > > > > > > +
> > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > +					    exit, true);
> > > > > > > > > > > > >
> > > > > > > > > > > > > ... why does the same not work here?  Wouldn't
> > > > > > > > > > > > > the proper condition be !dominated_by_p
> > > > > > > > > > > > > (CDI_DOMINATORS,
> > > > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > > > > > (loop_vinfo)->src) or similar?  That is, whether
> > > > > > > > > > > > > the exit is at or after the main IV exit?
> > > > > > > > > > > > > (consider having
> > > > > > > > > > > > > two)
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >        if (skip_epilog)
> > > > > > > > > > > > > >  	{
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > > > Nuernberg, Germany;
> > > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich;
> > > > > > > > > > > (HRB 36809, AG
> > > > > > > > > > > Nuernberg)
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > Nuernberg, Germany;
> > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB
> > > > > > > > > 36809, AG
> > > > > > > > > Nuernberg)
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > > AG
> > > > > > > Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-16 15:19                               ` Tamar Christina
@ 2023-11-16 18:41                                 ` Tamar Christina
  2023-11-17 10:40                                   ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-16 18:41 UTC (permalink / raw)
  To: Tamar Christina, Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Thursday, November 16, 2023 3:19 PM
> To: Richard Biener <rguenther@suse.de>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Thursday, November 16, 2023 2:18 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support
> > early breaks and arbitrary exits
> >
> > On Thu, 16 Nov 2023, Tamar Christina wrote:
> >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Thursday, November 16, 2023 1:36 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > jlaw@ventanamicro.com
> > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > > support early breaks and arbitrary exits
> > > >
> > > > On Thu, 16 Nov 2023, Tamar Christina wrote:
> > > >
> > > > > > > > > > >
> > > > > > > > > > > Perhaps I'm missing something here?
> > > > > > > > > >
> > > > > > > > > > OK, so I refreshed my mind of what
> > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > does.
> > > > > > > > > >
> > > > > > > > > > I still do not understand the (complexity of the) patch.
> > > > > > > > > > Basically the function computes the new value of the
> > > > > > > > > > IV "from scratch" based on the number of scalar
> > > > > > > > > > iterations of the vector loop,
> > > > > > the 'niter'
> > > > > > > > > > argument.  I would have expected that for the early
> > > > > > > > > > exits we either pass in a different 'niter' or
> > > > > > > > > > alternatively a
> > 'niter_adjustment'.
> > > > > > > > >
> > > > > > > > > But for an early exit there's no static value for
> > > > > > > > > adjusted niter, since you don't know which iteration you exited
> from.
> > > > > > > > > Unlike the normal exit when you know if you get there
> > > > > > > > > you've done all possible
> > > > > > > > iterations.
> > > > > > > > >
> > > > > > > > > So you must compute the scalar iteration count on the exit itself.
> > > > > > > >
> > > > > > > > ?  You do not need the actual scalar iteration you exited
> > > > > > > > (you don't compute that either), you need the scalar
> > > > > > > > iteration the vector iteration started with when it exited
> > > > > > > > prematurely and that's readily
> > > > > > available?
> > > > > > >
> > > > > > > For a normal exit yes, not for an early exit no?
> > > > > > > niters_vector_mult_vf is only valid for the main exit.
> > > > > > >
> > > > > > > There's the unadjusted scalar count, which is what it's
> > > > > > > using to adjust it to the final count.  Unless I'm missing something?
> > > > > >
> > > > > > Ah, of course - niters_vector_mult_vf is for the countable exit.
> > > > > > For the early exits we can't precompute the scalar iteration value.
> > > > > > But that then means we should compute the appropriate
> > "continuation"
> > > > > > as live value of the vectorized IVs even when they were not
> > > > > > originally used outside of the loop.  I don't see how we can
> > > > > > express this in terms of the scalar IVs in the (not yet)
> > > > > > vectorized loop - similar to the reduction case you are going
> > > > > > to end up with the wrong values
> > > > here.
> > > > > >
> > > > > > That said, I've for a long time wanted to preserve the
> > > > > > original control IV also for the vector code (leaving any "optimization"
> > > > > > to IVOPTs there), that would enable us to compute the correct
> > > > > > "niters_vector_mult_vf" based on that IV.
> > > > > >
> > > > > > So given we cannot use the scalar IVs you have to handle all
> > > > > > inductions (besides the main exit control IV) in
> > > > > > vectorizable_live_operation
> > > > I think.
> > > > > >
> > > > >
> > > > > That's what I currently do, that's why there was the
> > > > > 	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > 		continue;
> > > >
> > > > Yes, but that only works for the inductions marked so.  We'd need
> > > > to mark the others as well, but only for the early exits.
> > > >
> > > > > although I don't understand why we use the scalar count,  I
> > > > > suppose the reasoning is that we don't really want to keep it
> > > > > around, and referencing
> > > > it forces it to be kept?
> > > >
> > > > Referencing it will cause the scalar compute to be retained, but
> > > > since we do not adjust the scalar compute during vectorization
> > > > (but expect it to be dead) the scalar compute will compute the
> > > > wrong thing (as shown by the reduction example - I suspect
> > > > inductions will suffer
> > from the same problem).
> > > >
> > > > > At the moment it just does `init + (final - init) * vf` which is correct no?
> > > >
> > > > The issue is that 'final' is not computed correctly in the
> > > > vectorized loop.  This formula might work for affine evolutions of course.
> > > >
> > > > Extracting the correct value from the vectorized induction would
> > > > be the preferred solution.
> > >
> > > Ok, so I should be able to just mark IVs as live during process_use
> > > if there are multiple exits right? Since it's just gonna be unused
> > > on the main exit since we use niters?
> > >
> > > Because since it's the PHI inside the loop that needs to be marked
> > > live I can't just do it for a specific exits no?
> > >
> > > If I create a copy of the PHI node during peeling for use in early
> > > exits and mark it live it won't work no?
> >
> > I guess I wouldn't actually mark it STMT_VINFO_LIVE_P but somehow
> > arrange vectorizable_live_operation to be called, possibly adding a
> > edge argument to that as well.
> >
> > Maybe the thing to do for the moment is to reject vectorization with
> > early breaks if there's any (non-STMT_VINFO_LIVE_P?) induction or
> > reduction besides the main counting IV one you can already special-case?
> 
> Ok so I did a quick hack with:
> 
>       if (!virtual_operand_p (PHI_RESULT (phi))
> 	  && !STMT_VINFO_LIVE_P (phi_info))
> 	{
> 	  use_operand_p use_p;
> 	  imm_use_iterator imm_iter;
> 	  bool non_exit_use = false;
> 	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, PHI_RESULT (phi))
> 	    if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p))))
> 	      for (auto exit : get_loop_exit_edges (loop))
> 		{
> 		  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> 		    continue;
> 
> 		  if (gimple_bb (USE_STMT (use_p)) != exit->dest)
> 		    {
> 		      non_exit_use = true;
> 		      goto fail;
> 		    }
> 		}
> fail:
> 	  if (non_exit_use)
> 	    return false;
> 	}
> 
> And it does seem to still allow all the cases I want.  I've placed this in
> vect_can_advance_ivs_p.
> 
> Does this cover what you meant?
> 

Ok, I've rewritten this in a nicer form, but doesn't this mean we now block any loop there the index is not live?
i.e. we block such simple loops like

#ifndef N
#define N 800
#endif
unsigned vect_a[N];

unsigned test4(unsigned x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   if (vect_a[i]*2 != x)
     break;
   vect_a[i] = x;
 }
 return ret;
}

because it does a simple `break`.  If I force it to be live it works, but then I need to differentiate between
the counter and the IV.

# i_15 = PHI <i_12(6), 0(2)>
# ivtmp_7 = PHI <ivtmp_14(6), 803(2)>

I seems like if we don't want to keep i_15 around (at the moment it will be kept because of its usage in the
exit block it won't be DCEd) then we need to mark it live early during analysis.

Most likely if we do this I don't need to care about the "inverted" workflow here at all. What do you think?

Yes that doesn't work for SLP, but I don't think I can get SLP working in the remaining time anyway..

I'll fix reduction and multiple exit live values in the mean time.

Thanks,
Tamar
> Thanks,
> Tamar
> 
> >
> > Richard.
> >
> > > Tamar
> > > >
> > > > > Also you missed the question below about how to avoid the
> > > > > creation of the block, You ok with changing that?
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > > Or for now disable early-break for inductions that are not the
> > > > > > main exit control IV (in vect_can_advance_ivs_p)?
> > > > > >
> > > > > > > > > >
> > > > > > > > > > It seems your change handles different kinds of
> > > > > > > > > > inductions
> > > > differently.
> > > > > > > > > > Specifically
> > > > > > > > > >
> > > > > > > > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > > > > >       if (restart_loop && ivtemp)
> > > > > > > > > >         {
> > > > > > > > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > > > >           ni = build_int_cst (type, vf);
> > > > > > > > > >           if (inversed_iv)
> > > > > > > > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > > > > >                               fold_convert (type, step_expr));
> > > > > > > > > >         }
> > > > > > > > > >
> > > > > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > > > > > > > as the new value.  That seems to be very odd special
> > > > > > > > > > casing for unknown reasons.  And while you adjust
> > > > > > > > > > vec_step_op_add, you don't adjust
> > > > > > > > > > vect_peel_nonlinear_iv_init (maybe not supported -
> > > > > > > > > > better assert
> > > > > > > > here).
> > > > > > > > >
> > > > > > > > > The VF case is for a normal "non-inverted" loop, where
> > > > > > > > > if you take an early exit you know that you have to do
> > > > > > > > > at most VF
> > iterations.
> > > > > > > > > The VF
> > > > > > > > > - step is to account for the inverted loop control flow
> > > > > > > > > where you exit after adjusting the IV already by + step.
> > > > > > > >
> > > > > > > > But doesn't that assume the IV counts from niter to zero?
> > > > > > > > I don't see this special case is actually necessary, no?
> > > > > > > >
> > > > > > >
> > > > > > > I needed it because otherwise the scalar loop iterates one
> > > > > > > iteration too little So I got a miscompile with the inverter
> > > > > > > loop stuff.  I'll look at it again perhaps It can be solved differently.
> > > > > > >
> > > > > > > > >
> > > > > > > > > Peeling doesn't matter here, since you know you were
> > > > > > > > > able to do a vector iteration so it's safe to do VF iterations.
> > > > > > > > > So having peeled doesn't affect the remaining iters count.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Also the vec_step_op_add case will keep the original
> > > > > > > > > > scalar IV live even when it is a vectorized induction.
> > > > > > > > > > The code recomputing the value from scratch avoids this.
> > > > > > > > > >
> > > > > > > > > >       /* For non-main exit create an intermediat edge
> > > > > > > > > > to get any updated
> > > > > > iv
> > > > > > > > > >          calculations.  */
> > > > > > > > > >       if (needs_interm_block
> > > > > > > > > >           && !iv_block
> > > > > > > > > >           && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > !gimple_seq_empty_p
> > > > > > > > > > (new_stmts)))
> > > > > > > > > >         {
> > > > > > > > > >           iv_block = split_edge (update_e);
> > > > > > > > > >           update_e = single_succ_edge (update_e->dest);
> > > > > > > > > >           last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > >         }
> > > > > > > > > >
> > > > > > > > > > this is also odd, can we adjust the API instead?  I
> > > > > > > > > > suppose this is because your computation uses the
> > > > > > > > > > original loop IV, if you based the computation off the
> > > > > > > > > > initial value only this might not be
> > > > > > necessary?
> > > > > > > > >
> > > > > > > > > No, on the main exit the code updates the value in the
> > > > > > > > > loop header and puts the Calculation in the merge block.
> > > > > > > > > This works because it only needs to consume PHI nodes in
> > > > > > > > > the merge block and things like niters are
> > > > > > > > adjusted in the guard block.
> > > > > > > > >
> > > > > > > > > For an early exit, we don't have a guard block, only the
> > > > > > > > > merge
> > block.
> > > > > > > > > We have to update the PHI nodes in that block,  but
> > > > > > > > > can't do so since you can't produce a value and consume
> > > > > > > > > it in a PHI node in the same
> > > > > > BB.
> > > > > > > > > So we need to create the block to put the values in for
> > > > > > > > > use in the merge block.  Because there's no "guard"
> > > > > > > > > block for early
> > exits.
> > > > > > > >
> > > > > > > > ?  then compute niters in that block as well.
> > > > > > >
> > > > > > > We can't since it'll not be reachable through the right edge.
> > > > > > > What we can do if you want is slightly change peeling, we
> > > > > > > currently peel
> > > > as:
> > > > > > >
> > > > > > >   \        \             /
> > > > > > >   E1     E2        Normal exit
> > > > > > >     \       |          |
> > > > > > >        \    |          Guard
> > > > > > >           \ |          |
> > > > > > >          Merge block
> > > > > > >                   |
> > > > > > >              Pre Header
> > > > > > >
> > > > > > > If we instead peel as:
> > > > > > >
> > > > > > >
> > > > > > >   \        \             /
> > > > > > >   E1     E2        Normal exit
> > > > > > >     \       |          |
> > > > > > >        Exit join   Guard
> > > > > > >           \ |          |
> > > > > > >          Merge block
> > > > > > >                   |
> > > > > > >              Pre Header
> > > > > > >
> > > > > > > We can use the exit join block.  This would also mean
> > > > > > > vect_update_ivs_after_vectorizer Doesn't need to iterate
> > > > > > > over all exits and only really needs to adjust the phi nodes
> > > > > > > Coming out of the exit join
> > > > > > and guard block.
> > > > > > >
> > > > > > > Does this work for you?
> > > >
> > > > Yeah, I think that would work.  But I'd like to sort out the
> > > > correctness details of the IV update itself before sorting out
> > > > this code
> > placement detail.
> > > >
> > > > Richard.
> > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > > >
> > > > > > > > > The API can be adjusted by always creating the empty
> > > > > > > > > block either during
> > > > > > > > peeling.
> > > > > > > > > That would prevent us from having to do anything special here.
> > > > > > > > > Would that work better?  Or I can do it in the loop that
> > > > > > > > > iterates over the exits to before the call to
> > > > > > > > > vect_update_ivs_after_vectorizer, which I think
> > > > > > > > might be more consistent.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > That said, I wonder why we cannot simply pass in an
> > > > > > > > > > adjusted niter which would be niters_vector_mult_vf -
> > > > > > > > > > vf and be done with
> > > > that?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > We can ofcourse not have this and recompute it from
> > > > > > > > > niters itself, however this does affect the epilog code layout.
> > > > > > > > > Particularly knowing the static number if iterations
> > > > > > > > > left causes it to usually unroll the loop and share some
> > > > > > > > > of the computations.  i.e. the scalar code is often more
> > > > > > > > efficient.
> > > > > > > > >
> > > > > > > > > The computation would be niters_vector_mult_vf -
> > > > > > > > > iters_done * vf, since the value put Here is the
> > > > > > > > > remaining iteration
> > count.
> > > > > > > > > It's static for early
> > > > > > > > exits.
> > > > > > > >
> > > > > > > > Well, it might be "static" in that it doesn't really
> > > > > > > > matter what you use for the epilog main IV initial value
> > > > > > > > as long as you are sure you're not going to take that exit
> > > > > > > > as you are sure we're going to take one of the early
> > > > > > > > exits.  So yeah, the special code is probably OK, but it
> > > > > > > > needs a better comment and as said the structure of
> > > > > > vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > > > > > > >
> > > > > > > > As said an important part for optimization is to not keep
> > > > > > > > the scalar IVs live in the vector loop.
> > > > > > > >
> > > > > > > > > But can do whatever you prefer here.  Let me know what
> > > > > > > > > you prefer for the
> > > > > > > > above.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Tamar
> > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Richard.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Tamar
> > > > > > > > > > > >
> > > > > > > > > > > > > It has to do this since you have to perform the
> > > > > > > > > > > > > side effects for the non-matching elements still.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > Tamar
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > > > > > > +		continue;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > > > > > > > > +		 init + (final - init) * vf which takes
> > > > > > > > > > > > > > > +into account
> > > > > > peeling
> > > > > > > > > > > > > > > +		 values and non-single steps.  The
> main
> > > > > > > > > > > > > > > +exit
> > > > > > can
> > > > > > > > > > > > > > > +use
> > > > > > > > > > niters
> > > > > > > > > > > > > > > +		 since if you exit from the main exit
> > > > > > > > > > > > > > > +you've
> > > > > > done
> > > > > > > > > > > > > > > +all
> > > > > > > > > > vector
> > > > > > > > > > > > > > > +		 iterations.  For an early exit we don't
> > > > > > > > > > > > > > > +know
> > > > > > when
> > > > > > > > > > > > > > > +we
> > > > > > > > > > exit
> > > > > > > > > > > > > > > +so
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > > > > > > +				 fold_convert (stype,
> > > > > > start_expr),
> > > > > > > > > > > > > > > +				 fold_convert (stype,
> > > > > > init_expr));
> > > > > > > > > > > > > > > +	      /* Now adjust for VF to get the final
> iteration value.
> > > > > > */
> > > > > > > > > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > > > > > > +				 build_int_cst (stype,
> vf));
> > > > > > > > > > > > > > > +	    }
> > > > > > > > > > > > > > > +	  else
> > > > > > > > > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > > > > > > +			       fold_convert (stype,
> niters),
> > > > > > step_expr);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > > > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > > > > > > > >  	  else
> > > > > > > > > > > > > > > @@ -2238,6 +2286,8 @@
> > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.  */
> > > > > > > > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > > > > > > > >  	ni = init_expr;
> > > > > > > > > > > > > > > +      else if (restart_loop)
> > > > > > > > > > > > > > > +	continue;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This looks all a bit complicated - why
> > > > > > > > > > > > > > wouldn't we simply always use the PHI result when
> 'restart_loop'?
> > > > > > > > > > > > > > Isn't that the correct old start value in
> > > > > > > > > > > > all cases?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >        else
> > > > > > > > > > > > > > >  	ni = vect_peel_nonlinear_iv_init (&stmts,
> init_expr,
> > > > > > > > > > > > > > >  					  niters,
> step_expr,
> > > > @@ -
> > > > > > > > 2245,9 +2295,20 @@
> > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > > > > > > > >        ni_name = force_gimple_operand (ni,
> > > > > > > > > > > > > > > &new_stmts, false, var);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +      /* For non-main exit create an
> > > > > > > > > > > > > > > + intermediat edge to get any
> > > > > > > > > > updated iv
> > > > > > > > > > > > > > > +	 calculations.  */
> > > > > > > > > > > > > > > +      if (needs_interm_block
> > > > > > > > > > > > > > > +	  && !iv_block
> > > > > > > > > > > > > > > +	  && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > > > > > > (new_stmts)))
> > > > > > > > > > > > > > > +	{
> > > > > > > > > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > > > > > > > > +	  update_e = single_succ_edge (update_e-
> >dest);
> > > > > > > > > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > > > > > > > >  	{
> > > > > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling
> > > > > > > > > > > > > > > (loop_vec_info loop_vinfo, tree
> > > > > > > > > > > > > > niters, tree nitersm1,
> > > > > > > > > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > > > > > > > > >        gcc_checking_assert
> > > > > > > > > > > > > > > (vect_can_advance_ivs_p
> > > > > > (loop_vinfo));
> > > > > > > > > > > > > > >        update_e = skip_vector ? e :
> > > > > > > > > > > > > > > loop_preheader_edge
> > > > (epilog);
> > > > > > > > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > -					update_e);
> > > > > > > > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > > > > > > > > +      bool inversed_iv
> > > > > > > > > > > > > > > +	= !vect_is_loop_exit_latch_pred
> > > > > > (LOOP_VINFO_IV_EXIT
> > > > > > > > > > (loop_vinfo),
> > > > > > > > > > > > > > > +
> LOOP_VINFO_LOOP
> > > > > > > > > > (loop_vinfo));
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > You are computing this here and in
> > > > > > > > vect_update_ivs_after_vectorizer?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > > > > > > > +      vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > + (loop_vinfo, vf,
> > > > > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > +					update_e,
> > > > > > inversed_iv);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > > > > > > +	{
> > > > > > > > > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT
> (loop_vinfo))
> > > > > > > > > > > > > > > +	    continue;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +	  vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > +(loop_vinfo, vf,
> > > > > > > > > > > > > > > +
> > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > +					    exit, true);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ... why does the same not work here?  Wouldn't
> > > > > > > > > > > > > > the proper condition be !dominated_by_p
> > > > > > > > > > > > > > (CDI_DOMINATORS,
> > > > > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > > > > > > (loop_vinfo)->src) or similar?  That is,
> > > > > > > > > > > > > > whether the exit is at or after the main IV exit?
> > > > > > > > > > > > > > (consider having
> > > > > > > > > > > > > > two)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >        if (skip_epilog)
> > > > > > > > > > > > > > >  	{
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > > > > Nuernberg, Germany;
> > > > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich;
> > > > > > > > > > > > (HRB 36809, AG
> > > > > > > > > > > > Nuernberg)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > > Nuernberg, Germany;
> > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB
> > > > > > > > > > 36809, AG
> > > > > > > > > > Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> > > > > > > > Germany;
> > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB
> > > > > > > > 36809, AG
> > > > > > > > Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > AG
> > > > > > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany
> > > > GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-16 18:41                                 ` Tamar Christina
@ 2023-11-17 10:40                                   ` Tamar Christina
  2023-11-17 12:13                                     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-17 10:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> > > > > Yes, but that only works for the inductions marked so.  We'd
> > > > > need to mark the others as well, but only for the early exits.
> > > > >
> > > > > > although I don't understand why we use the scalar count,  I
> > > > > > suppose the reasoning is that we don't really want to keep it
> > > > > > around, and referencing
> > > > > it forces it to be kept?
> > > > >
> > > > > Referencing it will cause the scalar compute to be retained, but
> > > > > since we do not adjust the scalar compute during vectorization
> > > > > (but expect it to be dead) the scalar compute will compute the
> > > > > wrong thing (as shown by the reduction example - I suspect
> > > > > inductions will suffer
> > > from the same problem).
> > > > >
> > > > > > At the moment it just does `init + (final - init) * vf` which is correct no?
> > > > >
> > > > > The issue is that 'final' is not computed correctly in the
> > > > > vectorized loop.  This formula might work for affine evolutions of
> course.
> > > > >
> > > > > Extracting the correct value from the vectorized induction would
> > > > > be the preferred solution.
> > > >
> > > > Ok, so I should be able to just mark IVs as live during
> > > > process_use if there are multiple exits right? Since it's just
> > > > gonna be unused on the main exit since we use niters?
> > > >
> > > > Because since it's the PHI inside the loop that needs to be marked
> > > > live I can't just do it for a specific exits no?
> > > >
> > > > If I create a copy of the PHI node during peeling for use in early
> > > > exits and mark it live it won't work no?
> > >
> > > I guess I wouldn't actually mark it STMT_VINFO_LIVE_P but somehow
> > > arrange vectorizable_live_operation to be called, possibly adding a
> > > edge argument to that as well.
> > >
> > > Maybe the thing to do for the moment is to reject vectorization with
> > > early breaks if there's any (non-STMT_VINFO_LIVE_P?) induction or
> > > reduction besides the main counting IV one you can already special-case?
> >
> > Ok so I did a quick hack with:
> >
> >       if (!virtual_operand_p (PHI_RESULT (phi))
> > 	  && !STMT_VINFO_LIVE_P (phi_info))
> > 	{
> > 	  use_operand_p use_p;
> > 	  imm_use_iterator imm_iter;
> > 	  bool non_exit_use = false;
> > 	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, PHI_RESULT (phi))
> > 	    if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p))))
> > 	      for (auto exit : get_loop_exit_edges (loop))
> > 		{
> > 		  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > 		    continue;
> >
> > 		  if (gimple_bb (USE_STMT (use_p)) != exit->dest)
> > 		    {
> > 		      non_exit_use = true;
> > 		      goto fail;
> > 		    }
> > 		}
> > fail:
> > 	  if (non_exit_use)
> > 	    return false;
> > 	}
> >
> > And it does seem to still allow all the cases I want.  I've placed
> > this in vect_can_advance_ivs_p.
> >
> > Does this cover what you meant?
> >
> 
> Ok, I've rewritten this in a nicer form, but doesn't this mean we now block any
> loop there the index is not live?
> i.e. we block such simple loops like
> 
> #ifndef N
> #define N 800
> #endif
> unsigned vect_a[N];
> 
> unsigned test4(unsigned x)
> {
>  unsigned ret = 0;
>  for (int i = 0; i < N; i++)
>  {
>    if (vect_a[i]*2 != x)
>      break;
>    vect_a[i] = x;
>  }
>  return ret;
> }
> 
> because it does a simple `break`.  If I force it to be live it works, but then I need
> to differentiate between the counter and the IV.
> 
> # i_15 = PHI <i_12(6), 0(2)>
> # ivtmp_7 = PHI <ivtmp_14(6), 803(2)>
> 
> I seems like if we don't want to keep i_15 around (at the moment it will be kept
> because of its usage in the exit block it won't be DCEd) then we need to mark it
> live early during analysis.
> 
> Most likely if we do this I don't need to care about the "inverted" workflow
> here at all. What do you think?
> 
> Yes that doesn't work for SLP, but I don't think I can get SLP working in the
> remaining time anyway..
> 
> I'll fix reduction and multiple exit live values in the mean time.
> 

Ok, so I currently have the following solution.  Let me know if you agree with it
and I'll polish it up today and tomorrow and respin things.

1. During vect_update_ivs_after_vectorizer we no longer touch any PHIs aside from
     Just updating IVtemps with the expected remaining iteration count.
2. During vect_transform_loop after vectorizing any induction or reduction I call vectorizable_live_operation
     For any phi node that still has any usages in the early exit merge block.
3. vectorizable_live_operation is taught to have to materialize the same PHI in multiple exits
4. vectorizable_reduction or maybe vect_create_epilog_for_reduction need to be modified to for early exits materialize
    The previous iteration value.

This seems to work and produces now for the simple loop above:

.L2:
        str     q27, [x1, x3]
        str     q29, [x2, x1]
        add     x1, x1, 16
        cmp     x1, 3200
        beq     .L11
.L4:
        ldr     q31, [x2, x1]
        mov     v28.16b, v30.16b
        add     v30.4s, v30.4s, v26.4s
        shl     v31.4s, v31.4s, 1
        add     v27.4s, v28.4s, v29.4s
        cmeq    v31.4s, v31.4s, v29.4s
        not     v31.16b, v31.16b
        umaxp   v31.4s, v31.4s, v31.4s
        fmov    x4, d31
        cbz     x4, .L2
        fmov    w1, s28
        mov     w6, 4                                                                                                                                                                                                                                                        .L3:

so now the scalar index is no longer kept and it reduces the value from the vector IV in the exit:

fmov    w1, s28

Does this work as you expected?

Thanks,
Tamar

> Thanks,
> Tamar
> > Thanks,
> > Tamar
> >
> > >
> > > Richard.
> > >
> > > > Tamar
> > > > >
> > > > > > Also you missed the question below about how to avoid the
> > > > > > creation of the block, You ok with changing that?
> > > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > > > Or for now disable early-break for inductions that are not
> > > > > > > the main exit control IV (in vect_can_advance_ivs_p)?
> > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > It seems your change handles different kinds of
> > > > > > > > > > > inductions
> > > > > differently.
> > > > > > > > > > > Specifically
> > > > > > > > > > >
> > > > > > > > > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > > > > > >       if (restart_loop && ivtemp)
> > > > > > > > > > >         {
> > > > > > > > > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > > > > >           ni = build_int_cst (type, vf);
> > > > > > > > > > >           if (inversed_iv)
> > > > > > > > > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > > > > > >                               fold_convert (type, step_expr));
> > > > > > > > > > >         }
> > > > > > > > > > >
> > > > > > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > > > > > > > > as the new value.  That seems to be very odd special
> > > > > > > > > > > casing for unknown reasons.  And while you adjust
> > > > > > > > > > > vec_step_op_add, you don't adjust
> > > > > > > > > > > vect_peel_nonlinear_iv_init (maybe not supported -
> > > > > > > > > > > better assert
> > > > > > > > > here).
> > > > > > > > > >
> > > > > > > > > > The VF case is for a normal "non-inverted" loop, where
> > > > > > > > > > if you take an early exit you know that you have to do
> > > > > > > > > > at most VF
> > > iterations.
> > > > > > > > > > The VF
> > > > > > > > > > - step is to account for the inverted loop control
> > > > > > > > > > flow where you exit after adjusting the IV already by + step.
> > > > > > > > >
> > > > > > > > > But doesn't that assume the IV counts from niter to zero?
> > > > > > > > > I don't see this special case is actually necessary, no?
> > > > > > > > >
> > > > > > > >
> > > > > > > > I needed it because otherwise the scalar loop iterates one
> > > > > > > > iteration too little So I got a miscompile with the
> > > > > > > > inverter loop stuff.  I'll look at it again perhaps It can be solved
> differently.
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Peeling doesn't matter here, since you know you were
> > > > > > > > > > able to do a vector iteration so it's safe to do VF iterations.
> > > > > > > > > > So having peeled doesn't affect the remaining iters count.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Also the vec_step_op_add case will keep the original
> > > > > > > > > > > scalar IV live even when it is a vectorized induction.
> > > > > > > > > > > The code recomputing the value from scratch avoids this.
> > > > > > > > > > >
> > > > > > > > > > >       /* For non-main exit create an intermediat
> > > > > > > > > > > edge to get any updated
> > > > > > > iv
> > > > > > > > > > >          calculations.  */
> > > > > > > > > > >       if (needs_interm_block
> > > > > > > > > > >           && !iv_block
> > > > > > > > > > >           && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > !gimple_seq_empty_p
> > > > > > > > > > > (new_stmts)))
> > > > > > > > > > >         {
> > > > > > > > > > >           iv_block = split_edge (update_e);
> > > > > > > > > > >           update_e = single_succ_edge (update_e->dest);
> > > > > > > > > > >           last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > >         }
> > > > > > > > > > >
> > > > > > > > > > > this is also odd, can we adjust the API instead?  I
> > > > > > > > > > > suppose this is because your computation uses the
> > > > > > > > > > > original loop IV, if you based the computation off
> > > > > > > > > > > the initial value only this might not be
> > > > > > > necessary?
> > > > > > > > > >
> > > > > > > > > > No, on the main exit the code updates the value in the
> > > > > > > > > > loop header and puts the Calculation in the merge block.
> > > > > > > > > > This works because it only needs to consume PHI nodes
> > > > > > > > > > in the merge block and things like niters are
> > > > > > > > > adjusted in the guard block.
> > > > > > > > > >
> > > > > > > > > > For an early exit, we don't have a guard block, only
> > > > > > > > > > the merge
> > > block.
> > > > > > > > > > We have to update the PHI nodes in that block,  but
> > > > > > > > > > can't do so since you can't produce a value and
> > > > > > > > > > consume it in a PHI node in the same
> > > > > > > BB.
> > > > > > > > > > So we need to create the block to put the values in
> > > > > > > > > > for use in the merge block.  Because there's no "guard"
> > > > > > > > > > block for early
> > > exits.
> > > > > > > > >
> > > > > > > > > ?  then compute niters in that block as well.
> > > > > > > >
> > > > > > > > We can't since it'll not be reachable through the right edge.
> > > > > > > > What we can do if you want is slightly change peeling, we
> > > > > > > > currently peel
> > > > > as:
> > > > > > > >
> > > > > > > >   \        \             /
> > > > > > > >   E1     E2        Normal exit
> > > > > > > >     \       |          |
> > > > > > > >        \    |          Guard
> > > > > > > >           \ |          |
> > > > > > > >          Merge block
> > > > > > > >                   |
> > > > > > > >              Pre Header
> > > > > > > >
> > > > > > > > If we instead peel as:
> > > > > > > >
> > > > > > > >
> > > > > > > >   \        \             /
> > > > > > > >   E1     E2        Normal exit
> > > > > > > >     \       |          |
> > > > > > > >        Exit join   Guard
> > > > > > > >           \ |          |
> > > > > > > >          Merge block
> > > > > > > >                   |
> > > > > > > >              Pre Header
> > > > > > > >
> > > > > > > > We can use the exit join block.  This would also mean
> > > > > > > > vect_update_ivs_after_vectorizer Doesn't need to iterate
> > > > > > > > over all exits and only really needs to adjust the phi
> > > > > > > > nodes Coming out of the exit join
> > > > > > > and guard block.
> > > > > > > >
> > > > > > > > Does this work for you?
> > > > >
> > > > > Yeah, I think that would work.  But I'd like to sort out the
> > > > > correctness details of the IV update itself before sorting out
> > > > > this code
> > > placement detail.
> > > > >
> > > > > Richard.
> > > > >
> > > > > > > > Thanks,
> > > > > > > > Tamar
> > > > > > > > >
> > > > > > > > > > The API can be adjusted by always creating the empty
> > > > > > > > > > block either during
> > > > > > > > > peeling.
> > > > > > > > > > That would prevent us from having to do anything special here.
> > > > > > > > > > Would that work better?  Or I can do it in the loop
> > > > > > > > > > that iterates over the exits to before the call to
> > > > > > > > > > vect_update_ivs_after_vectorizer, which I think
> > > > > > > > > might be more consistent.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > That said, I wonder why we cannot simply pass in an
> > > > > > > > > > > adjusted niter which would be niters_vector_mult_vf
> > > > > > > > > > > - vf and be done with
> > > > > that?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > We can ofcourse not have this and recompute it from
> > > > > > > > > > niters itself, however this does affect the epilog code layout.
> > > > > > > > > > Particularly knowing the static number if iterations
> > > > > > > > > > left causes it to usually unroll the loop and share
> > > > > > > > > > some of the computations.  i.e. the scalar code is
> > > > > > > > > > often more
> > > > > > > > > efficient.
> > > > > > > > > >
> > > > > > > > > > The computation would be niters_vector_mult_vf -
> > > > > > > > > > iters_done * vf, since the value put Here is the
> > > > > > > > > > remaining iteration
> > > count.
> > > > > > > > > > It's static for early
> > > > > > > > > exits.
> > > > > > > > >
> > > > > > > > > Well, it might be "static" in that it doesn't really
> > > > > > > > > matter what you use for the epilog main IV initial value
> > > > > > > > > as long as you are sure you're not going to take that
> > > > > > > > > exit as you are sure we're going to take one of the
> > > > > > > > > early exits.  So yeah, the special code is probably OK,
> > > > > > > > > but it needs a better comment and as said the structure
> > > > > > > > > of
> > > > > > > vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > > > > > > > >
> > > > > > > > > As said an important part for optimization is to not
> > > > > > > > > keep the scalar IVs live in the vector loop.
> > > > > > > > >
> > > > > > > > > > But can do whatever you prefer here.  Let me know what
> > > > > > > > > > you prefer for the
> > > > > > > > > above.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Tamar
> > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Richard.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > Tamar
> > > > > > > > > > > > >
> > > > > > > > > > > > > > It has to do this since you have to perform
> > > > > > > > > > > > > > the side effects for the non-matching elements still.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > Tamar
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > > > > > > > +		continue;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > > > > > > > > > +		 init + (final - init) * vf which takes
> > > > > > > > > > > > > > > > +into account
> > > > > > > peeling
> > > > > > > > > > > > > > > > +		 values and non-single steps.  The
> > main
> > > > > > > > > > > > > > > > +exit
> > > > > > > can
> > > > > > > > > > > > > > > > +use
> > > > > > > > > > > niters
> > > > > > > > > > > > > > > > +		 since if you exit from the main exit
> > > > > > > > > > > > > > > > +you've
> > > > > > > done
> > > > > > > > > > > > > > > > +all
> > > > > > > > > > > vector
> > > > > > > > > > > > > > > > +		 iterations.  For an early exit we
> > > > > > > > > > > > > > > > +don't know
> > > > > > > when
> > > > > > > > > > > > > > > > +we
> > > > > > > > > > > exit
> > > > > > > > > > > > > > > > +so
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > > > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > > > > > > > +				 fold_convert (stype,
> > > > > > > start_expr),
> > > > > > > > > > > > > > > > +				 fold_convert (stype,
> > > > > > > init_expr));
> > > > > > > > > > > > > > > > +	      /* Now adjust for VF to get the
> > > > > > > > > > > > > > > > +final
> > iteration value.
> > > > > > > */
> > > > > > > > > > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > > > > > > > +				 build_int_cst (stype,
> > vf));
> > > > > > > > > > > > > > > > +	    }
> > > > > > > > > > > > > > > > +	  else
> > > > > > > > > > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > > > > > > > +			       fold_convert (stype,
> > niters),
> > > > > > > step_expr);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > > > > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > > > > > > > > >  	  else
> > > > > > > > > > > > > > > > @@ -2238,6 +2286,8 @@
> > > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.
> */
> > > > > > > > > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > > > > > > > > >  	ni = init_expr;
> > > > > > > > > > > > > > > > +      else if (restart_loop)
> > > > > > > > > > > > > > > > +	continue;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This looks all a bit complicated - why
> > > > > > > > > > > > > > > wouldn't we simply always use the PHI result
> > > > > > > > > > > > > > > when
> > 'restart_loop'?
> > > > > > > > > > > > > > > Isn't that the correct old start value in
> > > > > > > > > > > > > all cases?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >        else
> > > > > > > > > > > > > > > >  	ni = vect_peel_nonlinear_iv_init
> > > > > > > > > > > > > > > > (&stmts,
> > init_expr,
> > > > > > > > > > > > > > > >  					  niters,
> > step_expr,
> > > > > @@ -
> > > > > > > > > 2245,9 +2295,20 @@
> > > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > > > > > > > > >        ni_name = force_gimple_operand (ni,
> > > > > > > > > > > > > > > > &new_stmts, false, var);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +      /* For non-main exit create an
> > > > > > > > > > > > > > > > + intermediat edge to get any
> > > > > > > > > > > updated iv
> > > > > > > > > > > > > > > > +	 calculations.  */
> > > > > > > > > > > > > > > > +      if (needs_interm_block
> > > > > > > > > > > > > > > > +	  && !iv_block
> > > > > > > > > > > > > > > > +	  && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > > > > > > > (new_stmts)))
> > > > > > > > > > > > > > > > +	{
> > > > > > > > > > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > > > > > > > > > +	  update_e = single_succ_edge (update_e-
> > >dest);
> > > > > > > > > > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > > > > > > > > >  	{
> > > > > > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling
> > > > > > > > > > > > > > > > (loop_vec_info loop_vinfo, tree
> > > > > > > > > > > > > > > niters, tree nitersm1,
> > > > > > > > > > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > > > > > > > > > >        gcc_checking_assert
> > > > > > > > > > > > > > > > (vect_can_advance_ivs_p
> > > > > > > (loop_vinfo));
> > > > > > > > > > > > > > > >        update_e = skip_vector ? e :
> > > > > > > > > > > > > > > > loop_preheader_edge
> > > > > (epilog);
> > > > > > > > > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > > -					update_e);
> > > > > > > > > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > > > > > > > > > +      bool inversed_iv
> > > > > > > > > > > > > > > > +	= !vect_is_loop_exit_latch_pred
> > > > > > > (LOOP_VINFO_IV_EXIT
> > > > > > > > > > > (loop_vinfo),
> > > > > > > > > > > > > > > > +
> > LOOP_VINFO_LOOP
> > > > > > > > > > > (loop_vinfo));
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > You are computing this here and in
> > > > > > > > > vect_update_ivs_after_vectorizer?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > > > > > > > > +      vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > > + (loop_vinfo, vf,
> > > > > > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > > +					update_e,
> > > > > > > inversed_iv);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > > > > > > > +	{
> > > > > > > > > > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT
> > (loop_vinfo))
> > > > > > > > > > > > > > > > +	    continue;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +	  vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > > +(loop_vinfo, vf,
> > > > > > > > > > > > > > > > +
> > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > > +					    exit, true);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ... why does the same not work here?
> > > > > > > > > > > > > > > Wouldn't the proper condition be
> > > > > > > > > > > > > > > !dominated_by_p (CDI_DOMINATORS,
> > > > > > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > > > > > > > (loop_vinfo)->src) or similar?  That is,
> > > > > > > > > > > > > > > whether the exit is at or after the main IV exit?
> > > > > > > > > > > > > > > (consider having
> > > > > > > > > > > > > > > two)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >        if (skip_epilog)
> > > > > > > > > > > > > > > >  	{
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146,
> > > > > > > > > > > > > 90461 Nuernberg, Germany;
> > > > > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich;
> > > > > > > > > > > > > (HRB 36809, AG
> > > > > > > > > > > > > Nuernberg)
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > > > Nuernberg, Germany;
> > > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich;
> > > > > > > > > > > (HRB 36809, AG
> > > > > > > > > > > Nuernberg)
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > Nuernberg, Germany;
> > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB
> > > > > > > > > 36809, AG
> > > > > > > > > Nuernberg)
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > > AG
> > > > > > > Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-17 10:40                                   ` Tamar Christina
@ 2023-11-17 12:13                                     ` Richard Biener
  2023-11-20 21:54                                       ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-17 12:13 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 17 Nov 2023, Tamar Christina wrote:

> > > > > > Yes, but that only works for the inductions marked so.  We'd
> > > > > > need to mark the others as well, but only for the early exits.
> > > > > >
> > > > > > > although I don't understand why we use the scalar count,  I
> > > > > > > suppose the reasoning is that we don't really want to keep it
> > > > > > > around, and referencing
> > > > > > it forces it to be kept?
> > > > > >
> > > > > > Referencing it will cause the scalar compute to be retained, but
> > > > > > since we do not adjust the scalar compute during vectorization
> > > > > > (but expect it to be dead) the scalar compute will compute the
> > > > > > wrong thing (as shown by the reduction example - I suspect
> > > > > > inductions will suffer
> > > > from the same problem).
> > > > > >
> > > > > > > At the moment it just does `init + (final - init) * vf` which is correct no?
> > > > > >
> > > > > > The issue is that 'final' is not computed correctly in the
> > > > > > vectorized loop.  This formula might work for affine evolutions of
> > course.
> > > > > >
> > > > > > Extracting the correct value from the vectorized induction would
> > > > > > be the preferred solution.
> > > > >
> > > > > Ok, so I should be able to just mark IVs as live during
> > > > > process_use if there are multiple exits right? Since it's just
> > > > > gonna be unused on the main exit since we use niters?
> > > > >
> > > > > Because since it's the PHI inside the loop that needs to be marked
> > > > > live I can't just do it for a specific exits no?
> > > > >
> > > > > If I create a copy of the PHI node during peeling for use in early
> > > > > exits and mark it live it won't work no?
> > > >
> > > > I guess I wouldn't actually mark it STMT_VINFO_LIVE_P but somehow
> > > > arrange vectorizable_live_operation to be called, possibly adding a
> > > > edge argument to that as well.
> > > >
> > > > Maybe the thing to do for the moment is to reject vectorization with
> > > > early breaks if there's any (non-STMT_VINFO_LIVE_P?) induction or
> > > > reduction besides the main counting IV one you can already special-case?
> > >
> > > Ok so I did a quick hack with:
> > >
> > >       if (!virtual_operand_p (PHI_RESULT (phi))
> > > 	  && !STMT_VINFO_LIVE_P (phi_info))
> > > 	{
> > > 	  use_operand_p use_p;
> > > 	  imm_use_iterator imm_iter;
> > > 	  bool non_exit_use = false;
> > > 	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, PHI_RESULT (phi))
> > > 	    if (!flow_bb_inside_loop_p (loop, gimple_bb (USE_STMT (use_p))))
> > > 	      for (auto exit : get_loop_exit_edges (loop))
> > > 		{
> > > 		  if (exit == LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > 		    continue;
> > >
> > > 		  if (gimple_bb (USE_STMT (use_p)) != exit->dest)
> > > 		    {
> > > 		      non_exit_use = true;
> > > 		      goto fail;
> > > 		    }
> > > 		}
> > > fail:
> > > 	  if (non_exit_use)
> > > 	    return false;
> > > 	}
> > >
> > > And it does seem to still allow all the cases I want.  I've placed
> > > this in vect_can_advance_ivs_p.
> > >
> > > Does this cover what you meant?
> > >
> > 
> > Ok, I've rewritten this in a nicer form, but doesn't this mean we now block any
> > loop there the index is not live?
> > i.e. we block such simple loops like
> > 
> > #ifndef N
> > #define N 800
> > #endif
> > unsigned vect_a[N];
> > 
> > unsigned test4(unsigned x)
> > {
> >  unsigned ret = 0;
> >  for (int i = 0; i < N; i++)
> >  {
> >    if (vect_a[i]*2 != x)
> >      break;
> >    vect_a[i] = x;
> >  }
> >  return ret;
> > }
> > 
> > because it does a simple `break`.  If I force it to be live it works, but then I need
> > to differentiate between the counter and the IV.
> > 
> > # i_15 = PHI <i_12(6), 0(2)>
> > # ivtmp_7 = PHI <ivtmp_14(6), 803(2)>
> > 
> > I seems like if we don't want to keep i_15 around (at the moment it will be kept
> > because of its usage in the exit block it won't be DCEd) then we need to mark it
> > live early during analysis.
> > 
> > Most likely if we do this I don't need to care about the "inverted" workflow
> > here at all. What do you think?
> > 
> > Yes that doesn't work for SLP, but I don't think I can get SLP working in the
> > remaining time anyway..
> > 
> > I'll fix reduction and multiple exit live values in the mean time.
> > 
> 
> Ok, so I currently have the following solution.  Let me know if you agree with it
> and I'll polish it up today and tomorrow and respin things.
> 
> 1. During vect_update_ivs_after_vectorizer we no longer touch any PHIs aside from
>      Just updating IVtemps with the expected remaining iteration count.

OK

> 2. During vect_transform_loop after vectorizing any induction or reduction I call vectorizable_live_operation
>      For any phi node that still has any usages in the early exit merge block.

OK, I suppose you need to amend the vectorizable_live_operation API to
tell it it works for the early exits or the main exit (and not complain
when !STMT_VINFO_LIVE_P for the early exit case).

> 3. vectorizable_live_operation is taught to have to materialize the same PHI in multiple exits

For the main exit you'd get here via STMT_VINFO_LIVE_P handling and
vect_update_ivs_after_vectorizer would handle the rest.  For the
early exits I think you only have to materialize once (in the merge 
block)?

> 4. vectorizable_reduction or maybe vect_create_epilog_for_reduction need to be modified to for early exits materialize
>     The previous iteration value.

I think you need to only touch vect_create_epilog_for_reduction, the
early exit merge block needs another reduction epilog.  Well, in theory
just another vector to reduce but not sure if the control flow supports
having the same actual epilog for both the main and the early exits.

Richard.

> This seems to work and produces now for the simple loop above:
> 
> .L2:
>         str     q27, [x1, x3]
>         str     q29, [x2, x1]
>         add     x1, x1, 16
>         cmp     x1, 3200
>         beq     .L11
> .L4:
>         ldr     q31, [x2, x1]
>         mov     v28.16b, v30.16b
>         add     v30.4s, v30.4s, v26.4s
>         shl     v31.4s, v31.4s, 1
>         add     v27.4s, v28.4s, v29.4s
>         cmeq    v31.4s, v31.4s, v29.4s
>         not     v31.16b, v31.16b
>         umaxp   v31.4s, v31.4s, v31.4s
>         fmov    x4, d31
>         cbz     x4, .L2
>         fmov    w1, s28
>         mov     w6, 4                                                                                                                                                                                                                                                        .L3:
> 
> so now the scalar index is no longer kept and it reduces the value from the vector IV in the exit:
> 
> fmov    w1, s28
> 
> Does this work as you expected?
> 
> Thanks,
> Tamar
> 
> > Thanks,
> > Tamar
> > > Thanks,
> > > Tamar
> > >
> > > >
> > > > Richard.
> > > >
> > > > > Tamar
> > > > > >
> > > > > > > Also you missed the question below about how to avoid the
> > > > > > > creation of the block, You ok with changing that?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > > > Or for now disable early-break for inductions that are not
> > > > > > > > the main exit control IV (in vect_can_advance_ivs_p)?
> > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > It seems your change handles different kinds of
> > > > > > > > > > > > inductions
> > > > > > differently.
> > > > > > > > > > > > Specifically
> > > > > > > > > > > >
> > > > > > > > > > > >       bool ivtemp = gimple_cond_lhs (cond) == iv_var;
> > > > > > > > > > > >       if (restart_loop && ivtemp)
> > > > > > > > > > > >         {
> > > > > > > > > > > >           type = TREE_TYPE (gimple_phi_result (phi));
> > > > > > > > > > > >           ni = build_int_cst (type, vf);
> > > > > > > > > > > >           if (inversed_iv)
> > > > > > > > > > > >             ni = fold_build2 (MINUS_EXPR, type, ni,
> > > > > > > > > > > >                               fold_convert (type, step_expr));
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > > it looks like for the exit test IV we use either 'VF' or 'VF - step'
> > > > > > > > > > > > as the new value.  That seems to be very odd special
> > > > > > > > > > > > casing for unknown reasons.  And while you adjust
> > > > > > > > > > > > vec_step_op_add, you don't adjust
> > > > > > > > > > > > vect_peel_nonlinear_iv_init (maybe not supported -
> > > > > > > > > > > > better assert
> > > > > > > > > > here).
> > > > > > > > > > >
> > > > > > > > > > > The VF case is for a normal "non-inverted" loop, where
> > > > > > > > > > > if you take an early exit you know that you have to do
> > > > > > > > > > > at most VF
> > > > iterations.
> > > > > > > > > > > The VF
> > > > > > > > > > > - step is to account for the inverted loop control
> > > > > > > > > > > flow where you exit after adjusting the IV already by + step.
> > > > > > > > > >
> > > > > > > > > > But doesn't that assume the IV counts from niter to zero?
> > > > > > > > > > I don't see this special case is actually necessary, no?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I needed it because otherwise the scalar loop iterates one
> > > > > > > > > iteration too little So I got a miscompile with the
> > > > > > > > > inverter loop stuff.  I'll look at it again perhaps It can be solved
> > differently.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Peeling doesn't matter here, since you know you were
> > > > > > > > > > > able to do a vector iteration so it's safe to do VF iterations.
> > > > > > > > > > > So having peeled doesn't affect the remaining iters count.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Also the vec_step_op_add case will keep the original
> > > > > > > > > > > > scalar IV live even when it is a vectorized induction.
> > > > > > > > > > > > The code recomputing the value from scratch avoids this.
> > > > > > > > > > > >
> > > > > > > > > > > >       /* For non-main exit create an intermediat
> > > > > > > > > > > > edge to get any updated
> > > > > > > > iv
> > > > > > > > > > > >          calculations.  */
> > > > > > > > > > > >       if (needs_interm_block
> > > > > > > > > > > >           && !iv_block
> > > > > > > > > > > >           && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > > !gimple_seq_empty_p
> > > > > > > > > > > > (new_stmts)))
> > > > > > > > > > > >         {
> > > > > > > > > > > >           iv_block = split_edge (update_e);
> > > > > > > > > > > >           update_e = single_succ_edge (update_e->dest);
> > > > > > > > > > > >           last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > > this is also odd, can we adjust the API instead?  I
> > > > > > > > > > > > suppose this is because your computation uses the
> > > > > > > > > > > > original loop IV, if you based the computation off
> > > > > > > > > > > > the initial value only this might not be
> > > > > > > > necessary?
> > > > > > > > > > >
> > > > > > > > > > > No, on the main exit the code updates the value in the
> > > > > > > > > > > loop header and puts the Calculation in the merge block.
> > > > > > > > > > > This works because it only needs to consume PHI nodes
> > > > > > > > > > > in the merge block and things like niters are
> > > > > > > > > > adjusted in the guard block.
> > > > > > > > > > >
> > > > > > > > > > > For an early exit, we don't have a guard block, only
> > > > > > > > > > > the merge
> > > > block.
> > > > > > > > > > > We have to update the PHI nodes in that block,  but
> > > > > > > > > > > can't do so since you can't produce a value and
> > > > > > > > > > > consume it in a PHI node in the same
> > > > > > > > BB.
> > > > > > > > > > > So we need to create the block to put the values in
> > > > > > > > > > > for use in the merge block.  Because there's no "guard"
> > > > > > > > > > > block for early
> > > > exits.
> > > > > > > > > >
> > > > > > > > > > ?  then compute niters in that block as well.
> > > > > > > > >
> > > > > > > > > We can't since it'll not be reachable through the right edge.
> > > > > > > > > What we can do if you want is slightly change peeling, we
> > > > > > > > > currently peel
> > > > > > as:
> > > > > > > > >
> > > > > > > > >   \        \             /
> > > > > > > > >   E1     E2        Normal exit
> > > > > > > > >     \       |          |
> > > > > > > > >        \    |          Guard
> > > > > > > > >           \ |          |
> > > > > > > > >          Merge block
> > > > > > > > >                   |
> > > > > > > > >              Pre Header
> > > > > > > > >
> > > > > > > > > If we instead peel as:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >   \        \             /
> > > > > > > > >   E1     E2        Normal exit
> > > > > > > > >     \       |          |
> > > > > > > > >        Exit join   Guard
> > > > > > > > >           \ |          |
> > > > > > > > >          Merge block
> > > > > > > > >                   |
> > > > > > > > >              Pre Header
> > > > > > > > >
> > > > > > > > > We can use the exit join block.  This would also mean
> > > > > > > > > vect_update_ivs_after_vectorizer Doesn't need to iterate
> > > > > > > > > over all exits and only really needs to adjust the phi
> > > > > > > > > nodes Coming out of the exit join
> > > > > > > > and guard block.
> > > > > > > > >
> > > > > > > > > Does this work for you?
> > > > > >
> > > > > > Yeah, I think that would work.  But I'd like to sort out the
> > > > > > correctness details of the IV update itself before sorting out
> > > > > > this code
> > > > placement detail.
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Tamar
> > > > > > > > > >
> > > > > > > > > > > The API can be adjusted by always creating the empty
> > > > > > > > > > > block either during
> > > > > > > > > > peeling.
> > > > > > > > > > > That would prevent us from having to do anything special here.
> > > > > > > > > > > Would that work better?  Or I can do it in the loop
> > > > > > > > > > > that iterates over the exits to before the call to
> > > > > > > > > > > vect_update_ivs_after_vectorizer, which I think
> > > > > > > > > > might be more consistent.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > That said, I wonder why we cannot simply pass in an
> > > > > > > > > > > > adjusted niter which would be niters_vector_mult_vf
> > > > > > > > > > > > - vf and be done with
> > > > > > that?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > We can ofcourse not have this and recompute it from
> > > > > > > > > > > niters itself, however this does affect the epilog code layout.
> > > > > > > > > > > Particularly knowing the static number if iterations
> > > > > > > > > > > left causes it to usually unroll the loop and share
> > > > > > > > > > > some of the computations.  i.e. the scalar code is
> > > > > > > > > > > often more
> > > > > > > > > > efficient.
> > > > > > > > > > >
> > > > > > > > > > > The computation would be niters_vector_mult_vf -
> > > > > > > > > > > iters_done * vf, since the value put Here is the
> > > > > > > > > > > remaining iteration
> > > > count.
> > > > > > > > > > > It's static for early
> > > > > > > > > > exits.
> > > > > > > > > >
> > > > > > > > > > Well, it might be "static" in that it doesn't really
> > > > > > > > > > matter what you use for the epilog main IV initial value
> > > > > > > > > > as long as you are sure you're not going to take that
> > > > > > > > > > exit as you are sure we're going to take one of the
> > > > > > > > > > early exits.  So yeah, the special code is probably OK,
> > > > > > > > > > but it needs a better comment and as said the structure
> > > > > > > > > > of
> > > > > > > > vect_update_ivs_after_vectorizer is a bit hard to follow now.
> > > > > > > > > >
> > > > > > > > > > As said an important part for optimization is to not
> > > > > > > > > > keep the scalar IVs live in the vector loop.
> > > > > > > > > >
> > > > > > > > > > > But can do whatever you prefer here.  Let me know what
> > > > > > > > > > > you prefer for the
> > > > > > > > > > above.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Tamar
> > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Richard.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > Tamar
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It has to do this since you have to perform
> > > > > > > > > > > > > > > the side effects for the non-matching elements still.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > Tamar
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +	      if (STMT_VINFO_LIVE_P (phi_info))
> > > > > > > > > > > > > > > > > +		continue;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +	      /* For early break the final loop IV is:
> > > > > > > > > > > > > > > > > +		 init + (final - init) * vf which takes
> > > > > > > > > > > > > > > > > +into account
> > > > > > > > peeling
> > > > > > > > > > > > > > > > > +		 values and non-single steps.  The
> > > main
> > > > > > > > > > > > > > > > > +exit
> > > > > > > > can
> > > > > > > > > > > > > > > > > +use
> > > > > > > > > > > > niters
> > > > > > > > > > > > > > > > > +		 since if you exit from the main exit
> > > > > > > > > > > > > > > > > +you've
> > > > > > > > done
> > > > > > > > > > > > > > > > > +all
> > > > > > > > > > > > vector
> > > > > > > > > > > > > > > > > +		 iterations.  For an early exit we
> > > > > > > > > > > > > > > > > +don't know
> > > > > > > > when
> > > > > > > > > > > > > > > > > +we
> > > > > > > > > > > > exit
> > > > > > > > > > > > > > > > > +so
> > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > +		 must re-calculate this on the exit.  */
> > > > > > > > > > > > > > > > > +	      tree start_expr = gimple_phi_result (phi);
> > > > > > > > > > > > > > > > > +	      off = fold_build2 (MINUS_EXPR, stype,
> > > > > > > > > > > > > > > > > +				 fold_convert (stype,
> > > > > > > > start_expr),
> > > > > > > > > > > > > > > > > +				 fold_convert (stype,
> > > > > > > > init_expr));
> > > > > > > > > > > > > > > > > +	      /* Now adjust for VF to get the
> > > > > > > > > > > > > > > > > +final
> > > iteration value.
> > > > > > > > */
> > > > > > > > > > > > > > > > > +	      off = fold_build2 (MULT_EXPR, stype, off,
> > > > > > > > > > > > > > > > > +				 build_int_cst (stype,
> > > vf));
> > > > > > > > > > > > > > > > > +	    }
> > > > > > > > > > > > > > > > > +	  else
> > > > > > > > > > > > > > > > > +	    off = fold_build2 (MULT_EXPR, stype,
> > > > > > > > > > > > > > > > > +			       fold_convert (stype,
> > > niters),
> > > > > > > > step_expr);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >  	  if (POINTER_TYPE_P (type))
> > > > > > > > > > > > > > > > >  	    ni = fold_build_pointer_plus (init_expr, off);
> > > > > > > > > > > > > > > > >  	  else
> > > > > > > > > > > > > > > > > @@ -2238,6 +2286,8 @@
> > > > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > > > > > >        /* Don't bother call vect_peel_nonlinear_iv_init.
> > */
> > > > > > > > > > > > > > > > >        else if (induction_type == vect_step_op_neg)
> > > > > > > > > > > > > > > > >  	ni = init_expr;
> > > > > > > > > > > > > > > > > +      else if (restart_loop)
> > > > > > > > > > > > > > > > > +	continue;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This looks all a bit complicated - why
> > > > > > > > > > > > > > > > wouldn't we simply always use the PHI result
> > > > > > > > > > > > > > > > when
> > > 'restart_loop'?
> > > > > > > > > > > > > > > > Isn't that the correct old start value in
> > > > > > > > > > > > > > all cases?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >        else
> > > > > > > > > > > > > > > > >  	ni = vect_peel_nonlinear_iv_init
> > > > > > > > > > > > > > > > > (&stmts,
> > > init_expr,
> > > > > > > > > > > > > > > > >  					  niters,
> > > step_expr,
> > > > > > @@ -
> > > > > > > > > > 2245,9 +2295,20 @@
> > > > > > > > > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > > (loop_vec_info
> > > > > > > > > > > > > > > > > loop_vinfo,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >        var = create_tmp_var (type, "tmp");
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -      last_gsi = gsi_last_bb (exit_bb);
> > > > > > > > > > > > > > > > >        gimple_seq new_stmts = NULL;
> > > > > > > > > > > > > > > > >        ni_name = force_gimple_operand (ni,
> > > > > > > > > > > > > > > > > &new_stmts, false, var);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +      /* For non-main exit create an
> > > > > > > > > > > > > > > > > + intermediat edge to get any
> > > > > > > > > > > > updated iv
> > > > > > > > > > > > > > > > > +	 calculations.  */
> > > > > > > > > > > > > > > > > +      if (needs_interm_block
> > > > > > > > > > > > > > > > > +	  && !iv_block
> > > > > > > > > > > > > > > > > +	  && (!gimple_seq_empty_p (stmts) ||
> > > > > > > > > > > > > > > > > +!gimple_seq_empty_p
> > > > > > > > > > > > > > > > (new_stmts)))
> > > > > > > > > > > > > > > > > +	{
> > > > > > > > > > > > > > > > > +	  iv_block = split_edge (update_e);
> > > > > > > > > > > > > > > > > +	  update_e = single_succ_edge (update_e-
> > > >dest);
> > > > > > > > > > > > > > > > > +	  last_gsi = gsi_last_bb (iv_block);
> > > > > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >        /* Exit_bb shouldn't be empty.  */
> > > > > > > > > > > > > > > > >        if (!gsi_end_p (last_gsi))
> > > > > > > > > > > > > > > > >  	{
> > > > > > > > > > > > > > > > > @@ -3342,8 +3403,26 @@ vect_do_peeling
> > > > > > > > > > > > > > > > > (loop_vec_info loop_vinfo, tree
> > > > > > > > > > > > > > > > niters, tree nitersm1,
> > > > > > > > > > > > > > > > >  	 niters_vector_mult_vf steps.  */
> > > > > > > > > > > > > > > > >        gcc_checking_assert
> > > > > > > > > > > > > > > > > (vect_can_advance_ivs_p
> > > > > > > > (loop_vinfo));
> > > > > > > > > > > > > > > > >        update_e = skip_vector ? e :
> > > > > > > > > > > > > > > > > loop_preheader_edge
> > > > > > (epilog);
> > > > > > > > > > > > > > > > > -      vect_update_ivs_after_vectorizer (loop_vinfo,
> > > > > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > > > -					update_e);
> > > > > > > > > > > > > > > > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > > > > > > > > > > > > > +	update_e = single_succ_edge (e->dest);
> > > > > > > > > > > > > > > > > +      bool inversed_iv
> > > > > > > > > > > > > > > > > +	= !vect_is_loop_exit_latch_pred
> > > > > > > > (LOOP_VINFO_IV_EXIT
> > > > > > > > > > > > (loop_vinfo),
> > > > > > > > > > > > > > > > > +
> > > LOOP_VINFO_LOOP
> > > > > > > > > > > > (loop_vinfo));
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > You are computing this here and in
> > > > > > > > > > vect_update_ivs_after_vectorizer?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +      /* Update the main exit first.  */
> > > > > > > > > > > > > > > > > +      vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > > > + (loop_vinfo, vf,
> > > > > > > > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > > > +					update_e,
> > > > > > > > inversed_iv);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +      /* And then update the early exits.  */
> > > > > > > > > > > > > > > > > +      for (auto exit : get_loop_exit_edges (loop))
> > > > > > > > > > > > > > > > > +	{
> > > > > > > > > > > > > > > > > +	  if (exit == LOOP_VINFO_IV_EXIT
> > > (loop_vinfo))
> > > > > > > > > > > > > > > > > +	    continue;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +	  vect_update_ivs_after_vectorizer
> > > > > > > > > > > > > > > > > +(loop_vinfo, vf,
> > > > > > > > > > > > > > > > > +
> > > > > > > > niters_vector_mult_vf,
> > > > > > > > > > > > > > > > > +					    exit, true);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ... why does the same not work here?
> > > > > > > > > > > > > > > > Wouldn't the proper condition be
> > > > > > > > > > > > > > > > !dominated_by_p (CDI_DOMINATORS,
> > > > > > > > > > > > > > > > exit->src, LOOP_VINFO_IV_EXIT
> > > > > > > > > > > > > > > > (loop_vinfo)->src) or similar?  That is,
> > > > > > > > > > > > > > > > whether the exit is at or after the main IV exit?
> > > > > > > > > > > > > > > > (consider having
> > > > > > > > > > > > > > > > two)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +	}
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >        if (skip_epilog)
> > > > > > > > > > > > > > > > >  	{
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146,
> > > > > > > > > > > > > > 90461 Nuernberg, Germany;
> > > > > > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich;
> > > > > > > > > > > > > > (HRB 36809, AG
> > > > > > > > > > > > > > Nuernberg)
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > > > > Nuernberg, Germany;
> > > > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich;
> > > > > > > > > > > > (HRB 36809, AG
> > > > > > > > > > > > Nuernberg)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software
> > > > > > > > > > Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > > > > > > > Nuernberg, Germany;
> > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB
> > > > > > > > > > 36809, AG
> > > > > > > > > > Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809,
> > > > > > > > AG
> > > > > > > > Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > > Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks
  2023-11-15 12:40     ` Richard Biener
@ 2023-11-20 21:51       ` Tamar Christina
  2023-11-24 10:16         ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-20 21:51 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 15716 bytes --]

Hi All,

Here's the respun patch:

This splits the part of the function that does peeling for loops at exits to
a different function.  In this new function we also peel for early breaks.

Peeling for early breaks works by redirecting all early break exits to a
single "early break" block and combine them and the normal exit edge together
later in a different block which then goes into the epilog preheader.

This allows us to re-use all the existing code for IV updates, Additionally this
also enables correct linking for multiple vector epilogues.

flush_pending_stmts cannot be used in this scenario since it updates the PHI
nodes in the order that they are in the exit destination blocks.  This means
they are in CFG visit order.  With a single exit this doesn't matter but with
multiple exits with different live values through the different exits the order
usually does not line up.

Additionally the vectorizer helper functions expect to be able to iterate over
the nodes in the order that they occur in the loop header blocks.  This is an
invariant we must maintain.  To do this we just inline the work
flush_pending_stmts but maintain the order by using the header blocks to guide
the work.

The way peeling is done result in LIM noticing that in some cases the condition
and the results are loop invariant and tries to move them out of the loop.

While the resulting code is operationally sound, moving the compare out of the
gcond results in generating code that no longer branches, so cbranch is no
longer applicable.  As such I now add code to check during this motion to see
if the target supports flag setting vector comparison as general operation.

One outstanding issue is probability scaling, testcases like
vect-epilogues-2.c.176t.vect fails because during peeling I make an intermediate
edge that is used to keep IV updates simple.  This edge seems to have the wrong
count:

  if (ivtmp_71 < bnd.8_54)
    goto <bb 8>; [89.00%]
  else
    goto <bb 24>; [11.00%]
;;    succ:       8 [89.0% (guessed)]  count:765459809 (estimated locally, freq 6.4808) (TRUE_VALUE,EXECUTABLE)
;;                24 [11.0% (guessed)]  count:94607391 (estimated locally, freq 0.8010) (FALSE_VALUE,EXECUTABLE)

;;   basic block 24, loop depth 0, count 105119324 (estimated locally, freq 0.8900), maybe hot
;;   Invalid sum of incoming counts 94607391 (estimated locally, freq 0.8010), should be 105119324 (estimated locally, freq 0.8900)
;;    prev block 8, next block 30, flags: (NEW, VISITED)
;;    pred:       3 [11.0% (guessed)]  count:94607391 (estimated locally, freq 0.8010) (FALSE_VALUE,EXECUTABLE)
  # res_46 = PHI <res_13(3)>

If I'm reading this error correctly, the edge count should be 94607391 but it
got 105119324.  My guess is that something scalled up the BB count, i.e.
94607391 * 1.11, but... I don't know why or what.  Any thoughts?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues aside from
profile counts above.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* gcc/tree-ssa-loop-im.cc (compute_invariantness): Import insn-codes.h
	and optabs-tree.h and check for vector compare motion out of gcond.
	* tree-vect-loop-manip.cc
	(slpeel_tree_duplicate_loop_for_vectorization): New.
	(slpeel_tree_duplicate_loop_to_edge_cfg): use it.
	* tree-vectorizer.h (is_loop_header_bb_p): Drop assert.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Update signature.

--- inline copy of patch ---

diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 396963b6754c7671e2e5404302a69129918555e2..92a9318a1ca0a2da50ff2f29cf271d2e78fddd77 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -48,6 +48,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "tree-ssa.h"
 #include "dbgcnt.h"
+#include "insn-codes.h"
+#include "optabs-tree.h"
 
 /* TODO:  Support for predicated code motion.  I.e.
 
@@ -1138,6 +1140,24 @@ compute_invariantness (basic_block bb)
 	    continue;
 	  }
 
+	/* Check if one of the depedent statement is a vector compare whether
+	   the target supports it,  otherwise it's invalid to hoist it out of
+	   the gcond it belonged to.  */
+	for (auto dep_stmt : lim_data->depends)
+	  {
+	     if (is_gimple_assign (dep_stmt)
+		 && VECTOR_TYPE_P (TREE_TYPE (gimple_assign_lhs (dep_stmt))))
+		{
+		  tree type = TREE_TYPE (gimple_assign_lhs (dep_stmt));
+		  auto code = gimple_assign_rhs_code (dep_stmt);
+		  if (!target_supports_op_p (type, code, optab_vector))
+		    pos = MOVE_IMPOSSIBLE;
+		}
+	  }
+
+	if (pos == MOVE_IMPOSSIBLE)
+	  continue;
+
 	if (dump_file && (dump_flags & TDF_DETAILS))
 	  {
 	    print_gimple_stmt (dump_file, stmt, 2);
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index b9161274ce401a7307f3e61ad23aa036701190d7..9d17e6877e1e16b359fdc86a92bf33a9760f8f86 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1403,13 +1403,16 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
    copies remains the same.
 
    If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
-   dominators were updated during the peeling.  */
+   dominators were updated during the peeling.  When doing early break vectorization
+   then LOOP_VINFO needs to be provided and is used to keep track of any newly created
+   memory references that need to be updated should we decide to vectorize.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 					class loop *scalar_loop,
 					edge scalar_exit, edge e, edge *new_e,
-					bool flow_loops)
+					bool flow_loops,
+					vec<basic_block> *updated_doms)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1526,7 +1529,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       }
 
   auto loop_exits = get_loop_exit_edges (loop);
+  bool multiple_exits_p = loop_exits.length () > 1;
   auto_vec<basic_block> doms;
+  class loop *update_loop = NULL;
 
   if (at_exit) /* Add the loop copy at exit.  */
     {
@@ -1536,39 +1541,61 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	  flush_pending_stmts (new_exit);
 	}
 
+      bool multiple_exits_p = loop_exits.length () > 1;
+      basic_block main_loop_exit_block = new_preheader;
+      basic_block alt_loop_exit_block = NULL;
+      /* Create intermediate edge for main exit.  */
+      edge loop_e = single_succ_edge (new_preheader);
+      new_preheader = split_edge (loop_e);
+
       auto_vec <gimple *> new_phis;
       hash_map <tree, tree> new_phi_args;
       /* First create the empty phi nodes so that when we flush the
 	 statements they can be filled in.   However because there is no order
 	 between the PHI nodes in the exits and the loop headers we need to
 	 order them base on the order of the two headers.  First record the new
-	 phi nodes.  */
-      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
+	 phi nodes. Then redirect the edges and flush the changes.  This writes
+	 out the new SSA names.  */
+      for (auto gsi_from = gsi_start_phis (loop_exit->dest);
 	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
 	{
 	  gimple *from_phi = gsi_stmt (gsi_from);
 	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	  gphi *res = create_phi_node (new_res, new_preheader);
+	  gphi *res = create_phi_node (new_res, main_loop_exit_block);
 	  new_phis.safe_push (res);
 	}
 
-      /* Then redirect the edges and flush the changes.  This writes out the new
-	 SSA names.  */
-      for (edge exit : loop_exits)
+      for (auto exit : loop_exits)
 	{
-	  edge temp_e = redirect_edge_and_branch (exit, new_preheader);
-	  flush_pending_stmts (temp_e);
+	  basic_block dest = main_loop_exit_block;
+	  if (exit != loop_exit)
+	    {
+	      if (!alt_loop_exit_block)
+		{
+		  alt_loop_exit_block = split_edge (exit);
+		  edge res = redirect_edge_and_branch (
+				single_succ_edge (alt_loop_exit_block),
+				new_preheader);
+		  flush_pending_stmts (res);
+		  continue;
+		}
+	      dest = alt_loop_exit_block;
+	    }
+	  edge e = redirect_edge_and_branch (exit, dest);
+	  flush_pending_stmts (e);
 	}
+
       /* Record the new SSA names in the cache so that we can skip materializing
 	 them again when we fill in the rest of the LCSSA variables.  */
       for (auto phi : new_phis)
 	{
-	  tree new_arg = gimple_phi_arg (phi, 0)->def;
+	  tree new_arg = gimple_phi_arg (phi, loop_exit->dest_idx)->def;
 
 	  if (!SSA_VAR_P (new_arg))
 	    continue;
+
 	  /* If the PHI MEM node dominates the loop then we shouldn't create
-	      a new LC-SSSA PHI for it in the intermediate block.   */
+	     a new LC-SSSA PHI for it in the intermediate block.   */
 	  /* A MEM phi that consitutes a new DEF for the vUSE chain can either
 	     be a .VDEF or a PHI that operates on MEM. And said definition
 	     must not be inside the main loop.  Or we must be a parameter.
@@ -1584,6 +1611,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	      remove_phi_node (&gsi, true);
 	      continue;
 	    }
+
+	  /* If we decide to remove the PHI node we should also not
+	     rematerialize it later on.  */
 	  new_phi_args.put (new_arg, gimple_phi_result (phi));
 
 	  if (TREE_CODE (new_arg) != SSA_NAME)
@@ -1595,34 +1625,58 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	 preheader block and still find the right LC nodes.  */
       edge loop_entry = single_succ_edge (new_preheader);
       if (flow_loops)
-	for (auto gsi_from = gsi_start_phis (loop->header),
-	     gsi_to = gsi_start_phis (new_loop->header);
-	     !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
-	     gsi_next (&gsi_from), gsi_next (&gsi_to))
-	  {
-	    gimple *from_phi = gsi_stmt (gsi_from);
-	    gimple *to_phi = gsi_stmt (gsi_to);
-	    tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
-						  loop_latch_edge (loop));
+	{
+	  /* Link through the main exit first.  */
+	  for (auto gsi_from = gsi_start_phis (loop->header),
+	       gsi_to = gsi_start_phis (new_loop->header);
+	       !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	       gsi_next (&gsi_from), gsi_next (&gsi_to))
+	    {
+	      gimple *from_phi = gsi_stmt (gsi_from);
+	      gimple *to_phi = gsi_stmt (gsi_to);
+	      tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
+						    loop_latch_edge (loop));
 
-	    /* Check if we've already created a new phi node during edge
-	       redirection.  If we have, only propagate the value downwards.  */
-	    if (tree *res = new_phi_args.get (new_arg))
-	      {
-		adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
-		continue;
-	      }
+	      /* Check if we've already created a new phi node during edge
+		 redirection.  If we have, only propagate the value
+		 downwards.  */
+	      if (tree *res = new_phi_args.get (new_arg))
+		new_arg = *res;
 
-	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
+	      tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	      gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
 
-	    /* Main loop exit should use the final iter value.  */
-	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
+	      /* Main loop exit should use the final iter value.  */
+	      SET_PHI_ARG_DEF (lcssa_phi, loop_exit->dest_idx, new_arg);
 
-	    adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
-	  }
+	      adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
+	    }
 
-      set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
+	  set_immediate_dominator (CDI_DOMINATORS, main_loop_exit_block, loop_exit->src);
+	  set_immediate_dominator (CDI_DOMINATORS, new_preheader, main_loop_exit_block);
+
+	  /* Now link the alternative exits.  */
+	  if (multiple_exits_p)
+	    {
+	      for (auto gsi_from = gsi_start_phis (loop->header),
+		   gsi_to = gsi_start_phis (new_preheader);
+		   !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+		   gsi_next (&gsi_from), gsi_next (&gsi_to))
+		{
+		  gimple *from_phi = gsi_stmt (gsi_from);
+		  gimple *to_phi = gsi_stmt (gsi_to);
+
+		  tree alt_arg = gimple_phi_result (from_phi);
+		  edge main_e = single_succ_edge (alt_loop_exit_block);
+		  for (edge e : loop_exits)
+		    if (e != loop_exit)
+		      SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
+		}
+
+	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
+				       loop->header);
+	    }
+	}
 
       if (was_imm_dom || duplicate_outer_loop)
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
@@ -1634,6 +1688,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (preheader);
       set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
 			       loop_preheader_edge (scalar_loop)->src);
+
+      /* Finally after wiring the new epilogue we need to update its main exit
+	 to the original function exit we recorded.  Other exits are already
+	 correct.  */
+      if (multiple_exits_p)
+	{
+	  update_loop = new_loop;
+	  for (edge e : get_loop_exit_edges (loop))
+	    doms.safe_push (e->dest);
+	  doms.safe_push (exit_dest);
+
+	  /* Likely a fall-through edge, so update if needed.  */
+	  if (single_succ_p (exit_dest))
+	    doms.safe_push (single_succ (exit_dest));
+	}
     }
   else /* Add the copy at entry.  */
     {
@@ -1681,6 +1750,34 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (new_preheader);
       set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
 			       loop_preheader_edge (new_loop)->src);
+
+      if (multiple_exits_p)
+	update_loop = loop;
+    }
+
+  if (multiple_exits_p)
+    {
+      for (edge e : get_loop_exit_edges (update_loop))
+	{
+	  edge ex;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
+	    {
+	      /* Find the first non-fallthrough block as fall-throughs can't
+		 dominate other blocks.  */
+	      if (single_succ_p (ex->dest))
+		{
+		  doms.safe_push (ex->dest);
+		  ex = single_succ_edge (ex->dest);
+		}
+	      doms.safe_push (ex->dest);
+	    }
+	  doms.safe_push (e->dest);
+	}
+
+      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+      if (updated_doms)
+	updated_doms->safe_splice (doms);
     }
 
   free (new_bbs);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index b5e27d1c46d9cb3dfe5b44f1b49c9e4204572ff1..39aa4d1250efe308acccf484d370f8adfd1ba843 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1821,7 +1821,7 @@ is_loop_header_bb_p (basic_block bb)
 {
   if (bb == (bb->loop_father)->header)
     return true;
-  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
+
   return false;
 }
 
@@ -2212,7 +2212,8 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
 					 const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
 						    class loop *, edge,
-						    edge, edge *, bool = true);
+						    edge, edge *, bool = true,
+						    vec<basic_block> * = NULL);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,

[-- Attachment #2: rb17964 (1).patch --]
[-- Type: application/octet-stream, Size: 11906 bytes --]

diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 396963b6754c7671e2e5404302a69129918555e2..92a9318a1ca0a2da50ff2f29cf271d2e78fddd77 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -48,6 +48,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "tree-ssa.h"
 #include "dbgcnt.h"
+#include "insn-codes.h"
+#include "optabs-tree.h"
 
 /* TODO:  Support for predicated code motion.  I.e.
 
@@ -1138,6 +1140,24 @@ compute_invariantness (basic_block bb)
 	    continue;
 	  }
 
+	/* Check if one of the depedent statement is a vector compare whether
+	   the target supports it,  otherwise it's invalid to hoist it out of
+	   the gcond it belonged to.  */
+	for (auto dep_stmt : lim_data->depends)
+	  {
+	     if (is_gimple_assign (dep_stmt)
+		 && VECTOR_TYPE_P (TREE_TYPE (gimple_assign_lhs (dep_stmt))))
+		{
+		  tree type = TREE_TYPE (gimple_assign_lhs (dep_stmt));
+		  auto code = gimple_assign_rhs_code (dep_stmt);
+		  if (!target_supports_op_p (type, code, optab_vector))
+		    pos = MOVE_IMPOSSIBLE;
+		}
+	  }
+
+	if (pos == MOVE_IMPOSSIBLE)
+	  continue;
+
 	if (dump_file && (dump_flags & TDF_DETAILS))
 	  {
 	    print_gimple_stmt (dump_file, stmt, 2);
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index b9161274ce401a7307f3e61ad23aa036701190d7..9d17e6877e1e16b359fdc86a92bf33a9760f8f86 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1403,13 +1403,16 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
    copies remains the same.
 
    If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
-   dominators were updated during the peeling.  */
+   dominators were updated during the peeling.  When doing early break vectorization
+   then LOOP_VINFO needs to be provided and is used to keep track of any newly created
+   memory references that need to be updated should we decide to vectorize.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 					class loop *scalar_loop,
 					edge scalar_exit, edge e, edge *new_e,
-					bool flow_loops)
+					bool flow_loops,
+					vec<basic_block> *updated_doms)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1526,7 +1529,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       }
 
   auto loop_exits = get_loop_exit_edges (loop);
+  bool multiple_exits_p = loop_exits.length () > 1;
   auto_vec<basic_block> doms;
+  class loop *update_loop = NULL;
 
   if (at_exit) /* Add the loop copy at exit.  */
     {
@@ -1536,39 +1541,61 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	  flush_pending_stmts (new_exit);
 	}
 
+      bool multiple_exits_p = loop_exits.length () > 1;
+      basic_block main_loop_exit_block = new_preheader;
+      basic_block alt_loop_exit_block = NULL;
+      /* Create intermediate edge for main exit.  */
+      edge loop_e = single_succ_edge (new_preheader);
+      new_preheader = split_edge (loop_e);
+
       auto_vec <gimple *> new_phis;
       hash_map <tree, tree> new_phi_args;
       /* First create the empty phi nodes so that when we flush the
 	 statements they can be filled in.   However because there is no order
 	 between the PHI nodes in the exits and the loop headers we need to
 	 order them base on the order of the two headers.  First record the new
-	 phi nodes.  */
-      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
+	 phi nodes. Then redirect the edges and flush the changes.  This writes
+	 out the new SSA names.  */
+      for (auto gsi_from = gsi_start_phis (loop_exit->dest);
 	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
 	{
 	  gimple *from_phi = gsi_stmt (gsi_from);
 	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	  gphi *res = create_phi_node (new_res, new_preheader);
+	  gphi *res = create_phi_node (new_res, main_loop_exit_block);
 	  new_phis.safe_push (res);
 	}
 
-      /* Then redirect the edges and flush the changes.  This writes out the new
-	 SSA names.  */
-      for (edge exit : loop_exits)
+      for (auto exit : loop_exits)
 	{
-	  edge temp_e = redirect_edge_and_branch (exit, new_preheader);
-	  flush_pending_stmts (temp_e);
+	  basic_block dest = main_loop_exit_block;
+	  if (exit != loop_exit)
+	    {
+	      if (!alt_loop_exit_block)
+		{
+		  alt_loop_exit_block = split_edge (exit);
+		  edge res = redirect_edge_and_branch (
+				single_succ_edge (alt_loop_exit_block),
+				new_preheader);
+		  flush_pending_stmts (res);
+		  continue;
+		}
+	      dest = alt_loop_exit_block;
+	    }
+	  edge e = redirect_edge_and_branch (exit, dest);
+	  flush_pending_stmts (e);
 	}
+
       /* Record the new SSA names in the cache so that we can skip materializing
 	 them again when we fill in the rest of the LCSSA variables.  */
       for (auto phi : new_phis)
 	{
-	  tree new_arg = gimple_phi_arg (phi, 0)->def;
+	  tree new_arg = gimple_phi_arg (phi, loop_exit->dest_idx)->def;
 
 	  if (!SSA_VAR_P (new_arg))
 	    continue;
+
 	  /* If the PHI MEM node dominates the loop then we shouldn't create
-	      a new LC-SSSA PHI for it in the intermediate block.   */
+	     a new LC-SSSA PHI for it in the intermediate block.   */
 	  /* A MEM phi that consitutes a new DEF for the vUSE chain can either
 	     be a .VDEF or a PHI that operates on MEM. And said definition
 	     must not be inside the main loop.  Or we must be a parameter.
@@ -1584,6 +1611,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	      remove_phi_node (&gsi, true);
 	      continue;
 	    }
+
+	  /* If we decide to remove the PHI node we should also not
+	     rematerialize it later on.  */
 	  new_phi_args.put (new_arg, gimple_phi_result (phi));
 
 	  if (TREE_CODE (new_arg) != SSA_NAME)
@@ -1595,34 +1625,58 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	 preheader block and still find the right LC nodes.  */
       edge loop_entry = single_succ_edge (new_preheader);
       if (flow_loops)
-	for (auto gsi_from = gsi_start_phis (loop->header),
-	     gsi_to = gsi_start_phis (new_loop->header);
-	     !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
-	     gsi_next (&gsi_from), gsi_next (&gsi_to))
-	  {
-	    gimple *from_phi = gsi_stmt (gsi_from);
-	    gimple *to_phi = gsi_stmt (gsi_to);
-	    tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
-						  loop_latch_edge (loop));
+	{
+	  /* Link through the main exit first.  */
+	  for (auto gsi_from = gsi_start_phis (loop->header),
+	       gsi_to = gsi_start_phis (new_loop->header);
+	       !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	       gsi_next (&gsi_from), gsi_next (&gsi_to))
+	    {
+	      gimple *from_phi = gsi_stmt (gsi_from);
+	      gimple *to_phi = gsi_stmt (gsi_to);
+	      tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
+						    loop_latch_edge (loop));
 
-	    /* Check if we've already created a new phi node during edge
-	       redirection.  If we have, only propagate the value downwards.  */
-	    if (tree *res = new_phi_args.get (new_arg))
-	      {
-		adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
-		continue;
-	      }
+	      /* Check if we've already created a new phi node during edge
+		 redirection.  If we have, only propagate the value
+		 downwards.  */
+	      if (tree *res = new_phi_args.get (new_arg))
+		new_arg = *res;
 
-	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
+	      tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	      gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
 
-	    /* Main loop exit should use the final iter value.  */
-	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
+	      /* Main loop exit should use the final iter value.  */
+	      SET_PHI_ARG_DEF (lcssa_phi, loop_exit->dest_idx, new_arg);
 
-	    adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
-	  }
+	      adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
+	    }
 
-      set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
+	  set_immediate_dominator (CDI_DOMINATORS, main_loop_exit_block, loop_exit->src);
+	  set_immediate_dominator (CDI_DOMINATORS, new_preheader, main_loop_exit_block);
+
+	  /* Now link the alternative exits.  */
+	  if (multiple_exits_p)
+	    {
+	      for (auto gsi_from = gsi_start_phis (loop->header),
+		   gsi_to = gsi_start_phis (new_preheader);
+		   !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+		   gsi_next (&gsi_from), gsi_next (&gsi_to))
+		{
+		  gimple *from_phi = gsi_stmt (gsi_from);
+		  gimple *to_phi = gsi_stmt (gsi_to);
+
+		  tree alt_arg = gimple_phi_result (from_phi);
+		  edge main_e = single_succ_edge (alt_loop_exit_block);
+		  for (edge e : loop_exits)
+		    if (e != loop_exit)
+		      SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
+		}
+
+	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
+				       loop->header);
+	    }
+	}
 
       if (was_imm_dom || duplicate_outer_loop)
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
@@ -1634,6 +1688,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (preheader);
       set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
 			       loop_preheader_edge (scalar_loop)->src);
+
+      /* Finally after wiring the new epilogue we need to update its main exit
+	 to the original function exit we recorded.  Other exits are already
+	 correct.  */
+      if (multiple_exits_p)
+	{
+	  update_loop = new_loop;
+	  for (edge e : get_loop_exit_edges (loop))
+	    doms.safe_push (e->dest);
+	  doms.safe_push (exit_dest);
+
+	  /* Likely a fall-through edge, so update if needed.  */
+	  if (single_succ_p (exit_dest))
+	    doms.safe_push (single_succ (exit_dest));
+	}
     }
   else /* Add the copy at entry.  */
     {
@@ -1681,6 +1750,34 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (new_preheader);
       set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
 			       loop_preheader_edge (new_loop)->src);
+
+      if (multiple_exits_p)
+	update_loop = loop;
+    }
+
+  if (multiple_exits_p)
+    {
+      for (edge e : get_loop_exit_edges (update_loop))
+	{
+	  edge ex;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
+	    {
+	      /* Find the first non-fallthrough block as fall-throughs can't
+		 dominate other blocks.  */
+	      if (single_succ_p (ex->dest))
+		{
+		  doms.safe_push (ex->dest);
+		  ex = single_succ_edge (ex->dest);
+		}
+	      doms.safe_push (ex->dest);
+	    }
+	  doms.safe_push (e->dest);
+	}
+
+      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+      if (updated_doms)
+	updated_doms->safe_splice (doms);
     }
 
   free (new_bbs);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index b5e27d1c46d9cb3dfe5b44f1b49c9e4204572ff1..39aa4d1250efe308acccf484d370f8adfd1ba843 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1821,7 +1821,7 @@ is_loop_header_bb_p (basic_block bb)
 {
   if (bb == (bb->loop_father)->header)
     return true;
-  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
+
   return false;
 }
 
@@ -2212,7 +2212,8 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
 					 const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
 						    class loop *, edge,
-						    edge, edge *, bool = true);
+						    edge, edge *, bool = true,
+						    vec<basic_block> * = NULL);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-17 12:13                                     ` Richard Biener
@ 2023-11-20 21:54                                       ` Tamar Christina
  2023-11-24 10:18                                         ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-20 21:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 10204 bytes --]

> >
> > Ok, so I currently have the following solution.  Let me know if you
> > agree with it and I'll polish it up today and tomorrow and respin things.
> >
> > 1. During vect_update_ivs_after_vectorizer we no longer touch any PHIs
> aside from
> >      Just updating IVtemps with the expected remaining iteration count.
> 
> OK
> 
> > 2. During vect_transform_loop after vectorizing any induction or reduction I
> call vectorizable_live_operation
> >      For any phi node that still has any usages in the early exit merge block.
> 
> OK, I suppose you need to amend the vectorizable_live_operation API to tell it
> it works for the early exits or the main exit (and not complain when
> !STMT_VINFO_LIVE_P for the early exit case).
> 
> > 3. vectorizable_live_operation is taught to have to materialize the
> > same PHI in multiple exits
> 
> For the main exit you'd get here via STMT_VINFO_LIVE_P handling and
> vect_update_ivs_after_vectorizer would handle the rest.  For the early exits I
> think you only have to materialize once (in the merge block)?
> 
> > 4. vectorizable_reduction or maybe vect_create_epilog_for_reduction need
> to be modified to for early exits materialize
> >     The previous iteration value.
> 
> I think you need to only touch vect_create_epilog_for_reduction, the early exit
> merge block needs another reduction epilog.  Well, in theory just another
> vector to reduce but not sure if the control flow supports having the same
> actual epilog for both the main and the early exits.
> 
> Richard.

Good morning,

Here's the much cleaner respun patch:

This changes the PHI node updates to support early breaks.
It has to support both the case where the loop's exit matches the normal loop
exit and one where the early exit is "inverted", i.e. it's an early exit edge.

In the latter case we must always restart the loop for VF iterations.  For an
early exit the reason is obvious, but there are cases where the "normal" exit
is located before the early one.  This exit then does a check on ivtmp resulting
in us leaving the loop since it thinks we're done.

In these case we may still have side-effects to perform so we also go to the
scalar loop.

For the "normal" exit niters has already been adjusted for peeling, for the
early exits we must find out how many iterations we actually did.  So we have
to recalculate the new position for each exit.

For the "inverse" case I know what to do, but I wanted to ask where you wanted
it.  For inverted cases like ./gcc/testsuite/gcc.dg/vect/vect-early-break_70.c

the requirement is that any PHI value aside from the IV needs to be the value
of the early exit. i.e. the value of the incomplete exit as there's no iteration
that is "complete".

The IV should become:  niters - (((niters / vf) - 1) * vf)

So e.g. on a loop with niters = 17 and VF 4 it becomes
17 - (((17 / 4) - 1) * 4))) = 5.  This addresses the odd +step you had commented
on before.

To do these two I can either modify vect_update_ivs_after_vectorizer, or add
a smaller utility function that patched up this case if we want to keep
vect_update_ivs_after_vectorizer simple.

Which do you prefer?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide unused.
	(vect_update_ivs_after_vectorizer): Support early break.
	(vect_do_peeling): Use it.
	(vect_is_loop_exit_latch_pred): New.
	* tree-vectorizer.h (vect_is_loop_exit_latch_pred): New.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 5ab883fdeebf1917979fe44eb16356aaef637df7..5751aa6295ca052534cef1984a26c65994a57389 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1407,6 +1407,17 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
 		     (gimple *) cond_stmt);
 }
 
+/* Determine if the exit choosen by the loop vectorizer differs from the
+   natural loop exit.  i.e. if the exit leads to the loop patch or not.
+   When this happens we need to flip the understanding of main and other
+   exits by peeling and IV updates.  */
+
+bool
+vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
+{
+  return single_pred (loop->latch) == loop_exit->src;
+}
+
 /* Given LOOP this function generates a new copy of it and puts it
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
@@ -2134,6 +2145,10 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
                   The phi args associated with the edge UPDATE_E in the bb
                   UPDATE_E->dest are updated accordingly.
 
+     - MULTIPLE_EXIT - Indicates whether the scalar loop needs to restart the
+		       iteration count where the vector loop began.
+     - EXIT_BB - The basic block to insert any new statement for UPDATE_E into.
+
      Assumption 1: Like the rest of the vectorizer, this function assumes
      a single loop exit that has a single predecessor.
 
@@ -2152,17 +2167,14 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
 
 static void
 vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
-				  tree niters, edge update_e)
+				  tree niters, edge update_e,
+				  bool multiple_exit, basic_block exit_bb)
 {
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
-
-  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-
-  /* Make sure there exists a single-predecessor exit bb:  */
-  gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_succ_edge (exit_bb) == update_e);
+  gcond *cond = get_loop_exit_condition (LOOP_VINFO_IV_EXIT (loop_vinfo));
+  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
        !gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -2172,7 +2184,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       tree step_expr, off;
       tree type;
       tree var, ni, ni_name;
-      gimple_stmt_iterator last_gsi;
 
       gphi *phi = gsi.phi ();
       gphi *phi1 = gsi1.phi ();
@@ -2204,11 +2215,27 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       enum vect_induction_op_type induction_type
 	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
 
-      if (induction_type == vect_step_op_add)
+      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+      /* create_iv always places it on the LHS.  Alternatively we can set a
+	 property during create_iv to identify it.  */
+      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
+      if (multiple_exit && ivtemp)
+	{
+	  type = TREE_TYPE (gimple_phi_result (phi));
+	  ni = build_int_cst (type, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+	}
+      else if (induction_type == vect_step_op_add)
 	{
+
 	  tree stype = TREE_TYPE (step_expr);
-	  off = fold_build2 (MULT_EXPR, stype,
-			     fold_convert (stype, niters), step_expr);
+
+	  /* Early exits always use last iter value not niters. */
+	  if (multiple_exit)
+	    continue;
+	  else
+	    off = fold_build2 (MULT_EXPR, stype,
+			       fold_convert (stype, niters), step_expr);
+
 	  if (POINTER_TYPE_P (type))
 	    ni = fold_build_pointer_plus (init_expr, off);
 	  else
@@ -2227,9 +2254,9 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 
       var = create_tmp_var (type, "tmp");
 
-      last_gsi = gsi_last_bb (exit_bb);
       gimple_seq new_stmts = NULL;
       ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+
       /* Exit_bb shouldn't be empty.  */
       if (!gsi_end_p (last_gsi))
 	{
@@ -3324,8 +3351,31 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 niters_vector_mult_vf steps.  */
       gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
       update_e = skip_vector ? e : loop_preheader_edge (epilog);
+      edge alt_exit;
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (loop))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      {
+		alt_exit = single_succ_edge (exit->dest);
+		break;
+	      }
+	  update_e = single_succ_edge (e->dest);
+	}
+      bool inversed_iv
+	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
+					 LOOP_VINFO_LOOP (loop_vinfo));
+
+      /* Update the main exit first.  */
       vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
-					update_e);
+					update_e, inversed_iv,
+					LOOP_VINFO_IV_EXIT (loop_vinfo)->dest);
+
+      /* And then update the early exits, we only need to update the alt exit
+	 merge edge, but have to find it first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
+					  alt_exit, true, alt_exit->src);
 
       if (skip_epilog)
 	{
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 39aa4d1250efe308acccf484d370f8adfd1ba843..22a8c3d384d7ae1ca93079b64f2d40821b4a3c56 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2224,6 +2224,7 @@ extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
+extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,

[-- Attachment #2: rb17967 (1).patch --]
[-- Type: application/octet-stream, Size: 6494 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 5ab883fdeebf1917979fe44eb16356aaef637df7..5751aa6295ca052534cef1984a26c65994a57389 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
    loop handles exactly VF scalars per iteration.  */
 
 static gcond *
-vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
+vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
 				class loop *loop, tree niters, tree step,
 				tree final_iv, bool niters_maybe_zero,
 				gimple_stmt_iterator loop_cond_gsi)
@@ -1407,6 +1407,17 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
 		     (gimple *) cond_stmt);
 }
 
+/* Determine if the exit choosen by the loop vectorizer differs from the
+   natural loop exit.  i.e. if the exit leads to the loop patch or not.
+   When this happens we need to flip the understanding of main and other
+   exits by peeling and IV updates.  */
+
+bool
+vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
+{
+  return single_pred (loop->latch) == loop_exit->src;
+}
+
 /* Given LOOP this function generates a new copy of it and puts it
    on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
    non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
@@ -2134,6 +2145,10 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
                   The phi args associated with the edge UPDATE_E in the bb
                   UPDATE_E->dest are updated accordingly.
 
+     - MULTIPLE_EXIT - Indicates whether the scalar loop needs to restart the
+		       iteration count where the vector loop began.
+     - EXIT_BB - The basic block to insert any new statement for UPDATE_E into.
+
      Assumption 1: Like the rest of the vectorizer, this function assumes
      a single loop exit that has a single predecessor.
 
@@ -2152,17 +2167,14 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
 
 static void
 vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
-				  tree niters, edge update_e)
+				  tree niters, edge update_e,
+				  bool multiple_exit, basic_block exit_bb)
 {
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
-
-  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-
-  /* Make sure there exists a single-predecessor exit bb:  */
-  gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_succ_edge (exit_bb) == update_e);
+  gcond *cond = get_loop_exit_condition (LOOP_VINFO_IV_EXIT (loop_vinfo));
+  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
        !gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -2172,7 +2184,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       tree step_expr, off;
       tree type;
       tree var, ni, ni_name;
-      gimple_stmt_iterator last_gsi;
 
       gphi *phi = gsi.phi ();
       gphi *phi1 = gsi1.phi ();
@@ -2204,11 +2215,27 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       enum vect_induction_op_type induction_type
 	= STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info);
 
-      if (induction_type == vect_step_op_add)
+      tree iv_var = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+      /* create_iv always places it on the LHS.  Alternatively we can set a
+	 property during create_iv to identify it.  */
+      bool ivtemp = gimple_cond_lhs (cond) == iv_var;
+      if (multiple_exit && ivtemp)
+	{
+	  type = TREE_TYPE (gimple_phi_result (phi));
+	  ni = build_int_cst (type, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+	}
+      else if (induction_type == vect_step_op_add)
 	{
+
 	  tree stype = TREE_TYPE (step_expr);
-	  off = fold_build2 (MULT_EXPR, stype,
-			     fold_convert (stype, niters), step_expr);
+
+	  /* Early exits always use last iter value not niters. */
+	  if (multiple_exit)
+	    continue;
+	  else
+	    off = fold_build2 (MULT_EXPR, stype,
+			       fold_convert (stype, niters), step_expr);
+
 	  if (POINTER_TYPE_P (type))
 	    ni = fold_build_pointer_plus (init_expr, off);
 	  else
@@ -2227,9 +2254,9 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 
       var = create_tmp_var (type, "tmp");
 
-      last_gsi = gsi_last_bb (exit_bb);
       gimple_seq new_stmts = NULL;
       ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+
       /* Exit_bb shouldn't be empty.  */
       if (!gsi_end_p (last_gsi))
 	{
@@ -3324,8 +3351,31 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 niters_vector_mult_vf steps.  */
       gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
       update_e = skip_vector ? e : loop_preheader_edge (epilog);
+      edge alt_exit;
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (loop))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      {
+		alt_exit = single_succ_edge (exit->dest);
+		break;
+	      }
+	  update_e = single_succ_edge (e->dest);
+	}
+      bool inversed_iv
+	= !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
+					 LOOP_VINFO_LOOP (loop_vinfo));
+
+      /* Update the main exit first.  */
       vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
-					update_e);
+					update_e, inversed_iv,
+					LOOP_VINFO_IV_EXIT (loop_vinfo)->dest);
+
+      /* And then update the early exits, we only need to update the alt exit
+	 merge edge, but have to find it first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
+					  alt_exit, true, alt_exit->src);
 
       if (skip_epilog)
 	{
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 39aa4d1250efe308acccf484d370f8adfd1ba843..22a8c3d384d7ae1ca93079b64f2d40821b4a3c56 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2224,6 +2224,7 @@ extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
+extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-16 11:16         ` Richard Biener
@ 2023-11-20 21:57           ` Tamar Christina
  2023-11-24 10:20             ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-20 21:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 24144 bytes --]

Good morning,

Here's the respun patch,  as discussed we now use reductions and inductions rather than scalar values:

This adds support to vectorizable_live_reduction to handle multiple exits by
doing a search for which exit the live value should be materialized in.

Additinally which value in the index we're after depends on whether the exit
it's materialized in is an early exit or whether the loop's main exit is
different from the loop's natural one (i.e. the one with the same src block as
the latch).

In those two cases we want the first rather than the last value as we're going
to restart the iteration in the scalar loop.  For VLA this means we need to
reverse both the mask and vector since there's only a way to get the last
active element and not the first.

For inductions and multiple exits:
  - we test if the target will support vectorizing the induction
  - mark all inductions in the loop as relevant
  - for codegen of non-live inductions during codegen
  - induction during an early exit gets the first element rather than last.

For reductions and multiple exits:
  - Reductions for early exits reduces the reduction definition statement
    rather than the reduction step.  This allows us to get the value at the
    start of the iteration.
  - The peeling layout means that we just have to update one block, the merge
    block.  We expect all the reductions to be the same but we leave it up to
    the value numbering to clean up any duplicate code as we iterate over all
    edges.

These two changes fix the reduction codegen given before which has been added
to the testsuite for early vect.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
	(vect_analyze_loop_operations): Check if target supports vectorizing IV.
	(vect_transform_loop): Call vectorizable_live_operation for non-live
	inductions or reductions.
	(find_connected_edge, vectorizable_live_operation_1): New.
	(vect_create_epilog_for_reduction): Support reductions in early break.
	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
	(vect_stmt_relevant_p): Mark all inductions when early break as being
	relevant.
	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
	(vect_iv_increment_position): New.
	* tree-vect-loop-manip.cc (vect_iv_increment_position): Expose.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 139311142b376d4eaa7ef8765608220b1eb92b31..af216d3dcb8a502639898ff67cb86948a7f140a4 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -453,7 +453,7 @@ vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
    INSERT_AFTER is set to true if the increment should be inserted after
    *BSI.  */
 
-static void
+void
 vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
 			    bool *insert_after)
 {
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..b1c34c4c3aaf8bdf9bf52d5a726836936de772b6 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2163,6 +2163,15 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
 	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
 					      -1, false, &cost_vec);
 
+	  /* Check if we can perform the operation for early break if we force
+	     the live operation.  */
+	  if (ok
+	      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+	      && !STMT_VINFO_LIVE_P (stmt_info)
+	      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
+					      -1, false, &cost_vec);
+
           if (!ok)
 	    return opt_result::failure_at (phi,
 					   "not vectorized: relevant phi not "
@@ -5842,6 +5851,10 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
    SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
    REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
      (counting from 0)
+   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
+     exit this edge is always the main loop exit.
+   MAIN_EXIT_P indicates whether we are updating the main exit or an alternate
+     exit.  This determines whether we use the final or original value.
 
    This function:
    1. Completes the reduction def-use cycles.
@@ -5882,7 +5895,9 @@ static void
 vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 				  stmt_vec_info stmt_info,
 				  slp_tree slp_node,
-				  slp_instance slp_node_instance)
+				  slp_instance slp_node_instance,
+				  edge loop_exit,
+				  bool main_exit_p = true)
 {
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
@@ -6053,7 +6068,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
       /* Create an induction variable.  */
       gimple_stmt_iterator incr_gsi;
       bool insert_after;
-      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
       create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
 		 insert_after, &indx_before_incr, &indx_after_incr);
 
@@ -6132,23 +6147,30 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+  /* We need to reduce values in all exits.  */
+  exit_bb = loop_exit->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
+  vec <gimple *> vec_stmts;
+  if (main_exit_p)
+    vec_stmts = STMT_VINFO_VEC_STMTS (rdef_info);
+  else
+    vec_stmts = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF (rdef_info));
+
   for (unsigned i = 0; i < vec_num; i++)
     {
       gimple_seq stmts = NULL;
       if (slp_node)
 	def = vect_get_slp_vect_def (slp_node, i);
       else
-	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
+	def = gimple_get_lhs (vec_stmts[0]);
       for (j = 0; j < ncopies; j++)
 	{
 	  tree new_def = copy_ssa_name (def);
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
-	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
+	    def = gimple_get_lhs (vec_stmts[j]);
+	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -6885,7 +6907,20 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
           FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
 	    {
 	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-		SET_USE (use_p, scalar_result);
+		{
+		  gimple *stmt = USE_STMT (use_p);
+		  if (main_exit_p)
+		    SET_USE (use_p, scalar_result);
+		  else if (is_a <gphi *> (stmt))
+		    {
+		      /* If an early exit only update usages in the merge
+			 block.  */
+		      edge merge_e = single_succ_edge (loop_exit->dest);
+		      if (gimple_bb (stmt) != merge_e->dest)
+			continue;
+		      SET_PHI_ARG_DEF (stmt, merge_e->dest_idx, scalar_result);
+		    }
+		}
 	      update_stmt (use_stmt);
 	    }
         }
@@ -10481,6 +10516,156 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return true;
 }
 
+/* Function vectorizable_live_operation_1.
+
+   helper function for vectorizable_live_operation.  */
+
+tree
+vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
+			       stmt_vec_info stmt_info, edge exit_e,
+			       tree vectype, int ncopies, slp_tree slp_node,
+			       tree bitsize, tree bitstart, tree vec_lhs,
+			       tree lhs_type, bool restart_loop,
+			       gimple_stmt_iterator *exit_gsi)
+{
+  basic_block exit_bb = exit_e->dest;
+  gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+  tree vec_lhs_phi = copy_ssa_name (vec_lhs);
+  gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
+  for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
+    SET_PHI_ARG_DEF (phi, i, vec_lhs);
+
+  gimple_seq stmts = NULL;
+  tree new_tree;
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (ncopies == 1 && !slp_node);
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree len = vect_get_loop_len (loop_vinfo, &gsi,
+				    &LOOP_VINFO_LENS (loop_vinfo),
+				    1, vectype, 0, 0);
+
+      /* BIAS - 1.  */
+      signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+      tree bias_minus_one
+	= int_const_binop (MINUS_EXPR,
+			   build_int_cst (TREE_TYPE (len), biasval),
+			   build_one_cst (TREE_TYPE (len)));
+
+      /* LAST_INDEX = LEN + (BIAS - 1).  */
+      tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
+				     len, bias_minus_one);
+
+      /* This needs to implement extraction of the first index, but not sure
+	 how the LEN stuff works.  At the moment we shouldn't get here since
+	 there's no LEN support for early breaks.  But guard this so there's
+	 no incorrect codegen.  */
+      gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+      /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
+      tree scalar_res
+	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
+			vec_lhs_phi, last_index);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (!slp_node);
+      tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
+				      &LOOP_VINFO_MASKS (loop_vinfo),
+				      1, vectype, 0);
+      tree scalar_res;
+
+      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* First create the permuted mask.  */
+	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	  tree perm_dest = copy_ssa_name (mask);
+	  gimple *perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+				       mask, perm_mask);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  mask = perm_dest;
+
+	  /* Then permute the vector contents.  */
+	  tree perm_elem = perm_mask_for_reverse (vectype);
+	  perm_dest = copy_ssa_name (vec_lhs_phi);
+	  perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+				       vec_lhs_phi, perm_elem);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  vec_lhs_phi = perm_dest;
+	}
+
+      gimple_seq_add_seq (&stmts, tem);
+
+      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				 mask, vec_lhs_phi);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else
+    {
+      tree bftype = TREE_TYPE (vectype);
+      if (VECTOR_BOOLEAN_TYPE_P (vectype))
+	bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
+      new_tree = build3 (BIT_FIELD_REF, bftype, vec_lhs_phi, bitsize, bitstart);
+      new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
+				       &stmts, true, NULL_TREE);
+    }
+
+  *exit_gsi = gsi_after_labels (exit_bb);
+  if (stmts)
+    gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
+
+  return new_tree;
+}
+
+/* Find the edge that's the final one in the path from SRC to DEST and
+   return it.  This edge must exist in at most one forwarder edge between.  */
+
+static edge
+find_connected_edge (edge src, basic_block dest)
+{
+   if (src->dest == dest)
+     return src;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, dest->preds)
+    {
+      if (src->dest == e->src)
+	return e;
+    }
+
+  return NULL;
+}
+
 /* Function vectorizable_live_operation.
 
    STMT_INFO computes a value that is used outside the loop.  Check if
@@ -10505,7 +10690,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
   int vec_entry = 0;
   poly_uint64 vec_index = 0;
 
-  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
+  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
+	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   /* If a stmt of a reduction is live, vectorize it via
      vect_create_epilog_for_reduction.  vectorizable_reduction assessed
@@ -10530,8 +10716,22 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
 	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
 	return true;
+
+      /* If early break we only have to materialize the reduction on the merge
+	 block, but we have to find an alternate exit first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
+						slp_node, slp_node_instance,
+						exit, false);
+	}
+
       vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
-					slp_node_instance);
+					slp_node_instance,
+					LOOP_VINFO_IV_EXIT (loop_vinfo));
+
       return true;
     }
 
@@ -10683,103 +10883,62 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-      gcc_assert (single_pred_p (exit_bb));
-
-      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
-      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
-
-      gimple_seq stmts = NULL;
-      tree new_tree;
-      if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree len
-	    = vect_get_loop_len (loop_vinfo, &gsi,
-				 &LOOP_VINFO_LENS (loop_vinfo),
-				 1, vectype, 0, 0);
-
-	  /* BIAS - 1.  */
-	  signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
-	  tree bias_minus_one
-	    = int_const_binop (MINUS_EXPR,
-			       build_int_cst (TREE_TYPE (len), biasval),
-			       build_one_cst (TREE_TYPE (len)));
-
-	  /* LAST_INDEX = LEN + (BIAS - 1).  */
-	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
-					  len, bias_minus_one);
-
-	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
-	  tree scalar_res
-	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
-			    vec_lhs_phi, last_index);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
-					  &LOOP_VINFO_MASKS (loop_vinfo),
-					  1, vectype, 0);
-	  gimple_seq_add_seq (&stmts, tem);
-	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-					  mask, vec_lhs_phi);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else
-	{
-	  tree bftype = TREE_TYPE (vectype);
-	  if (VECTOR_BOOLEAN_TYPE_P (vectype))
-	    bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
-	  new_tree = build3 (BIT_FIELD_REF, bftype,
-			     vec_lhs_phi, bitsize, bitstart);
-	  new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
-					   &stmts, true, NULL_TREE);
-	}
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      bool restart_loop = !vect_is_loop_exit_latch_pred (main_e, loop);
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	if (!is_gimple_debug (use_stmt)
+	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	  {
+	    basic_block use_bb = gimple_bb (use_stmt);
+	    if (!is_a <gphi *> (use_stmt))
+	      continue;
+	    for (auto exit_e : get_loop_exit_edges (loop))
+	      {
+		/* See if this exit leads to the value.  */
+		edge dest_e = find_connected_edge (exit_e, use_bb);
+		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e) != lhs)
+		  continue;
 
-      gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
-      if (stmts)
-	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+		gimple *tmp_vec_stmt = vec_stmt;
+		tree tmp_vec_lhs = vec_lhs;
+		tree tmp_bitstart = bitstart;
+		/* For early exit where the exit is not in the BB that leads
+		   to the latch then we're restarting the iteration in the
+		   scalar loop.  So get the first live value.  */
+		if (restart_loop || exit_e != main_e)
+		  {
+		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
+		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
+		  }
 
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
-	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
-	    {
-	      remove_phi_node (&gsi, false);
-	      tree lhs_phi = gimple_phi_result (phi);
-	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
-	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
-	    }
-	  else
-	    gsi_next (&gsi);
-	}
+		gimple_stmt_iterator exit_gsi;
+		tree new_tree
+		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
+						   exit_e, vectype, ncopies,
+						   slp_node, bitsize,
+						   tmp_bitstart, tmp_vec_lhs,
+						   lhs_type, restart_loop,
+						   &exit_gsi);
+
+		/* Use the empty block on the exit to materialize the new stmts
+		   so we can use update the PHI here.  */
+		if (gimple_phi_num_args (use_stmt) == 1)
+		  {
+		    auto gsi = gsi_for_stmt (use_stmt);
+		    remove_phi_node (&gsi, false);
+		    tree lhs_phi = gimple_phi_result (use_stmt);
+		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
+		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+		  }
+		else
+		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
+	      }
+	  }
 
       /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
       FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
@@ -11797,6 +11956,21 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 	      if (dump_enabled_p ())
 		dump_printf_loc (MSG_NOTE, vect_location, "transform phi.\n");
 	      vect_transform_stmt (loop_vinfo, stmt_info, NULL, NULL, NULL);
+	      /* If vectorizing early break we must also vectorize the use of
+		 the PHIs as a live operation.  */
+	      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		  && !STMT_VINFO_LIVE_P (stmt_info)
+		  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_NOTE, vect_location,
+			 "----> vectorizing early break reduc or induc phi: %G",
+			 (gimple *) phi);
+		  bool done
+		    = vectorizable_live_operation (loop_vinfo, stmt_info, NULL,
+						   NULL, -1, true, NULL);
+		  gcc_assert (done);
+		}
 	    }
 	}
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fe38beb4fa1d9f8593445354f56ba52e10a040cd..f1b6a13395f286f9997530bbe57cda3a00502f8f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
    - it has uses outside the loop.
    - it has vdefs (it alters memory).
    - control stmts in the loop (except for the exit condition).
+   - it is an induction and we have multiple exits.
 
    CHECKME: what other side effects would the vectorizer allow?  */
 
@@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	}
     }
 
+  /* Check if it's an induction and multiple exits.  In this case there will be
+     a usage later on after peeling which is needed for the alternate exit.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "vec_stmt_relevant_p: induction forced for "
+			   "early break.\n");
+      *relevant = vect_used_in_scope;
+
+    }
+
   if (*live_p && *relevant == vect_unused_in_scope
       && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
     {
@@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 22a8c3d384d7ae1ca93079b64f2d40821b4a3c56..cfd6756492e4af460c2f5669ecccc82b1089cfe4 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2225,6 +2225,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
 extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
+extern void vect_iv_increment_position (edge, gimple_stmt_iterator *, bool *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
@@ -2246,6 +2247,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

[-- Attachment #2: rb17968 (1).patch --]
[-- Type: application/octet-stream, Size: 21096 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 139311142b376d4eaa7ef8765608220b1eb92b31..af216d3dcb8a502639898ff67cb86948a7f140a4 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -453,7 +453,7 @@ vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
    INSERT_AFTER is set to true if the increment should be inserted after
    *BSI.  */
 
-static void
+void
 vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
 			    bool *insert_after)
 {
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..b1c34c4c3aaf8bdf9bf52d5a726836936de772b6 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2163,6 +2163,15 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
 	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
 					      -1, false, &cost_vec);
 
+	  /* Check if we can perform the operation for early break if we force
+	     the live operation.  */
+	  if (ok
+	      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+	      && !STMT_VINFO_LIVE_P (stmt_info)
+	      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
+					      -1, false, &cost_vec);
+
           if (!ok)
 	    return opt_result::failure_at (phi,
 					   "not vectorized: relevant phi not "
@@ -5842,6 +5851,10 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
    SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
    REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
      (counting from 0)
+   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
+     exit this edge is always the main loop exit.
+   MAIN_EXIT_P indicates whether we are updating the main exit or an alternate
+     exit.  This determines whether we use the final or original value.
 
    This function:
    1. Completes the reduction def-use cycles.
@@ -5882,7 +5895,9 @@ static void
 vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 				  stmt_vec_info stmt_info,
 				  slp_tree slp_node,
-				  slp_instance slp_node_instance)
+				  slp_instance slp_node_instance,
+				  edge loop_exit,
+				  bool main_exit_p = true)
 {
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
@@ -6053,7 +6068,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
       /* Create an induction variable.  */
       gimple_stmt_iterator incr_gsi;
       bool insert_after;
-      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
       create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
 		 insert_after, &indx_before_incr, &indx_after_incr);
 
@@ -6132,23 +6147,30 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+  /* We need to reduce values in all exits.  */
+  exit_bb = loop_exit->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
+  vec <gimple *> vec_stmts;
+  if (main_exit_p)
+    vec_stmts = STMT_VINFO_VEC_STMTS (rdef_info);
+  else
+    vec_stmts = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF (rdef_info));
+
   for (unsigned i = 0; i < vec_num; i++)
     {
       gimple_seq stmts = NULL;
       if (slp_node)
 	def = vect_get_slp_vect_def (slp_node, i);
       else
-	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
+	def = gimple_get_lhs (vec_stmts[0]);
       for (j = 0; j < ncopies; j++)
 	{
 	  tree new_def = copy_ssa_name (def);
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
-	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
+	    def = gimple_get_lhs (vec_stmts[j]);
+	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -6885,7 +6907,20 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
           FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
 	    {
 	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-		SET_USE (use_p, scalar_result);
+		{
+		  gimple *stmt = USE_STMT (use_p);
+		  if (main_exit_p)
+		    SET_USE (use_p, scalar_result);
+		  else if (is_a <gphi *> (stmt))
+		    {
+		      /* If an early exit only update usages in the merge
+			 block.  */
+		      edge merge_e = single_succ_edge (loop_exit->dest);
+		      if (gimple_bb (stmt) != merge_e->dest)
+			continue;
+		      SET_PHI_ARG_DEF (stmt, merge_e->dest_idx, scalar_result);
+		    }
+		}
 	      update_stmt (use_stmt);
 	    }
         }
@@ -10481,6 +10516,156 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return true;
 }
 
+/* Function vectorizable_live_operation_1.
+
+   helper function for vectorizable_live_operation.  */
+
+tree
+vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
+			       stmt_vec_info stmt_info, edge exit_e,
+			       tree vectype, int ncopies, slp_tree slp_node,
+			       tree bitsize, tree bitstart, tree vec_lhs,
+			       tree lhs_type, bool restart_loop,
+			       gimple_stmt_iterator *exit_gsi)
+{
+  basic_block exit_bb = exit_e->dest;
+  gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+  tree vec_lhs_phi = copy_ssa_name (vec_lhs);
+  gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
+  for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
+    SET_PHI_ARG_DEF (phi, i, vec_lhs);
+
+  gimple_seq stmts = NULL;
+  tree new_tree;
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (ncopies == 1 && !slp_node);
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree len = vect_get_loop_len (loop_vinfo, &gsi,
+				    &LOOP_VINFO_LENS (loop_vinfo),
+				    1, vectype, 0, 0);
+
+      /* BIAS - 1.  */
+      signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+      tree bias_minus_one
+	= int_const_binop (MINUS_EXPR,
+			   build_int_cst (TREE_TYPE (len), biasval),
+			   build_one_cst (TREE_TYPE (len)));
+
+      /* LAST_INDEX = LEN + (BIAS - 1).  */
+      tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
+				     len, bias_minus_one);
+
+      /* This needs to implement extraction of the first index, but not sure
+	 how the LEN stuff works.  At the moment we shouldn't get here since
+	 there's no LEN support for early breaks.  But guard this so there's
+	 no incorrect codegen.  */
+      gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+      /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
+      tree scalar_res
+	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
+			vec_lhs_phi, last_index);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (!slp_node);
+      tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
+				      &LOOP_VINFO_MASKS (loop_vinfo),
+				      1, vectype, 0);
+      tree scalar_res;
+
+      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* First create the permuted mask.  */
+	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	  tree perm_dest = copy_ssa_name (mask);
+	  gimple *perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+				       mask, perm_mask);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  mask = perm_dest;
+
+	  /* Then permute the vector contents.  */
+	  tree perm_elem = perm_mask_for_reverse (vectype);
+	  perm_dest = copy_ssa_name (vec_lhs_phi);
+	  perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+				       vec_lhs_phi, perm_elem);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  vec_lhs_phi = perm_dest;
+	}
+
+      gimple_seq_add_seq (&stmts, tem);
+
+      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				 mask, vec_lhs_phi);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else
+    {
+      tree bftype = TREE_TYPE (vectype);
+      if (VECTOR_BOOLEAN_TYPE_P (vectype))
+	bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
+      new_tree = build3 (BIT_FIELD_REF, bftype, vec_lhs_phi, bitsize, bitstart);
+      new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
+				       &stmts, true, NULL_TREE);
+    }
+
+  *exit_gsi = gsi_after_labels (exit_bb);
+  if (stmts)
+    gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
+
+  return new_tree;
+}
+
+/* Find the edge that's the final one in the path from SRC to DEST and
+   return it.  This edge must exist in at most one forwarder edge between.  */
+
+static edge
+find_connected_edge (edge src, basic_block dest)
+{
+   if (src->dest == dest)
+     return src;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, dest->preds)
+    {
+      if (src->dest == e->src)
+	return e;
+    }
+
+  return NULL;
+}
+
 /* Function vectorizable_live_operation.
 
    STMT_INFO computes a value that is used outside the loop.  Check if
@@ -10505,7 +10690,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
   int vec_entry = 0;
   poly_uint64 vec_index = 0;
 
-  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
+  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
+	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   /* If a stmt of a reduction is live, vectorize it via
      vect_create_epilog_for_reduction.  vectorizable_reduction assessed
@@ -10530,8 +10716,22 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
 	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
 	return true;
+
+      /* If early break we only have to materialize the reduction on the merge
+	 block, but we have to find an alternate exit first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
+						slp_node, slp_node_instance,
+						exit, false);
+	}
+
       vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
-					slp_node_instance);
+					slp_node_instance,
+					LOOP_VINFO_IV_EXIT (loop_vinfo));
+
       return true;
     }
 
@@ -10683,103 +10883,62 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-      gcc_assert (single_pred_p (exit_bb));
-
-      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
-      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
-
-      gimple_seq stmts = NULL;
-      tree new_tree;
-      if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree len
-	    = vect_get_loop_len (loop_vinfo, &gsi,
-				 &LOOP_VINFO_LENS (loop_vinfo),
-				 1, vectype, 0, 0);
-
-	  /* BIAS - 1.  */
-	  signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
-	  tree bias_minus_one
-	    = int_const_binop (MINUS_EXPR,
-			       build_int_cst (TREE_TYPE (len), biasval),
-			       build_one_cst (TREE_TYPE (len)));
-
-	  /* LAST_INDEX = LEN + (BIAS - 1).  */
-	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
-					  len, bias_minus_one);
-
-	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
-	  tree scalar_res
-	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
-			    vec_lhs_phi, last_index);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
-					  &LOOP_VINFO_MASKS (loop_vinfo),
-					  1, vectype, 0);
-	  gimple_seq_add_seq (&stmts, tem);
-	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-					  mask, vec_lhs_phi);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else
-	{
-	  tree bftype = TREE_TYPE (vectype);
-	  if (VECTOR_BOOLEAN_TYPE_P (vectype))
-	    bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
-	  new_tree = build3 (BIT_FIELD_REF, bftype,
-			     vec_lhs_phi, bitsize, bitstart);
-	  new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
-					   &stmts, true, NULL_TREE);
-	}
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      bool restart_loop = !vect_is_loop_exit_latch_pred (main_e, loop);
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	if (!is_gimple_debug (use_stmt)
+	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	  {
+	    basic_block use_bb = gimple_bb (use_stmt);
+	    if (!is_a <gphi *> (use_stmt))
+	      continue;
+	    for (auto exit_e : get_loop_exit_edges (loop))
+	      {
+		/* See if this exit leads to the value.  */
+		edge dest_e = find_connected_edge (exit_e, use_bb);
+		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e) != lhs)
+		  continue;
 
-      gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
-      if (stmts)
-	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+		gimple *tmp_vec_stmt = vec_stmt;
+		tree tmp_vec_lhs = vec_lhs;
+		tree tmp_bitstart = bitstart;
+		/* For early exit where the exit is not in the BB that leads
+		   to the latch then we're restarting the iteration in the
+		   scalar loop.  So get the first live value.  */
+		if (restart_loop || exit_e != main_e)
+		  {
+		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
+		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
+		  }
 
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
-	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
-	    {
-	      remove_phi_node (&gsi, false);
-	      tree lhs_phi = gimple_phi_result (phi);
-	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
-	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
-	    }
-	  else
-	    gsi_next (&gsi);
-	}
+		gimple_stmt_iterator exit_gsi;
+		tree new_tree
+		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
+						   exit_e, vectype, ncopies,
+						   slp_node, bitsize,
+						   tmp_bitstart, tmp_vec_lhs,
+						   lhs_type, restart_loop,
+						   &exit_gsi);
+
+		/* Use the empty block on the exit to materialize the new stmts
+		   so we can use update the PHI here.  */
+		if (gimple_phi_num_args (use_stmt) == 1)
+		  {
+		    auto gsi = gsi_for_stmt (use_stmt);
+		    remove_phi_node (&gsi, false);
+		    tree lhs_phi = gimple_phi_result (use_stmt);
+		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
+		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+		  }
+		else
+		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
+	      }
+	  }
 
       /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
       FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
@@ -11797,6 +11956,21 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 	      if (dump_enabled_p ())
 		dump_printf_loc (MSG_NOTE, vect_location, "transform phi.\n");
 	      vect_transform_stmt (loop_vinfo, stmt_info, NULL, NULL, NULL);
+	      /* If vectorizing early break we must also vectorize the use of
+		 the PHIs as a live operation.  */
+	      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		  && !STMT_VINFO_LIVE_P (stmt_info)
+		  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_NOTE, vect_location,
+			 "----> vectorizing early break reduc or induc phi: %G",
+			 (gimple *) phi);
+		  bool done
+		    = vectorizable_live_operation (loop_vinfo, stmt_info, NULL,
+						   NULL, -1, true, NULL);
+		  gcc_assert (done);
+		}
 	    }
 	}
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fe38beb4fa1d9f8593445354f56ba52e10a040cd..f1b6a13395f286f9997530bbe57cda3a00502f8f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
    - it has uses outside the loop.
    - it has vdefs (it alters memory).
    - control stmts in the loop (except for the exit condition).
+   - it is an induction and we have multiple exits.
 
    CHECKME: what other side effects would the vectorizer allow?  */
 
@@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	}
     }
 
+  /* Check if it's an induction and multiple exits.  In this case there will be
+     a usage later on after peeling which is needed for the alternate exit.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "vec_stmt_relevant_p: induction forced for "
+			   "early break.\n");
+      *relevant = vect_used_in_scope;
+
+    }
+
   if (*live_p && *relevant == vect_unused_in_scope
       && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
     {
@@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 22a8c3d384d7ae1ca93079b64f2d40821b4a3c56..cfd6756492e4af460c2f5669ecccc82b1089cfe4 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2225,6 +2225,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
 extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
+extern void vect_iv_increment_position (edge, gimple_stmt_iterator *, bool *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
@@ -2246,6 +2247,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks
  2023-11-20 21:51       ` Tamar Christina
@ 2023-11-24 10:16         ` Tamar Christina
  2023-11-24 12:38           ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-24 10:16 UTC (permalink / raw)
  To: Tamar Christina, Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 15861 bytes --]

Hi All,

Here's an updated patch, which takes a slightly different approach but makes things much easier later on.

Peeling for early breaks works by redirecting all early break exits to a
single "early break" block and combine them and the normal exit edge together
later in a different block which then goes into the epilog preheader.

This allows us to re-use all the existing code for IV updates, Additionally this
also enables correct linking for multiple vector epilogues.

flush_pending_stmts cannot be used in this scenario since it updates the PHI
nodes in the order that they are in the exit destination blocks.  This means
they are in CFG visit order.  With a single exit this doesn't matter but with
multiple exits with different live values through the different exits the order
usually does not line up.

Additionally the vectorizer helper functions expect to be able to iterate over
the nodes in the order that they occur in the loop header blocks.  This is an
invariant we must maintain.  To do this we just inline the work
flush_pending_stmts but maintain the order by using the header blocks to guide
the work.

The way peeling is done result in LIM noticing that in some cases the condition
and the results are loop invariant and tries to move them out of the loop.

While the resulting code is operationally sound, moving the compare out of the
gcond results in generating code that no longer branches, so cbranch is no
longer applicable.  As such I now add code to check during this motion to see
if the target supports flag setting vector comparison as general operation.

Because of the change in peeling I now also have to update the BB counts for
the loop exit intermediate block.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* gcc/tree-ssa-loop-im.cc (compute_invariantness): Import insn-codes.h
	and optabs-tree.h and check for vector compare motion out of gcond.
	* tree-vect-loop-manip.cc
	(slpeel_tree_duplicate_loop_to_edge_cfg): Peel using intermediate blocks.
	(vect_update_ivs_after_vectorizer): Drop assert.
	(vect_do_peeling): Correct BB count for new intermediate block.
	* tree-vectorizer.h (is_loop_header_bb_p): Drop assert.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Update signature.

--- inline copy of patch ---

diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 396963b6754c7671e2e5404302a69129918555e2..92a9318a1ca0a2da50ff2f29cf271d2e78fddd77 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -48,6 +48,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "tree-ssa.h"
 #include "dbgcnt.h"
+#include "insn-codes.h"
+#include "optabs-tree.h"
 
 /* TODO:  Support for predicated code motion.  I.e.
 
@@ -1138,6 +1140,24 @@ compute_invariantness (basic_block bb)
 	    continue;
 	  }
 
+	/* Check if one of the depedent statement is a vector compare whether
+	   the target supports it,  otherwise it's invalid to hoist it out of
+	   the gcond it belonged to.  */
+	for (auto dep_stmt : lim_data->depends)
+	  {
+	     if (is_gimple_assign (dep_stmt)
+		 && VECTOR_TYPE_P (TREE_TYPE (gimple_assign_lhs (dep_stmt))))
+		{
+		  tree type = TREE_TYPE (gimple_assign_lhs (dep_stmt));
+		  auto code = gimple_assign_rhs_code (dep_stmt);
+		  if (!target_supports_op_p (type, code, optab_vector))
+		    pos = MOVE_IMPOSSIBLE;
+		}
+	  }
+
+	if (pos == MOVE_IMPOSSIBLE)
+	  continue;
+
 	if (dump_file && (dump_flags & TDF_DETAILS))
 	  {
 	    print_gimple_stmt (dump_file, stmt, 2);
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index b9161274ce401a7307f3e61ad23aa036701190d7..0b042b2baf976572af962dd40d5dc311a419ee60 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1403,13 +1403,16 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
    copies remains the same.
 
    If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
-   dominators were updated during the peeling.  */
+   dominators were updated during the peeling.  When doing early break vectorization
+   then LOOP_VINFO needs to be provided and is used to keep track of any newly created
+   memory references that need to be updated should we decide to vectorize.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 					class loop *scalar_loop,
 					edge scalar_exit, edge e, edge *new_e,
-					bool flow_loops)
+					bool flow_loops,
+					vec<basic_block> *updated_doms)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1526,7 +1529,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       }
 
   auto loop_exits = get_loop_exit_edges (loop);
+  bool multiple_exits_p = loop_exits.length () > 1;
   auto_vec<basic_block> doms;
+  class loop *update_loop = NULL;
 
   if (at_exit) /* Add the loop copy at exit.  */
     {
@@ -1536,39 +1541,65 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	  flush_pending_stmts (new_exit);
 	}
 
+      bool multiple_exits_p = loop_exits.length () > 1;
+      basic_block main_loop_exit_block = new_preheader;
+      basic_block alt_loop_exit_block = NULL;
+      /* Create intermediate edge for main exit.  But only useful for early
+	 exits.  */
+      if (multiple_exits_p)
+	{
+	  edge loop_e = single_succ_edge (new_preheader);
+	  new_preheader = split_edge (loop_e);
+	}
+
       auto_vec <gimple *> new_phis;
       hash_map <tree, tree> new_phi_args;
       /* First create the empty phi nodes so that when we flush the
 	 statements they can be filled in.   However because there is no order
 	 between the PHI nodes in the exits and the loop headers we need to
 	 order them base on the order of the two headers.  First record the new
-	 phi nodes.  */
-      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
+	 phi nodes. Then redirect the edges and flush the changes.  This writes
+	 out the new SSA names.  */
+      for (auto gsi_from = gsi_start_phis (loop_exit->dest);
 	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
 	{
 	  gimple *from_phi = gsi_stmt (gsi_from);
 	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	  gphi *res = create_phi_node (new_res, new_preheader);
+	  gphi *res = create_phi_node (new_res, main_loop_exit_block);
 	  new_phis.safe_push (res);
 	}
 
-      /* Then redirect the edges and flush the changes.  This writes out the new
-	 SSA names.  */
-      for (edge exit : loop_exits)
+      for (auto exit : loop_exits)
 	{
-	  edge temp_e = redirect_edge_and_branch (exit, new_preheader);
-	  flush_pending_stmts (temp_e);
+	  basic_block dest = main_loop_exit_block;
+	  if (exit != loop_exit)
+	    {
+	      if (!alt_loop_exit_block)
+		{
+		  alt_loop_exit_block = split_edge (exit);
+		  edge res = redirect_edge_and_branch (
+				single_succ_edge (alt_loop_exit_block),
+				new_preheader);
+		  flush_pending_stmts (res);
+		  continue;
+		}
+	      dest = alt_loop_exit_block;
+	    }
+	  edge e = redirect_edge_and_branch (exit, dest);
+	  flush_pending_stmts (e);
 	}
+
       /* Record the new SSA names in the cache so that we can skip materializing
 	 them again when we fill in the rest of the LCSSA variables.  */
       for (auto phi : new_phis)
 	{
-	  tree new_arg = gimple_phi_arg (phi, 0)->def;
+	  tree new_arg = gimple_phi_arg (phi, loop_exit->dest_idx)->def;
 
 	  if (!SSA_VAR_P (new_arg))
 	    continue;
+
 	  /* If the PHI MEM node dominates the loop then we shouldn't create
-	      a new LC-SSSA PHI for it in the intermediate block.   */
+	     a new LC-SSSA PHI for it in the intermediate block.   */
 	  /* A MEM phi that consitutes a new DEF for the vUSE chain can either
 	     be a .VDEF or a PHI that operates on MEM. And said definition
 	     must not be inside the main loop.  Or we must be a parameter.
@@ -1584,6 +1615,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	      remove_phi_node (&gsi, true);
 	      continue;
 	    }
+
+	  /* If we decide to remove the PHI node we should also not
+	     rematerialize it later on.  */
 	  new_phi_args.put (new_arg, gimple_phi_result (phi));
 
 	  if (TREE_CODE (new_arg) != SSA_NAME)
@@ -1595,34 +1629,68 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	 preheader block and still find the right LC nodes.  */
       edge loop_entry = single_succ_edge (new_preheader);
       if (flow_loops)
-	for (auto gsi_from = gsi_start_phis (loop->header),
-	     gsi_to = gsi_start_phis (new_loop->header);
-	     !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
-	     gsi_next (&gsi_from), gsi_next (&gsi_to))
-	  {
-	    gimple *from_phi = gsi_stmt (gsi_from);
-	    gimple *to_phi = gsi_stmt (gsi_to);
-	    tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
-						  loop_latch_edge (loop));
+	{
+	  /* Link through the main exit first.  */
+	  for (auto gsi_from = gsi_start_phis (loop->header),
+	       gsi_to = gsi_start_phis (new_loop->header);
+	       !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	       gsi_next (&gsi_from), gsi_next (&gsi_to))
+	    {
+	      gimple *from_phi = gsi_stmt (gsi_from);
+	      gimple *to_phi = gsi_stmt (gsi_to);
+	      tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
+						    loop_latch_edge (loop));
+
+	      /* Check if we've already created a new phi node during edge
+		 redirection.  If we have, only propagate the value
+		 downwards.  */
+	      if (tree *res = new_phi_args.get (new_arg))
+		{
+		  if (multiple_exits_p)
+		    new_arg = *res;
+		  else
+		    {
+		      adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
+		      continue;
+		    }
+		}
 
-	    /* Check if we've already created a new phi node during edge
-	       redirection.  If we have, only propagate the value downwards.  */
-	    if (tree *res = new_phi_args.get (new_arg))
-	      {
-		adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
-		continue;
-	      }
+	      tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	      gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
 
-	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
+	      /* Main loop exit should use the final iter value.  */
+	      SET_PHI_ARG_DEF (lcssa_phi, loop_exit->dest_idx, new_arg);
 
-	    /* Main loop exit should use the final iter value.  */
-	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
+	      adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
+	    }
 
-	    adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
-	  }
+	  set_immediate_dominator (CDI_DOMINATORS, main_loop_exit_block,
+				   loop_exit->src);
+
+	  /* Now link the alternative exits.  */
+	  if (multiple_exits_p)
+	    {
+	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
+				       main_loop_exit_block);
+	      for (auto gsi_from = gsi_start_phis (loop->header),
+		   gsi_to = gsi_start_phis (new_preheader);
+		   !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+		   gsi_next (&gsi_from), gsi_next (&gsi_to))
+		{
+		  gimple *from_phi = gsi_stmt (gsi_from);
+		  gimple *to_phi = gsi_stmt (gsi_to);
+
+		  tree alt_arg = gimple_phi_result (from_phi);
+		  edge main_e = single_succ_edge (alt_loop_exit_block);
+		  for (edge e : loop_exits)
+		    if (e != loop_exit)
+		      SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
+		}
 
-      set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
+	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
+				       loop->header);
+	    }
+	}
 
       if (was_imm_dom || duplicate_outer_loop)
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
@@ -1634,6 +1702,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (preheader);
       set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
 			       loop_preheader_edge (scalar_loop)->src);
+
+      /* Finally after wiring the new epilogue we need to update its main exit
+	 to the original function exit we recorded.  Other exits are already
+	 correct.  */
+      if (multiple_exits_p)
+	{
+	  update_loop = new_loop;
+	  for (edge e : get_loop_exit_edges (loop))
+	    doms.safe_push (e->dest);
+	  doms.safe_push (exit_dest);
+
+	  /* Likely a fall-through edge, so update if needed.  */
+	  if (single_succ_p (exit_dest))
+	    doms.safe_push (single_succ (exit_dest));
+	}
     }
   else /* Add the copy at entry.  */
     {
@@ -1681,6 +1764,34 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (new_preheader);
       set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
 			       loop_preheader_edge (new_loop)->src);
+
+      if (multiple_exits_p)
+	update_loop = loop;
+    }
+
+  if (multiple_exits_p)
+    {
+      for (edge e : get_loop_exit_edges (update_loop))
+	{
+	  edge ex;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
+	    {
+	      /* Find the first non-fallthrough block as fall-throughs can't
+		 dominate other blocks.  */
+	      if (single_succ_p (ex->dest))
+		{
+		  doms.safe_push (ex->dest);
+		  ex = single_succ_edge (ex->dest);
+		}
+	      doms.safe_push (ex->dest);
+	    }
+	  doms.safe_push (e->dest);
+	}
+
+      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+      if (updated_doms)
+	updated_doms->safe_splice (doms);
     }
 
   free (new_bbs);
@@ -2050,7 +2161,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 
   /* Make sure there exists a single-predecessor exit bb:  */
   gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_succ_edge (exit_bb) == update_e);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
        !gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -3138,6 +3248,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       epilog->force_vectorize = false;
       bb_before_epilog = loop_preheader_edge (epilog)->src;
 
+      /* Fixup the probabities of the new intermediate blocks that we use to connect
+	 to the merge block.  The rest are dealt with via bb_before_epilog
+	 adjustments. */
+	e->dest->count = e->count ();
+
       /* Scalar version loop may be preferred.  In this case, add guard
 	 and skip to epilog.  Note this only happens when the number of
 	 iterations of loop is unknown at compile time, otherwise this
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index b5e27d1c46d9cb3dfe5b44f1b49c9e4204572ff1..39aa4d1250efe308acccf484d370f8adfd1ba843 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1821,7 +1821,7 @@ is_loop_header_bb_p (basic_block bb)
 {
   if (bb == (bb->loop_father)->header)
     return true;
-  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
+
   return false;
 }
 
@@ -2212,7 +2212,8 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
 					 const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
 						    class loop *, edge,
-						    edge, edge *, bool = true);
+						    edge, edge *, bool = true,
+						    vec<basic_block> * = NULL);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,

[-- Attachment #2: rb17964.patch --]
[-- Type: application/octet-stream, Size: 13132 bytes --]

diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 396963b6754c7671e2e5404302a69129918555e2..92a9318a1ca0a2da50ff2f29cf271d2e78fddd77 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -48,6 +48,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "tree-ssa.h"
 #include "dbgcnt.h"
+#include "insn-codes.h"
+#include "optabs-tree.h"
 
 /* TODO:  Support for predicated code motion.  I.e.
 
@@ -1138,6 +1140,24 @@ compute_invariantness (basic_block bb)
 	    continue;
 	  }
 
+	/* Check if one of the depedent statement is a vector compare whether
+	   the target supports it,  otherwise it's invalid to hoist it out of
+	   the gcond it belonged to.  */
+	for (auto dep_stmt : lim_data->depends)
+	  {
+	     if (is_gimple_assign (dep_stmt)
+		 && VECTOR_TYPE_P (TREE_TYPE (gimple_assign_lhs (dep_stmt))))
+		{
+		  tree type = TREE_TYPE (gimple_assign_lhs (dep_stmt));
+		  auto code = gimple_assign_rhs_code (dep_stmt);
+		  if (!target_supports_op_p (type, code, optab_vector))
+		    pos = MOVE_IMPOSSIBLE;
+		}
+	  }
+
+	if (pos == MOVE_IMPOSSIBLE)
+	  continue;
+
 	if (dump_file && (dump_flags & TDF_DETAILS))
 	  {
 	    print_gimple_stmt (dump_file, stmt, 2);
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index b9161274ce401a7307f3e61ad23aa036701190d7..0b042b2baf976572af962dd40d5dc311a419ee60 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1403,13 +1403,16 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
    copies remains the same.
 
    If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
-   dominators were updated during the peeling.  */
+   dominators were updated during the peeling.  When doing early break vectorization
+   then LOOP_VINFO needs to be provided and is used to keep track of any newly created
+   memory references that need to be updated should we decide to vectorize.  */
 
 class loop *
 slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 					class loop *scalar_loop,
 					edge scalar_exit, edge e, edge *new_e,
-					bool flow_loops)
+					bool flow_loops,
+					vec<basic_block> *updated_doms)
 {
   class loop *new_loop;
   basic_block *new_bbs, *bbs, *pbbs;
@@ -1526,7 +1529,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       }
 
   auto loop_exits = get_loop_exit_edges (loop);
+  bool multiple_exits_p = loop_exits.length () > 1;
   auto_vec<basic_block> doms;
+  class loop *update_loop = NULL;
 
   if (at_exit) /* Add the loop copy at exit.  */
     {
@@ -1536,39 +1541,65 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	  flush_pending_stmts (new_exit);
 	}
 
+      bool multiple_exits_p = loop_exits.length () > 1;
+      basic_block main_loop_exit_block = new_preheader;
+      basic_block alt_loop_exit_block = NULL;
+      /* Create intermediate edge for main exit.  But only useful for early
+	 exits.  */
+      if (multiple_exits_p)
+	{
+	  edge loop_e = single_succ_edge (new_preheader);
+	  new_preheader = split_edge (loop_e);
+	}
+
       auto_vec <gimple *> new_phis;
       hash_map <tree, tree> new_phi_args;
       /* First create the empty phi nodes so that when we flush the
 	 statements they can be filled in.   However because there is no order
 	 between the PHI nodes in the exits and the loop headers we need to
 	 order them base on the order of the two headers.  First record the new
-	 phi nodes.  */
-      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
+	 phi nodes. Then redirect the edges and flush the changes.  This writes
+	 out the new SSA names.  */
+      for (auto gsi_from = gsi_start_phis (loop_exit->dest);
 	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
 	{
 	  gimple *from_phi = gsi_stmt (gsi_from);
 	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	  gphi *res = create_phi_node (new_res, new_preheader);
+	  gphi *res = create_phi_node (new_res, main_loop_exit_block);
 	  new_phis.safe_push (res);
 	}
 
-      /* Then redirect the edges and flush the changes.  This writes out the new
-	 SSA names.  */
-      for (edge exit : loop_exits)
+      for (auto exit : loop_exits)
 	{
-	  edge temp_e = redirect_edge_and_branch (exit, new_preheader);
-	  flush_pending_stmts (temp_e);
+	  basic_block dest = main_loop_exit_block;
+	  if (exit != loop_exit)
+	    {
+	      if (!alt_loop_exit_block)
+		{
+		  alt_loop_exit_block = split_edge (exit);
+		  edge res = redirect_edge_and_branch (
+				single_succ_edge (alt_loop_exit_block),
+				new_preheader);
+		  flush_pending_stmts (res);
+		  continue;
+		}
+	      dest = alt_loop_exit_block;
+	    }
+	  edge e = redirect_edge_and_branch (exit, dest);
+	  flush_pending_stmts (e);
 	}
+
       /* Record the new SSA names in the cache so that we can skip materializing
 	 them again when we fill in the rest of the LCSSA variables.  */
       for (auto phi : new_phis)
 	{
-	  tree new_arg = gimple_phi_arg (phi, 0)->def;
+	  tree new_arg = gimple_phi_arg (phi, loop_exit->dest_idx)->def;
 
 	  if (!SSA_VAR_P (new_arg))
 	    continue;
+
 	  /* If the PHI MEM node dominates the loop then we shouldn't create
-	      a new LC-SSSA PHI for it in the intermediate block.   */
+	     a new LC-SSSA PHI for it in the intermediate block.   */
 	  /* A MEM phi that consitutes a new DEF for the vUSE chain can either
 	     be a .VDEF or a PHI that operates on MEM. And said definition
 	     must not be inside the main loop.  Or we must be a parameter.
@@ -1584,6 +1615,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	      remove_phi_node (&gsi, true);
 	      continue;
 	    }
+
+	  /* If we decide to remove the PHI node we should also not
+	     rematerialize it later on.  */
 	  new_phi_args.put (new_arg, gimple_phi_result (phi));
 
 	  if (TREE_CODE (new_arg) != SSA_NAME)
@@ -1595,34 +1629,68 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
 	 preheader block and still find the right LC nodes.  */
       edge loop_entry = single_succ_edge (new_preheader);
       if (flow_loops)
-	for (auto gsi_from = gsi_start_phis (loop->header),
-	     gsi_to = gsi_start_phis (new_loop->header);
-	     !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
-	     gsi_next (&gsi_from), gsi_next (&gsi_to))
-	  {
-	    gimple *from_phi = gsi_stmt (gsi_from);
-	    gimple *to_phi = gsi_stmt (gsi_to);
-	    tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
-						  loop_latch_edge (loop));
+	{
+	  /* Link through the main exit first.  */
+	  for (auto gsi_from = gsi_start_phis (loop->header),
+	       gsi_to = gsi_start_phis (new_loop->header);
+	       !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+	       gsi_next (&gsi_from), gsi_next (&gsi_to))
+	    {
+	      gimple *from_phi = gsi_stmt (gsi_from);
+	      gimple *to_phi = gsi_stmt (gsi_to);
+	      tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
+						    loop_latch_edge (loop));
+
+	      /* Check if we've already created a new phi node during edge
+		 redirection.  If we have, only propagate the value
+		 downwards.  */
+	      if (tree *res = new_phi_args.get (new_arg))
+		{
+		  if (multiple_exits_p)
+		    new_arg = *res;
+		  else
+		    {
+		      adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
+		      continue;
+		    }
+		}
 
-	    /* Check if we've already created a new phi node during edge
-	       redirection.  If we have, only propagate the value downwards.  */
-	    if (tree *res = new_phi_args.get (new_arg))
-	      {
-		adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
-		continue;
-	      }
+	      tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
+	      gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
 
-	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
-	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
+	      /* Main loop exit should use the final iter value.  */
+	      SET_PHI_ARG_DEF (lcssa_phi, loop_exit->dest_idx, new_arg);
 
-	    /* Main loop exit should use the final iter value.  */
-	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
+	      adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
+	    }
 
-	    adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
-	  }
+	  set_immediate_dominator (CDI_DOMINATORS, main_loop_exit_block,
+				   loop_exit->src);
+
+	  /* Now link the alternative exits.  */
+	  if (multiple_exits_p)
+	    {
+	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
+				       main_loop_exit_block);
+	      for (auto gsi_from = gsi_start_phis (loop->header),
+		   gsi_to = gsi_start_phis (new_preheader);
+		   !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
+		   gsi_next (&gsi_from), gsi_next (&gsi_to))
+		{
+		  gimple *from_phi = gsi_stmt (gsi_from);
+		  gimple *to_phi = gsi_stmt (gsi_to);
+
+		  tree alt_arg = gimple_phi_result (from_phi);
+		  edge main_e = single_succ_edge (alt_loop_exit_block);
+		  for (edge e : loop_exits)
+		    if (e != loop_exit)
+		      SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
+		}
 
-      set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
+	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
+				       loop->header);
+	    }
+	}
 
       if (was_imm_dom || duplicate_outer_loop)
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
@@ -1634,6 +1702,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (preheader);
       set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
 			       loop_preheader_edge (scalar_loop)->src);
+
+      /* Finally after wiring the new epilogue we need to update its main exit
+	 to the original function exit we recorded.  Other exits are already
+	 correct.  */
+      if (multiple_exits_p)
+	{
+	  update_loop = new_loop;
+	  for (edge e : get_loop_exit_edges (loop))
+	    doms.safe_push (e->dest);
+	  doms.safe_push (exit_dest);
+
+	  /* Likely a fall-through edge, so update if needed.  */
+	  if (single_succ_p (exit_dest))
+	    doms.safe_push (single_succ (exit_dest));
+	}
     }
   else /* Add the copy at entry.  */
     {
@@ -1681,6 +1764,34 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
       delete_basic_block (new_preheader);
       set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
 			       loop_preheader_edge (new_loop)->src);
+
+      if (multiple_exits_p)
+	update_loop = loop;
+    }
+
+  if (multiple_exits_p)
+    {
+      for (edge e : get_loop_exit_edges (update_loop))
+	{
+	  edge ex;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
+	    {
+	      /* Find the first non-fallthrough block as fall-throughs can't
+		 dominate other blocks.  */
+	      if (single_succ_p (ex->dest))
+		{
+		  doms.safe_push (ex->dest);
+		  ex = single_succ_edge (ex->dest);
+		}
+	      doms.safe_push (ex->dest);
+	    }
+	  doms.safe_push (e->dest);
+	}
+
+      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+      if (updated_doms)
+	updated_doms->safe_splice (doms);
     }
 
   free (new_bbs);
@@ -2050,7 +2161,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 
   /* Make sure there exists a single-predecessor exit bb:  */
   gcc_assert (single_pred_p (exit_bb));
-  gcc_assert (single_succ_edge (exit_bb) == update_e);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
        !gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -3138,6 +3248,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       epilog->force_vectorize = false;
       bb_before_epilog = loop_preheader_edge (epilog)->src;
 
+      /* Fixup the probabities of the new intermediate blocks that we use to connect
+	 to the merge block.  The rest are dealt with via bb_before_epilog
+	 adjustments. */
+	e->dest->count = e->count ();
+
       /* Scalar version loop may be preferred.  In this case, add guard
 	 and skip to epilog.  Note this only happens when the number of
 	 iterations of loop is unknown at compile time, otherwise this
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index b5e27d1c46d9cb3dfe5b44f1b49c9e4204572ff1..39aa4d1250efe308acccf484d370f8adfd1ba843 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1821,7 +1821,7 @@ is_loop_header_bb_p (basic_block bb)
 {
   if (bb == (bb->loop_father)->header)
     return true;
-  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
+
   return false;
 }
 
@@ -2212,7 +2212,8 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
 					 const_edge);
 class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
 						    class loop *, edge,
-						    edge, edge *, bool = true);
+						    edge, edge *, bool = true,
+						    vec<basic_block> * = NULL);
 class loop *vect_loop_versioning (loop_vec_info, gimple *);
 extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
 				    tree *, tree *, tree *, int, bool, bool,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-20 21:54                                       ` Tamar Christina
@ 2023-11-24 10:18                                         ` Tamar Christina
  2023-11-24 12:41                                           ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-24 10:18 UTC (permalink / raw)
  To: Tamar Christina, Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 8234 bytes --]

Hi,

Having simplified peeling this patch becomes smaller as well:

This changes the PHI node updates to support early breaks.
It has to support both the case where the loop's exit matches the normal loop
exit and one where the early exit is "inverted", i.e. it's an early exit edge.

In the latter case we must always restart the loop for VF iterations.  For an
early exit the reason is obvious, but there are cases where the "normal" exit
is located before the early one.  This exit then does a check on ivtmp resulting
in us leaving the loop since it thinks we're done.

In these case we may still have side-effects to perform so we also go to the
scalar loop.

For the "normal" exit niters has already been adjusted for peeling, for the
early exits we must find out how many iterations we actually did.  So we have
to recalculate the new position for each exit.

For the "inverse" case we essentially peel a vector iteration *after* the vector
loop has finished. i.e. conceptually it's the same as vect epilogue peeling but
without generating code for the peeled iteration.  That'll be handled by the
scalar loop.

To do this we just adjust niters_vector_mult_vf and remove one VF and for masked
cases we do the same with final_iv.

The normal IV update code will then generate the correct values for us.
Eventually VRP will simplify the constant bounds and we get the proper scalar
unrolling.  This means we don't have to make any changes at all to
vect_update_ivs_after_vectorizer but dropping some asserts.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
	vect_set_loop_condition_partial_vectors_avx512,
	vect_gen_vector_loop_niters_mult_vf): Support peeling a vector
	iteration.
	(vect_update_ivs_after_vectorizer): Drop asserts.
	(vect_do_peeling): Skip forwarder edge.
	(vect_is_loop_exit_latch_pred): New.
	* tree-vectorizer.h (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED): New.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index d61d7c3a189b279fc3bcbb58c3c0e32521db3cf8..476be8a0bb6da2d06c4ca7052cb07bacecca60b1 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -951,7 +951,18 @@ vect_set_loop_condition_partial_vectors (class loop *loop, edge exit_edge,
 
   if (final_iv)
     {
-      gassign *assign = gimple_build_assign (final_iv, orig_niters);
+      gassign *assign;
+      /* If vectorizing an inverted early break loop we have to restart the
+	 scalar loop at niters - vf.  This matches what we do in
+	 vect_gen_vector_loop_niters_mult_vf for non-masked loops.  */
+      if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
+	{
+	  tree ftype = TREE_TYPE (orig_niters);
+	  tree vf = build_int_cst (ftype, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+	  assign = gimple_build_assign (final_iv, MINUS_EXPR, orig_niters, vf);
+	}
+       else
+	assign = gimple_build_assign (final_iv, orig_niters);
       gsi_insert_on_edge_immediate (exit_edge, assign);
     }
 
@@ -1188,8 +1199,19 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
 
   if (final_iv)
     {
-      gassign *assign = gimple_build_assign (final_iv, orig_niters);
-      gsi_insert_on_edge_immediate (single_exit (loop), assign);
+      gassign *assign;
+      /* If vectorizing an inverted early break loop we have to restart the
+	 scalar loop at niters - vf.  This matches what we do in
+	 vect_gen_vector_loop_niters_mult_vf for non-masked loops.  */
+      if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
+	{
+	  tree ftype = TREE_TYPE (orig_niters);
+	  tree vf = build_int_cst (ftype, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+	  assign = gimple_build_assign (final_iv, MINUS_EXPR, orig_niters, vf);
+	}
+       else
+	assign = gimple_build_assign (final_iv, orig_niters);
+      gsi_insert_on_edge_immediate (exit_edge, assign);
     }
 
   return cond_stmt;
@@ -2157,11 +2179,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
-
   basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-
-  /* Make sure there exists a single-predecessor exit bb:  */
-  gcc_assert (single_pred_p (exit_bb));
+  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
        !gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -2171,7 +2190,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       tree step_expr, off;
       tree type;
       tree var, ni, ni_name;
-      gimple_stmt_iterator last_gsi;
 
       gphi *phi = gsi.phi ();
       gphi *phi1 = gsi1.phi ();
@@ -2207,7 +2225,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 	{
 	  tree stype = TREE_TYPE (step_expr);
 	  off = fold_build2 (MULT_EXPR, stype,
-			     fold_convert (stype, niters), step_expr);
+			       fold_convert (stype, niters), step_expr);
+
 	  if (POINTER_TYPE_P (type))
 	    ni = fold_build_pointer_plus (init_expr, off);
 	  else
@@ -2226,9 +2245,9 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 
       var = create_tmp_var (type, "tmp");
 
-      last_gsi = gsi_last_bb (exit_bb);
       gimple_seq new_stmts = NULL;
       ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+
       /* Exit_bb shouldn't be empty.  */
       if (!gsi_end_p (last_gsi))
 	{
@@ -2726,11 +2745,19 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ();
   tree type = TREE_TYPE (niters_vector);
   tree log_vf = build_int_cst (type, exact_log2 (vf));
+  tree tree_vf = build_int_cst (type, vf);
   basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   gcc_assert (niters_vector_mult_vf_ptr != NULL);
   tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
 					    niters_vector, log_vf);
+
+  /* If we've peeled a vector iteration then subtract one full vector
+     iteration.  */
+  if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
+    niters_vector_mult_vf = fold_build2 (MINUS_EXPR, type,
+					 niters_vector_mult_vf, tree_vf);
+
   if (!is_gimple_val (niters_vector_mult_vf))
     {
       tree var = create_tmp_var (type, "niters_vector_mult_vf");
@@ -3328,6 +3355,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 niters_vector_mult_vf steps.  */
       gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
       update_e = skip_vector ? e : loop_preheader_edge (epilog);
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	update_e = single_succ_edge (e->dest);
+
+      /* Update the main exit.  */
       vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
 					update_e);
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 39aa4d1250efe308acccf484d370f8adfd1ba843..de60da31e2a3030a7fbc302d3f676af9683fd019 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1016,6 +1016,8 @@ public:
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
 #define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BREAKS_VECT_PEELED(L)  \
+  (single_pred ((L)->loop->latch) != (L)->vec_loop_iv_exit->src)
 #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
 #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
 #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
@@ -2224,6 +2226,7 @@ extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
+extern void vect_iv_increment_position (edge, gimple_stmt_iterator *, bool *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,

[-- Attachment #2: rb17967.patch --]
[-- Type: application/octet-stream, Size: 6061 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index d61d7c3a189b279fc3bcbb58c3c0e32521db3cf8..476be8a0bb6da2d06c4ca7052cb07bacecca60b1 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -951,7 +951,18 @@ vect_set_loop_condition_partial_vectors (class loop *loop, edge exit_edge,
 
   if (final_iv)
     {
-      gassign *assign = gimple_build_assign (final_iv, orig_niters);
+      gassign *assign;
+      /* If vectorizing an inverted early break loop we have to restart the
+	 scalar loop at niters - vf.  This matches what we do in
+	 vect_gen_vector_loop_niters_mult_vf for non-masked loops.  */
+      if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
+	{
+	  tree ftype = TREE_TYPE (orig_niters);
+	  tree vf = build_int_cst (ftype, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+	  assign = gimple_build_assign (final_iv, MINUS_EXPR, orig_niters, vf);
+	}
+       else
+	assign = gimple_build_assign (final_iv, orig_niters);
       gsi_insert_on_edge_immediate (exit_edge, assign);
     }
 
@@ -1188,8 +1199,19 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
 
   if (final_iv)
     {
-      gassign *assign = gimple_build_assign (final_iv, orig_niters);
-      gsi_insert_on_edge_immediate (single_exit (loop), assign);
+      gassign *assign;
+      /* If vectorizing an inverted early break loop we have to restart the
+	 scalar loop at niters - vf.  This matches what we do in
+	 vect_gen_vector_loop_niters_mult_vf for non-masked loops.  */
+      if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
+	{
+	  tree ftype = TREE_TYPE (orig_niters);
+	  tree vf = build_int_cst (ftype, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+	  assign = gimple_build_assign (final_iv, MINUS_EXPR, orig_niters, vf);
+	}
+       else
+	assign = gimple_build_assign (final_iv, orig_niters);
+      gsi_insert_on_edge_immediate (exit_edge, assign);
     }
 
   return cond_stmt;
@@ -2157,11 +2179,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
   gphi_iterator gsi, gsi1;
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block update_bb = update_e->dest;
-
   basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-
-  /* Make sure there exists a single-predecessor exit bb:  */
-  gcc_assert (single_pred_p (exit_bb));
+  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
 
   for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
        !gsi_end_p (gsi) && !gsi_end_p (gsi1);
@@ -2171,7 +2190,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
       tree step_expr, off;
       tree type;
       tree var, ni, ni_name;
-      gimple_stmt_iterator last_gsi;
 
       gphi *phi = gsi.phi ();
       gphi *phi1 = gsi1.phi ();
@@ -2207,7 +2225,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 	{
 	  tree stype = TREE_TYPE (step_expr);
 	  off = fold_build2 (MULT_EXPR, stype,
-			     fold_convert (stype, niters), step_expr);
+			       fold_convert (stype, niters), step_expr);
+
 	  if (POINTER_TYPE_P (type))
 	    ni = fold_build_pointer_plus (init_expr, off);
 	  else
@@ -2226,9 +2245,9 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
 
       var = create_tmp_var (type, "tmp");
 
-      last_gsi = gsi_last_bb (exit_bb);
       gimple_seq new_stmts = NULL;
       ni_name = force_gimple_operand (ni, &new_stmts, false, var);
+
       /* Exit_bb shouldn't be empty.  */
       if (!gsi_end_p (last_gsi))
 	{
@@ -2726,11 +2745,19 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ();
   tree type = TREE_TYPE (niters_vector);
   tree log_vf = build_int_cst (type, exact_log2 (vf));
+  tree tree_vf = build_int_cst (type, vf);
   basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
 
   gcc_assert (niters_vector_mult_vf_ptr != NULL);
   tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
 					    niters_vector, log_vf);
+
+  /* If we've peeled a vector iteration then subtract one full vector
+     iteration.  */
+  if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
+    niters_vector_mult_vf = fold_build2 (MINUS_EXPR, type,
+					 niters_vector_mult_vf, tree_vf);
+
   if (!is_gimple_val (niters_vector_mult_vf))
     {
       tree var = create_tmp_var (type, "niters_vector_mult_vf");
@@ -3328,6 +3355,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 niters_vector_mult_vf steps.  */
       gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
       update_e = skip_vector ? e : loop_preheader_edge (epilog);
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	update_e = single_succ_edge (e->dest);
+
+      /* Update the main exit.  */
       vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
 					update_e);
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 39aa4d1250efe308acccf484d370f8adfd1ba843..de60da31e2a3030a7fbc302d3f676af9683fd019 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1016,6 +1016,8 @@ public:
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
 #define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BREAKS_VECT_PEELED(L)  \
+  (single_pred ((L)->loop->latch) != (L)->vec_loop_iv_exit->src)
 #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
 #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
 #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
@@ -2224,6 +2226,7 @@ extern dump_user_location_t find_loop_location (class loop *);
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
+extern void vect_iv_increment_position (edge, gimple_stmt_iterator *, bool *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-20 21:57           ` Tamar Christina
@ 2023-11-24 10:20             ` Tamar Christina
  2023-11-24 13:23               ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-24 10:20 UTC (permalink / raw)
  To: Tamar Christina, Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 24122 bytes --]

Good morning,

This is a respun patch with a fix for VLA.

This adds support to vectorizable_live_reduction to handle multiple exits by
doing a search for which exit the live value should be materialized in.

Additionally which value in the index we're after depends on whether the exit
it's materialized in is an early exit or whether the loop's main exit is
different from the loop's natural one (i.e. the one with the same src block as
the latch).

In those two cases we want the first rather than the last value as we're going
to restart the iteration in the scalar loop.  For VLA this means we need to
reverse both the mask and vector since there's only a way to get the last
active element and not the first.

For inductions and multiple exits:
  - we test if the target will support vectorizing the induction
  - mark all inductions in the loop as relevant
  - for codegen of non-live inductions during codegen
  - induction during an early exit gets the first element rather than last.

For reductions and multiple exits:
  - Reductions for early exits reduces the reduction definition statement
    rather than the reduction step.  This allows us to get the value at the
    start of the iteration.
  - The peeling layout means that we just have to update one block, the merge
    block.  We expect all the reductions to be the same but we leave it up to
    the value numbering to clean up any duplicate code as we iterate over all
    edges.

These two changes fix the reduction codegen given before which has been added
to the testsuite for early vect.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
	(vect_analyze_loop_operations): Check if target supports vectorizing IV.
	(vect_transform_loop): Call vectorizable_live_operation for non-live
	inductions or reductions.
	(find_connected_edge, vectorizable_live_operation_1): New.
	(vect_create_epilog_for_reduction): Support reductions in early break.
	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
	(vect_stmt_relevant_p): Mark all inductions when early break as being
	relevant.
	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
	(vect_iv_increment_position): New.
	* tree-vect-loop-manip.cc (vect_iv_increment_position): Expose.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 476be8a0bb6da2d06c4ca7052cb07bacecca60b1..1a4ba349fb6ae39c79401aecd4e7eaaaa9e2b8a0 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -453,7 +453,7 @@ vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
    INSERT_AFTER is set to true if the increment should be inserted after
    *BSI.  */
 
-static void
+void
 vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
 			    bool *insert_after)
 {
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..b42318b2999e6a27e6983382190792602cb25af1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2163,6 +2163,15 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
 	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
 					      -1, false, &cost_vec);
 
+	  /* Check if we can perform the operation for early break if we force
+	     the live operation.  */
+	  if (ok
+	      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+	      && !STMT_VINFO_LIVE_P (stmt_info)
+	      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
+					      -1, false, &cost_vec);
+
           if (!ok)
 	    return opt_result::failure_at (phi,
 					   "not vectorized: relevant phi not "
@@ -5842,6 +5851,10 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
    SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
    REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
      (counting from 0)
+   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
+     exit this edge is always the main loop exit.
+   MAIN_EXIT_P indicates whether we are updating the main exit or an alternate
+     exit.  This determines whether we use the final or original value.
 
    This function:
    1. Completes the reduction def-use cycles.
@@ -5882,7 +5895,9 @@ static void
 vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 				  stmt_vec_info stmt_info,
 				  slp_tree slp_node,
-				  slp_instance slp_node_instance)
+				  slp_instance slp_node_instance,
+				  edge loop_exit,
+				  bool main_exit_p = true)
 {
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
@@ -6053,7 +6068,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
       /* Create an induction variable.  */
       gimple_stmt_iterator incr_gsi;
       bool insert_after;
-      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
       create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
 		 insert_after, &indx_before_incr, &indx_after_incr);
 
@@ -6132,23 +6147,30 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+  /* We need to reduce values in all exits.  */
+  exit_bb = loop_exit->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
+  vec <gimple *> vec_stmts;
+  if (main_exit_p)
+    vec_stmts = STMT_VINFO_VEC_STMTS (rdef_info);
+  else
+    vec_stmts = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF (rdef_info));
+
   for (unsigned i = 0; i < vec_num; i++)
     {
       gimple_seq stmts = NULL;
       if (slp_node)
 	def = vect_get_slp_vect_def (slp_node, i);
       else
-	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
+	def = gimple_get_lhs (vec_stmts[0]);
       for (j = 0; j < ncopies; j++)
 	{
 	  tree new_def = copy_ssa_name (def);
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
-	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
+	    def = gimple_get_lhs (vec_stmts[j]);
+	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -6885,7 +6907,20 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
           FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
 	    {
 	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-		SET_USE (use_p, scalar_result);
+		{
+		  gimple *stmt = USE_STMT (use_p);
+		  if (main_exit_p)
+		    SET_USE (use_p, scalar_result);
+		  else if (is_a <gphi *> (stmt))
+		    {
+		      /* If an early exit only update usages in the merge
+			 block.  */
+		      edge merge_e = single_succ_edge (loop_exit->dest);
+		      if (gimple_bb (stmt) != merge_e->dest)
+			continue;
+		      SET_PHI_ARG_DEF (stmt, merge_e->dest_idx, scalar_result);
+		    }
+		}
 	      update_stmt (use_stmt);
 	    }
         }
@@ -10481,6 +10516,156 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return true;
 }
 
+/* Function vectorizable_live_operation_1.
+
+   helper function for vectorizable_live_operation.  */
+
+tree
+vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
+			       stmt_vec_info stmt_info, edge exit_e,
+			       tree vectype, int ncopies, slp_tree slp_node,
+			       tree bitsize, tree bitstart, tree vec_lhs,
+			       tree lhs_type, bool restart_loop,
+			       gimple_stmt_iterator *exit_gsi)
+{
+  basic_block exit_bb = exit_e->dest;
+  gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+  tree vec_lhs_phi = copy_ssa_name (vec_lhs);
+  gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
+  for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
+    SET_PHI_ARG_DEF (phi, i, vec_lhs);
+
+  gimple_seq stmts = NULL;
+  tree new_tree;
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (ncopies == 1 && !slp_node);
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree len = vect_get_loop_len (loop_vinfo, &gsi,
+				    &LOOP_VINFO_LENS (loop_vinfo),
+				    1, vectype, 0, 0);
+
+      /* BIAS - 1.  */
+      signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+      tree bias_minus_one
+	= int_const_binop (MINUS_EXPR,
+			   build_int_cst (TREE_TYPE (len), biasval),
+			   build_one_cst (TREE_TYPE (len)));
+
+      /* LAST_INDEX = LEN + (BIAS - 1).  */
+      tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
+				     len, bias_minus_one);
+
+      /* This needs to implement extraction of the first index, but not sure
+	 how the LEN stuff works.  At the moment we shouldn't get here since
+	 there's no LEN support for early breaks.  But guard this so there's
+	 no incorrect codegen.  */
+      gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+      /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
+      tree scalar_res
+	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
+			vec_lhs_phi, last_index);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (!slp_node);
+      tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
+				      &LOOP_VINFO_MASKS (loop_vinfo),
+				      1, vectype, 0);
+      tree scalar_res;
+
+      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* First create the permuted mask.  */
+	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	  tree perm_dest = copy_ssa_name (mask);
+	  gimple *perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+				       mask, perm_mask);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  mask = perm_dest;
+
+	  /* Then permute the vector contents.  */
+	  tree perm_elem = perm_mask_for_reverse (vectype);
+	  perm_dest = copy_ssa_name (vec_lhs_phi);
+	  perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+				       vec_lhs_phi, perm_elem);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  vec_lhs_phi = perm_dest;
+	}
+
+      gimple_seq_add_seq (&stmts, tem);
+
+      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				 mask, vec_lhs_phi);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else
+    {
+      tree bftype = TREE_TYPE (vectype);
+      if (VECTOR_BOOLEAN_TYPE_P (vectype))
+	bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
+      new_tree = build3 (BIT_FIELD_REF, bftype, vec_lhs_phi, bitsize, bitstart);
+      new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
+				       &stmts, true, NULL_TREE);
+    }
+
+  *exit_gsi = gsi_after_labels (exit_bb);
+  if (stmts)
+    gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
+
+  return new_tree;
+}
+
+/* Find the edge that's the final one in the path from SRC to DEST and
+   return it.  This edge must exist in at most one forwarder edge between.  */
+
+static edge
+find_connected_edge (edge src, basic_block dest)
+{
+   if (src->dest == dest)
+     return src;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, dest->preds)
+    {
+      if (src->dest == e->src)
+	return e;
+    }
+
+  return NULL;
+}
+
 /* Function vectorizable_live_operation.
 
    STMT_INFO computes a value that is used outside the loop.  Check if
@@ -10505,7 +10690,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
   int vec_entry = 0;
   poly_uint64 vec_index = 0;
 
-  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
+  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
+	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   /* If a stmt of a reduction is live, vectorize it via
      vect_create_epilog_for_reduction.  vectorizable_reduction assessed
@@ -10530,8 +10716,22 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
 	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
 	return true;
+
+      /* If early break we only have to materialize the reduction on the merge
+	 block, but we have to find an alternate exit first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
+						slp_node, slp_node_instance,
+						exit, false);
+	}
+
       vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
-					slp_node_instance);
+					slp_node_instance,
+					LOOP_VINFO_IV_EXIT (loop_vinfo));
+
       return true;
     }
 
@@ -10683,103 +10883,63 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-      gcc_assert (single_pred_p (exit_bb));
-
-      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
-      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
-
-      gimple_seq stmts = NULL;
-      tree new_tree;
-      if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree len
-	    = vect_get_loop_len (loop_vinfo, &gsi,
-				 &LOOP_VINFO_LENS (loop_vinfo),
-				 1, vectype, 0, 0);
-
-	  /* BIAS - 1.  */
-	  signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
-	  tree bias_minus_one
-	    = int_const_binop (MINUS_EXPR,
-			       build_int_cst (TREE_TYPE (len), biasval),
-			       build_one_cst (TREE_TYPE (len)));
-
-	  /* LAST_INDEX = LEN + (BIAS - 1).  */
-	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
-					  len, bias_minus_one);
-
-	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
-	  tree scalar_res
-	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
-			    vec_lhs_phi, last_index);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
-					  &LOOP_VINFO_MASKS (loop_vinfo),
-					  1, vectype, 0);
-	  gimple_seq_add_seq (&stmts, tem);
-	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-					  mask, vec_lhs_phi);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else
-	{
-	  tree bftype = TREE_TYPE (vectype);
-	  if (VECTOR_BOOLEAN_TYPE_P (vectype))
-	    bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
-	  new_tree = build3 (BIT_FIELD_REF, bftype,
-			     vec_lhs_phi, bitsize, bitstart);
-	  new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
-					   &stmts, true, NULL_TREE);
-	}
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	if (!is_gimple_debug (use_stmt)
+	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	  {
+	    basic_block use_bb = gimple_bb (use_stmt);
+	    if (!is_a <gphi *> (use_stmt))
+	      continue;
+	    for (auto exit_e : get_loop_exit_edges (loop))
+	      {
+		/* See if this exit leads to the value.  */
+		edge dest_e = find_connected_edge (exit_e, use_bb);
+		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e) != lhs)
+		  continue;
 
-      gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
-      if (stmts)
-	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+		gimple *tmp_vec_stmt = vec_stmt;
+		tree tmp_vec_lhs = vec_lhs;
+		tree tmp_bitstart = bitstart;
+		/* For early exit where the exit is not in the BB that leads
+		   to the latch then we're restarting the iteration in the
+		   scalar loop.  So get the first live value.  */
+		restart_loop = restart_loop || exit_e != main_e;
+		if (restart_loop)
+		  {
+		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
+		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
+		  }
 
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
-	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
-	    {
-	      remove_phi_node (&gsi, false);
-	      tree lhs_phi = gimple_phi_result (phi);
-	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
-	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
-	    }
-	  else
-	    gsi_next (&gsi);
-	}
+		gimple_stmt_iterator exit_gsi;
+		tree new_tree
+		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
+						   exit_e, vectype, ncopies,
+						   slp_node, bitsize,
+						   tmp_bitstart, tmp_vec_lhs,
+						   lhs_type, restart_loop,
+						   &exit_gsi);
+
+		/* Use the empty block on the exit to materialize the new stmts
+		   so we can use update the PHI here.  */
+		if (gimple_phi_num_args (use_stmt) == 1)
+		  {
+		    auto gsi = gsi_for_stmt (use_stmt);
+		    remove_phi_node (&gsi, false);
+		    tree lhs_phi = gimple_phi_result (use_stmt);
+		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
+		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+		  }
+		else
+		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
+	      }
+	  }
 
       /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
       FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
@@ -11797,6 +11957,21 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 	      if (dump_enabled_p ())
 		dump_printf_loc (MSG_NOTE, vect_location, "transform phi.\n");
 	      vect_transform_stmt (loop_vinfo, stmt_info, NULL, NULL, NULL);
+	      /* If vectorizing early break we must also vectorize the use of
+		 the PHIs as a live operation.  */
+	      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		  && !STMT_VINFO_LIVE_P (stmt_info)
+		  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_NOTE, vect_location,
+			 "----> vectorizing early break reduc or induc phi: %G",
+			 (gimple *) phi);
+		  bool done
+		    = vectorizable_live_operation (loop_vinfo, stmt_info, NULL,
+						   NULL, -1, true, NULL);
+		  gcc_assert (done);
+		}
 	    }
 	}
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fe38beb4fa1d9f8593445354f56ba52e10a040cd..f1b6a13395f286f9997530bbe57cda3a00502f8f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
    - it has uses outside the loop.
    - it has vdefs (it alters memory).
    - control stmts in the loop (except for the exit condition).
+   - it is an induction and we have multiple exits.
 
    CHECKME: what other side effects would the vectorizer allow?  */
 
@@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	}
     }
 
+  /* Check if it's an induction and multiple exits.  In this case there will be
+     a usage later on after peeling which is needed for the alternate exit.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "vec_stmt_relevant_p: induction forced for "
+			   "early break.\n");
+      *relevant = vect_used_in_scope;
+
+    }
+
   if (*live_p && *relevant == vect_unused_in_scope
       && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
     {
@@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 076a698eb4288f68e81f91923f7e3e8d181ad685..de673ae56eac455c9560a29d7f3792b6c3c49f3b 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2227,6 +2227,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
 extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
+extern void vect_iv_increment_position (edge, gimple_stmt_iterator *, bool *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
@@ -2248,6 +2249,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

[-- Attachment #2: rb17968.patch --]
[-- Type: application/octet-stream, Size: 21132 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 476be8a0bb6da2d06c4ca7052cb07bacecca60b1..1a4ba349fb6ae39c79401aecd4e7eaaaa9e2b8a0 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -453,7 +453,7 @@ vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
    INSERT_AFTER is set to true if the increment should be inserted after
    *BSI.  */
 
-static void
+void
 vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
 			    bool *insert_after)
 {
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..b42318b2999e6a27e6983382190792602cb25af1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2163,6 +2163,15 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
 	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
 					      -1, false, &cost_vec);
 
+	  /* Check if we can perform the operation for early break if we force
+	     the live operation.  */
+	  if (ok
+	      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+	      && !STMT_VINFO_LIVE_P (stmt_info)
+	      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
+					      -1, false, &cost_vec);
+
           if (!ok)
 	    return opt_result::failure_at (phi,
 					   "not vectorized: relevant phi not "
@@ -5842,6 +5851,10 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
    SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
    REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
      (counting from 0)
+   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
+     exit this edge is always the main loop exit.
+   MAIN_EXIT_P indicates whether we are updating the main exit or an alternate
+     exit.  This determines whether we use the final or original value.
 
    This function:
    1. Completes the reduction def-use cycles.
@@ -5882,7 +5895,9 @@ static void
 vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 				  stmt_vec_info stmt_info,
 				  slp_tree slp_node,
-				  slp_instance slp_node_instance)
+				  slp_instance slp_node_instance,
+				  edge loop_exit,
+				  bool main_exit_p = true)
 {
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
@@ -6053,7 +6068,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
       /* Create an induction variable.  */
       gimple_stmt_iterator incr_gsi;
       bool insert_after;
-      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
       create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
 		 insert_after, &indx_before_incr, &indx_after_incr);
 
@@ -6132,23 +6147,30 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+  /* We need to reduce values in all exits.  */
+  exit_bb = loop_exit->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
+  vec <gimple *> vec_stmts;
+  if (main_exit_p)
+    vec_stmts = STMT_VINFO_VEC_STMTS (rdef_info);
+  else
+    vec_stmts = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF (rdef_info));
+
   for (unsigned i = 0; i < vec_num; i++)
     {
       gimple_seq stmts = NULL;
       if (slp_node)
 	def = vect_get_slp_vect_def (slp_node, i);
       else
-	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
+	def = gimple_get_lhs (vec_stmts[0]);
       for (j = 0; j < ncopies; j++)
 	{
 	  tree new_def = copy_ssa_name (def);
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
-	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
+	    def = gimple_get_lhs (vec_stmts[j]);
+	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -6885,7 +6907,20 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
           FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
 	    {
 	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-		SET_USE (use_p, scalar_result);
+		{
+		  gimple *stmt = USE_STMT (use_p);
+		  if (main_exit_p)
+		    SET_USE (use_p, scalar_result);
+		  else if (is_a <gphi *> (stmt))
+		    {
+		      /* If an early exit only update usages in the merge
+			 block.  */
+		      edge merge_e = single_succ_edge (loop_exit->dest);
+		      if (gimple_bb (stmt) != merge_e->dest)
+			continue;
+		      SET_PHI_ARG_DEF (stmt, merge_e->dest_idx, scalar_result);
+		    }
+		}
 	      update_stmt (use_stmt);
 	    }
         }
@@ -10481,6 +10516,156 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return true;
 }
 
+/* Function vectorizable_live_operation_1.
+
+   helper function for vectorizable_live_operation.  */
+
+tree
+vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
+			       stmt_vec_info stmt_info, edge exit_e,
+			       tree vectype, int ncopies, slp_tree slp_node,
+			       tree bitsize, tree bitstart, tree vec_lhs,
+			       tree lhs_type, bool restart_loop,
+			       gimple_stmt_iterator *exit_gsi)
+{
+  basic_block exit_bb = exit_e->dest;
+  gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+  tree vec_lhs_phi = copy_ssa_name (vec_lhs);
+  gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
+  for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
+    SET_PHI_ARG_DEF (phi, i, vec_lhs);
+
+  gimple_seq stmts = NULL;
+  tree new_tree;
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (ncopies == 1 && !slp_node);
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree len = vect_get_loop_len (loop_vinfo, &gsi,
+				    &LOOP_VINFO_LENS (loop_vinfo),
+				    1, vectype, 0, 0);
+
+      /* BIAS - 1.  */
+      signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+      tree bias_minus_one
+	= int_const_binop (MINUS_EXPR,
+			   build_int_cst (TREE_TYPE (len), biasval),
+			   build_one_cst (TREE_TYPE (len)));
+
+      /* LAST_INDEX = LEN + (BIAS - 1).  */
+      tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
+				     len, bias_minus_one);
+
+      /* This needs to implement extraction of the first index, but not sure
+	 how the LEN stuff works.  At the moment we shouldn't get here since
+	 there's no LEN support for early breaks.  But guard this so there's
+	 no incorrect codegen.  */
+      gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
+
+      /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
+      tree scalar_res
+	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
+			vec_lhs_phi, last_index);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+    {
+      /* Emit:
+
+	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
+
+	 where VEC_LHS is the vectorized live-out result and MASK is
+	 the loop mask for the final iteration.  */
+      gcc_assert (!slp_node);
+      tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
+      gimple_seq tem = NULL;
+      gimple_stmt_iterator gsi = gsi_last (tem);
+      tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
+				      &LOOP_VINFO_MASKS (loop_vinfo),
+				      1, vectype, 0);
+      tree scalar_res;
+
+      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* First create the permuted mask.  */
+	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	  tree perm_dest = copy_ssa_name (mask);
+	  gimple *perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+				       mask, perm_mask);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  mask = perm_dest;
+
+	  /* Then permute the vector contents.  */
+	  tree perm_elem = perm_mask_for_reverse (vectype);
+	  perm_dest = copy_ssa_name (vec_lhs_phi);
+	  perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+				       vec_lhs_phi, perm_elem);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  vec_lhs_phi = perm_dest;
+	}
+
+      gimple_seq_add_seq (&stmts, tem);
+
+      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				 mask, vec_lhs_phi);
+
+      /* Convert the extracted vector element to the scalar type.  */
+      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
+    }
+  else
+    {
+      tree bftype = TREE_TYPE (vectype);
+      if (VECTOR_BOOLEAN_TYPE_P (vectype))
+	bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
+      new_tree = build3 (BIT_FIELD_REF, bftype, vec_lhs_phi, bitsize, bitstart);
+      new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
+				       &stmts, true, NULL_TREE);
+    }
+
+  *exit_gsi = gsi_after_labels (exit_bb);
+  if (stmts)
+    gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
+
+  return new_tree;
+}
+
+/* Find the edge that's the final one in the path from SRC to DEST and
+   return it.  This edge must exist in at most one forwarder edge between.  */
+
+static edge
+find_connected_edge (edge src, basic_block dest)
+{
+   if (src->dest == dest)
+     return src;
+
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, dest->preds)
+    {
+      if (src->dest == e->src)
+	return e;
+    }
+
+  return NULL;
+}
+
 /* Function vectorizable_live_operation.
 
    STMT_INFO computes a value that is used outside the loop.  Check if
@@ -10505,7 +10690,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
   int vec_entry = 0;
   poly_uint64 vec_index = 0;
 
-  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
+  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
+	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   /* If a stmt of a reduction is live, vectorize it via
      vect_create_epilog_for_reduction.  vectorizable_reduction assessed
@@ -10530,8 +10716,22 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
 	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
 	return true;
+
+      /* If early break we only have to materialize the reduction on the merge
+	 block, but we have to find an alternate exit first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
+						slp_node, slp_node_instance,
+						exit, false);
+	}
+
       vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
-					slp_node_instance);
+					slp_node_instance,
+					LOOP_VINFO_IV_EXIT (loop_vinfo));
+
       return true;
     }
 
@@ -10683,103 +10883,63 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-      gcc_assert (single_pred_p (exit_bb));
-
-      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
-      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
-
-      gimple_seq stmts = NULL;
-      tree new_tree;
-      if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree len
-	    = vect_get_loop_len (loop_vinfo, &gsi,
-				 &LOOP_VINFO_LENS (loop_vinfo),
-				 1, vectype, 0, 0);
-
-	  /* BIAS - 1.  */
-	  signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
-	  tree bias_minus_one
-	    = int_const_binop (MINUS_EXPR,
-			       build_int_cst (TREE_TYPE (len), biasval),
-			       build_one_cst (TREE_TYPE (len)));
-
-	  /* LAST_INDEX = LEN + (BIAS - 1).  */
-	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
-					  len, bias_minus_one);
-
-	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
-	  tree scalar_res
-	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
-			    vec_lhs_phi, last_index);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-	{
-	  /* Emit:
-
-	       SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
-
-	     where VEC_LHS is the vectorized live-out result and MASK is
-	     the loop mask for the final iteration.  */
-	  gcc_assert (ncopies == 1 && !slp_node);
-	  tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
-	  gimple_seq tem = NULL;
-	  gimple_stmt_iterator gsi = gsi_last (tem);
-	  tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
-					  &LOOP_VINFO_MASKS (loop_vinfo),
-					  1, vectype, 0);
-	  gimple_seq_add_seq (&stmts, tem);
-	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-					  mask, vec_lhs_phi);
-
-	  /* Convert the extracted vector element to the scalar type.  */
-	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
-	}
-      else
-	{
-	  tree bftype = TREE_TYPE (vectype);
-	  if (VECTOR_BOOLEAN_TYPE_P (vectype))
-	    bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
-	  new_tree = build3 (BIT_FIELD_REF, bftype,
-			     vec_lhs_phi, bitsize, bitstart);
-	  new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
-					   &stmts, true, NULL_TREE);
-	}
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	if (!is_gimple_debug (use_stmt)
+	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	  {
+	    basic_block use_bb = gimple_bb (use_stmt);
+	    if (!is_a <gphi *> (use_stmt))
+	      continue;
+	    for (auto exit_e : get_loop_exit_edges (loop))
+	      {
+		/* See if this exit leads to the value.  */
+		edge dest_e = find_connected_edge (exit_e, use_bb);
+		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e) != lhs)
+		  continue;
 
-      gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
-      if (stmts)
-	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+		gimple *tmp_vec_stmt = vec_stmt;
+		tree tmp_vec_lhs = vec_lhs;
+		tree tmp_bitstart = bitstart;
+		/* For early exit where the exit is not in the BB that leads
+		   to the latch then we're restarting the iteration in the
+		   scalar loop.  So get the first live value.  */
+		restart_loop = restart_loop || exit_e != main_e;
+		if (restart_loop)
+		  {
+		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
+		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
+		  }
 
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
-	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
-	    {
-	      remove_phi_node (&gsi, false);
-	      tree lhs_phi = gimple_phi_result (phi);
-	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
-	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
-	    }
-	  else
-	    gsi_next (&gsi);
-	}
+		gimple_stmt_iterator exit_gsi;
+		tree new_tree
+		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
+						   exit_e, vectype, ncopies,
+						   slp_node, bitsize,
+						   tmp_bitstart, tmp_vec_lhs,
+						   lhs_type, restart_loop,
+						   &exit_gsi);
+
+		/* Use the empty block on the exit to materialize the new stmts
+		   so we can use update the PHI here.  */
+		if (gimple_phi_num_args (use_stmt) == 1)
+		  {
+		    auto gsi = gsi_for_stmt (use_stmt);
+		    remove_phi_node (&gsi, false);
+		    tree lhs_phi = gimple_phi_result (use_stmt);
+		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
+		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+		  }
+		else
+		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
+	      }
+	  }
 
       /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
       FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
@@ -11797,6 +11957,21 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
 	      if (dump_enabled_p ())
 		dump_printf_loc (MSG_NOTE, vect_location, "transform phi.\n");
 	      vect_transform_stmt (loop_vinfo, stmt_info, NULL, NULL, NULL);
+	      /* If vectorizing early break we must also vectorize the use of
+		 the PHIs as a live operation.  */
+	      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		  && !STMT_VINFO_LIVE_P (stmt_info)
+		  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_NOTE, vect_location,
+			 "----> vectorizing early break reduc or induc phi: %G",
+			 (gimple *) phi);
+		  bool done
+		    = vectorizable_live_operation (loop_vinfo, stmt_info, NULL,
+						   NULL, -1, true, NULL);
+		  gcc_assert (done);
+		}
 	    }
 	}
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fe38beb4fa1d9f8593445354f56ba52e10a040cd..f1b6a13395f286f9997530bbe57cda3a00502f8f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
    - it has uses outside the loop.
    - it has vdefs (it alters memory).
    - control stmts in the loop (except for the exit condition).
+   - it is an induction and we have multiple exits.
 
    CHECKME: what other side effects would the vectorizer allow?  */
 
@@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	}
     }
 
+  /* Check if it's an induction and multiple exits.  In this case there will be
+     a usage later on after peeling which is needed for the alternate exit.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "vec_stmt_relevant_p: induction forced for "
+			   "early break.\n");
+      *relevant = vect_used_in_scope;
+
+    }
+
   if (*live_p && *relevant == vect_unused_in_scope
       && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
     {
@@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 076a698eb4288f68e81f91923f7e3e8d181ad685..de673ae56eac455c9560a29d7f3792b6c3c49f3b 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2227,6 +2227,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
 extern edge vec_init_loop_exit_info (class loop *);
 extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
+extern void vect_iv_increment_position (edge, gimple_stmt_iterator *, bool *);
 
 /* In tree-vect-stmts.cc.  */
 extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
@@ -2248,6 +2249,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks
  2023-11-24 10:16         ` Tamar Christina
@ 2023-11-24 12:38           ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-11-24 12:38 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 24 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> Here's an updated patch, which takes a slightly different approach but makes things much easier later on.
> 
> Peeling for early breaks works by redirecting all early break exits to a
> single "early break" block and combine them and the normal exit edge together
> later in a different block which then goes into the epilog preheader.
> 
> This allows us to re-use all the existing code for IV updates, Additionally this
> also enables correct linking for multiple vector epilogues.
> 
> flush_pending_stmts cannot be used in this scenario since it updates the PHI
> nodes in the order that they are in the exit destination blocks.  This means
> they are in CFG visit order.  With a single exit this doesn't matter but with
> multiple exits with different live values through the different exits the order
> usually does not line up.
> 
> Additionally the vectorizer helper functions expect to be able to iterate over
> the nodes in the order that they occur in the loop header blocks.  This is an
> invariant we must maintain.  To do this we just inline the work
> flush_pending_stmts but maintain the order by using the header blocks to guide
> the work.
> 
> The way peeling is done result in LIM noticing that in some cases the condition
> and the results are loop invariant and tries to move them out of the loop.
> 
> While the resulting code is operationally sound, moving the compare out of the
> gcond results in generating code that no longer branches, so cbranch is no
> longer applicable.  As such I now add code to check during this motion to see
> if the target supports flag setting vector comparison as general operation.
> 
> Because of the change in peeling I now also have to update the BB counts for
> the loop exit intermediate block.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* gcc/tree-ssa-loop-im.cc (compute_invariantness): Import insn-codes.h
> 	and optabs-tree.h and check for vector compare motion out of gcond.
> 	* tree-vect-loop-manip.cc
> 	(slpeel_tree_duplicate_loop_to_edge_cfg): Peel using intermediate blocks.
> 	(vect_update_ivs_after_vectorizer): Drop assert.
> 	(vect_do_peeling): Correct BB count for new intermediate block.
> 	* tree-vectorizer.h (is_loop_header_bb_p): Drop assert.
> 	(slpeel_tree_duplicate_loop_to_edge_cfg): Update signature.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
> index 396963b6754c7671e2e5404302a69129918555e2..92a9318a1ca0a2da50ff2f29cf271d2e78fddd77 100644
> --- a/gcc/tree-ssa-loop-im.cc
> +++ b/gcc/tree-ssa-loop-im.cc
> @@ -48,6 +48,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-dfa.h"
>  #include "tree-ssa.h"
>  #include "dbgcnt.h"
> +#include "insn-codes.h"
> +#include "optabs-tree.h"
>  
>  /* TODO:  Support for predicated code motion.  I.e.
>  
> @@ -1138,6 +1140,24 @@ compute_invariantness (basic_block bb)
>  	    continue;
>  	  }
>  
> +	/* Check if one of the depedent statement is a vector compare whether
> +	   the target supports it,  otherwise it's invalid to hoist it out of
> +	   the gcond it belonged to.  */
> +	for (auto dep_stmt : lim_data->depends)
> +	  {
> +	     if (is_gimple_assign (dep_stmt)
> +		 && VECTOR_TYPE_P (TREE_TYPE (gimple_assign_lhs (dep_stmt))))
> +		{
> +		  tree type = TREE_TYPE (gimple_assign_lhs (dep_stmt));
> +		  auto code = gimple_assign_rhs_code (dep_stmt);
> +		  if (!target_supports_op_p (type, code, optab_vector))
> +		    pos = MOVE_IMPOSSIBLE;
> +		}
> +	  }

I think it's more natural to handle this in determine_max_movement
where we specifically look at the condition we are going to turn
into a COND_EXPR condition.

I think it's also independent of this series - the issue should
be latent, but possibly only triggerable with a GIMPLE testcase.

Can you split it out?

The rest of the patch is OK.

Thanks,
Richard.

> +
> +	if (pos == MOVE_IMPOSSIBLE)
> +	  continue;
> +
>  	if (dump_file && (dump_flags & TDF_DETAILS))
>  	  {
>  	    print_gimple_stmt (dump_file, stmt, 2);
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index b9161274ce401a7307f3e61ad23aa036701190d7..0b042b2baf976572af962dd40d5dc311a419ee60 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1403,13 +1403,16 @@ vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info loop_vinfo
>     copies remains the same.
>  
>     If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms
> -   dominators were updated during the peeling.  */
> +   dominators were updated during the peeling.  When doing early break vectorization
> +   then LOOP_VINFO needs to be provided and is used to keep track of any newly created
> +   memory references that need to be updated should we decide to vectorize.  */
>  
>  class loop *
>  slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  					class loop *scalar_loop,
>  					edge scalar_exit, edge e, edge *new_e,
> -					bool flow_loops)
> +					bool flow_loops,
> +					vec<basic_block> *updated_doms)
>  {
>    class loop *new_loop;
>    basic_block *new_bbs, *bbs, *pbbs;
> @@ -1526,7 +1529,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>        }
>  
>    auto loop_exits = get_loop_exit_edges (loop);
> +  bool multiple_exits_p = loop_exits.length () > 1;
>    auto_vec<basic_block> doms;
> +  class loop *update_loop = NULL;
>  
>    if (at_exit) /* Add the loop copy at exit.  */
>      {
> @@ -1536,39 +1541,65 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  	  flush_pending_stmts (new_exit);
>  	}
>  
> +      bool multiple_exits_p = loop_exits.length () > 1;
> +      basic_block main_loop_exit_block = new_preheader;
> +      basic_block alt_loop_exit_block = NULL;
> +      /* Create intermediate edge for main exit.  But only useful for early
> +	 exits.  */
> +      if (multiple_exits_p)
> +	{
> +	  edge loop_e = single_succ_edge (new_preheader);
> +	  new_preheader = split_edge (loop_e);
> +	}
> +
>        auto_vec <gimple *> new_phis;
>        hash_map <tree, tree> new_phi_args;
>        /* First create the empty phi nodes so that when we flush the
>  	 statements they can be filled in.   However because there is no order
>  	 between the PHI nodes in the exits and the loop headers we need to
>  	 order them base on the order of the two headers.  First record the new
> -	 phi nodes.  */
> -      for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> +	 phi nodes. Then redirect the edges and flush the changes.  This writes
> +	 out the new SSA names.  */
> +      for (auto gsi_from = gsi_start_phis (loop_exit->dest);
>  	   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
>  	{
>  	  gimple *from_phi = gsi_stmt (gsi_from);
>  	  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> -	  gphi *res = create_phi_node (new_res, new_preheader);
> +	  gphi *res = create_phi_node (new_res, main_loop_exit_block);
>  	  new_phis.safe_push (res);
>  	}
>  
> -      /* Then redirect the edges and flush the changes.  This writes out the new
> -	 SSA names.  */
> -      for (edge exit : loop_exits)
> +      for (auto exit : loop_exits)
>  	{
> -	  edge temp_e = redirect_edge_and_branch (exit, new_preheader);
> -	  flush_pending_stmts (temp_e);
> +	  basic_block dest = main_loop_exit_block;
> +	  if (exit != loop_exit)
> +	    {
> +	      if (!alt_loop_exit_block)
> +		{
> +		  alt_loop_exit_block = split_edge (exit);
> +		  edge res = redirect_edge_and_branch (
> +				single_succ_edge (alt_loop_exit_block),
> +				new_preheader);
> +		  flush_pending_stmts (res);
> +		  continue;
> +		}
> +	      dest = alt_loop_exit_block;
> +	    }
> +	  edge e = redirect_edge_and_branch (exit, dest);
> +	  flush_pending_stmts (e);
>  	}
> +
>        /* Record the new SSA names in the cache so that we can skip materializing
>  	 them again when we fill in the rest of the LCSSA variables.  */
>        for (auto phi : new_phis)
>  	{
> -	  tree new_arg = gimple_phi_arg (phi, 0)->def;
> +	  tree new_arg = gimple_phi_arg (phi, loop_exit->dest_idx)->def;
>  
>  	  if (!SSA_VAR_P (new_arg))
>  	    continue;
> +
>  	  /* If the PHI MEM node dominates the loop then we shouldn't create
> -	      a new LC-SSSA PHI for it in the intermediate block.   */
> +	     a new LC-SSSA PHI for it in the intermediate block.   */
>  	  /* A MEM phi that consitutes a new DEF for the vUSE chain can either
>  	     be a .VDEF or a PHI that operates on MEM. And said definition
>  	     must not be inside the main loop.  Or we must be a parameter.
> @@ -1584,6 +1615,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  	      remove_phi_node (&gsi, true);
>  	      continue;
>  	    }
> +
> +	  /* If we decide to remove the PHI node we should also not
> +	     rematerialize it later on.  */
>  	  new_phi_args.put (new_arg, gimple_phi_result (phi));
>  
>  	  if (TREE_CODE (new_arg) != SSA_NAME)
> @@ -1595,34 +1629,68 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>  	 preheader block and still find the right LC nodes.  */
>        edge loop_entry = single_succ_edge (new_preheader);
>        if (flow_loops)
> -	for (auto gsi_from = gsi_start_phis (loop->header),
> -	     gsi_to = gsi_start_phis (new_loop->header);
> -	     !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> -	     gsi_next (&gsi_from), gsi_next (&gsi_to))
> -	  {
> -	    gimple *from_phi = gsi_stmt (gsi_from);
> -	    gimple *to_phi = gsi_stmt (gsi_to);
> -	    tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> -						  loop_latch_edge (loop));
> +	{
> +	  /* Link through the main exit first.  */
> +	  for (auto gsi_from = gsi_start_phis (loop->header),
> +	       gsi_to = gsi_start_phis (new_loop->header);
> +	       !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> +	       gsi_next (&gsi_from), gsi_next (&gsi_to))
> +	    {
> +	      gimple *from_phi = gsi_stmt (gsi_from);
> +	      gimple *to_phi = gsi_stmt (gsi_to);
> +	      tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi,
> +						    loop_latch_edge (loop));
> +
> +	      /* Check if we've already created a new phi node during edge
> +		 redirection.  If we have, only propagate the value
> +		 downwards.  */
> +	      if (tree *res = new_phi_args.get (new_arg))
> +		{
> +		  if (multiple_exits_p)
> +		    new_arg = *res;
> +		  else
> +		    {
> +		      adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
> +		      continue;
> +		    }
> +		}
>  
> -	    /* Check if we've already created a new phi node during edge
> -	       redirection.  If we have, only propagate the value downwards.  */
> -	    if (tree *res = new_phi_args.get (new_arg))
> -	      {
> -		adjust_phi_and_debug_stmts (to_phi, loop_entry, *res);
> -		continue;
> -	      }
> +	      tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> +	      gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
>  
> -	    tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> -	    gphi *lcssa_phi = create_phi_node (new_res, new_preheader);
> +	      /* Main loop exit should use the final iter value.  */
> +	      SET_PHI_ARG_DEF (lcssa_phi, loop_exit->dest_idx, new_arg);
>  
> -	    /* Main loop exit should use the final iter value.  */
> -	    add_phi_arg (lcssa_phi, new_arg, loop_exit, UNKNOWN_LOCATION);
> +	      adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
> +	    }
>  
> -	    adjust_phi_and_debug_stmts (to_phi, loop_entry, new_res);
> -	  }
> +	  set_immediate_dominator (CDI_DOMINATORS, main_loop_exit_block,
> +				   loop_exit->src);
> +
> +	  /* Now link the alternative exits.  */
> +	  if (multiple_exits_p)
> +	    {
> +	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> +				       main_loop_exit_block);
> +	      for (auto gsi_from = gsi_start_phis (loop->header),
> +		   gsi_to = gsi_start_phis (new_preheader);
> +		   !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
> +		   gsi_next (&gsi_from), gsi_next (&gsi_to))
> +		{
> +		  gimple *from_phi = gsi_stmt (gsi_from);
> +		  gimple *to_phi = gsi_stmt (gsi_to);
> +
> +		  tree alt_arg = gimple_phi_result (from_phi);
> +		  edge main_e = single_succ_edge (alt_loop_exit_block);
> +		  for (edge e : loop_exits)
> +		    if (e != loop_exit)
> +		      SET_PHI_ARG_DEF (to_phi, main_e->dest_idx, alt_arg);
> +		}
>  
> -      set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
> +	      set_immediate_dominator (CDI_DOMINATORS, new_preheader,
> +				       loop->header);
> +	    }
> +	}
>  
>        if (was_imm_dom || duplicate_outer_loop)
>  	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
> @@ -1634,6 +1702,21 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>        delete_basic_block (preheader);
>        set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header,
>  			       loop_preheader_edge (scalar_loop)->src);
> +
> +      /* Finally after wiring the new epilogue we need to update its main exit
> +	 to the original function exit we recorded.  Other exits are already
> +	 correct.  */
> +      if (multiple_exits_p)
> +	{
> +	  update_loop = new_loop;
> +	  for (edge e : get_loop_exit_edges (loop))
> +	    doms.safe_push (e->dest);
> +	  doms.safe_push (exit_dest);
> +
> +	  /* Likely a fall-through edge, so update if needed.  */
> +	  if (single_succ_p (exit_dest))
> +	    doms.safe_push (single_succ (exit_dest));
> +	}
>      }
>    else /* Add the copy at entry.  */
>      {
> @@ -1681,6 +1764,34 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit,
>        delete_basic_block (new_preheader);
>        set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
>  			       loop_preheader_edge (new_loop)->src);
> +
> +      if (multiple_exits_p)
> +	update_loop = loop;
> +    }
> +
> +  if (multiple_exits_p)
> +    {
> +      for (edge e : get_loop_exit_edges (update_loop))
> +	{
> +	  edge ex;
> +	  edge_iterator ei;
> +	  FOR_EACH_EDGE (ex, ei, e->dest->succs)
> +	    {
> +	      /* Find the first non-fallthrough block as fall-throughs can't
> +		 dominate other blocks.  */
> +	      if (single_succ_p (ex->dest))
> +		{
> +		  doms.safe_push (ex->dest);
> +		  ex = single_succ_edge (ex->dest);
> +		}
> +	      doms.safe_push (ex->dest);
> +	    }
> +	  doms.safe_push (e->dest);
> +	}
> +
> +      iterate_fix_dominators (CDI_DOMINATORS, doms, false);
> +      if (updated_doms)
> +	updated_doms->safe_splice (doms);
>      }
>  
>    free (new_bbs);
> @@ -2050,7 +2161,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>  
>    /* Make sure there exists a single-predecessor exit bb:  */
>    gcc_assert (single_pred_p (exit_bb));
> -  gcc_assert (single_succ_edge (exit_bb) == update_e);
>  
>    for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
>         !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> @@ -3138,6 +3248,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>        epilog->force_vectorize = false;
>        bb_before_epilog = loop_preheader_edge (epilog)->src;
>  
> +      /* Fixup the probabities of the new intermediate blocks that we use to connect
> +	 to the merge block.  The rest are dealt with via bb_before_epilog
> +	 adjustments. */
> +	e->dest->count = e->count ();
> +
>        /* Scalar version loop may be preferred.  In this case, add guard
>  	 and skip to epilog.  Note this only happens when the number of
>  	 iterations of loop is unknown at compile time, otherwise this
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index b5e27d1c46d9cb3dfe5b44f1b49c9e4204572ff1..39aa4d1250efe308acccf484d370f8adfd1ba843 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1821,7 +1821,7 @@ is_loop_header_bb_p (basic_block bb)
>  {
>    if (bb == (bb->loop_father)->header)
>      return true;
> -  gcc_checking_assert (EDGE_COUNT (bb->preds) == 1);
> +
>    return false;
>  }
>  
> @@ -2212,7 +2212,8 @@ extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge,
>  					 const_edge);
>  class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, edge,
>  						    class loop *, edge,
> -						    edge, edge *, bool = true);
> +						    edge, edge *, bool = true,
> +						    vec<basic_block> * = NULL);
>  class loop *vect_loop_versioning (loop_vec_info, gimple *);
>  extern class loop *vect_do_peeling (loop_vec_info, tree, tree,
>  				    tree *, tree *, tree *, int, bool, bool,
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits
  2023-11-24 10:18                                         ` Tamar Christina
@ 2023-11-24 12:41                                           ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-11-24 12:41 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 24 Nov 2023, Tamar Christina wrote:

> Hi,
> 
> Having simplified peeling this patch becomes smaller as well:
> 
> This changes the PHI node updates to support early breaks.
> It has to support both the case where the loop's exit matches the normal loop
> exit and one where the early exit is "inverted", i.e. it's an early exit edge.
> 
> In the latter case we must always restart the loop for VF iterations.  For an
> early exit the reason is obvious, but there are cases where the "normal" exit
> is located before the early one.  This exit then does a check on ivtmp resulting
> in us leaving the loop since it thinks we're done.
> 
> In these case we may still have side-effects to perform so we also go to the
> scalar loop.
> 
> For the "normal" exit niters has already been adjusted for peeling, for the
> early exits we must find out how many iterations we actually did.  So we have
> to recalculate the new position for each exit.
> 
> For the "inverse" case we essentially peel a vector iteration *after* the vector
> loop has finished. i.e. conceptually it's the same as vect epilogue peeling but
> without generating code for the peeled iteration.  That'll be handled by the
> scalar loop.
> 
> To do this we just adjust niters_vector_mult_vf and remove one VF and for masked
> cases we do the same with final_iv.
> 
> The normal IV update code will then generate the correct values for us.
> Eventually VRP will simplify the constant bounds and we get the proper scalar
> unrolling.  This means we don't have to make any changes at all to
> vect_update_ivs_after_vectorizer but dropping some asserts.
> 
> Ok for master?

Nice.  OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
> 	vect_set_loop_condition_partial_vectors_avx512,
> 	vect_gen_vector_loop_niters_mult_vf): Support peeling a vector
> 	iteration.
> 	(vect_update_ivs_after_vectorizer): Drop asserts.
> 	(vect_do_peeling): Skip forwarder edge.
> 	(vect_is_loop_exit_latch_pred): New.
> 	* tree-vectorizer.h (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED): New.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index d61d7c3a189b279fc3bcbb58c3c0e32521db3cf8..476be8a0bb6da2d06c4ca7052cb07bacecca60b1 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -951,7 +951,18 @@ vect_set_loop_condition_partial_vectors (class loop *loop, edge exit_edge,
>  
>    if (final_iv)
>      {
> -      gassign *assign = gimple_build_assign (final_iv, orig_niters);
> +      gassign *assign;
> +      /* If vectorizing an inverted early break loop we have to restart the
> +	 scalar loop at niters - vf.  This matches what we do in
> +	 vect_gen_vector_loop_niters_mult_vf for non-masked loops.  */
> +      if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> +	{
> +	  tree ftype = TREE_TYPE (orig_niters);
> +	  tree vf = build_int_cst (ftype, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
> +	  assign = gimple_build_assign (final_iv, MINUS_EXPR, orig_niters, vf);
> +	}
> +       else
> +	assign = gimple_build_assign (final_iv, orig_niters);
>        gsi_insert_on_edge_immediate (exit_edge, assign);
>      }
>  
> @@ -1188,8 +1199,19 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
>  
>    if (final_iv)
>      {
> -      gassign *assign = gimple_build_assign (final_iv, orig_niters);
> -      gsi_insert_on_edge_immediate (single_exit (loop), assign);
> +      gassign *assign;
> +      /* If vectorizing an inverted early break loop we have to restart the
> +	 scalar loop at niters - vf.  This matches what we do in
> +	 vect_gen_vector_loop_niters_mult_vf for non-masked loops.  */
> +      if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> +	{
> +	  tree ftype = TREE_TYPE (orig_niters);
> +	  tree vf = build_int_cst (ftype, LOOP_VINFO_VECT_FACTOR (loop_vinfo));
> +	  assign = gimple_build_assign (final_iv, MINUS_EXPR, orig_niters, vf);
> +	}
> +       else
> +	assign = gimple_build_assign (final_iv, orig_niters);
> +      gsi_insert_on_edge_immediate (exit_edge, assign);
>      }
>  
>    return cond_stmt;
> @@ -2157,11 +2179,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>    gphi_iterator gsi, gsi1;
>    class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>    basic_block update_bb = update_e->dest;
> -
>    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> -
> -  /* Make sure there exists a single-predecessor exit bb:  */
> -  gcc_assert (single_pred_p (exit_bb));
> +  gimple_stmt_iterator last_gsi = gsi_last_bb (exit_bb);
>  
>    for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb);
>         !gsi_end_p (gsi) && !gsi_end_p (gsi1);
> @@ -2171,7 +2190,6 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>        tree step_expr, off;
>        tree type;
>        tree var, ni, ni_name;
> -      gimple_stmt_iterator last_gsi;
>  
>        gphi *phi = gsi.phi ();
>        gphi *phi1 = gsi1.phi ();
> @@ -2207,7 +2225,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>  	{
>  	  tree stype = TREE_TYPE (step_expr);
>  	  off = fold_build2 (MULT_EXPR, stype,
> -			     fold_convert (stype, niters), step_expr);
> +			       fold_convert (stype, niters), step_expr);
> +
>  	  if (POINTER_TYPE_P (type))
>  	    ni = fold_build_pointer_plus (init_expr, off);
>  	  else
> @@ -2226,9 +2245,9 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
>  
>        var = create_tmp_var (type, "tmp");
>  
> -      last_gsi = gsi_last_bb (exit_bb);
>        gimple_seq new_stmts = NULL;
>        ni_name = force_gimple_operand (ni, &new_stmts, false, var);
> +
>        /* Exit_bb shouldn't be empty.  */
>        if (!gsi_end_p (last_gsi))
>  	{
> @@ -2726,11 +2745,19 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo,
>    int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ();
>    tree type = TREE_TYPE (niters_vector);
>    tree log_vf = build_int_cst (type, exact_log2 (vf));
> +  tree tree_vf = build_int_cst (type, vf);
>    basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
>  
>    gcc_assert (niters_vector_mult_vf_ptr != NULL);
>    tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type,
>  					    niters_vector, log_vf);
> +
> +  /* If we've peeled a vector iteration then subtract one full vector
> +     iteration.  */
> +  if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
> +    niters_vector_mult_vf = fold_build2 (MINUS_EXPR, type,
> +					 niters_vector_mult_vf, tree_vf);
> +
>    if (!is_gimple_val (niters_vector_mult_vf))
>      {
>        tree var = create_tmp_var (type, "niters_vector_mult_vf");
> @@ -3328,6 +3355,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	 niters_vector_mult_vf steps.  */
>        gcc_checking_assert (vect_can_advance_ivs_p (loop_vinfo));
>        update_e = skip_vector ? e : loop_preheader_edge (epilog);
> +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	update_e = single_succ_edge (e->dest);
> +
> +      /* Update the main exit.  */
>        vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf,
>  					update_e);
>  
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 39aa4d1250efe308acccf484d370f8adfd1ba843..de60da31e2a3030a7fbc302d3f676af9683fd019 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1016,6 +1016,8 @@ public:
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
>  #define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> +#define LOOP_VINFO_EARLY_BREAKS_VECT_PEELED(L)  \
> +  (single_pred ((L)->loop->latch) != (L)->vec_loop_iv_exit->src)
>  #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
>  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
>  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
> @@ -2224,6 +2226,7 @@ extern dump_user_location_t find_loop_location (class loop *);
>  extern bool vect_can_advance_ivs_p (loop_vec_info);
>  extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
>  extern edge vec_init_loop_exit_info (class loop *);
> +extern void vect_iv_increment_position (edge, gimple_stmt_iterator *, bool *);
>  
>  /* In tree-vect-stmts.cc.  */
>  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-24 10:20             ` Tamar Christina
@ 2023-11-24 13:23               ` Richard Biener
  2023-11-27 22:47                 ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-24 13:23 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 24 Nov 2023, Tamar Christina wrote:

> Good morning,
> 
> This is a respun patch with a fix for VLA.
> 
> This adds support to vectorizable_live_reduction to handle multiple exits by
> doing a search for which exit the live value should be materialized in.
> 
> Additionally which value in the index we're after depends on whether the exit
> it's materialized in is an early exit or whether the loop's main exit is
> different from the loop's natural one (i.e. the one with the same src block as
> the latch).
> 
> In those two cases we want the first rather than the last value as we're going
> to restart the iteration in the scalar loop.  For VLA this means we need to
> reverse both the mask and vector since there's only a way to get the last
> active element and not the first.
> 
> For inductions and multiple exits:
>   - we test if the target will support vectorizing the induction
>   - mark all inductions in the loop as relevant
>   - for codegen of non-live inductions during codegen
>   - induction during an early exit gets the first element rather than last.
> 
> For reductions and multiple exits:
>   - Reductions for early exits reduces the reduction definition statement
>     rather than the reduction step.  This allows us to get the value at the
>     start of the iteration.
>   - The peeling layout means that we just have to update one block, the merge
>     block.  We expect all the reductions to be the same but we leave it up to
>     the value numbering to clean up any duplicate code as we iterate over all
>     edges.
> 
> These two changes fix the reduction codegen given before which has been added
> to the testsuite for early vect.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
> 	(vect_analyze_loop_operations): Check if target supports vectorizing IV.
> 	(vect_transform_loop): Call vectorizable_live_operation for non-live
> 	inductions or reductions.
> 	(find_connected_edge, vectorizable_live_operation_1): New.
> 	(vect_create_epilog_for_reduction): Support reductions in early break.
> 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> 	(vect_stmt_relevant_p): Mark all inductions when early break as being
> 	relevant.
> 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> 	(vect_iv_increment_position): New.
> 	* tree-vect-loop-manip.cc (vect_iv_increment_position): Expose.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 476be8a0bb6da2d06c4ca7052cb07bacecca60b1..1a4ba349fb6ae39c79401aecd4e7eaaaa9e2b8a0 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -453,7 +453,7 @@ vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
>     INSERT_AFTER is set to true if the increment should be inserted after
>     *BSI.  */
>  
> -static void
> +void
>  vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
>  			    bool *insert_after)
>  {
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..b42318b2999e6a27e6983382190792602cb25af1 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -2163,6 +2163,15 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
>  	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
>  					      -1, false, &cost_vec);
>  
> +	  /* Check if we can perform the operation for early break if we force
> +	     the live operation.  */
> +	  if (ok
> +	      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +	      && !STMT_VINFO_LIVE_P (stmt_info)
> +	      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> +	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL, NULL,
> +					      -1, false, &cost_vec);

can you add && !PURE_SLP_STMT?

> +
>            if (!ok)
>  	    return opt_result::failure_at (phi,
>  					   "not vectorized: relevant phi not "
> @@ -5842,6 +5851,10 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
>     SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
>     REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
>       (counting from 0)
> +   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
> +     exit this edge is always the main loop exit.
> +   MAIN_EXIT_P indicates whether we are updating the main exit or an alternate
> +     exit.  This determines whether we use the final or original value.
>  
>     This function:
>     1. Completes the reduction def-use cycles.
> @@ -5882,7 +5895,9 @@ static void
>  vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>  				  stmt_vec_info stmt_info,
>  				  slp_tree slp_node,
> -				  slp_instance slp_node_instance)
> +				  slp_instance slp_node_instance,
> +				  edge loop_exit,
> +				  bool main_exit_p = true)

isn't main_exit_p computable from 'loop_exit' by comparing that to
the one recorded in loop_vinfo?  If so please do that instead of passing
in another argument.

>  {
>    stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
>    gcc_assert (reduc_info->is_reduc_info);
> @@ -6053,7 +6068,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>        /* Create an induction variable.  */
>        gimple_stmt_iterator incr_gsi;
>        bool insert_after;
> -      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
>        create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
>  		 insert_after, &indx_before_incr, &indx_after_incr);
>  
> @@ -6132,23 +6147,30 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>           Store them in NEW_PHIS.  */
>    if (double_reduc)
>      loop = outer_loop;
> -  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +  /* We need to reduce values in all exits.  */
> +  exit_bb = loop_exit->dest;
>    exit_gsi = gsi_after_labels (exit_bb);
>    reduc_inputs.create (slp_node ? vec_num : ncopies);
> +  vec <gimple *> vec_stmts;
> +  if (main_exit_p)
> +    vec_stmts = STMT_VINFO_VEC_STMTS (rdef_info);
> +  else
> +    vec_stmts = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF (rdef_info));

both would be wrong for SLP, also I think you need to look at
STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))?  For SLP the
PHI SLP node is reached via slp_node_instance->reduc_phis.

I think an overall better structure would be to add a

vect_get_vect_def (stmt_vec_info, slp_tree, unsigned);

abstracting SLP and non-SLP and doing

  for (unsigned i = 0; i < vec_num * ncopies; ++i)
    {
      def = vect_get_vect_def (stmt_info, slp_node, i);
...
    }

and then adjusting stmt_info/slp_node according to main_exit_p?
(would be nice to transition stmt_info->vec_stmts to stmt_info->vec_defs)

That said, wherever possible please think of SLP ;)

> +
>    for (unsigned i = 0; i < vec_num; i++)
>      {
>        gimple_seq stmts = NULL;
>        if (slp_node)
>  	def = vect_get_slp_vect_def (slp_node, i);
>        else
> -	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
> +	def = gimple_get_lhs (vec_stmts[0]);
>        for (j = 0; j < ncopies; j++)
>  	{
>  	  tree new_def = copy_ssa_name (def);
>  	  phi = create_phi_node (new_def, exit_bb);
>  	  if (j)
> -	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> -	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
> +	    def = gimple_get_lhs (vec_stmts[j]);
> +	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
>  	  new_def = gimple_convert (&stmts, vectype, new_def);
>  	  reduc_inputs.quick_push (new_def);
>  	}
> @@ -6885,7 +6907,20 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>            FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
>  	    {
>  	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> -		SET_USE (use_p, scalar_result);
> +		{
> +		  gimple *stmt = USE_STMT (use_p);
> +		  if (main_exit_p)
> +		    SET_USE (use_p, scalar_result);
> +		  else if (is_a <gphi *> (stmt))
> +		    {
> +		      /* If an early exit only update usages in the merge
> +			 block.  */

shouldn't that be the only use at this point anyway?  You only
update uses in PHI nodes btw. and you can use SET_USE, maybe
you wanted to check that
gimple_phi_arg_edge (stmt, phi_arg_index_from_use (use_p)) == merge_e
instead?

That said, the comment could be more precise

Are we calling vect_create_epilog_for_reduction for each early exit?
I suppose not?

> +		      edge merge_e = single_succ_edge (loop_exit->dest);
> +		      if (gimple_bb (stmt) != merge_e->dest)
> +			continue;
> +		      SET_PHI_ARG_DEF (stmt, merge_e->dest_idx, scalar_result);
> +		    }
> +		}
>  	      update_stmt (use_stmt);
>  	    }
>          }
> @@ -10481,6 +10516,156 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>    return true;
>  }
>  
> +/* Function vectorizable_live_operation_1.
> +
> +   helper function for vectorizable_live_operation.  */
> +
> +tree
> +vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> +			       stmt_vec_info stmt_info, edge exit_e,
> +			       tree vectype, int ncopies, slp_tree slp_node,
> +			       tree bitsize, tree bitstart, tree vec_lhs,
> +			       tree lhs_type, bool restart_loop,
> +			       gimple_stmt_iterator *exit_gsi)
> +{
> +  basic_block exit_bb = exit_e->dest;
> +  gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> +
> +  tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> +  gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> +  for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
> +    SET_PHI_ARG_DEF (phi, i, vec_lhs);
> +
> +  gimple_seq stmts = NULL;
> +  tree new_tree;
> +  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> +    {
> +      /* Emit:
> +
> +	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
> +
> +	 where VEC_LHS is the vectorized live-out result and MASK is
> +	 the loop mask for the final iteration.  */
> +      gcc_assert (ncopies == 1 && !slp_node);
> +      gimple_seq tem = NULL;
> +      gimple_stmt_iterator gsi = gsi_last (tem);
> +      tree len = vect_get_loop_len (loop_vinfo, &gsi,
> +				    &LOOP_VINFO_LENS (loop_vinfo),
> +				    1, vectype, 0, 0);
> +
> +      /* BIAS - 1.  */
> +      signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> +      tree bias_minus_one
> +	= int_const_binop (MINUS_EXPR,
> +			   build_int_cst (TREE_TYPE (len), biasval),
> +			   build_one_cst (TREE_TYPE (len)));
> +
> +      /* LAST_INDEX = LEN + (BIAS - 1).  */
> +      tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
> +				     len, bias_minus_one);
> +
> +      /* This needs to implement extraction of the first index, but not sure
> +	 how the LEN stuff works.  At the moment we shouldn't get here since
> +	 there's no LEN support for early breaks.  But guard this so there's
> +	 no incorrect codegen.  */
> +      gcc_assert (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> +
> +      /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
> +      tree scalar_res
> +	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> +			vec_lhs_phi, last_index);
> +
> +      /* Convert the extracted vector element to the scalar type.  */
> +      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> +    }
> +  else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> +    {
> +      /* Emit:
> +
> +	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
> +
> +	 where VEC_LHS is the vectorized live-out result and MASK is
> +	 the loop mask for the final iteration.  */
> +      gcc_assert (!slp_node);
> +      tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
> +      gimple_seq tem = NULL;
> +      gimple_stmt_iterator gsi = gsi_last (tem);
> +      tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
> +				      &LOOP_VINFO_MASKS (loop_vinfo),
> +				      1, vectype, 0);
> +      tree scalar_res;
> +
> +      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
> +	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
> +      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	{
> +	  /* First create the permuted mask.  */
> +	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> +	  tree perm_dest = copy_ssa_name (mask);
> +	  gimple *perm_stmt
> +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> +				       mask, perm_mask);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> +				       &gsi);
> +	  mask = perm_dest;
> +
> +	  /* Then permute the vector contents.  */
> +	  tree perm_elem = perm_mask_for_reverse (vectype);
> +	  perm_dest = copy_ssa_name (vec_lhs_phi);
> +	  perm_stmt
> +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
> +				       vec_lhs_phi, perm_elem);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> +				       &gsi);
> +	  vec_lhs_phi = perm_dest;
> +	}
> +
> +      gimple_seq_add_seq (&stmts, tem);
> +
> +      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> +				 mask, vec_lhs_phi);
> +
> +      /* Convert the extracted vector element to the scalar type.  */
> +      new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> +    }
> +  else
> +    {
> +      tree bftype = TREE_TYPE (vectype);
> +      if (VECTOR_BOOLEAN_TYPE_P (vectype))
> +	bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
> +      new_tree = build3 (BIT_FIELD_REF, bftype, vec_lhs_phi, bitsize, bitstart);
> +      new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
> +				       &stmts, true, NULL_TREE);
> +    }
> +
> +  *exit_gsi = gsi_after_labels (exit_bb);
> +  if (stmts)
> +    gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
> +
> +  return new_tree;
> +}
> +
> +/* Find the edge that's the final one in the path from SRC to DEST and
> +   return it.  This edge must exist in at most one forwarder edge between.  */
> +
> +static edge
> +find_connected_edge (edge src, basic_block dest)
> +{
> +   if (src->dest == dest)
> +     return src;
> +
> +  edge e;
> +  edge_iterator ei;
> +
> +  FOR_EACH_EDGE (e, ei, dest->preds)
> +    {
> +      if (src->dest == e->src)
> +	return e;
> +    }

isn't that just find_edge (src->dest, dest)?

> +  return NULL;
> +}
> +
>  /* Function vectorizable_live_operation.
>  
>     STMT_INFO computes a value that is used outside the loop.  Check if
> @@ -10505,7 +10690,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>    int vec_entry = 0;
>    poly_uint64 vec_index = 0;
>  
> -  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
> +  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
> +	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
>  
>    /* If a stmt of a reduction is live, vectorize it via
>       vect_create_epilog_for_reduction.  vectorizable_reduction assessed
> @@ -10530,8 +10716,22 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>        if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
>  	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
>  	return true;
> +
> +      /* If early break we only have to materialize the reduction on the merge
> +	 block, but we have to find an alternate exit first.  */
> +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	{
> +	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
> +	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
> +	      vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
> +						slp_node, slp_node_instance,
> +						exit, false);

Hmm, for each one.  But we only need a single reduction epilogue, no?
In the merge block?

> +	}
> +
>        vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
> -					slp_node_instance);
> +					slp_node_instance,
> +					LOOP_VINFO_IV_EXIT (loop_vinfo));
> +
>        return true;
>      }
>  
> @@ -10683,103 +10883,63 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>  	   lhs' = new_tree;  */
>  
>        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> -      gcc_assert (single_pred_p (exit_bb));
> -
> -      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> -      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> -      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
> -
> -      gimple_seq stmts = NULL;
> -      tree new_tree;
> -      if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> -	{
> -	  /* Emit:
> -
> -	       SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
> -
> -	     where VEC_LHS is the vectorized live-out result and MASK is
> -	     the loop mask for the final iteration.  */
> -	  gcc_assert (ncopies == 1 && !slp_node);
> -	  gimple_seq tem = NULL;
> -	  gimple_stmt_iterator gsi = gsi_last (tem);
> -	  tree len
> -	    = vect_get_loop_len (loop_vinfo, &gsi,
> -				 &LOOP_VINFO_LENS (loop_vinfo),
> -				 1, vectype, 0, 0);
> -
> -	  /* BIAS - 1.  */
> -	  signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> -	  tree bias_minus_one
> -	    = int_const_binop (MINUS_EXPR,
> -			       build_int_cst (TREE_TYPE (len), biasval),
> -			       build_one_cst (TREE_TYPE (len)));
> -
> -	  /* LAST_INDEX = LEN + (BIAS - 1).  */
> -	  tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
> -					  len, bias_minus_one);
> -
> -	  /* SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>.  */
> -	  tree scalar_res
> -	    = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> -			    vec_lhs_phi, last_index);
> -
> -	  /* Convert the extracted vector element to the scalar type.  */
> -	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> -	}
> -      else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> -	{
> -	  /* Emit:
> -
> -	       SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
> -
> -	     where VEC_LHS is the vectorized live-out result and MASK is
> -	     the loop mask for the final iteration.  */
> -	  gcc_assert (ncopies == 1 && !slp_node);
> -	  tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
> -	  gimple_seq tem = NULL;
> -	  gimple_stmt_iterator gsi = gsi_last (tem);
> -	  tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
> -					  &LOOP_VINFO_MASKS (loop_vinfo),
> -					  1, vectype, 0);
> -	  gimple_seq_add_seq (&stmts, tem);
> -	  tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> -					  mask, vec_lhs_phi);
> -
> -	  /* Convert the extracted vector element to the scalar type.  */
> -	  new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> -	}
> -      else
> -	{
> -	  tree bftype = TREE_TYPE (vectype);
> -	  if (VECTOR_BOOLEAN_TYPE_P (vectype))
> -	    bftype = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 1);
> -	  new_tree = build3 (BIT_FIELD_REF, bftype,
> -			     vec_lhs_phi, bitsize, bitstart);
> -	  new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
> -					   &stmts, true, NULL_TREE);
> -	}
> +      /* Check if we have a loop where the chosen exit is not the main exit,
> +	 in these cases for an early break we restart the iteration the vector code
> +	 did.  For the live values we want the value at the start of the iteration
> +	 rather than at the end.  */
> +      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
> +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> +	if (!is_gimple_debug (use_stmt)
> +	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> +	  {
> +	    basic_block use_bb = gimple_bb (use_stmt);
> +	    if (!is_a <gphi *> (use_stmt))
> +	      continue;
> +	    for (auto exit_e : get_loop_exit_edges (loop))
> +	      {
> +		/* See if this exit leads to the value.  */
> +		edge dest_e = find_connected_edge (exit_e, use_bb);
> +		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e) != lhs)
> +		  continue;
>  
> -      gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
> -      if (stmts)
> -	gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
> +		gimple *tmp_vec_stmt = vec_stmt;
> +		tree tmp_vec_lhs = vec_lhs;
> +		tree tmp_bitstart = bitstart;
> +		/* For early exit where the exit is not in the BB that leads
> +		   to the latch then we're restarting the iteration in the
> +		   scalar loop.  So get the first live value.  */
> +		restart_loop = restart_loop || exit_e != main_e;
> +		if (restart_loop)
> +		  {
> +		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> +		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> +		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
> +		  }
>  
> -      /* Remove existing phis that copy from lhs and create copies
> -	 from new_tree.  */
> -      gimple_stmt_iterator gsi;
> -      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
> -	{
> -	  gimple *phi = gsi_stmt (gsi);
> -	  if ((gimple_phi_arg_def (phi, 0) == lhs))
> -	    {
> -	      remove_phi_node (&gsi, false);
> -	      tree lhs_phi = gimple_phi_result (phi);
> -	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> -	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> -	    }
> -	  else
> -	    gsi_next (&gsi);
> -	}
> +		gimple_stmt_iterator exit_gsi;
> +		tree new_tree
> +		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> +						   exit_e, vectype, ncopies,
> +						   slp_node, bitsize,
> +						   tmp_bitstart, tmp_vec_lhs,
> +						   lhs_type, restart_loop,
> +						   &exit_gsi);
> +
> +		/* Use the empty block on the exit to materialize the new stmts
> +		   so we can use update the PHI here.  */
> +		if (gimple_phi_num_args (use_stmt) == 1)
> +		  {
> +		    auto gsi = gsi_for_stmt (use_stmt);
> +		    remove_phi_node (&gsi, false);
> +		    tree lhs_phi = gimple_phi_result (use_stmt);
> +		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> +		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> +		  }
> +		else
> +		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
> +	      }
> +	  }

Difficult to see what changed due to the split out, guess it'll be ok.

>        /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
>        FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> @@ -11797,6 +11957,21 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>  	      if (dump_enabled_p ())
>  		dump_printf_loc (MSG_NOTE, vect_location, "transform phi.\n");
>  	      vect_transform_stmt (loop_vinfo, stmt_info, NULL, NULL, NULL);
> +	      /* If vectorizing early break we must also vectorize the use of
> +		 the PHIs as a live operation.  */
> +	      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +		  && !STMT_VINFO_LIVE_P (stmt_info)
> +		  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> +		{
> +		  if (dump_enabled_p ())
> +		    dump_printf_loc (MSG_NOTE, vect_location,
> +			 "----> vectorizing early break reduc or induc phi: %G",
> +			 (gimple *) phi);
> +		  bool done
> +		    = vectorizable_live_operation (loop_vinfo, stmt_info, NULL,
> +						   NULL, -1, true, NULL);

you should be able to amend can_vectorize_live_stmts instead by
adding || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo) && vect_induction_def),
then we keep it at one place also where we'd handle the SLP case.

> +		  gcc_assert (done);
> +		}
>  	    }
>  	}
>  
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index fe38beb4fa1d9f8593445354f56ba52e10a040cd..f1b6a13395f286f9997530bbe57cda3a00502f8f 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
>     - it has uses outside the loop.
>     - it has vdefs (it alters memory).
>     - control stmts in the loop (except for the exit condition).
> +   - it is an induction and we have multiple exits.
>  
>     CHECKME: what other side effects would the vectorizer allow?  */
>  
> @@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>  	}
>      }
>  
> +  /* Check if it's an induction and multiple exits.  In this case there will be
> +     a usage later on after peeling which is needed for the alternate exit.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "vec_stmt_relevant_p: induction forced for "
> +			   "early break.\n");
> +      *relevant = vect_used_in_scope;
> +

I think you should instead set *live_p?

> +    }
> +
>    if (*live_p && *relevant == vect_unused_in_scope
>        && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
>      {
> @@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
>  /* If the target supports a permute mask that reverses the elements in
>     a vector of type VECTYPE, return that mask, otherwise return null.  */
>  
> -static tree
> +tree
>  perm_mask_for_reverse (tree vectype)
>  {
>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 076a698eb4288f68e81f91923f7e3e8d181ad685..de673ae56eac455c9560a29d7f3792b6c3c49f3b 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2227,6 +2227,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
>  extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);
>  extern edge vec_init_loop_exit_info (class loop *);
>  extern bool vect_is_loop_exit_latch_pred (edge, class loop *);
> +extern void vect_iv_increment_position (edge, gimple_stmt_iterator *, bool *);
>  
>  /* In tree-vect-stmts.cc.  */
>  extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
> @@ -2248,6 +2249,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
>  				enum vect_def_type *,
>  				tree *, stmt_vec_info * = NULL);
>  extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
> +extern tree perm_mask_for_reverse (tree);
>  extern bool supportable_widening_operation (vec_info*, code_helper,
>  					    stmt_vec_info, tree, tree,
>  					    code_helper*, code_helper*,
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 21/21]Arm: Add MVE cbranch implementation
  2023-11-06  7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
@ 2023-11-27 12:47   ` Kyrylo Tkachov
  0 siblings, 0 replies; 200+ messages in thread
From: Kyrylo Tkachov @ 2023-11-27 12:47 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: nd, Ramana Radhakrishnan, Richard Earnshaw, nickc

Hi Tamar,

> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Monday, November 6, 2023 7:43 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> Subject: [PATCH 21/21]Arm: Add MVE cbranch implementation
> 
> Hi All,
> 
> This adds an implementation for conditional branch optab for MVE.
> 
> Unfortunately MVE has rather limited operations on VPT.P0, we are missing the
> ability to do P0 comparisons and logical OR on P0.
> 
> For that reason we can only support cbranch with 0, as for comparing to a 0
> predicate we don't need to actually do a comparison, we only have to check that
> any bit is set within P0.
> 
> Because we can only do P0 comparisons with 0, the costing of the comparison was
> reduced in order for the compiler not to try to push 0 to a register thinking
> it's too expensive.  For the cbranch implementation to be safe we must see the
> constant 0 vector.
> 
> For the lack of logical OR on P0 we can't really work around.  This means MVE
> can't support cases where the sizes of operands in the comparison don't match,
> i.e. when one operand has been unpacked.
> 
> For e.g.
> 
> void f1 ()
> {
>   for (int i = 0; i < N; i++)
>     {
>       b[i] += a[i];
>       if (a[i] > 0)
> 	break;
>     }
> }
> 
> For 128-bit vectors we generate:
> 
>         vcmp.s32        gt, q3, q1
>         vmrs    r3, p0  @ movhi
>         cbnz    r3, .L2
> 
> MVE does not have 64-bit vector comparisons, as such that is also not supported.
> 
> Bootstrapped arm-none-linux-gnueabihf and regtested with
> -march=armv8.1-m.main+mve -mfpu=auto and no issues.
> 
> Ok for master?
> 

This is okay once the rest goes in.
Thanks,
Kyrill

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* config/arm/arm.cc (arm_rtx_costs_internal): Update costs for pred 0
> 	compares.
> 	* config/arm/mve.md (cbranch<mode>4): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* lib/target-supports.exp (vect_early_break): Add MVE.
> 	* gcc.target/arm/mve/vect-early-break-cbranch.c: New test.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index
> 38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84
> b255a24eb51e32 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code,
> enum rtx_code outer_code,
>  	   || TARGET_HAVE_MVE)
>  	  && simd_immediate_valid_for_move (x, mode, NULL, NULL))
>  	*cost = COSTS_N_INSNS (1);
> +      else if (TARGET_HAVE_MVE
> +	       && outer_code == COMPARE
> +	       && VALID_MVE_PRED_MODE (mode))
> +	/* MVE allows very limited instructions on VPT.P0,  however comparisons
> +	   to 0 do not require us to materialze this constant or require a
> +	   predicate comparison as we can go through SImode.  For that reason
> +	   allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to
> +	   registers as we can't compare two predicates.  */
> +	*cost = COSTS_N_INSNS (1);
>        else
>  	*cost = COSTS_N_INSNS (4);
>        return true;
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index
> 74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38
> 306dd43cbebfb3f 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -6880,6 +6880,21 @@ (define_expand
> "vcond_mask_<mode><MVE_vpred>"
>    DONE;
>  })
> 
> +(define_expand "cbranch<mode>4"
> +  [(set (pc) (if_then_else
> +	      (match_operator 0 "expandable_comparison_operator"
> +	       [(match_operand:MVE_7 1 "register_operand")
> +	        (match_operand:MVE_7 2 "zero_operand")])
> +	      (label_ref (match_operand 3 "" ""))
> +	      (pc)))]
> +  "TARGET_HAVE_MVE"
> +{
> +  rtx val = gen_reg_rtx (SImode);
> +  emit_move_insn (val, gen_lowpart (SImode, operands[1]));
> +  emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3]));
> +  DONE;
> +})
> +
>  ;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
>  (define_expand "@arm_mve_reinterpret<mode>"
>    [(set (match_operand:MVE_vecs 0 "register_operand")
> diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
> b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..c3b8506dca0b2b044e6869a6
> c8259d663c1ff930
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
> @@ -0,0 +1,117 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#define N 640
> +int a[N] = {0};
> +int b[N] = {0};
> +
> +/*
> +** f1:
> +**	...
> +**	vcmp.s32	gt, q[0-9]+, q[0-9]+
> +**	vmrs	r[0-9]+, p0	@ movhi
> +**	cbnz	r[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f1 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] > 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f2:
> +**	...
> +**	vcmp.s32	ge, q[0-9]+, q[0-9]+
> +**	vmrs	r[0-9]+, p0	@ movhi
> +**	cbnz	r[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f2 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] >= 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f3:
> +**	...
> +**	vcmp.i32	eq, q[0-9]+, q[0-9]+
> +**	vmrs	r[0-9]+, p0	@ movhi
> +**	cbnz	r[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f3 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] == 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f4:
> +**	...
> +**	vcmp.i32	ne, q[0-9]+, q[0-9]+
> +**	vmrs	r[0-9]+, p0	@ movhi
> +**	cbnz	r[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f4 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] != 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f5:
> +**	...
> +**	vcmp.s32	lt, q[0-9]+, q[0-9]+
> +**	vmrs	r[0-9]+, p0	@ movhi
> +**	cbnz	r[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f5 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] < 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f6:
> +**	...
> +**	vcmp.s32	le, q[0-9]+, q[0-9]+
> +**	vmrs	r[0-9]+, p0	@ movhi
> +**	cbnz	r[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f6 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] <= 0)
> +	break;
> +    }
> +}
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-
> supports.exp
> index
> 8f58671e6cfd3546c6a98e40341fe31c6492594b..1eef764542a782786e27ed935a
> 06243e319ae3fc 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -3785,6 +3785,8 @@ proc check_effective_target_vect_early_break { } {
>        expr {
>  	[istarget aarch64*-*-*]
>  	|| [check_effective_target_arm_neon_ok]
> +	|| ([check_effective_target_arm_v8_1m_mve_fp_ok]
> +	     && [check_effective_target_arm_little_endian])
>  	}}]
>  }
>  # Return 1 if the target supports hardware vectorization of complex additions of
> 
> 
> 
> 
> --

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation
  2023-11-06  7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
@ 2023-11-27 12:48   ` Kyrylo Tkachov
  0 siblings, 0 replies; 200+ messages in thread
From: Kyrylo Tkachov @ 2023-11-27 12:48 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: nd, Ramana Radhakrishnan, Richard Earnshaw, nickc

Hi Tamar,

> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Monday, November 6, 2023 7:43 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> Subject: [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation
> 
> Hi All,
> 
> This adds an implementation for conditional branch optab for AArch32.
> 
> For e.g.
> 
> void f1 ()
> {
>   for (int i = 0; i < N; i++)
>     {
>       b[i] += a[i];
>       if (a[i] > 0)
> 	break;
>     }
> }
> 
> For 128-bit vectors we generate:
> 
>         vcgt.s32        q8, q9, #0
>         vpmax.u32       d7, d16, d17
>         vpmax.u32       d7, d7, d7
>         vmov    r3, s14 @ int
>         cmp     r3, #0
> 
> and of 64-bit vector we can omit one vpmax as we still need to compress to
> 32-bits.
> 
> Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
> 
> Ok for master?
> 

This is okay once the prerequisites go in.
Thanks,
Kyrill

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* config/arm/neon.md (cbranch<mode>4): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* lib/target-supports.exp (vect_early_break): Add AArch32.
> 	* gcc.target/arm/vect-early-break-cbranch.c: New test.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index
> d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc
> 344d2243dcb63 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -408,6 +408,45 @@ (define_insn "vec_extract<mode><V_elem_l>"
>    [(set_attr "type" "neon_store1_one_lane<q>,neon_to_gp<q>")]
>  )
> 
> +;; Patterns comparing two vectors and conditionally jump.
> +;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
> +;; operation.  To not pay the penalty for inverting == we can map our any
> +;; comparisons to all i.e. any(~x) => all(x).
> +;;
> +;; However unlike the AArch64 version, we can't optimize this further as the
> +;; chain is too long for combine due to these being unspecs so it doesn't fold
> +;; the operation to something simpler.
> +(define_expand "cbranch<mode>4"
> +  [(set (pc) (if_then_else
> +	      (match_operator 0 "expandable_comparison_operator"
> +	       [(match_operand:VDQI 1 "register_operand")
> +	        (match_operand:VDQI 2 "zero_operand")])
> +	      (label_ref (match_operand 3 "" ""))
> +	      (pc)))]
> +  "TARGET_NEON"
> +{
> +  rtx mask = operands[1];
> +
> +  /* For 128-bit vectors we need an additional reductions.  */
> +  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
> +    {
> +      /* Always reduce using a V4SI.  */
> +      mask = gen_reg_rtx (V2SImode);
> +      rtx low = gen_reg_rtx (V2SImode);
> +      rtx high = gen_reg_rtx (V2SImode);
> +      emit_insn (gen_neon_vget_lowv4si (low, operands[1]));
> +      emit_insn (gen_neon_vget_highv4si (high, operands[1]));
> +      emit_insn (gen_neon_vpumaxv2si (mask, low, high));
> +    }
> +
> +  emit_insn (gen_neon_vpumaxv2si (mask, mask, mask));
> +
> +  rtx val = gen_reg_rtx (SImode);
> +  emit_move_insn (val, gen_lowpart (SImode, mask));
> +  emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3]));
> +  DONE;
> +})
> +
>  ;; This pattern is renamed from "vec_extract<mode><V_elem_l>" to
>  ;; "neon_vec_extract<mode><V_elem_l>" and this pattern is called
>  ;; by define_expand in vec-common.md file.
> diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
> b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..2c05aa10d26ed4ac9785672e
> 6e3b4355cef046dc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
> @@ -0,0 +1,136 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-require-effective-target arm32 } */
> +/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#define N 640
> +int a[N] = {0};
> +int b[N] = {0};
> +
> +/* f1:
> +**	...
> +**	vcgt.s32	q[0-9]+, q[0-9]+, #0
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vmov	r[0-9]+, s[0-9]+	@ int
> +**	cmp	r[0-9]+, #0
> +**	bne	\.L[0-9]+
> +**	...
> +*/
> +void f1 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] > 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f2:
> +**	...
> +**	vcge.s32	q[0-9]+, q[0-9]+, #0
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vmov	r[0-9]+, s[0-9]+	@ int
> +**	cmp	r[0-9]+, #0
> +**	bne	\.L[0-9]+
> +**	...
> +*/
> +void f2 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] >= 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f3:
> +**	...
> +**	vceq.i32	q[0-9]+, q[0-9]+, #0
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vmov	r[0-9]+, s[0-9]+	@ int
> +**	cmp	r[0-9]+, #0
> +**	bne	\.L[0-9]+
> +**	...
> +*/
> +void f3 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] == 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f4:
> +**	...
> +**	vceq.i32	q[0-9]+, q[0-9]+, #0
> +**	vmvn	q[0-9]+, q[0-9]+
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vmov	r[0-9]+, s[0-9]+	@ int
> +**	cmp	r[0-9]+, #0
> +**	bne	\.L[0-9]+
> +**	...
> +*/
> +void f4 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] != 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f5:
> +**	...
> +**	vclt.s32	q[0-9]+, q[0-9]+, #0
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vmov	r[0-9]+, s[0-9]+	@ int
> +**	cmp	r[0-9]+, #0
> +**	bne	\.L[0-9]+
> +**	...
> +*/
> +void f5 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] < 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f6:
> +**	...
> +**	vcle.s32	q[0-9]+, q[0-9]+, #0
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vpmax.u32	d[0-9]+, d[0-9]+, d[0-9]+
> +**	vmov	r[0-9]+, s[0-9]+	@ int
> +**	cmp	r[0-9]+, #0
> +**	bne	\.L[0-9]+
> +**	...
> +*/
> +void f6 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] <= 0)
> +	break;
> +    }
> +}
> +
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-
> supports.exp
> index
> 5516188dc0aa86d161d67dea5a7769e3c3d72f85..8f58671e6cfd3546c6a98e4034
> 1fe31c6492594b 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -3784,6 +3784,7 @@ proc check_effective_target_vect_early_break { } {
>      return [check_cached_effective_target_indexed vect_early_break {
>        expr {
>  	[istarget aarch64*-*-*]
> +	|| [check_effective_target_arm_neon_ok]
>  	}}]
>  }
>  # Return 1 if the target supports hardware vectorization of complex additions of
> 
> 
> 
> 
> --

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization
  2023-11-07 13:58         ` Richard Biener
@ 2023-11-27 18:30           ` Richard Sandiford
  2023-11-28  8:11             ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Sandiford @ 2023-11-27 18:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tamar Christina, gcc-patches, nd

Catching up on backlog, so this might already be resolved, but:

Richard Biener <rguenther@suse.de> writes:
> On Tue, 7 Nov 2023, Tamar Christina wrote:
>
>> > -----Original Message-----
>> > From: Richard Biener <rguenther@suse.de>
>> > Sent: Tuesday, November 7, 2023 9:43 AM
>> > To: Tamar Christina <Tamar.Christina@arm.com>
>> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
>> > Subject: RE: [PATCH v6 0/21]middle-end: Support early break/return auto-
>> > vectorization
>> > 
>> > On Mon, 6 Nov 2023, Tamar Christina wrote:
>> > 
>> > > > -----Original Message-----
>> > > > From: Richard Biener <rguenther@suse.de>
>> > > > Sent: Monday, November 6, 2023 2:25 PM
>> > > > To: Tamar Christina <Tamar.Christina@arm.com>
>> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
>> > > > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return
>> > > > auto- vectorization
>> > > >
>> > > > On Mon, 6 Nov 2023, Tamar Christina wrote:
>> > > >
>> > > > > Hi All,
>> > > > >
>> > > > > This patch adds initial support for early break vectorization in GCC.
>> > > > > The support is added for any target that implements a vector
>> > > > > cbranch optab, this includes both fully masked and non-masked targets.
>> > > > >
>> > > > > Depending on the operation, the vectorizer may also require
>> > > > > support for boolean mask reductions using Inclusive OR.  This is
>> > > > > however only checked then the comparison would produce multiple
>> > statements.
>> > > > >
>> > > > > Note: I am currently struggling to get patch 7 correct in all
>> > > > > cases and could
>> > > > use
>> > > > >       some feedback there.
>> > > > >
>> > > > > Concretely the kind of loops supported are of the forms:
>> > > > >
>> > > > >  for (int i = 0; i < N; i++)
>> > > > >  {
>> > > > >    <statements1>
>> > > > >    if (<condition>)
>> > > > >      {
>> > > > >        ...
>> > > > >        <action>;
>> > > > >      }
>> > > > >    <statements2>
>> > > > >  }
>> > > > >
>> > > > > where <action> can be:
>> > > > >  - break
>> > > > >  - return
>> > > > >  - goto
>> > > > >
>> > > > > Any number of statements can be used before the <action> occurs.
>> > > > >
>> > > > > Since this is an initial version for GCC 14 it has the following
>> > > > > limitations and
>> > > > > features:
>> > > > >
>> > > > > - Only fixed sized iterations and buffers are supported.  That is to say any
>> > > > >   vectors loaded or stored must be to statically allocated arrays with
>> > known
>> > > > >   sizes. N must also be known.  This limitation is because our primary
>> > target
>> > > > >   for this optimization is SVE.  For VLA SVE we can't easily do cross page
>> > > > >   iteraion checks. The result is likely to also not be beneficial. For that
>> > > > >   reason we punt support for variable buffers till we have First-Faulting
>> > > > >   support in GCC.
>> > 
>> > Btw, for this I wonder if you thought about marking memory accesses required
>> > for the early break condition as required to be vector-size aligned, thus peeling
>> > or versioning them for alignment?  That should ensure they do not fault.
>> > 
>> > OTOH I somehow remember prologue peeling isn't supported for early break
>> > vectorization?  ..
>> > 
>> > > > > - any stores in <statements1> should not be to the same objects as in
>> > > > >   <condition>.  Loads are fine as long as they don't have the possibility to
>> > > > >   alias.  More concretely, we block RAW dependencies when the
>> > > > > intermediate
>> > > > value
>> > > > >   can't be separated fromt the store, or the store itself can't be moved.
>> > > > > - Prologue peeling, alignment peelinig and loop versioning are supported.
>> > 
>> > .. but here you say it is.  Not sure if peeling for alignment works for VLA vectors
>> > though.  Just to say x86 doesn't support first-faulting loads.
>> 
>> For VLA we support it through masking.  i.e. if you need to peel N iterations, we
>> generate a masked copy of the loop vectorized which masks off the first N bits.
>> 
>> This is not typically needed, but we do support it.  But the problem with this
>> scheme and early break is obviously that the peeled loop needs to be vectorized
>> so you kinda end up with the same issue again.  So Atm it rejects it for VLA.
>
> Hmm, I see.  I thought peeling by masking is an optimization.

Yeah, it's an opt-in optimisation.  No current Arm cores opt in though.

> Anyhow, I think it should still work here - since all accesses are aligned
> and we know that there's at least one original scalar iteration in the
> first masked and the following "unmasked" vector iterations there
> should never be faults for any of the aligned accesses.

Peeling via masking works by using the main loop for the "peeled"
iteration (so it's a bit of a misnomer).  The vector pointers start
out lower than the original scalar pointers, with some leading
inactive elements.

The awkwardness would be in skipping those leading inactive elements
in the epilogue, if an early break occurs in the first vector iteration.
Definitely doable, but I imagine not trivial.

> I think going via alignment is a way easier method to guarantee this
> than handwaving about "declared" arrays and niter.  One can try that
> in addition of course - it's not always possible to align all
> vector loads we are going to speculate (for VLA one could also
> find common runtime (mis-)alignment and restrict the vector length based
> on that, for RISC-V it seems to be efficient, not sure whether altering
> that for SVE is though).

I think both techniques (alignment and reasoning about accessibility)
are useful.  And they each help with different cases.  Like you say,
if there are two vector loads that need to be aligned, we'd need to
version for alignment on fixed-length architectures, with a scalar
fallback when the alignment requirement isn't met.  In contrast,
static reasoning about accessibility allows the vector loop to be
used for all relative misalignments.

So I think the aim should be to support both techniques.  But IMO it's
reasonable to start with either one.  It sounds from Tamar's results
like starting with static reasoning does fire quite often, and it
should have less runtime overhead than the alignment approach.

Plus, when the loop operates on chars, it's hard to predict whether
peeling for alignment pays for itself, or whether the scalar prologue
will end up handling the majority of cases.  If we have the option
of not peeling for alignment, then it's probably worth taking it
for chars.

Capping the VL at runtime is possible on SVE.  It's on the backlog
for handling runtime aliases, where we can vectorise with a lower VF
rather than falling back to scalar code.  But first-faulting loads
are likely to be better than halving or quartering the VL at runtime,
so I don't think capping the VL would be the right SVE technique for
early exits.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-24 13:23               ` Richard Biener
@ 2023-11-27 22:47                 ` Tamar Christina
  2023-11-29 13:28                   ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-27 22:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 26198 bytes --]

 >
> > This is a respun patch with a fix for VLA.
> >
> > This adds support to vectorizable_live_reduction to handle multiple
> > exits by doing a search for which exit the live value should be materialized in.
> >
> > Additionally which value in the index we're after depends on whether
> > the exit it's materialized in is an early exit or whether the loop's
> > main exit is different from the loop's natural one (i.e. the one with
> > the same src block as the latch).
> >
> > In those two cases we want the first rather than the last value as
> > we're going to restart the iteration in the scalar loop.  For VLA this
> > means we need to reverse both the mask and vector since there's only a
> > way to get the last active element and not the first.
> >
> > For inductions and multiple exits:
> >   - we test if the target will support vectorizing the induction
> >   - mark all inductions in the loop as relevant
> >   - for codegen of non-live inductions during codegen
> >   - induction during an early exit gets the first element rather than last.
> >
> > For reductions and multiple exits:
> >   - Reductions for early exits reduces the reduction definition statement
> >     rather than the reduction step.  This allows us to get the value at the
> >     start of the iteration.
> >   - The peeling layout means that we just have to update one block, the
> merge
> >     block.  We expect all the reductions to be the same but we leave it up to
> >     the value numbering to clean up any duplicate code as we iterate over all
> >     edges.
> >
> > These two changes fix the reduction codegen given before which has
> > been added to the testsuite for early vect.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
> > 	(vect_analyze_loop_operations): Check if target supports vectorizing
> IV.
> > 	(vect_transform_loop): Call vectorizable_live_operation for non-live
> > 	inductions or reductions.
> > 	(find_connected_edge, vectorizable_live_operation_1): New.
> > 	(vect_create_epilog_for_reduction): Support reductions in early break.
> > 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> > 	(vect_stmt_relevant_p): Mark all inductions when early break as being
> > 	relevant.
> > 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> > 	(vect_iv_increment_position): New.
> > 	* tree-vect-loop-manip.cc (vect_iv_increment_position): Expose.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> >
> 476be8a0bb6da2d06c4ca7052cb07bacecca60b1..1a4ba349fb6ae39c79401
> aecd4e7
> > eaaaa9e2b8a0 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -453,7 +453,7 @@ vect_adjust_loop_lens_control (tree iv_type,
> gimple_seq *seq,
> >     INSERT_AFTER is set to true if the increment should be inserted after
> >     *BSI.  */
> >
> > -static void
> > +void
> >  vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
> >  			    bool *insert_after)
> >  {
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> >
> 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..b42318b2999e6a27e698
> 33821907
> > 92602cb25af1 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -2163,6 +2163,15 @@ vect_analyze_loop_operations (loop_vec_info
> loop_vinfo)
> >  	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL,
> NULL,
> >  					      -1, false, &cost_vec);
> >
> > +	  /* Check if we can perform the operation for early break if we force
> > +	     the live operation.  */
> > +	  if (ok
> > +	      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > +	      && !STMT_VINFO_LIVE_P (stmt_info)
> > +	      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> > +	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL,
> NULL,
> > +					      -1, false, &cost_vec);
> 
> can you add && !PURE_SLP_STMT?
> 

I've cleaned up the patch a bit more, so these hunks are now all gone.

> > @@ -6132,23 +6147,30 @@ vect_create_epilog_for_reduction
> (loop_vec_info loop_vinfo,
> >           Store them in NEW_PHIS.  */
> >    if (double_reduc)
> >      loop = outer_loop;
> > -  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > +  /* We need to reduce values in all exits.  */  exit_bb =
> > + loop_exit->dest;
> >    exit_gsi = gsi_after_labels (exit_bb);
> >    reduc_inputs.create (slp_node ? vec_num : ncopies);
> > +  vec <gimple *> vec_stmts;
> > +  if (main_exit_p)
> > +    vec_stmts = STMT_VINFO_VEC_STMTS (rdef_info);  else
> > +    vec_stmts = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF
> > + (rdef_info));
> 
> both would be wrong for SLP, also I think you need to look at
> STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))?  For SLP the PHI SLP
> node is reached via slp_node_instance->reduc_phis.
> 
> I think an overall better structure would be to add a
> 
> vect_get_vect_def (stmt_vec_info, slp_tree, unsigned);
> 
> abstracting SLP and non-SLP and doing
> 
>   for (unsigned i = 0; i < vec_num * ncopies; ++i)
>     {
>       def = vect_get_vect_def (stmt_info, slp_node, i); ...
>     }
> 
> and then adjusting stmt_info/slp_node according to main_exit_p?

Done.

> (would be nice to transition stmt_info->vec_stmts to stmt_info->vec_defs)

True. I guess since the plan is to remove non-SLP next year this'll just go away anyway.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop.cc (vectorizable_live_operation,
	vectorizable_live_operation_1): Support early exits.
	(can_vectorize_live_stmts): Call vectorizable_live_operation for non-live
	inductions or reductions.
	(find_connected_edge, vect_get_vect_def): New.
	(vect_create_epilog_for_reduction): Support reductions in early break.
	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
	(vect_stmt_relevant_p): Mark all inductions when early break as being
	live.
	* tree-vectorizer.h (perm_mask_for_reverse): Expose.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index f38cc47551488525b15c2be758cac8291dbefb3a..4e48217a31e59318c2ea8e5ab63b06ba19840cbd 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3346,6 +3346,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	    bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count ();
 	  bb_before_epilog = loop_preheader_edge (epilog)->src;
 	}
+
       /* If loop is peeled for non-zero constant times, now niters refers to
 	 orig_niters - prolog_peeling, it won't overflow even the orig_niters
 	 overflows.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index df5e1d28fac2ce35e71decdec0d8e31fb75557f5..90041d1e138afb08c0116f48f517fe0fcc615557 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5831,6 +5831,34 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
   return new_temp;
 }
 
+/* Retrieves the definining statement to be used for a reduction.
+   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at
+   the reduction definitions.  */
+
+tree
+vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
+		   slp_instance slp_node_instance, bool main_exit_p, unsigned i,
+		   vec <gimple *> &vec_stmts)
+{
+  tree def;
+
+  if (slp_node)
+    {
+      if (!main_exit_p)
+        slp_node = slp_node_instance->reduc_phis;
+      def = vect_get_slp_vect_def (slp_node, i);
+    }
+  else
+    {
+      if (!main_exit_p)
+	reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
+      vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
+      def = gimple_get_lhs (vec_stmts[0]);
+    }
+
+  return def;
+}
+
 /* Function vect_create_epilog_for_reduction
 
    Create code at the loop-epilog to finalize the result of a reduction
@@ -5842,6 +5870,8 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
    SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
    REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
      (counting from 0)
+   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
+     exit this edge is always the main loop exit.
 
    This function:
    1. Completes the reduction def-use cycles.
@@ -5882,7 +5912,8 @@ static void
 vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 				  stmt_vec_info stmt_info,
 				  slp_tree slp_node,
-				  slp_instance slp_node_instance)
+				  slp_instance slp_node_instance,
+				  edge loop_exit)
 {
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
@@ -5891,6 +5922,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
      loop-closed PHI of the inner loop which we remember as
      def for the reduction PHI generation.  */
   bool double_reduc = false;
+  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
   stmt_vec_info rdef_info = stmt_info;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
     {
@@ -6053,7 +6085,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
       /* Create an induction variable.  */
       gimple_stmt_iterator incr_gsi;
       bool insert_after;
-      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
       create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
 		 insert_after, &indx_before_incr, &indx_after_incr);
 
@@ -6132,23 +6164,23 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+  /* We need to reduce values in all exits.  */
+  exit_bb = loop_exit->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
+  vec <gimple *> vec_stmts;
   for (unsigned i = 0; i < vec_num; i++)
     {
       gimple_seq stmts = NULL;
-      if (slp_node)
-	def = vect_get_slp_vect_def (slp_node, i);
-      else
-	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
+      def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
+			       main_exit_p, i, vec_stmts);
       for (j = 0; j < ncopies; j++)
 	{
 	  tree new_def = copy_ssa_name (def);
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
-	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
+	    def = gimple_get_lhs (vec_stmts[j]);
+	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -6882,10 +6914,33 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 	    }
 
           scalar_result = scalar_results[k];
+	  edge merge_e = loop_exit;
+	  if (!main_exit_p)
+	    merge_e = single_succ_edge (loop_exit->dest);
           FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
 	    {
 	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-		SET_USE (use_p, scalar_result);
+		{
+		  if (main_exit_p)
+		    SET_USE (use_p, scalar_result);
+		  else
+		    {
+		      /* When multiple exits the same SSA name can appear in
+			 both the main and the early exits.  The meaning of the
+			 reduction however is not the same.  In the main exit
+			 case the meaning is "get the last value" and in the
+			 early exit case it means "get the first value".  As
+			 such we should only update the value for the exit
+			 attached to loop_exit.  To make this easier we always
+			 call vect_create_epilog_for_reduction on the early
+			 exit main block first.  As such for the main exit we
+			 no longer have to perform the BB check.  */
+		      gphi *stmt = as_a <gphi *> (USE_STMT (use_p));
+		      int idx = phi_arg_index_from_use (use_p);
+		      if (gimple_phi_arg_edge (stmt, idx) == merge_e)
+			SET_USE (use_p, scalar_result);
+		    }
+		}
 	      update_stmt (use_stmt);
 	    }
         }
@@ -10481,15 +10536,17 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return true;
 }
 
-
 /* Function vectorizable_live_operation_1.
+
    helper function for vectorizable_live_operation.  */
+
 tree
 vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
 			       stmt_vec_info stmt_info, edge exit_e,
 			       tree vectype, int ncopies, slp_tree slp_node,
 			       tree bitsize, tree bitstart, tree vec_lhs,
-			       tree lhs_type, gimple_stmt_iterator *exit_gsi)
+			       tree lhs_type, bool restart_loop,
+			       gimple_stmt_iterator *exit_gsi)
 {
   basic_block exit_bb = exit_e->dest;
   gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
@@ -10504,7 +10561,9 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
     {
       /* Emit:
+
 	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
+
 	 where VEC_LHS is the vectorized live-out result and MASK is
 	 the loop mask for the final iteration.  */
       gcc_assert (ncopies == 1 && !slp_node);
@@ -10513,15 +10572,18 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree len = vect_get_loop_len (loop_vinfo, &gsi,
 				    &LOOP_VINFO_LENS (loop_vinfo),
 				    1, vectype, 0, 0);
+
       /* BIAS - 1.  */
       signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
       tree bias_minus_one
 	= int_const_binop (MINUS_EXPR,
 			   build_int_cst (TREE_TYPE (len), biasval),
 			   build_one_cst (TREE_TYPE (len)));
+
       /* LAST_INDEX = LEN + (BIAS - 1).  */
       tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
 				     len, bias_minus_one);
+
       /* This needs to implement extraction of the first index, but not sure
 	 how the LEN stuff works.  At the moment we shouldn't get here since
 	 there's no LEN support for early breaks.  But guard this so there's
@@ -10532,13 +10594,16 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree scalar_res
 	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
 			vec_lhs_phi, last_index);
+
       /* Convert the extracted vector element to the scalar type.  */
       new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
     }
   else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
     {
       /* Emit:
+
 	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
+
 	 where VEC_LHS is the vectorized live-out result and MASK is
 	 the loop mask for the final iteration.  */
       gcc_assert (!slp_node);
@@ -10548,10 +10613,38 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
 				      &LOOP_VINFO_MASKS (loop_vinfo),
 				      1, vectype, 0);
+      tree scalar_res;
+
+      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* First create the permuted mask.  */
+	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	  tree perm_dest = copy_ssa_name (mask);
+	  gimple *perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+				       mask, perm_mask);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  mask = perm_dest;
+
+	  /* Then permute the vector contents.  */
+	  tree perm_elem = perm_mask_for_reverse (vectype);
+	  perm_dest = copy_ssa_name (vec_lhs_phi);
+	  perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+				       vec_lhs_phi, perm_elem);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  vec_lhs_phi = perm_dest;
+	}
 
       gimple_seq_add_seq (&stmts, tem);
-       tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-				       mask, vec_lhs_phi);
+
+      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				 mask, vec_lhs_phi);
+
       /* Convert the extracted vector element to the scalar type.  */
       new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
     }
@@ -10564,12 +10657,26 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
 				       &stmts, true, NULL_TREE);
     }
+
   *exit_gsi = gsi_after_labels (exit_bb);
   if (stmts)
     gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
+
   return new_tree;
 }
 
+/* Find the edge that's the final one in the path from SRC to DEST and
+   return it.  This edge must exist in at most one forwarder edge between.  */
+
+static edge
+find_connected_edge (edge src, basic_block dest)
+{
+   if (src->dest == dest)
+     return src;
+
+  return find_edge (src->dest, dest);
+}
+
 /* Function vectorizable_live_operation.
 
    STMT_INFO computes a value that is used outside the loop.  Check if
@@ -10594,7 +10701,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
   int vec_entry = 0;
   poly_uint64 vec_index = 0;
 
-  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
+  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
+	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   /* If a stmt of a reduction is live, vectorize it via
      vect_create_epilog_for_reduction.  vectorizable_reduction assessed
@@ -10619,8 +10727,25 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
 	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
 	return true;
+
+      /* If early break we only have to materialize the reduction on the merge
+	 block, but we have to find an alternate exit first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      {
+		vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
+						  slp_node, slp_node_instance,
+						  exit);
+		break;
+	      }
+	}
+
       vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
-					slp_node_instance);
+					slp_node_instance,
+					LOOP_VINFO_IV_EXIT (loop_vinfo));
+
       return true;
     }
 
@@ -10772,37 +10897,63 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-      gcc_assert (single_pred_p (exit_bb));
-
-      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
-      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
-
-      gimple_stmt_iterator exit_gsi;
-      tree new_tree
-	= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
-					 LOOP_VINFO_IV_EXIT (loop_vinfo),
-					 vectype, ncopies, slp_node, bitsize,
-					 bitstart, vec_lhs, lhs_type,
-					 &exit_gsi);
-
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
-	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
-	    {
-	      remove_phi_node (&gsi, false);
-	      tree lhs_phi = gimple_phi_result (phi);
-	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
-	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
-	    }
-	  else
-	    gsi_next (&gsi);
-	}
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	if (!is_gimple_debug (use_stmt)
+	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	  {
+	    basic_block use_bb = gimple_bb (use_stmt);
+	    if (!is_a <gphi *> (use_stmt))
+	      continue;
+	    for (auto exit_e : get_loop_exit_edges (loop))
+	      {
+		/* See if this exit leads to the value.  */
+		edge dest_e = find_connected_edge (exit_e, use_bb);
+		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e) != lhs)
+		  continue;
+
+		gimple *tmp_vec_stmt = vec_stmt;
+		tree tmp_vec_lhs = vec_lhs;
+		tree tmp_bitstart = bitstart;
+		/* For early exit where the exit is not in the BB that leads
+		   to the latch then we're restarting the iteration in the
+		   scalar loop.  So get the first live value.  */
+		restart_loop = restart_loop || exit_e != main_e;
+		if (restart_loop)
+		  {
+		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
+		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
+		  }
+
+		gimple_stmt_iterator exit_gsi;
+		tree new_tree
+		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
+						   exit_e, vectype, ncopies,
+						   slp_node, bitsize,
+						   tmp_bitstart, tmp_vec_lhs,
+						   lhs_type, restart_loop,
+						   &exit_gsi);
+
+		/* Use the empty block on the exit to materialize the new stmts
+		   so we can use update the PHI here.  */
+		if (gimple_phi_num_args (use_stmt) == 1)
+		  {
+		    auto gsi = gsi_for_stmt (use_stmt);
+		    remove_phi_node (&gsi, false);
+		    tree lhs_phi = gimple_phi_result (use_stmt);
+		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
+		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+		  }
+		else
+		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
+	      }
+	  }
 
       /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
       FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fe38beb4fa1d9f8593445354f56ba52e10a040cd..27221c6e8e86034050b562ee5c15992827a8d2cb 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
    - it has uses outside the loop.
    - it has vdefs (it alters memory).
    - control stmts in the loop (except for the exit condition).
+   - it is an induction and we have multiple exits.
 
    CHECKME: what other side effects would the vectorizer allow?  */
 
@@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	}
     }
 
+  /* Check if it's an induction and multiple exits.  In this case there will be
+     a usage later on after peeling which is needed for the alternate exit.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "vec_stmt_relevant_p: induction forced for "
+			   "early break.\n");
+      *live_p = true;
+
+    }
+
   if (*live_p && *relevant == vect_unused_in_scope
       && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
     {
@@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
@@ -12720,20 +12734,27 @@ can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info stmt_info,
 			  bool vec_stmt_p,
 			  stmt_vector_for_cost *cost_vec)
 {
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
   if (slp_node)
     {
       stmt_vec_info slp_stmt_info;
       unsigned int i;
       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
 	{
-	  if (STMT_VINFO_LIVE_P (slp_stmt_info)
+	  if ((STMT_VINFO_LIVE_P (slp_stmt_info)
+	       || (loop_vinfo
+		   && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		   && STMT_VINFO_DEF_TYPE (slp_stmt_info)
+			== vect_induction_def))
 	      && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
 					       slp_node_instance, i,
 					       vec_stmt_p, cost_vec))
 	    return false;
 	}
     }
-  else if (STMT_VINFO_LIVE_P (stmt_info)
+  else if ((STMT_VINFO_LIVE_P (stmt_info)
+	    || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		&& STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def))
 	   && !vectorizable_live_operation (vinfo, stmt_info,
 					    slp_node, slp_node_instance, -1,
 					    vec_stmt_p, cost_vec))
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index de60da31e2a3030a7fbc302d3f676af9683fd019..fd4b0a787e6128b43c5ca2b0612f55845e6b3cef 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2248,6 +2248,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

[-- Attachment #2: rb17968 (1).patch --]
[-- Type: application/octet-stream, Size: 19351 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index f38cc47551488525b15c2be758cac8291dbefb3a..4e48217a31e59318c2ea8e5ab63b06ba19840cbd 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3346,6 +3346,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	    bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count ();
 	  bb_before_epilog = loop_preheader_edge (epilog)->src;
 	}
+
       /* If loop is peeled for non-zero constant times, now niters refers to
 	 orig_niters - prolog_peeling, it won't overflow even the orig_niters
 	 overflows.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index df5e1d28fac2ce35e71decdec0d8e31fb75557f5..90041d1e138afb08c0116f48f517fe0fcc615557 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5831,6 +5831,34 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
   return new_temp;
 }
 
+/* Retrieves the definining statement to be used for a reduction.
+   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at
+   the reduction definitions.  */
+
+tree
+vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
+		   slp_instance slp_node_instance, bool main_exit_p, unsigned i,
+		   vec <gimple *> &vec_stmts)
+{
+  tree def;
+
+  if (slp_node)
+    {
+      if (!main_exit_p)
+        slp_node = slp_node_instance->reduc_phis;
+      def = vect_get_slp_vect_def (slp_node, i);
+    }
+  else
+    {
+      if (!main_exit_p)
+	reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
+      vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
+      def = gimple_get_lhs (vec_stmts[0]);
+    }
+
+  return def;
+}
+
 /* Function vect_create_epilog_for_reduction
 
    Create code at the loop-epilog to finalize the result of a reduction
@@ -5842,6 +5870,8 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
    SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
    REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
      (counting from 0)
+   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
+     exit this edge is always the main loop exit.
 
    This function:
    1. Completes the reduction def-use cycles.
@@ -5882,7 +5912,8 @@ static void
 vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 				  stmt_vec_info stmt_info,
 				  slp_tree slp_node,
-				  slp_instance slp_node_instance)
+				  slp_instance slp_node_instance,
+				  edge loop_exit)
 {
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
@@ -5891,6 +5922,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
      loop-closed PHI of the inner loop which we remember as
      def for the reduction PHI generation.  */
   bool double_reduc = false;
+  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
   stmt_vec_info rdef_info = stmt_info;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
     {
@@ -6053,7 +6085,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
       /* Create an induction variable.  */
       gimple_stmt_iterator incr_gsi;
       bool insert_after;
-      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
       create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
 		 insert_after, &indx_before_incr, &indx_after_incr);
 
@@ -6132,23 +6164,23 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+  /* We need to reduce values in all exits.  */
+  exit_bb = loop_exit->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
+  vec <gimple *> vec_stmts;
   for (unsigned i = 0; i < vec_num; i++)
     {
       gimple_seq stmts = NULL;
-      if (slp_node)
-	def = vect_get_slp_vect_def (slp_node, i);
-      else
-	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
+      def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
+			       main_exit_p, i, vec_stmts);
       for (j = 0; j < ncopies; j++)
 	{
 	  tree new_def = copy_ssa_name (def);
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
-	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
+	    def = gimple_get_lhs (vec_stmts[j]);
+	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -6882,10 +6914,33 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 	    }
 
           scalar_result = scalar_results[k];
+	  edge merge_e = loop_exit;
+	  if (!main_exit_p)
+	    merge_e = single_succ_edge (loop_exit->dest);
           FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
 	    {
 	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-		SET_USE (use_p, scalar_result);
+		{
+		  if (main_exit_p)
+		    SET_USE (use_p, scalar_result);
+		  else
+		    {
+		      /* When multiple exits the same SSA name can appear in
+			 both the main and the early exits.  The meaning of the
+			 reduction however is not the same.  In the main exit
+			 case the meaning is "get the last value" and in the
+			 early exit case it means "get the first value".  As
+			 such we should only update the value for the exit
+			 attached to loop_exit.  To make this easier we always
+			 call vect_create_epilog_for_reduction on the early
+			 exit main block first.  As such for the main exit we
+			 no longer have to perform the BB check.  */
+		      gphi *stmt = as_a <gphi *> (USE_STMT (use_p));
+		      int idx = phi_arg_index_from_use (use_p);
+		      if (gimple_phi_arg_edge (stmt, idx) == merge_e)
+			SET_USE (use_p, scalar_result);
+		    }
+		}
 	      update_stmt (use_stmt);
 	    }
         }
@@ -10481,15 +10536,17 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return true;
 }
 
-
 /* Function vectorizable_live_operation_1.
+
    helper function for vectorizable_live_operation.  */
+
 tree
 vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
 			       stmt_vec_info stmt_info, edge exit_e,
 			       tree vectype, int ncopies, slp_tree slp_node,
 			       tree bitsize, tree bitstart, tree vec_lhs,
-			       tree lhs_type, gimple_stmt_iterator *exit_gsi)
+			       tree lhs_type, bool restart_loop,
+			       gimple_stmt_iterator *exit_gsi)
 {
   basic_block exit_bb = exit_e->dest;
   gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
@@ -10504,7 +10561,9 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
     {
       /* Emit:
+
 	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
+
 	 where VEC_LHS is the vectorized live-out result and MASK is
 	 the loop mask for the final iteration.  */
       gcc_assert (ncopies == 1 && !slp_node);
@@ -10513,15 +10572,18 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree len = vect_get_loop_len (loop_vinfo, &gsi,
 				    &LOOP_VINFO_LENS (loop_vinfo),
 				    1, vectype, 0, 0);
+
       /* BIAS - 1.  */
       signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
       tree bias_minus_one
 	= int_const_binop (MINUS_EXPR,
 			   build_int_cst (TREE_TYPE (len), biasval),
 			   build_one_cst (TREE_TYPE (len)));
+
       /* LAST_INDEX = LEN + (BIAS - 1).  */
       tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
 				     len, bias_minus_one);
+
       /* This needs to implement extraction of the first index, but not sure
 	 how the LEN stuff works.  At the moment we shouldn't get here since
 	 there's no LEN support for early breaks.  But guard this so there's
@@ -10532,13 +10594,16 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree scalar_res
 	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
 			vec_lhs_phi, last_index);
+
       /* Convert the extracted vector element to the scalar type.  */
       new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
     }
   else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
     {
       /* Emit:
+
 	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
+
 	 where VEC_LHS is the vectorized live-out result and MASK is
 	 the loop mask for the final iteration.  */
       gcc_assert (!slp_node);
@@ -10548,10 +10613,38 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
 				      &LOOP_VINFO_MASKS (loop_vinfo),
 				      1, vectype, 0);
+      tree scalar_res;
+
+      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* First create the permuted mask.  */
+	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	  tree perm_dest = copy_ssa_name (mask);
+	  gimple *perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+				       mask, perm_mask);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  mask = perm_dest;
+
+	  /* Then permute the vector contents.  */
+	  tree perm_elem = perm_mask_for_reverse (vectype);
+	  perm_dest = copy_ssa_name (vec_lhs_phi);
+	  perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+				       vec_lhs_phi, perm_elem);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  vec_lhs_phi = perm_dest;
+	}
 
       gimple_seq_add_seq (&stmts, tem);
-       tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-				       mask, vec_lhs_phi);
+
+      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				 mask, vec_lhs_phi);
+
       /* Convert the extracted vector element to the scalar type.  */
       new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
     }
@@ -10564,12 +10657,26 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
 				       &stmts, true, NULL_TREE);
     }
+
   *exit_gsi = gsi_after_labels (exit_bb);
   if (stmts)
     gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
+
   return new_tree;
 }
 
+/* Find the edge that's the final one in the path from SRC to DEST and
+   return it.  This edge must exist in at most one forwarder edge between.  */
+
+static edge
+find_connected_edge (edge src, basic_block dest)
+{
+   if (src->dest == dest)
+     return src;
+
+  return find_edge (src->dest, dest);
+}
+
 /* Function vectorizable_live_operation.
 
    STMT_INFO computes a value that is used outside the loop.  Check if
@@ -10594,7 +10701,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
   int vec_entry = 0;
   poly_uint64 vec_index = 0;
 
-  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
+  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
+	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   /* If a stmt of a reduction is live, vectorize it via
      vect_create_epilog_for_reduction.  vectorizable_reduction assessed
@@ -10619,8 +10727,25 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
 	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
 	return true;
+
+      /* If early break we only have to materialize the reduction on the merge
+	 block, but we have to find an alternate exit first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      {
+		vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
+						  slp_node, slp_node_instance,
+						  exit);
+		break;
+	      }
+	}
+
       vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
-					slp_node_instance);
+					slp_node_instance,
+					LOOP_VINFO_IV_EXIT (loop_vinfo));
+
       return true;
     }
 
@@ -10772,37 +10897,63 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-      gcc_assert (single_pred_p (exit_bb));
-
-      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
-      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
-
-      gimple_stmt_iterator exit_gsi;
-      tree new_tree
-	= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
-					 LOOP_VINFO_IV_EXIT (loop_vinfo),
-					 vectype, ncopies, slp_node, bitsize,
-					 bitstart, vec_lhs, lhs_type,
-					 &exit_gsi);
-
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
-	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
-	    {
-	      remove_phi_node (&gsi, false);
-	      tree lhs_phi = gimple_phi_result (phi);
-	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
-	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
-	    }
-	  else
-	    gsi_next (&gsi);
-	}
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	if (!is_gimple_debug (use_stmt)
+	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	  {
+	    basic_block use_bb = gimple_bb (use_stmt);
+	    if (!is_a <gphi *> (use_stmt))
+	      continue;
+	    for (auto exit_e : get_loop_exit_edges (loop))
+	      {
+		/* See if this exit leads to the value.  */
+		edge dest_e = find_connected_edge (exit_e, use_bb);
+		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e) != lhs)
+		  continue;
+
+		gimple *tmp_vec_stmt = vec_stmt;
+		tree tmp_vec_lhs = vec_lhs;
+		tree tmp_bitstart = bitstart;
+		/* For early exit where the exit is not in the BB that leads
+		   to the latch then we're restarting the iteration in the
+		   scalar loop.  So get the first live value.  */
+		restart_loop = restart_loop || exit_e != main_e;
+		if (restart_loop)
+		  {
+		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
+		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
+		  }
+
+		gimple_stmt_iterator exit_gsi;
+		tree new_tree
+		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
+						   exit_e, vectype, ncopies,
+						   slp_node, bitsize,
+						   tmp_bitstart, tmp_vec_lhs,
+						   lhs_type, restart_loop,
+						   &exit_gsi);
+
+		/* Use the empty block on the exit to materialize the new stmts
+		   so we can use update the PHI here.  */
+		if (gimple_phi_num_args (use_stmt) == 1)
+		  {
+		    auto gsi = gsi_for_stmt (use_stmt);
+		    remove_phi_node (&gsi, false);
+		    tree lhs_phi = gimple_phi_result (use_stmt);
+		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
+		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+		  }
+		else
+		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
+	      }
+	  }
 
       /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
       FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fe38beb4fa1d9f8593445354f56ba52e10a040cd..27221c6e8e86034050b562ee5c15992827a8d2cb 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
    - it has uses outside the loop.
    - it has vdefs (it alters memory).
    - control stmts in the loop (except for the exit condition).
+   - it is an induction and we have multiple exits.
 
    CHECKME: what other side effects would the vectorizer allow?  */
 
@@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	}
     }
 
+  /* Check if it's an induction and multiple exits.  In this case there will be
+     a usage later on after peeling which is needed for the alternate exit.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "vec_stmt_relevant_p: induction forced for "
+			   "early break.\n");
+      *live_p = true;
+
+    }
+
   if (*live_p && *relevant == vect_unused_in_scope
       && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
     {
@@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
@@ -12720,20 +12734,27 @@ can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info stmt_info,
 			  bool vec_stmt_p,
 			  stmt_vector_for_cost *cost_vec)
 {
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
   if (slp_node)
     {
       stmt_vec_info slp_stmt_info;
       unsigned int i;
       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
 	{
-	  if (STMT_VINFO_LIVE_P (slp_stmt_info)
+	  if ((STMT_VINFO_LIVE_P (slp_stmt_info)
+	       || (loop_vinfo
+		   && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		   && STMT_VINFO_DEF_TYPE (slp_stmt_info)
+			== vect_induction_def))
 	      && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
 					       slp_node_instance, i,
 					       vec_stmt_p, cost_vec))
 	    return false;
 	}
     }
-  else if (STMT_VINFO_LIVE_P (stmt_info)
+  else if ((STMT_VINFO_LIVE_P (stmt_info)
+	    || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		&& STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def))
 	   && !vectorizable_live_operation (vinfo, stmt_info,
 					    slp_node, slp_node_instance, -1,
 					    vec_stmt_p, cost_vec))
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index de60da31e2a3030a7fbc302d3f676af9683fd019..fd4b0a787e6128b43c5ca2b0612f55845e6b3cef 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2248,6 +2248,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 13/21]middle-end: Update loop form analysis to support early break
  2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
@ 2023-11-27 22:48   ` Tamar Christina
  2023-12-06  4:00     ` Tamar Christina
  2023-12-06  8:18   ` Richard Biener
  1 sibling, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-27 22:48 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: nd, rguenther, jlaw

Ping

> -----Original Message-----
> From: Tamar Christina <tamar.christina@arm.com>
> Sent: Monday, November 6, 2023 7:41 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: [PATCH 13/21]middle-end: Update loop form analysis to support
> early break
> 
> Hi All,
> 
> This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the
> other patches are self contained.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
> 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> 	(vect_transform_loop): Use it.
> 
> --- inline copy of patch --
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb
> 991f07cd6052491d0 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost
> (loop_vec_info loop_vinfo)
>    loop_vinfo->scalar_costs->finish_cost (nullptr);  }
> 
> -
>  /* Function vect_analyze_loop_form.
> 
>     Verify that certain CFG restrictions hold, including:
>     - the loop has a pre-header
> -   - the loop has a single entry and exit
> +   - the loop has a single entry
> +   - nested loops can have only a single exit.
>     - the loop exit condition is simple enough
>     - the number of iterations can be analyzed, i.e, a countable loop.  The
>       niter could be analyzed under some assumptions.  */ @@ -1841,10
> +1841,14 @@ vect_analyze_loop_form (class loop *loop,
> vect_loop_form_info *info)
>  				   "not vectorized: latch block not empty.\n");
> 
>    /* Make sure the exit is not abnormal.  */
> -  if (exit_e->flags & EDGE_ABNORMAL)
> -    return opt_result::failure_at (vect_location,
> -				   "not vectorized:"
> -				   " abnormal loop exit edge.\n");
> +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> +  for (edge e : exits)
> +    {
> +      if (e->flags & EDGE_ABNORMAL)
> +	return opt_result::failure_at (vect_location,
> +				       "not vectorized:"
> +				       " abnormal loop exit edge.\n");
> +    }
> 
>    info->conds
>      = vect_get_loop_niters (loop, exit_e, &info->assumptions, @@ -1920,6
> +1924,10 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared
> *shared,
> 
>    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> 
> +  /* Check to see if we're vectorizing multiple exits.  */
> + LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> +
>    if (info->inner_loop_cond)
>      {
>        stmt_vec_info inner_loop_cond_info @@ -11577,7 +11585,7 @@
> vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>    /* Make sure there exists a single-predecessor exit bb.  Do this before
>       versioning.   */
>    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> -  if (! single_pred_p (e->dest))
> +  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> + (loop_vinfo))
>      {
>        split_loop_exit_edge (e, true);
>        if (dump_enabled_p ())
> 
> 
> 
> 
> --

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks
  2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
@ 2023-11-27 22:48   ` Tamar Christina
  2023-12-06  8:31   ` Richard Biener
  1 sibling, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-11-27 22:48 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: nd, rguenther, jlaw

Ping

> -----Original Message-----
> From: Tamar Christina <tamar.christina@arm.com>
> Sent: Monday, November 6, 2023 7:41 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: [PATCH 12/21]middle-end: Add remaining changes to peeling and
> vectorizer to support early breaks
> 
> Hi All,
> 
> This finishes wiring that didn't fit in any of the other patches.
> Essentially just adding related changes so peeling for early break works.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal,
> 	vect_do_peeling): Support early breaks.
> 	* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p):
> Likewise.
> 	* tree-vectorizer.cc (pass_vectorize::execute): Check all exits.
> 
> --- inline copy of patch --
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index
> eef2bb50c1505f5cf802d5d80300affc2cbe69f6..9c1405d79fd8fe8689007df
> 3b7605b7a3d3ecdd7 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512
> (class loop *loop,
>     loop handles exactly VF scalars per iteration.  */
> 
>  static gcond *
> -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> exit_edge,
> +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> +exit_edge,
>  				class loop *loop, tree niters, tree step,
>  				tree final_iv, bool niters_maybe_zero,
>  				gimple_stmt_iterator loop_cond_gsi) @@ -
> 1296,7 +1296,8 @@ vect_set_loop_condition_normal (loop_vec_info /*
> loop_vinfo */, edge exit_edge,
>    gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
> 
>    /* Record the number of latch iterations.  */
> -  if (limit == niters)
> +  if (limit == niters
> +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      /* Case A: the loop iterates NITERS times.  Subtract one to get the
>         latch count.  */
>      loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters, @@ -
> 3242,6 +3243,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters,
> tree nitersm1,
>      bound_epilog += vf - 1;
>    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>      bound_epilog += 1;
> +
> +  /* For early breaks the scalar loop needs to execute at most VF times
> +     to find the element that caused the break.  */  if
> + (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    {
> +      bound_epilog = vf;
> +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> +      vect_epilogues = false;
> +    }
> +
>    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
>    poly_uint64 bound_scalar = bound_epilog;
> 
> @@ -3376,14 +3387,23 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
>  				  bound_prolog + bound_epilog)
>  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
>  			 || vect_epilogues));
> +
> +  /* We only support early break vectorization on known bounds at this time.
> +     This means that if the vector loop can't be entered then we won't generate
> +     it at all.  So for now force skip_vector off because the additional control
> +     flow messes with the BB exits and we've already analyzed them.  */
> + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> +
>    /* Epilog loop must be executed if the number of iterations for epilog
>       loop is known at compile time, otherwise we need to add a check at
>       the end of vector loop and skip to the end of epilog loop.  */
>    bool skip_epilog = (prolog_peeling < 0
>  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>  		      || !vf.is_constant ());
> -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
> -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> +  /* PEELING_FOR_GAPS and peeling for early breaks are special because
> epilog
> +     loop must be executed.  */
> +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      skip_epilog = false;
> 
>    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo); diff --git
> a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> 55d6aee3d29151e6b528f6fdde15c693e5bdd847..51a054c5b035ac80dfbbf
> 3b5ba2f6da82fda91f6 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1236,6 +1236,14 @@ vect_need_peeling_or_partial_vectors_p
> (loop_vec_info loop_vinfo)
>      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> (LOOP_VINFO_ORIG_LOOP_INFO
>  					  (loop_vinfo));
> 
> +  /* When we have multiple exits and VF is unknown, we must require partial
> +     vectors because the loop bounds is not a minimum but a maximum.  That
> is to
> +     say we cannot unpredicate the main loop unless we peel or use partial
> +     vectors in the epilogue.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> +    return true;
> +
>    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
>      {
> @@ -3149,7 +3157,8 @@ start_over:
> 
>    /* If an epilogue loop is required make sure we can create one.  */
>    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      {
>        if (dump_enabled_p ())
>          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
> diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc index
> d97e2b54c25ac60378935392aa7b73476efed74b..8b495fc7ca137109fc009
> 7c77837233c660e032a 100644
> --- a/gcc/tree-vectorizer.cc
> +++ b/gcc/tree-vectorizer.cc
> @@ -1381,7 +1381,9 @@ pass_vectorize::execute (function *fun)
>  	 predicates that need to be shared for optimal predicate usage.
>  	 However reassoc will re-order them and prevent CSE from working
>  	 as it should.  CSE only the loop body, not the entry.  */
> -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> +      for (edge exit : exits)
> +	bitmap_set_bit (exit_bbs, exit->dest->index);
> 
>        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
>        do_rpo_vn (fun, entry, exit_bbs);
> 
> 
> 
> 
> --

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 10/21]middle-end: implement relevancy analysis support for control flow
  2023-11-06  7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
@ 2023-11-27 22:49   ` Tamar Christina
  2023-11-29 14:47     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-27 22:49 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: nd, rguenther, jlaw

Ping

> -----Original Message-----
> From: Tamar Christina <tamar.christina@arm.com>
> Sent: Monday, November 6, 2023 7:40 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: [PATCH 10/21]middle-end: implement relevancy analysis support for
> control flow
> 
> Hi All,
> 
> This updates relevancy analysis to support marking gcond's belonging to early
> breaks as relevant for vectorization.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-stmts.cc (vect_stmt_relevant_p,
> 	vect_mark_stmts_to_be_vectorized, vect_analyze_stmt,
> vect_is_simple_use,
> 	vect_get_vector_types_for_stmt): Support early breaks.
> 
> --- inline copy of patch --
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> 4809b822632279493a843d402a833c9267bb315e..31474e923cc3feb2604
> ca2882ecfb300cd211679 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -359,9 +359,14 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> loop_vec_info loop_vinfo,
>    *live_p = false;
> 
>    /* cond stmt other than loop exit cond.  */
> -  if (is_ctrl_stmt (stmt_info->stmt)
> -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> -    *relevant = vect_used_in_scope;
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  if (is_ctrl_stmt (stmt) && is_a <gcond *> (stmt))
> +    {
> +      gcond *cond = as_a <gcond *> (stmt);
> +      if (LOOP_VINFO_LOOP_CONDS (loop_vinfo).contains (cond)
> +	  && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != cond)
> +	*relevant = vect_used_in_scope;
> +    }
> 
>    /* changing memory.  */
>    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI) @@ -374,6 +379,11 @@
> vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>  	*relevant = vect_used_in_scope;
>        }
> 
> +  auto_vec<edge> exits = get_loop_exit_edges (loop);  auto_bitmap
> + exit_bbs;  for (edge exit : exits)
> +    bitmap_set_bit (exit_bbs, exit->dest->index);
> +
>    /* uses outside the loop.  */
>    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> SSA_OP_DEF)
>      {
> @@ -392,7 +402,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> loop_vec_info loop_vinfo,
>  	      /* We expect all such uses to be in the loop exit phis
>  		 (because of loop closed form)   */
>  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> -	      gcc_assert (bb == single_exit (loop)->dest);
> 
>                *live_p = true;
>  	    }
> @@ -793,6 +802,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info
> loop_vinfo, bool *fatal)
>  			return res;
>  		    }
>                   }
> +	    }
> +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> +	    {
> +	      enum tree_code rhs_code = gimple_cond_code (cond);
> +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> +	      opt_result res
> +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> +			       loop_vinfo, relevant, &worklist, false);
> +	      if (!res)
> +		return res;
> +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> +				loop_vinfo, relevant, &worklist, false);
> +	      if (!res)
> +		return res;
>              }
>  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
>  	    {
> @@ -13043,11 +13066,15 @@ vect_analyze_stmt (vec_info *vinfo,
>  			     node_instance, cost_vec);
>        if (!res)
>  	return res;
> -   }
> +    }
> +
> +  if (is_ctrl_stmt (stmt_info->stmt))
> +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> 
>    switch (STMT_VINFO_DEF_TYPE (stmt_info))
>      {
>        case vect_internal_def:
> +      case vect_early_exit_def:
>          break;
> 
>        case vect_reduction_def:
> @@ -13080,6 +13107,7 @@ vect_analyze_stmt (vec_info *vinfo,
>      {
>        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
>        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
>  		  || (call && gimple_call_lhs (call) == NULL_TREE));
>        *need_to_vectorize = true;
>      }
> @@ -13835,6 +13863,14 @@ vect_is_simple_use (vec_info *vinfo,
> stmt_vec_info stmt, slp_tree slp_node,
>  	  else
>  	    *op = gimple_op (ass, operand + 1);
>  	}
> +      else if (gcond *cond = dyn_cast <gcond *> (stmt->stmt))
> +	{
> +	  gimple_match_op m_op;
> +	  if (!gimple_extract_op (cond, &m_op))
> +	    return false;
> +	  gcc_assert (m_op.code.is_tree_code ());
> +	  *op = m_op.ops[operand];
> +	}
>        else if (gcall *call = dyn_cast <gcall *> (stmt->stmt))
>  	*op = gimple_call_arg (call, operand);
>        else
> @@ -14445,6 +14481,8 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
>    *nunits_vectype_out = NULL_TREE;
> 
>    if (gimple_get_lhs (stmt) == NULL_TREE
> +      /* Allow vector conditionals through here.  */
> +      && !is_ctrl_stmt (stmt)
>        /* MASK_STORE has no lhs, but is ok.  */
>        && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
>      {
> @@ -14461,7 +14499,7 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
>  	}
> 
>        return opt_result::failure_at (stmt,
> -				     "not vectorized: irregular stmt.%G", stmt);
> +				     "not vectorized: irregular stmt: %G", stmt);
>      }
> 
>    tree vectype;
> @@ -14490,6 +14528,14 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else if (is_ctrl_stmt (stmt))
> +	{
> +	  gcond *cond = dyn_cast <gcond *> (stmt);
> +	  if (!cond)
> +	    return opt_result::failure_at (stmt, "not vectorized: unsupported"
> +					   " control flow statement.\n");
> +	  scalar_type = TREE_TYPE (gimple_cond_rhs (stmt));
> +	}
>        else
>  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> 
> 
> 
> 
> 
> --

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
@ 2023-11-27 22:49   ` Tamar Christina
  2023-11-29 13:50     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-27 22:49 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches; +Cc: nd, rguenther, jlaw

Ping

> -----Original Message-----
> From: Tamar Christina <tamar.christina@arm.com>
> Sent: Monday, November 6, 2023 7:40 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> Hi All,
> 
> This implements vectorable_early_exit which is used as the codegen part of
> vectorizing a gcond.
> 
> For the most part it shares the majority of the code with
> vectorizable_comparison with addition that it needs to be able to reduce
> multiple resulting statements into a single one for use in the gcond, and also
> needs to be able to perform masking on the comparisons.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts
> without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support
> gcond.
> 
> --- inline copy of patch --
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> 36aeca60a22cfaea8d3b43348000d75de1d525c7..4809b822632279493a84
> 3d402a833c9267bb315e 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12475,7 +12475,7 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
> 
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12615,8 +12615,9 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree vectype,
>    /* Transform.  */
> 
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
> 
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12630,7 +12631,10 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
> 
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code, @@ -12709,6
> +12713,196 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
> 
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec) {
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  gimple_match_op op;
> +  if (!gimple_extract_op (stmt_info->stmt, &op))
> +    gcc_unreachable ();
> +  gcc_assert (op.code.is_tree_code ());  auto code = tree_code
> + (op.code);
> +
> +  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);  gcc_assert
> + (vectype_out);
> +
> +  tree var_op = op.ops[0];
> +
> +  /* When vectorizing things like pointer comparisons we will assume that
> +     the VF of both operands are the same. e.g. a pointer must be compared
> +     to a pointer.  We'll leave this up to vectorizable_comparison_1 to
> +     check further.  */
> +  tree vectype_op = vectype_out;
> +  if (SSA_VAR_P (var_op))
> +    {
> +      stmt_vec_info operand0_info
> +	= loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (var_op));
> +      if (!operand0_info)
> +	return false;
> +
> +      /* If we're in a pattern get the type of the original statement.  */
> +      if (STMT_VINFO_IN_PATTERN_P (operand0_info))
> +	operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
> +      vectype_op = STMT_VINFO_VECTYPE (operand0_info);
> +    }
> +
> +  tree truth_type = truth_type_for (vectype_op);  machine_mode mode =
> + TYPE_MODE (truth_type);  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);  bool
> + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector "
> +			       "comparisons for type %T.\n", truth_type);
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", truth_type);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type,
> NULL);
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform
> + early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);  basic_block cond_bb =
> + gimple_bb (stmt);  gimple_stmt_iterator  cond_gsi = gsi_last_bb
> + (cond_bb);
> +
> +  vec<tree> stmts;
> +
> +  if (slp_node)
> +    stmts = SLP_TREE_VEC_DEFS (slp_node);
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.create (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +      workset.splice (stmts);
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (truth_type, NULL,
> "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0,
> arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  if (slp_node)
> +	    slp_node->push_vec_def (new_stmt);
> +	  else
> +	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  if (masked_loop_p)
> +    {
> +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> truth_type, 0);
> +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			       &cond_gsi);
> +    }
> +
> +  /* Now build the new conditional.  Pattern gimple_conds get dropped
> during
> +     codegen so we must replace the original insn.  */  if
> + (is_pattern_stmt_p (stmt_info))
> +    stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info));
> +
> +  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
> +			build_zero_cst (truth_type));
> +  t = canonicalize_cond_expr_cond (t);
> +  gimple_cond_set_condition_from_tree ((gcond*)stmt, t);
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    slp_node->push_vec_def (stmt);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12928,7 +13122,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12951,7 +13147,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
> 
>    if (node)
> @@ -13110,6 +13309,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
> 
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -13511,7 +13716,7 @@ vect_is_simple_use (tree operand, vec_info
> *vinfo, enum vect_def_type *dt,
>  	case vect_first_order_recurrence:
>  	  dump_printf (MSG_NOTE, "first order recurrence\n");
>  	  break;
> -       case vect_early_exit_def:
> +	case vect_early_exit_def:
>  	  dump_printf (MSG_NOTE, "early exit\n");
>  	  break;
>  	case vect_unknown_def_type:
> 
> 
> 
> 
> --

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization
  2023-11-27 18:30           ` Richard Sandiford
@ 2023-11-28  8:11             ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-11-28  8:11 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Tamar Christina, gcc-patches, nd

On Mon, 27 Nov 2023, Richard Sandiford wrote:

> Catching up on backlog, so this might already be resolved, but:
> 
> Richard Biener <rguenther@suse.de> writes:
> > On Tue, 7 Nov 2023, Tamar Christina wrote:
> >
> >> > -----Original Message-----
> >> > From: Richard Biener <rguenther@suse.de>
> >> > Sent: Tuesday, November 7, 2023 9:43 AM
> >> > To: Tamar Christina <Tamar.Christina@arm.com>
> >> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> >> > Subject: RE: [PATCH v6 0/21]middle-end: Support early break/return auto-
> >> > vectorization
> >> > 
> >> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> >> > 
> >> > > > -----Original Message-----
> >> > > > From: Richard Biener <rguenther@suse.de>
> >> > > > Sent: Monday, November 6, 2023 2:25 PM
> >> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> >> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> >> > > > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return
> >> > > > auto- vectorization
> >> > > >
> >> > > > On Mon, 6 Nov 2023, Tamar Christina wrote:
> >> > > >
> >> > > > > Hi All,
> >> > > > >
> >> > > > > This patch adds initial support for early break vectorization in GCC.
> >> > > > > The support is added for any target that implements a vector
> >> > > > > cbranch optab, this includes both fully masked and non-masked targets.
> >> > > > >
> >> > > > > Depending on the operation, the vectorizer may also require
> >> > > > > support for boolean mask reductions using Inclusive OR.  This is
> >> > > > > however only checked then the comparison would produce multiple
> >> > statements.
> >> > > > >
> >> > > > > Note: I am currently struggling to get patch 7 correct in all
> >> > > > > cases and could
> >> > > > use
> >> > > > >       some feedback there.
> >> > > > >
> >> > > > > Concretely the kind of loops supported are of the forms:
> >> > > > >
> >> > > > >  for (int i = 0; i < N; i++)
> >> > > > >  {
> >> > > > >    <statements1>
> >> > > > >    if (<condition>)
> >> > > > >      {
> >> > > > >        ...
> >> > > > >        <action>;
> >> > > > >      }
> >> > > > >    <statements2>
> >> > > > >  }
> >> > > > >
> >> > > > > where <action> can be:
> >> > > > >  - break
> >> > > > >  - return
> >> > > > >  - goto
> >> > > > >
> >> > > > > Any number of statements can be used before the <action> occurs.
> >> > > > >
> >> > > > > Since this is an initial version for GCC 14 it has the following
> >> > > > > limitations and
> >> > > > > features:
> >> > > > >
> >> > > > > - Only fixed sized iterations and buffers are supported.  That is to say any
> >> > > > >   vectors loaded or stored must be to statically allocated arrays with
> >> > known
> >> > > > >   sizes. N must also be known.  This limitation is because our primary
> >> > target
> >> > > > >   for this optimization is SVE.  For VLA SVE we can't easily do cross page
> >> > > > >   iteraion checks. The result is likely to also not be beneficial. For that
> >> > > > >   reason we punt support for variable buffers till we have First-Faulting
> >> > > > >   support in GCC.
> >> > 
> >> > Btw, for this I wonder if you thought about marking memory accesses required
> >> > for the early break condition as required to be vector-size aligned, thus peeling
> >> > or versioning them for alignment?  That should ensure they do not fault.
> >> > 
> >> > OTOH I somehow remember prologue peeling isn't supported for early break
> >> > vectorization?  ..
> >> > 
> >> > > > > - any stores in <statements1> should not be to the same objects as in
> >> > > > >   <condition>.  Loads are fine as long as they don't have the possibility to
> >> > > > >   alias.  More concretely, we block RAW dependencies when the
> >> > > > > intermediate
> >> > > > value
> >> > > > >   can't be separated fromt the store, or the store itself can't be moved.
> >> > > > > - Prologue peeling, alignment peelinig and loop versioning are supported.
> >> > 
> >> > .. but here you say it is.  Not sure if peeling for alignment works for VLA vectors
> >> > though.  Just to say x86 doesn't support first-faulting loads.
> >> 
> >> For VLA we support it through masking.  i.e. if you need to peel N iterations, we
> >> generate a masked copy of the loop vectorized which masks off the first N bits.
> >> 
> >> This is not typically needed, but we do support it.  But the problem with this
> >> scheme and early break is obviously that the peeled loop needs to be vectorized
> >> so you kinda end up with the same issue again.  So Atm it rejects it for VLA.
> >
> > Hmm, I see.  I thought peeling by masking is an optimization.
> 
> Yeah, it's an opt-in optimisation.  No current Arm cores opt in though.
> 
> > Anyhow, I think it should still work here - since all accesses are aligned
> > and we know that there's at least one original scalar iteration in the
> > first masked and the following "unmasked" vector iterations there
> > should never be faults for any of the aligned accesses.
> 
> Peeling via masking works by using the main loop for the "peeled"
> iteration (so it's a bit of a misnomer).  The vector pointers start
> out lower than the original scalar pointers, with some leading
> inactive elements.
> 
> The awkwardness would be in skipping those leading inactive elements
> in the epilogue, if an early break occurs in the first vector iteration.
> Definitely doable, but I imagine not trivial.
> 
> > I think going via alignment is a way easier method to guarantee this
> > than handwaving about "declared" arrays and niter.  One can try that
> > in addition of course - it's not always possible to align all
> > vector loads we are going to speculate (for VLA one could also
> > find common runtime (mis-)alignment and restrict the vector length based
> > on that, for RISC-V it seems to be efficient, not sure whether altering
> > that for SVE is though).
> 
> I think both techniques (alignment and reasoning about accessibility)
> are useful.  And they each help with different cases.  Like you say,
> if there are two vector loads that need to be aligned, we'd need to
> version for alignment on fixed-length architectures, with a scalar
> fallback when the alignment requirement isn't met.  In contrast,
> static reasoning about accessibility allows the vector loop to be
> used for all relative misalignments.
> 
> So I think the aim should be to support both techniques.  But IMO it's
> reasonable to start with either one.  It sounds from Tamar's results
> like starting with static reasoning does fire quite often, and it
> should have less runtime overhead than the alignment approach.

Fair enough, we need to fix the correctness issues then though
(as said, correctness is way easier to assert for alignment).

> Plus, when the loop operates on chars, it's hard to predict whether
> peeling for alignment pays for itself, or whether the scalar prologue
> will end up handling the majority of cases.  If we have the option
> of not peeling for alignment, then it's probably worth taking it
> for chars.

That's true.

> Capping the VL at runtime is possible on SVE.  It's on the backlog
> for handling runtime aliases, where we can vectorise with a lower VF
> rather than falling back to scalar code.  But first-faulting loads
> are likely to be better than halving or quartering the VL at runtime,
> so I don't think capping the VL would be the right SVE technique for
> early exits.

For targets with no first-faulting loads we only have alignment as
additional possibility then.  I can look at this for next stage1.

Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD
  2023-11-06  7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
@ 2023-11-28 16:37   ` Richard Sandiford
  2023-11-28 17:55     ` Richard Sandiford
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Sandiford @ 2023-11-28 16:37 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> This adds an implementation for conditional branch optab for AArch64.
>
> For e.g.
>
> void f1 ()
> {
>   for (int i = 0; i < N; i++)
>     {
>       b[i] += a[i];
>       if (a[i] > 0)
> 	break;
>     }
> }
>
> For 128-bit vectors we generate:
>
>         cmgt    v1.4s, v1.4s, #0
>         umaxp   v1.4s, v1.4s, v1.4s
>         fmov    x3, d1
>         cbnz    x3, .L8
>
> and of 64-bit vector we can omit the compression:
>
>         cmgt    v1.2s, v1.2s, #0
>         fmov    x2, d1
>         cbz     x2, .L13
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-simd.md (cbranch<mode>4): New.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/vect-early-break-cbranch.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
>    DONE;
>  })
>  
> +;; Patterns comparing two vectors and conditionally jump
> +
> +(define_expand "cbranch<mode>4"
> +  [(set (pc)
> +        (if_then_else
> +          (match_operator 0 "aarch64_equality_operator"
> +            [(match_operand:VDQ_I 1 "register_operand")
> +             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
> +          (label_ref (match_operand 3 ""))
> +          (pc)))]
> +  "TARGET_SIMD"
> +{
> +  auto code = GET_CODE (operands[0]);
> +  rtx tmp = operands[1];
> +
> +  /* If comparing against a non-zero vector we have to do a comparison first
> +     so we can have a != 0 comparison with the result.  */
> +  if (operands[2] != CONST0_RTX (<MODE>mode))
> +    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
> +					operands[2]));
> +
> +  /* For 64-bit vectors we need no reductions.  */
> +  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
> +    {
> +      /* Always reduce using a V4SI.  */
> +      rtx reduc = gen_lowpart (V4SImode, tmp);
> +      rtx res = gen_reg_rtx (V4SImode);
> +      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
> +      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
> +    }
> +
> +  rtx val = gen_reg_rtx (DImode);
> +  emit_move_insn (val, gen_lowpart (DImode, tmp));
> +
> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
> +  DONE;

Are you sure this is correct for the operands[2] != const0_rtx case?
It looks like it uses the same comparison code for the vector comparison
and the scalar comparison.

E.g. if the pattern is passed a comparison:

  (eq (reg:V2SI x) (reg:V2SI y))

it looks like we'd generate a CMEQ for the x and y, then branch
when the DImode bitcast of the CMEQ result equals zero.  This means
that we branch when no elements of x and y are equal, rather than
when all elements of x and y are equal.

E.g. for:

   { 1, 2 } == { 1, 2 }

CMEQ will produce { -1, -1 }, the scalar comparison will be -1 == 0,
and the branch won't be taken.

ISTM it would be easier for the operands[2] != const0_rtx case to use
EOR instead of a comparison.  That gives a zero result if the input
vectors are equal and a nonzero result if the input vectors are
different.  We can then branch on the result using CODE and const0_rtx.

(Hope I've got that right.)

Maybe that also removes the need for patch 18.

Thanks,
Richard

> +})
> +
>  ;; Patterns comparing two vectors to produce a mask.
>  
>  (define_expand "vec_cmp<mode><mode>"
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
> @@ -0,0 +1,124 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#pragma GCC target "+nosve"
> +
> +#define N 640
> +int a[N] = {0};
> +int b[N] = {0};
> +
> +
> +/*
> +** f1:
> +**	...
> +**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f1 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] > 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f2:
> +**	...
> +**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f2 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] >= 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f3:
> +**	...
> +**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f3 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] == 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f4:
> +**	...
> +**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f4 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] != 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f5:
> +**	...
> +**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f5 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] < 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f6:
> +**	...
> +**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f6 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] <= 0)
> +	break;
> +    }
> +}

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD
  2023-11-28 16:37   ` Richard Sandiford
@ 2023-11-28 17:55     ` Richard Sandiford
  2023-12-06 16:25       ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Sandiford @ 2023-11-28 17:55 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov

Richard Sandiford <richard.sandiford@arm.com> writes:
> Tamar Christina <tamar.christina@arm.com> writes:
>> Hi All,
>>
>> This adds an implementation for conditional branch optab for AArch64.
>>
>> For e.g.
>>
>> void f1 ()
>> {
>>   for (int i = 0; i < N; i++)
>>     {
>>       b[i] += a[i];
>>       if (a[i] > 0)
>> 	break;
>>     }
>> }
>>
>> For 128-bit vectors we generate:
>>
>>         cmgt    v1.4s, v1.4s, #0
>>         umaxp   v1.4s, v1.4s, v1.4s
>>         fmov    x3, d1
>>         cbnz    x3, .L8
>>
>> and of 64-bit vector we can omit the compression:
>>
>>         cmgt    v1.2s, v1.2s, #0
>>         fmov    x2, d1
>>         cbz     x2, .L13
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>
>> Ok for master?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>> 	* config/aarch64/aarch64-simd.md (cbranch<mode>4): New.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.target/aarch64/vect-early-break-cbranch.c: New test.
>>
>> --- inline copy of patch -- 
>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>> index 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
>>    DONE;
>>  })
>>  
>> +;; Patterns comparing two vectors and conditionally jump
>> +
>> +(define_expand "cbranch<mode>4"
>> +  [(set (pc)
>> +        (if_then_else
>> +          (match_operator 0 "aarch64_equality_operator"
>> +            [(match_operand:VDQ_I 1 "register_operand")
>> +             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
>> +          (label_ref (match_operand 3 ""))
>> +          (pc)))]
>> +  "TARGET_SIMD"
>> +{
>> +  auto code = GET_CODE (operands[0]);
>> +  rtx tmp = operands[1];
>> +
>> +  /* If comparing against a non-zero vector we have to do a comparison first
>> +     so we can have a != 0 comparison with the result.  */
>> +  if (operands[2] != CONST0_RTX (<MODE>mode))
>> +    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
>> +					operands[2]));
>> +
>> +  /* For 64-bit vectors we need no reductions.  */
>> +  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
>> +    {
>> +      /* Always reduce using a V4SI.  */
>> +      rtx reduc = gen_lowpart (V4SImode, tmp);
>> +      rtx res = gen_reg_rtx (V4SImode);
>> +      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
>> +      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
>> +    }
>> +
>> +  rtx val = gen_reg_rtx (DImode);
>> +  emit_move_insn (val, gen_lowpart (DImode, tmp));
>> +
>> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
>> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
>> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
>> +  DONE;
>
> Are you sure this is correct for the operands[2] != const0_rtx case?
> It looks like it uses the same comparison code for the vector comparison
> and the scalar comparison.
>
> E.g. if the pattern is passed a comparison:
>
>   (eq (reg:V2SI x) (reg:V2SI y))
>
> it looks like we'd generate a CMEQ for the x and y, then branch
> when the DImode bitcast of the CMEQ result equals zero.  This means
> that we branch when no elements of x and y are equal, rather than
> when all elements of x and y are equal.
>
> E.g. for:
>
>    { 1, 2 } == { 1, 2 }
>
> CMEQ will produce { -1, -1 }, the scalar comparison will be -1 == 0,
> and the branch won't be taken.
>
> ISTM it would be easier for the operands[2] != const0_rtx case to use
> EOR instead of a comparison.  That gives a zero result if the input
> vectors are equal and a nonzero result if the input vectors are
> different.  We can then branch on the result using CODE and const0_rtx.
>
> (Hope I've got that right.)
>
> Maybe that also removes the need for patch 18.

Sorry, I forgot to say: we can't use operands[1] as a temporary,
since it's only an input to the pattern.  The EOR destination would
need to be a fresh register.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-27 22:47                 ` Tamar Christina
@ 2023-11-29 13:28                   ` Richard Biener
  2023-11-29 21:22                     ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-29 13:28 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 27 Nov 2023, Tamar Christina wrote:

>  >
> > > This is a respun patch with a fix for VLA.
> > >
> > > This adds support to vectorizable_live_reduction to handle multiple
> > > exits by doing a search for which exit the live value should be materialized in.
> > >
> > > Additionally which value in the index we're after depends on whether
> > > the exit it's materialized in is an early exit or whether the loop's
> > > main exit is different from the loop's natural one (i.e. the one with
> > > the same src block as the latch).
> > >
> > > In those two cases we want the first rather than the last value as
> > > we're going to restart the iteration in the scalar loop.  For VLA this
> > > means we need to reverse both the mask and vector since there's only a
> > > way to get the last active element and not the first.
> > >
> > > For inductions and multiple exits:
> > >   - we test if the target will support vectorizing the induction
> > >   - mark all inductions in the loop as relevant
> > >   - for codegen of non-live inductions during codegen
> > >   - induction during an early exit gets the first element rather than last.
> > >
> > > For reductions and multiple exits:
> > >   - Reductions for early exits reduces the reduction definition statement
> > >     rather than the reduction step.  This allows us to get the value at the
> > >     start of the iteration.
> > >   - The peeling layout means that we just have to update one block, the
> > merge
> > >     block.  We expect all the reductions to be the same but we leave it up to
> > >     the value numbering to clean up any duplicate code as we iterate over all
> > >     edges.
> > >
> > > These two changes fix the reduction codegen given before which has
> > > been added to the testsuite for early vect.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
> > > 	(vect_analyze_loop_operations): Check if target supports vectorizing
> > IV.
> > > 	(vect_transform_loop): Call vectorizable_live_operation for non-live
> > > 	inductions or reductions.
> > > 	(find_connected_edge, vectorizable_live_operation_1): New.
> > > 	(vect_create_epilog_for_reduction): Support reductions in early break.
> > > 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> > > 	(vect_stmt_relevant_p): Mark all inductions when early break as being
> > > 	relevant.
> > > 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> > > 	(vect_iv_increment_position): New.
> > > 	* tree-vect-loop-manip.cc (vect_iv_increment_position): Expose.
> > >
> > > --- inline copy of patch ---
> > >
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > >
> > 476be8a0bb6da2d06c4ca7052cb07bacecca60b1..1a4ba349fb6ae39c79401
> > aecd4e7
> > > eaaaa9e2b8a0 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -453,7 +453,7 @@ vect_adjust_loop_lens_control (tree iv_type,
> > gimple_seq *seq,
> > >     INSERT_AFTER is set to true if the increment should be inserted after
> > >     *BSI.  */
> > >
> > > -static void
> > > +void
> > >  vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
> > >  			    bool *insert_after)
> > >  {
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > >
> > 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..b42318b2999e6a27e698
> > 33821907
> > > 92602cb25af1 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -2163,6 +2163,15 @@ vect_analyze_loop_operations (loop_vec_info
> > loop_vinfo)
> > >  	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL,
> > NULL,
> > >  					      -1, false, &cost_vec);
> > >
> > > +	  /* Check if we can perform the operation for early break if we force
> > > +	     the live operation.  */
> > > +	  if (ok
> > > +	      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > +	      && !STMT_VINFO_LIVE_P (stmt_info)
> > > +	      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> > > +	    ok = vectorizable_live_operation (loop_vinfo, stmt_info, NULL,
> > NULL,
> > > +					      -1, false, &cost_vec);
> > 
> > can you add && !PURE_SLP_STMT?
> > 
> 
> I've cleaned up the patch a bit more, so these hunks are now all gone.
> 
> > > @@ -6132,23 +6147,30 @@ vect_create_epilog_for_reduction
> > (loop_vec_info loop_vinfo,
> > >           Store them in NEW_PHIS.  */
> > >    if (double_reduc)
> > >      loop = outer_loop;
> > > -  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > +  /* We need to reduce values in all exits.  */  exit_bb =
> > > + loop_exit->dest;
> > >    exit_gsi = gsi_after_labels (exit_bb);
> > >    reduc_inputs.create (slp_node ? vec_num : ncopies);
> > > +  vec <gimple *> vec_stmts;
> > > +  if (main_exit_p)
> > > +    vec_stmts = STMT_VINFO_VEC_STMTS (rdef_info);  else
> > > +    vec_stmts = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF
> > > + (rdef_info));
> > 
> > both would be wrong for SLP, also I think you need to look at
> > STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))?  For SLP the PHI SLP
> > node is reached via slp_node_instance->reduc_phis.
> > 
> > I think an overall better structure would be to add a
> > 
> > vect_get_vect_def (stmt_vec_info, slp_tree, unsigned);
> > 
> > abstracting SLP and non-SLP and doing
> > 
> >   for (unsigned i = 0; i < vec_num * ncopies; ++i)
> >     {
> >       def = vect_get_vect_def (stmt_info, slp_node, i); ...
> >     }
> > 
> > and then adjusting stmt_info/slp_node according to main_exit_p?
> 
> Done.
> 
> > (would be nice to transition stmt_info->vec_stmts to stmt_info->vec_defs)
> 
> True. I guess since the plan is to remove non-SLP next year this'll just go away anyway.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop.cc (vectorizable_live_operation,
> 	vectorizable_live_operation_1): Support early exits.
> 	(can_vectorize_live_stmts): Call vectorizable_live_operation for non-live
> 	inductions or reductions.
> 	(find_connected_edge, vect_get_vect_def): New.
> 	(vect_create_epilog_for_reduction): Support reductions in early break.
> 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> 	(vect_stmt_relevant_p): Mark all inductions when early break as being
> 	live.
> 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index f38cc47551488525b15c2be758cac8291dbefb3a..4e48217a31e59318c2ea8e5ab63b06ba19840cbd 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3346,6 +3346,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	    bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count ();
>  	  bb_before_epilog = loop_preheader_edge (epilog)->src;
>  	}
> +
>        /* If loop is peeled for non-zero constant times, now niters refers to
>  	 orig_niters - prolog_peeling, it won't overflow even the orig_niters
>  	 overflows.  */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index df5e1d28fac2ce35e71decdec0d8e31fb75557f5..90041d1e138afb08c0116f48f517fe0fcc615557 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5831,6 +5831,34 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
>    return new_temp;
>  }
>  
> +/* Retrieves the definining statement to be used for a reduction.
> +   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at
> +   the reduction definitions.  */
> +
> +tree
> +vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
> +		   slp_instance slp_node_instance, bool main_exit_p, unsigned i,
> +		   vec <gimple *> &vec_stmts)
> +{
> +  tree def;
> +
> +  if (slp_node)
> +    {
> +      if (!main_exit_p)
> +        slp_node = slp_node_instance->reduc_phis;
> +      def = vect_get_slp_vect_def (slp_node, i);
> +    }
> +  else
> +    {
> +      if (!main_exit_p)
> +	reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
> +      vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
> +      def = gimple_get_lhs (vec_stmts[0]);
> +    }
> +
> +  return def;
> +}
> +
>  /* Function vect_create_epilog_for_reduction
>  
>     Create code at the loop-epilog to finalize the result of a reduction
> @@ -5842,6 +5870,8 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
>     SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
>     REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
>       (counting from 0)
> +   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
> +     exit this edge is always the main loop exit.
>  
>     This function:
>     1. Completes the reduction def-use cycles.
> @@ -5882,7 +5912,8 @@ static void
>  vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>  				  stmt_vec_info stmt_info,
>  				  slp_tree slp_node,
> -				  slp_instance slp_node_instance)
> +				  slp_instance slp_node_instance,
> +				  edge loop_exit)
>  {
>    stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
>    gcc_assert (reduc_info->is_reduc_info);
> @@ -5891,6 +5922,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>       loop-closed PHI of the inner loop which we remember as
>       def for the reduction PHI generation.  */
>    bool double_reduc = false;
> +  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
>    stmt_vec_info rdef_info = stmt_info;
>    if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
>      {
> @@ -6053,7 +6085,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>        /* Create an induction variable.  */
>        gimple_stmt_iterator incr_gsi;
>        bool insert_after;
> -      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
>        create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
>  		 insert_after, &indx_before_incr, &indx_after_incr);
>  
> @@ -6132,23 +6164,23 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>           Store them in NEW_PHIS.  */
>    if (double_reduc)
>      loop = outer_loop;
> -  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +  /* We need to reduce values in all exits.  */
> +  exit_bb = loop_exit->dest;
>    exit_gsi = gsi_after_labels (exit_bb);
>    reduc_inputs.create (slp_node ? vec_num : ncopies);
> +  vec <gimple *> vec_stmts;
>    for (unsigned i = 0; i < vec_num; i++)
>      {
>        gimple_seq stmts = NULL;
> -      if (slp_node)
> -	def = vect_get_slp_vect_def (slp_node, i);
> -      else
> -	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
> +      def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
> +			       main_exit_p, i, vec_stmts);
>        for (j = 0; j < ncopies; j++)
>  	{
>  	  tree new_def = copy_ssa_name (def);
>  	  phi = create_phi_node (new_def, exit_bb);
>  	  if (j)
> -	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> -	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
> +	    def = gimple_get_lhs (vec_stmts[j]);
> +	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
>  	  new_def = gimple_convert (&stmts, vectype, new_def);
>  	  reduc_inputs.quick_push (new_def);
>  	}
> @@ -6882,10 +6914,33 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>  	    }
>  
>            scalar_result = scalar_results[k];
> +	  edge merge_e = loop_exit;
> +	  if (!main_exit_p)
> +	    merge_e = single_succ_edge (loop_exit->dest);
>            FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
>  	    {
>  	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> -		SET_USE (use_p, scalar_result);
> +		{
> +		  if (main_exit_p)
> +		    SET_USE (use_p, scalar_result);
> +		  else
> +		    {
> +		      /* When multiple exits the same SSA name can appear in
> +			 both the main and the early exits.  The meaning of the
> +			 reduction however is not the same.  In the main exit
> +			 case the meaning is "get the last value" and in the
> +			 early exit case it means "get the first value".  As
> +			 such we should only update the value for the exit
> +			 attached to loop_exit.  To make this easier we always
> +			 call vect_create_epilog_for_reduction on the early
> +			 exit main block first.  As such for the main exit we
> +			 no longer have to perform the BB check.  */
> +		      gphi *stmt = as_a <gphi *> (USE_STMT (use_p));
> +		      int idx = phi_arg_index_from_use (use_p);
> +		      if (gimple_phi_arg_edge (stmt, idx) == merge_e)
> +			SET_USE (use_p, scalar_result);

Hmm, I guess I still don't understand.  This code tries, in the
reduction epilog

  # scalar_result_1 = PHI <..>
  # vector_result_2 = PHI <..>
  _3 = ... reduce vector_result_2 ...;

replace uses of scalar_result_1 with _3 of which there could be many,
including in debug stmts (there doesn't have to be an epilog loop
after all).

Now, for an early exit we know there _is_ an epilog loop and we
have a merge block merging early exits before merging with the
main exit.  We have forced(?) PHI nodes to merge the individual
early exit reduction results?

Either I can't see how we can end up with multiple uses or I can't
see how the main_exit_p case cannot also stomp on those?

Maybe it's related to the other question whether we are emitting
a reduction epilogue for each of the early exits or just once.

> +		    }
> +		}
>  	      update_stmt (use_stmt);
>  	    }
>          }
> @@ -10481,15 +10536,17 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>    return true;
>  }
>  
> -
>  /* Function vectorizable_live_operation_1.
> +
>     helper function for vectorizable_live_operation.  */
> +
>  tree
>  vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>  			       stmt_vec_info stmt_info, edge exit_e,
>  			       tree vectype, int ncopies, slp_tree slp_node,
>  			       tree bitsize, tree bitstart, tree vec_lhs,
> -			       tree lhs_type, gimple_stmt_iterator *exit_gsi)
> +			       tree lhs_type, bool restart_loop,
> +			       gimple_stmt_iterator *exit_gsi)
>  {
>    basic_block exit_bb = exit_e->dest;
>    gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> @@ -10504,7 +10561,9 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>    if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
>      {
>        /* Emit:
> +
>  	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
> +
>  	 where VEC_LHS is the vectorized live-out result and MASK is
>  	 the loop mask for the final iteration.  */
>        gcc_assert (ncopies == 1 && !slp_node);
> @@ -10513,15 +10572,18 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        tree len = vect_get_loop_len (loop_vinfo, &gsi,
>  				    &LOOP_VINFO_LENS (loop_vinfo),
>  				    1, vectype, 0, 0);
> +
>        /* BIAS - 1.  */
>        signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
>        tree bias_minus_one
>  	= int_const_binop (MINUS_EXPR,
>  			   build_int_cst (TREE_TYPE (len), biasval),
>  			   build_one_cst (TREE_TYPE (len)));
> +
>        /* LAST_INDEX = LEN + (BIAS - 1).  */
>        tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
>  				     len, bias_minus_one);
> +
>        /* This needs to implement extraction of the first index, but not sure
>  	 how the LEN stuff works.  At the moment we shouldn't get here since
>  	 there's no LEN support for early breaks.  But guard this so there's
> @@ -10532,13 +10594,16 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        tree scalar_res
>  	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
>  			vec_lhs_phi, last_index);
> +
>        /* Convert the extracted vector element to the scalar type.  */
>        new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
>      }
>    else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>      {
>        /* Emit:
> +
>  	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
> +
>  	 where VEC_LHS is the vectorized live-out result and MASK is
>  	 the loop mask for the final iteration.  */
>        gcc_assert (!slp_node);
> @@ -10548,10 +10613,38 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
>  				      &LOOP_VINFO_MASKS (loop_vinfo),
>  				      1, vectype, 0);
> +      tree scalar_res;
> +
> +      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
> +	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
> +      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	{
> +	  /* First create the permuted mask.  */
> +	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> +	  tree perm_dest = copy_ssa_name (mask);
> +	  gimple *perm_stmt
> +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> +				       mask, perm_mask);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> +				       &gsi);
> +	  mask = perm_dest;
> +
> +	  /* Then permute the vector contents.  */
> +	  tree perm_elem = perm_mask_for_reverse (vectype);
> +	  perm_dest = copy_ssa_name (vec_lhs_phi);
> +	  perm_stmt
> +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
> +				       vec_lhs_phi, perm_elem);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> +				       &gsi);
> +	  vec_lhs_phi = perm_dest;
> +	}
>  
>        gimple_seq_add_seq (&stmts, tem);
> -       tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> -				       mask, vec_lhs_phi);
> +
> +      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> +				 mask, vec_lhs_phi);
> +
>        /* Convert the extracted vector element to the scalar type.  */
>        new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
>      }
> @@ -10564,12 +10657,26 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
>  				       &stmts, true, NULL_TREE);
>      }
> +
>    *exit_gsi = gsi_after_labels (exit_bb);
>    if (stmts)
>      gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
> +
>    return new_tree;
>  }
>  
> +/* Find the edge that's the final one in the path from SRC to DEST and
> +   return it.  This edge must exist in at most one forwarder edge between.  */
> +
> +static edge
> +find_connected_edge (edge src, basic_block dest)
> +{
> +   if (src->dest == dest)
> +     return src;
> +
> +  return find_edge (src->dest, dest);
> +}
> +
>  /* Function vectorizable_live_operation.
>  
>     STMT_INFO computes a value that is used outside the loop.  Check if
> @@ -10594,7 +10701,8 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>    int vec_entry = 0;
>    poly_uint64 vec_index = 0;
>  
> -  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
> +  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
> +	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
>  
>    /* If a stmt of a reduction is live, vectorize it via
>       vect_create_epilog_for_reduction.  vectorizable_reduction assessed
> @@ -10619,8 +10727,25 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>        if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
>  	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
>  	return true;
> +
> +      /* If early break we only have to materialize the reduction on the merge
> +	 block, but we have to find an alternate exit first.  */
> +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	{
> +	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
> +	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
> +	      {
> +		vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
> +						  slp_node, slp_node_instance,
> +						  exit);
> +		break;
> +	      }
> +	}
> +
>        vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
> -					slp_node_instance);
> +					slp_node_instance,
> +					LOOP_VINFO_IV_EXIT (loop_vinfo));
> +
>        return true;
>      }
>  
> @@ -10772,37 +10897,63 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>  	   lhs' = new_tree;  */
>  
>        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> -      gcc_assert (single_pred_p (exit_bb));
> -
> -      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> -      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> -      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
> -
> -      gimple_stmt_iterator exit_gsi;
> -      tree new_tree
> -	= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> -					 LOOP_VINFO_IV_EXIT (loop_vinfo),
> -					 vectype, ncopies, slp_node, bitsize,
> -					 bitstart, vec_lhs, lhs_type,
> -					 &exit_gsi);
> -
> -      /* Remove existing phis that copy from lhs and create copies
> -	 from new_tree.  */
> -      gimple_stmt_iterator gsi;
> -      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
> -	{
> -	  gimple *phi = gsi_stmt (gsi);
> -	  if ((gimple_phi_arg_def (phi, 0) == lhs))
> -	    {
> -	      remove_phi_node (&gsi, false);
> -	      tree lhs_phi = gimple_phi_result (phi);
> -	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> -	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> -	    }
> -	  else
> -	    gsi_next (&gsi);
> -	}
> +      /* Check if we have a loop where the chosen exit is not the main exit,
> +	 in these cases for an early break we restart the iteration the vector code
> +	 did.  For the live values we want the value at the start of the iteration
> +	 rather than at the end.  */
> +      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
> +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> +	if (!is_gimple_debug (use_stmt)
> +	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> +	  {
> +	    basic_block use_bb = gimple_bb (use_stmt);
> +	    if (!is_a <gphi *> (use_stmt))

should always be a PHI

> +	      continue;
> +	    for (auto exit_e : get_loop_exit_edges (loop))
> +	      {
> +		/* See if this exit leads to the value.  */
> +		edge dest_e = find_connected_edge (exit_e, use_bb);

When is this not exit_e->dest == use_bb?

> +		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e) != lhs)
> +		  continue;

I'd change the above to

       FOR_EACH_IMM_USE_FAST (...)

then

   gimple_phi_arg_edge (USE_STMT (use_p), phi_arg_index_from_use (use_p))

is the exit edge you are looking for without iterating over all loop
exits.

> +		gimple *tmp_vec_stmt = vec_stmt;
> +		tree tmp_vec_lhs = vec_lhs;
> +		tree tmp_bitstart = bitstart;
> +		/* For early exit where the exit is not in the BB that leads
> +		   to the latch then we're restarting the iteration in the
> +		   scalar loop.  So get the first live value.  */
> +		restart_loop = restart_loop || exit_e != main_e;
> +		if (restart_loop)
> +		  {
> +		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> +		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> +		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));

Hmm, that gets you the value after the first iteration, not the one
before which would be the last value of the preceeding vector iteration?
(but we don't keep those, we'd need a PHI)

Why again do we need (non-induction) live values from the vector loop
to the epilogue loop again?

If we are dealing with an induction (or a reduction, you can check
the def type), there should be an associated PHI node to get that
vector.

That said, are you sure there's testsuite coverage for the induction
case?

> +		  }
> +
> +		gimple_stmt_iterator exit_gsi;
> +		tree new_tree
> +		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> +						   exit_e, vectype, ncopies,
> +						   slp_node, bitsize,
> +						   tmp_bitstart, tmp_vec_lhs,
> +						   lhs_type, restart_loop,
> +						   &exit_gsi);
> +
> +		/* Use the empty block on the exit to materialize the new stmts
> +		   so we can use update the PHI here.  */
> +		if (gimple_phi_num_args (use_stmt) == 1)
> +		  {
> +		    auto gsi = gsi_for_stmt (use_stmt);
> +		    remove_phi_node (&gsi, false);
> +		    tree lhs_phi = gimple_phi_result (use_stmt);
> +		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> +		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> +		  }
> +		else
> +		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);

if the else case works, why not use it always?


The rest looks OK.

Richard.

> +	      }
> +	  }
>  
>        /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
>        FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index fe38beb4fa1d9f8593445354f56ba52e10a040cd..27221c6e8e86034050b562ee5c15992827a8d2cb 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
>     - it has uses outside the loop.
>     - it has vdefs (it alters memory).
>     - control stmts in the loop (except for the exit condition).
> +   - it is an induction and we have multiple exits.
>  
>     CHECKME: what other side effects would the vectorizer allow?  */
>  
> @@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>  	}
>      }
>  
> +  /* Check if it's an induction and multiple exits.  In this case there will be
> +     a usage later on after peeling which is needed for the alternate exit.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "vec_stmt_relevant_p: induction forced for "
> +			   "early break.\n");
> +      *live_p = true;
> +
> +    }
> +
>    if (*live_p && *relevant == vect_unused_in_scope
>        && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
>      {
> @@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
>  /* If the target supports a permute mask that reverses the elements in
>     a vector of type VECTYPE, return that mask, otherwise return null.  */
>  
> -static tree
> +tree
>  perm_mask_for_reverse (tree vectype)
>  {
>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> @@ -12720,20 +12734,27 @@ can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info stmt_info,
>  			  bool vec_stmt_p,
>  			  stmt_vector_for_cost *cost_vec)
>  {
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
>    if (slp_node)
>      {
>        stmt_vec_info slp_stmt_info;
>        unsigned int i;
>        FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
>  	{
> -	  if (STMT_VINFO_LIVE_P (slp_stmt_info)
> +	  if ((STMT_VINFO_LIVE_P (slp_stmt_info)
> +	       || (loop_vinfo
> +		   && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +		   && STMT_VINFO_DEF_TYPE (slp_stmt_info)
> +			== vect_induction_def))
>  	      && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
>  					       slp_node_instance, i,
>  					       vec_stmt_p, cost_vec))
>  	    return false;
>  	}
>      }
> -  else if (STMT_VINFO_LIVE_P (stmt_info)
> +  else if ((STMT_VINFO_LIVE_P (stmt_info)
> +	    || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +		&& STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def))
>  	   && !vectorizable_live_operation (vinfo, stmt_info,
>  					    slp_node, slp_node_instance, -1,
>  					    vec_stmt_p, cost_vec))
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index de60da31e2a3030a7fbc302d3f676af9683fd019..fd4b0a787e6128b43c5ca2b0612f55845e6b3cef 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2248,6 +2248,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
>  				enum vect_def_type *,
>  				tree *, stmt_vec_info * = NULL);
>  extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
> +extern tree perm_mask_for_reverse (tree);
>  extern bool supportable_widening_operation (vec_info*, code_helper,
>  					    stmt_vec_info, tree, tree,
>  					    code_helper*, code_helper*,
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-11-27 22:49   ` Tamar Christina
@ 2023-11-29 13:50     ` Richard Biener
  2023-12-06  4:37       ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-29 13:50 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 27 Nov 2023, Tamar Christina wrote:

> Ping
> 
> > -----Original Message-----
> > From: Tamar Christina <tamar.christina@arm.com>
> > Sent: Monday, November 6, 2023 7:40 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> > Subject: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> > codegen of exit code
> > 
> > Hi All,
> > 
> > This implements vectorable_early_exit which is used as the codegen part of
> > vectorizing a gcond.
> > 
> > For the most part it shares the majority of the code with
> > vectorizable_comparison with addition that it needs to be able to reduce
> > multiple resulting statements into a single one for use in the gcond, and also
> > needs to be able to perform masking on the comparisons.
> > 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > 
> > Ok for master?
> > 
> > Thanks,
> > Tamar
> > 
> > gcc/ChangeLog:
> > 
> > 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts
> > without
> > 	lhs.
> > 	(vectorizable_early_exit): New.
> > 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> > 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support
> > gcond.
> > 
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> > 36aeca60a22cfaea8d3b43348000d75de1d525c7..4809b822632279493a84
> > 3d402a833c9267bb315e 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12475,7 +12475,7 @@ vectorizable_comparison_1 (vec_info *vinfo,
> > tree vectype,
> >    vec<tree> vec_oprnds0 = vNULL;
> >    vec<tree> vec_oprnds1 = vNULL;
> >    tree mask_type;
> > -  tree mask;
> > +  tree mask = NULL_TREE;
> > 
> >    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> >      return false;
> > @@ -12615,8 +12615,9 @@ vectorizable_comparison_1 (vec_info *vinfo,
> > tree vectype,
> >    /* Transform.  */
> > 
> >    /* Handle def.  */
> > -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> > -  mask = vect_create_destination_var (lhs, mask_type);
> > +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));  if (lhs)
> > +    mask = vect_create_destination_var (lhs, mask_type);

wrecked line-break / white-space

> > 
> >    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
> >  		     rhs1, &vec_oprnds0, vectype,
> > @@ -12630,7 +12631,10 @@ vectorizable_comparison_1 (vec_info *vinfo,
> > tree vectype,
> >        gimple *new_stmt;
> >        vec_rhs2 = vec_oprnds1[i];
> > 
> > -      new_temp = make_ssa_name (mask);
> > +      if (lhs)
> > +	new_temp = make_ssa_name (mask);
> > +      else
> > +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
> >        if (bitop1 == NOP_EXPR)
> >  	{
> >  	  new_stmt = gimple_build_assign (new_temp, code, @@ -12709,6
> > +12713,196 @@ vectorizable_comparison (vec_info *vinfo,
> >    return true;
> >  }
> > 
> > +/* Check to see if the current early break given in STMT_INFO is valid for
> > +   vectorization.  */
> > +
> > +static bool
> > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec) {

{ goes to the next line

> > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > +  if (!loop_vinfo
> > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > +    return false;
> > +
> > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
> > +    return false;
> > +
> > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > +    return false;
> > +
> > +  gimple_match_op op;
> > +  if (!gimple_extract_op (stmt_info->stmt, &op))
> > +    gcc_unreachable ();
> > +  gcc_assert (op.code.is_tree_code ());  auto code = tree_code
> > + (op.code);

missed line break

> > +
> > +  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);  gcc_assert
> > + (vectype_out);

likewise.

> > +  tree var_op = op.ops[0];
> > +
> > +  /* When vectorizing things like pointer comparisons we will assume that
> > +     the VF of both operands are the same. e.g. a pointer must be compared
> > +     to a pointer.  We'll leave this up to vectorizable_comparison_1 to
> > +     check further.  */
> > +  tree vectype_op = vectype_out;
> > +  if (SSA_VAR_P (var_op))

TREE_CODE (var_op) == SSA_NAME

> > +    {
> > +      stmt_vec_info operand0_info
> > +	= loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (var_op));

lookup_def (var_op)

> > +      if (!operand0_info)
> > +	return false;
> > +
> > +      /* If we're in a pattern get the type of the original statement.  */
> > +      if (STMT_VINFO_IN_PATTERN_P (operand0_info))
> > +	operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
> > +      vectype_op = STMT_VINFO_VECTYPE (operand0_info);
> > +    }

I think you want to use vect_is_simple_use on var_op instead, that's
the canonical way for querying operands.

> > +
> > +  tree truth_type = truth_type_for (vectype_op);  machine_mode mode =
> > + TYPE_MODE (truth_type);  int ncopies;
> > +

more line break issues ... (also below, check yourself)

shouldn't STMT_VINFO_VECTYPE already match truth_type here?  If not
it looks to be set wrongly (or shouldn't be set at all)

> > +  if (slp_node)
> > +    ncopies = 1;
> > +  else
> > +    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
> > +
> > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);  bool
> > + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > +

what about with_len?

> > +  /* Analyze only.  */
> > +  if (!vec_stmt)
> > +    {
> > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target doesn't support flag setting vector "
> > +			       "comparisons.\n");
> > +	  return false;
> > +	}
> > +
> > +      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))

Why NE_EXPR?  This looks wrong.  Or vectype_op is wrong if you're
emitting

 mask = op0 CMP op1;
 if (mask != 0)

I think you need to check for CMP, not NE_EXPR.

> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target does not support boolean vector "
> > +			       "comparisons for type %T.\n", truth_type);
> > +	  return false;
> > +	}
> > +
> > +      if (ncopies > 1
> > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target does not support boolean vector OR for "
> > +			       "type %T.\n", truth_type);
> > +	  return false;
> > +	}
> > +
> > +      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> > +				      vec_stmt, slp_node, cost_vec))
> > +	return false;

I suppose vectorizable_comparison_1 will check this again, so the above
is redundant?

> > +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> > +	vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type,
> > NULL);

LENs missing (or disabling partial vectors).

> > +      return true;
> > +    }
> > +
> > +  /* Tranform.  */
> > +
> > +  tree new_temp = NULL_TREE;
> > +  gimple *new_stmt = NULL;
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "transform
> > + early-exit.\n");
> > +
> > +  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> > +				  vec_stmt, slp_node, cost_vec))
> > +    gcc_unreachable ();
> > +
> > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);  basic_block cond_bb =
> > + gimple_bb (stmt);  gimple_stmt_iterator  cond_gsi = gsi_last_bb
> > + (cond_bb);
> > +
> > +  vec<tree> stmts;
> > +
> > +  if (slp_node)
> > +    stmts = SLP_TREE_VEC_DEFS (slp_node);
> > +  else
> > +    {
> > +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> > +      stmts.create (vec_stmts.length ());
> > +      for (auto stmt : vec_stmts)
> > +	stmts.quick_push (gimple_assign_lhs (stmt));
> > +    }
>
> > +  /* Determine if we need to reduce the final value.  */
> > +  if (stmts.length () > 1)
> > +    {
> > +      /* We build the reductions in a way to maintain as much parallelism as
> > +	 possible.  */
> > +      auto_vec<tree> workset (stmts.length ());
> > +      workset.splice (stmts);
> > +      while (workset.length () > 1)
> > +	{
> > +	  new_temp = make_temp_ssa_name (truth_type, NULL,
> > "vexit_reduc");
> > +	  tree arg0 = workset.pop ();
> > +	  tree arg1 = workset.pop ();
> > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0,
> > arg1);
> > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > +				       &cond_gsi);
> > +	  if (slp_node)
> > +	    slp_node->push_vec_def (new_stmt);
> > +	  else
> > +	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> > +	  workset.quick_insert (0, new_temp);

Reduction epilogue handling has similar code to reduce a set of vectors
to a single one with an operation.  I think we want to share that code.

> > +	}
> > +    }
> > +  else
> > +    new_temp = stmts[0];
> > +
> > +  gcc_assert (new_temp);
> > +
> > +  tree cond = new_temp;
> > +  if (masked_loop_p)
> > +    {
> > +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> > truth_type, 0);
> > +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > +			       &cond_gsi);

I don't think this is correct when 'stmts' had more than one vector?

> > +    }
> > +
> > +  /* Now build the new conditional.  Pattern gimple_conds get dropped
> > during
> > +     codegen so we must replace the original insn.  */  if
> > + (is_pattern_stmt_p (stmt_info))
> > +    stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info));

vect_original_stmt?

> > +
> > +  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
> > +			build_zero_cst (truth_type));
> > +  t = canonicalize_cond_expr_cond (t);
> > +  gimple_cond_set_condition_from_tree ((gcond*)stmt, t);

Please use gimple_cond_set_{lhs,rhs,code} instead of going through
GENERIC.

> > +  update_stmt (stmt);
> > +
> > +  if (slp_node)
> > +    slp_node->push_vec_def (stmt);
> > +   else
> > +    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt);

I don't think we need those, in fact we still have the original defs
from vectorizable_comparison_1 here?  I'd just truncate both vectors.

> > +
> > +
> > +  if (!slp_node)
> > +    *vec_stmt = stmt;

I think you leak 'stmts' for !slp

Otherwise looks good.

Richard.


> > +  return true;
> > +}
> > +
> >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> >     can handle all live statements in the node.  Otherwise return true
> >     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> > @@ -12928,7 +13122,9 @@ vect_analyze_stmt (vec_info *vinfo,
> >  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> >  				  stmt_info, NULL, node)
> >  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > -				   stmt_info, NULL, node, cost_vec));
> > +				   stmt_info, NULL, node, cost_vec)
> > +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +				      cost_vec));
> >    else
> >      {
> >        if (bb_vinfo)
> > @@ -12951,7 +13147,10 @@ vect_analyze_stmt (vec_info *vinfo,
> >  					 NULL, NULL, node, cost_vec)
> >  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> >  					  cost_vec)
> > -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +					  cost_vec));
> > +
> >      }
> > 
> >    if (node)
> > @@ -13110,6 +13309,12 @@ vect_transform_stmt (vec_info *vinfo,
> >        gcc_assert (done);
> >        break;
> > 
> > +    case loop_exit_ctrl_vec_info_type:
> > +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> > +				      slp_node, NULL);
> > +      gcc_assert (done);
> > +      break;
> > +
> >      default:
> >        if (!STMT_VINFO_LIVE_P (stmt_info))
> >  	{
> > @@ -13511,7 +13716,7 @@ vect_is_simple_use (tree operand, vec_info
> > *vinfo, enum vect_def_type *dt,
> >  	case vect_first_order_recurrence:
> >  	  dump_printf (MSG_NOTE, "first order recurrence\n");
> >  	  break;
> > -       case vect_early_exit_def:
> > +	case vect_early_exit_def:
> >  	  dump_printf (MSG_NOTE, "early exit\n");
> >  	  break;
> >  	case vect_unknown_def_type:
> > 
> > 
> > 
> > 
> > --
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 10/21]middle-end: implement relevancy analysis support for control flow
  2023-11-27 22:49   ` Tamar Christina
@ 2023-11-29 14:47     ` Richard Biener
  2023-12-06  4:10       ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-29 14:47 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 27 Nov 2023, Tamar Christina wrote:

> Ping
> 
> > -----Original Message-----
> > From: Tamar Christina <tamar.christina@arm.com>
> > Sent: Monday, November 6, 2023 7:40 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> > Subject: [PATCH 10/21]middle-end: implement relevancy analysis support for
> > control flow
> > 
> > Hi All,
> > 
> > This updates relevancy analysis to support marking gcond's belonging to early
> > breaks as relevant for vectorization.
> > 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > 
> > Ok for master?
> > 
> > Thanks,
> > Tamar
> > 
> > gcc/ChangeLog:
> > 
> > 	* tree-vect-stmts.cc (vect_stmt_relevant_p,
> > 	vect_mark_stmts_to_be_vectorized, vect_analyze_stmt,
> > vect_is_simple_use,
> > 	vect_get_vector_types_for_stmt): Support early breaks.
> > 
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> > 4809b822632279493a843d402a833c9267bb315e..31474e923cc3feb2604
> > ca2882ecfb300cd211679 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -359,9 +359,14 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > loop_vec_info loop_vinfo,
> >    *live_p = false;
> > 
> >    /* cond stmt other than loop exit cond.  */
> > -  if (is_ctrl_stmt (stmt_info->stmt)
> > -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> > -    *relevant = vect_used_in_scope;
> > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > +  if (is_ctrl_stmt (stmt) && is_a <gcond *> (stmt))

is_ctrl_stmt (stmt) is redundant

> > +    {
> > +      gcond *cond = as_a <gcond *> (stmt);

in total better written as

       if (gcond *cond = dyn_cast <gcond *> (stmt))
         {

> > +      if (LOOP_VINFO_LOOP_CONDS (loop_vinfo).contains (cond)

linear search ...

> > +	  && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != cond)
> > +	*relevant = vect_used_in_scope;

but why not simply mark all gconds as vect_used_in_scope?

> > +    }
> > 
> >    /* changing memory.  */
> >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI) @@ -374,6 +379,11 @@
> > vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
> >  	*relevant = vect_used_in_scope;
> >        }
> > 
> > +  auto_vec<edge> exits = get_loop_exit_edges (loop);  auto_bitmap
> > + exit_bbs;  for (edge exit : exits)

is it your mail client messing patches up?  missing line-break
again.

> > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > +

you don't seem to use the bitmap?

> >    /* uses outside the loop.  */
> >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > SSA_OP_DEF)
> >      {
> > @@ -392,7 +402,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > loop_vec_info loop_vinfo,
> >  	      /* We expect all such uses to be in the loop exit phis
> >  		 (because of loop closed form)   */
> >  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > -	      gcc_assert (bb == single_exit (loop)->dest);
> > 
> >                *live_p = true;
> >  	    }
> > @@ -793,6 +802,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info
> > loop_vinfo, bool *fatal)
> >  			return res;
> >  		    }
> >                   }
> > +	    }
> > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> > +	    {
> > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > +	      opt_result res
> > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > +			       loop_vinfo, relevant, &worklist, false);
> > +	      if (!res)
> > +		return res;
> > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > +				loop_vinfo, relevant, &worklist, false);
> > +	      if (!res)
> > +		return res;
> >              }

I guess we're missing an

  else
    gcc_unreachable ();

to catch not handled stmt kinds (do we have gcond patterns yet?)

> >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> >  	    {
> > @@ -13043,11 +13066,15 @@ vect_analyze_stmt (vec_info *vinfo,
> >  			     node_instance, cost_vec);
> >        if (!res)
> >  	return res;
> > -   }
> > +    }
> > +
> > +  if (is_ctrl_stmt (stmt_info->stmt))
> > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;

I think it should rather be vect_condition_def.  It's also not
this functions business to set STMT_VINFO_DEF_TYPE.  If we ever
get to handle not if-converted code (or BB vectorization of that)
then a gcond would define the mask stmts are under.

> >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> >      {
> >        case vect_internal_def:
> > +      case vect_early_exit_def:
> >          break;
> > 
> >        case vect_reduction_def:
> > @@ -13080,6 +13107,7 @@ vect_analyze_stmt (vec_info *vinfo,
> >      {
> >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> >        *need_to_vectorize = true;
> >      }
> > @@ -13835,6 +13863,14 @@ vect_is_simple_use (vec_info *vinfo,
> > stmt_vec_info stmt, slp_tree slp_node,
> >  	  else
> >  	    *op = gimple_op (ass, operand + 1);
> >  	}
> > +      else if (gcond *cond = dyn_cast <gcond *> (stmt->stmt))
> > +	{
> > +	  gimple_match_op m_op;
> > +	  if (!gimple_extract_op (cond, &m_op))
> > +	    return false;
> > +	  gcc_assert (m_op.code.is_tree_code ());
> > +	  *op = m_op.ops[operand];
> > +	}

Please do not use gimple_extract_op, use

  *op = gimple_op (cond, operand);

> >        else if (gcall *call = dyn_cast <gcall *> (stmt->stmt))
> >  	*op = gimple_call_arg (call, operand);
> >        else
> > @@ -14445,6 +14481,8 @@ vect_get_vector_types_for_stmt (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> >    *nunits_vectype_out = NULL_TREE;
> > 
> >    if (gimple_get_lhs (stmt) == NULL_TREE
> > +      /* Allow vector conditionals through here.  */
> > +      && !is_ctrl_stmt (stmt)

!is_a <gcond *> (stmt)

> >        /* MASK_STORE has no lhs, but is ok.  */
> >        && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
> >      {
> > @@ -14461,7 +14499,7 @@ vect_get_vector_types_for_stmt (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> >  	}
> > 
> >        return opt_result::failure_at (stmt,
> > -				     "not vectorized: irregular stmt.%G", stmt);
> > +				     "not vectorized: irregular stmt: %G", stmt);
> >      }
> > 
> >    tree vectype;
> > @@ -14490,6 +14528,14 @@ vect_get_vector_types_for_stmt (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> >  	scalar_type = TREE_TYPE (DR_REF (dr));
> >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > +      else if (is_ctrl_stmt (stmt))

else if (gcond *cond = dyn_cast <...>)

> > +	{
> > +	  gcond *cond = dyn_cast <gcond *> (stmt);
> > +	  if (!cond)
> > +	    return opt_result::failure_at (stmt, "not vectorized: unsupported"
> > +					   " control flow statement.\n");
> > +	  scalar_type = TREE_TYPE (gimple_cond_rhs (stmt));

As said in the other patch STMT_VINFO_VECTYPE of the gcond should
be the _mask_ type the compare produces, not the vector type of
the inputs (the nunits_vectype might be that one though).
You possibly need to adjust vect_get_smallest_scalar_type for this.

Richard.

> > +	}
> >        else
> >  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> > 
> > 
> > 
> > 
> > 
> > --
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-29 13:28                   ` Richard Biener
@ 2023-11-29 21:22                     ` Tamar Christina
  2023-11-30 13:23                       ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-11-29 21:22 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, November 29, 2023 2:29 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction
> with support for multiple exits and different exits
> 
> On Mon, 27 Nov 2023, Tamar Christina wrote:
> 
> >  >
> > > > This is a respun patch with a fix for VLA.
> > > >
> > > > This adds support to vectorizable_live_reduction to handle
> > > > multiple exits by doing a search for which exit the live value should be
> materialized in.
> > > >
> > > > Additionally which value in the index we're after depends on
> > > > whether the exit it's materialized in is an early exit or whether
> > > > the loop's main exit is different from the loop's natural one
> > > > (i.e. the one with the same src block as the latch).
> > > >
> > > > In those two cases we want the first rather than the last value as
> > > > we're going to restart the iteration in the scalar loop.  For VLA
> > > > this means we need to reverse both the mask and vector since
> > > > there's only a way to get the last active element and not the first.
> > > >
> > > > For inductions and multiple exits:
> > > >   - we test if the target will support vectorizing the induction
> > > >   - mark all inductions in the loop as relevant
> > > >   - for codegen of non-live inductions during codegen
> > > >   - induction during an early exit gets the first element rather than last.
> > > >
> > > > For reductions and multiple exits:
> > > >   - Reductions for early exits reduces the reduction definition statement
> > > >     rather than the reduction step.  This allows us to get the value at the
> > > >     start of the iteration.
> > > >   - The peeling layout means that we just have to update one
> > > > block, the
> > > merge
> > > >     block.  We expect all the reductions to be the same but we leave it up to
> > > >     the value numbering to clean up any duplicate code as we iterate over
> all
> > > >     edges.
> > > >
> > > > These two changes fix the reduction codegen given before which has
> > > > been added to the testsuite for early vect.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
> > > > 	(vect_analyze_loop_operations): Check if target supports
> > > > vectorizing
> > > IV.
> > > > 	(vect_transform_loop): Call vectorizable_live_operation for non-live
> > > > 	inductions or reductions.
> > > > 	(find_connected_edge, vectorizable_live_operation_1): New.
> > > > 	(vect_create_epilog_for_reduction): Support reductions in early break.
> > > > 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> > > > 	(vect_stmt_relevant_p): Mark all inductions when early break as being
> > > > 	relevant.
> > > > 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> > > > 	(vect_iv_increment_position): New.
> > > > 	* tree-vect-loop-manip.cc (vect_iv_increment_position): Expose.
> > > >
> > > > --- inline copy of patch ---
> > > >
> > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > b/gcc/tree-vect-loop-manip.cc index
> > > >
> > >
> 476be8a0bb6da2d06c4ca7052cb07bacecca60b1..1a4ba349fb6ae39c79401
> > > aecd4e7
> > > > eaaaa9e2b8a0 100644
> > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > @@ -453,7 +453,7 @@ vect_adjust_loop_lens_control (tree iv_type,
> > > gimple_seq *seq,
> > > >     INSERT_AFTER is set to true if the increment should be inserted after
> > > >     *BSI.  */
> > > >
> > > > -static void
> > > > +void
> > > >  vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
> > > >  			    bool *insert_after)
> > > >  {
> > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > > >
> > >
> 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..b42318b2999e6a27e698
> > > 33821907
> > > > 92602cb25af1 100644
> > > > --- a/gcc/tree-vect-loop.cc
> > > > +++ b/gcc/tree-vect-loop.cc
> > > > @@ -2163,6 +2163,15 @@ vect_analyze_loop_operations
> (loop_vec_info
> > > loop_vinfo)
> > > >  	    ok = vectorizable_live_operation (loop_vinfo, stmt_info,
> > > > NULL,
> > > NULL,
> > > >  					      -1, false, &cost_vec);
> > > >
> > > > +	  /* Check if we can perform the operation for early break if we force
> > > > +	     the live operation.  */
> > > > +	  if (ok
> > > > +	      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > +	      && !STMT_VINFO_LIVE_P (stmt_info)
> > > > +	      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> > > > +	    ok = vectorizable_live_operation (loop_vinfo, stmt_info,
> > > > +NULL,
> > > NULL,
> > > > +					      -1, false, &cost_vec);
> > >
> > > can you add && !PURE_SLP_STMT?
> > >
> >
> > I've cleaned up the patch a bit more, so these hunks are now all gone.
> >
> > > > @@ -6132,23 +6147,30 @@ vect_create_epilog_for_reduction
> > > (loop_vec_info loop_vinfo,
> > > >           Store them in NEW_PHIS.  */
> > > >    if (double_reduc)
> > > >      loop = outer_loop;
> > > > -  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > +  /* We need to reduce values in all exits.  */  exit_bb =
> > > > + loop_exit->dest;
> > > >    exit_gsi = gsi_after_labels (exit_bb);
> > > >    reduc_inputs.create (slp_node ? vec_num : ncopies);
> > > > +  vec <gimple *> vec_stmts;
> > > > +  if (main_exit_p)
> > > > +    vec_stmts = STMT_VINFO_VEC_STMTS (rdef_info);  else
> > > > +    vec_stmts = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF
> > > > + (rdef_info));
> > >
> > > both would be wrong for SLP, also I think you need to look at
> > > STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))?  For SLP the PHI
> > > SLP node is reached via slp_node_instance->reduc_phis.
> > >
> > > I think an overall better structure would be to add a
> > >
> > > vect_get_vect_def (stmt_vec_info, slp_tree, unsigned);
> > >
> > > abstracting SLP and non-SLP and doing
> > >
> > >   for (unsigned i = 0; i < vec_num * ncopies; ++i)
> > >     {
> > >       def = vect_get_vect_def (stmt_info, slp_node, i); ...
> > >     }
> > >
> > > and then adjusting stmt_info/slp_node according to main_exit_p?
> >
> > Done.
> >
> > > (would be nice to transition stmt_info->vec_stmts to
> > > stmt_info->vec_defs)
> >
> > True. I guess since the plan is to remove non-SLP next year this'll just go
> away anyway.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-loop.cc (vectorizable_live_operation,
> > 	vectorizable_live_operation_1): Support early exits.
> > 	(can_vectorize_live_stmts): Call vectorizable_live_operation for non-
> live
> > 	inductions or reductions.
> > 	(find_connected_edge, vect_get_vect_def): New.
> > 	(vect_create_epilog_for_reduction): Support reductions in early break.
> > 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> > 	(vect_stmt_relevant_p): Mark all inductions when early break as being
> > 	live.
> > 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> >
> f38cc47551488525b15c2be758cac8291dbefb3a..4e48217a31e59318c2ea8
> e5ab63b
> > 06ba19840cbd 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -3346,6 +3346,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >  	    bb_before_epilog->count = single_pred_edge (bb_before_epilog)-
> >count ();
> >  	  bb_before_epilog = loop_preheader_edge (epilog)->src;
> >  	}
> > +
> >        /* If loop is peeled for non-zero constant times, now niters refers to
> >  	 orig_niters - prolog_peeling, it won't overflow even the orig_niters
> >  	 overflows.  */
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> >
> df5e1d28fac2ce35e71decdec0d8e31fb75557f5..90041d1e138afb08c0116f
> 48f517
> > fe0fcc615557 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -5831,6 +5831,34 @@ vect_create_partial_epilog (tree vec_def, tree
> vectype, code_helper code,
> >    return new_temp;
> >  }
> >
> > +/* Retrieves the definining statement to be used for a reduction.
> > +   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look
> at
> > +   the reduction definitions.  */
> > +
> > +tree
> > +vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
> > +		   slp_instance slp_node_instance, bool main_exit_p, unsigned
> i,
> > +		   vec <gimple *> &vec_stmts)
> > +{
> > +  tree def;
> > +
> > +  if (slp_node)
> > +    {
> > +      if (!main_exit_p)
> > +        slp_node = slp_node_instance->reduc_phis;
> > +      def = vect_get_slp_vect_def (slp_node, i);
> > +    }
> > +  else
> > +    {
> > +      if (!main_exit_p)
> > +	reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
> > +      vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
> > +      def = gimple_get_lhs (vec_stmts[0]);
> > +    }
> > +
> > +  return def;
> > +}
> > +
> >  /* Function vect_create_epilog_for_reduction
> >
> >     Create code at the loop-epilog to finalize the result of a
> > reduction @@ -5842,6 +5870,8 @@ vect_create_partial_epilog (tree
> vec_def, tree vectype, code_helper code,
> >     SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
> >     REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction
> phi
> >       (counting from 0)
> > +   LOOP_EXIT is the edge to update in the merge block.  In the case of a
> single
> > +     exit this edge is always the main loop exit.
> >
> >     This function:
> >     1. Completes the reduction def-use cycles.
> > @@ -5882,7 +5912,8 @@ static void
> >  vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
> >  				  stmt_vec_info stmt_info,
> >  				  slp_tree slp_node,
> > -				  slp_instance slp_node_instance)
> > +				  slp_instance slp_node_instance,
> > +				  edge loop_exit)
> >  {
> >    stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
> >    gcc_assert (reduc_info->is_reduc_info); @@ -5891,6 +5922,7 @@
> > vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
> >       loop-closed PHI of the inner loop which we remember as
> >       def for the reduction PHI generation.  */
> >    bool double_reduc = false;
> > +  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
> >    stmt_vec_info rdef_info = stmt_info;
> >    if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
> >      {
> > @@ -6053,7 +6085,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
> >        /* Create an induction variable.  */
> >        gimple_stmt_iterator incr_gsi;
> >        bool insert_after;
> > -      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > +      vect_iv_increment_position (loop_exit, &incr_gsi,
> > + &insert_after);
> >        create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
> >  		 insert_after, &indx_before_incr, &indx_after_incr);
> >
> > @@ -6132,23 +6164,23 @@ vect_create_epilog_for_reduction
> (loop_vec_info loop_vinfo,
> >           Store them in NEW_PHIS.  */
> >    if (double_reduc)
> >      loop = outer_loop;
> > -  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > +  /* We need to reduce values in all exits.  */  exit_bb =
> > + loop_exit->dest;
> >    exit_gsi = gsi_after_labels (exit_bb);
> >    reduc_inputs.create (slp_node ? vec_num : ncopies);
> > +  vec <gimple *> vec_stmts;
> >    for (unsigned i = 0; i < vec_num; i++)
> >      {
> >        gimple_seq stmts = NULL;
> > -      if (slp_node)
> > -	def = vect_get_slp_vect_def (slp_node, i);
> > -      else
> > -	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
> > +      def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
> > +			       main_exit_p, i, vec_stmts);
> >        for (j = 0; j < ncopies; j++)
> >  	{
> >  	  tree new_def = copy_ssa_name (def);
> >  	  phi = create_phi_node (new_def, exit_bb);
> >  	  if (j)
> > -	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> > -	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)-
> >dest_idx, def);
> > +	    def = gimple_get_lhs (vec_stmts[j]);
> > +	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
> >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> >  	  reduc_inputs.quick_push (new_def);
> >  	}
> > @@ -6882,10 +6914,33 @@ vect_create_epilog_for_reduction
> (loop_vec_info loop_vinfo,
> >  	    }
> >
> >            scalar_result = scalar_results[k];
> > +	  edge merge_e = loop_exit;
> > +	  if (!main_exit_p)
> > +	    merge_e = single_succ_edge (loop_exit->dest);
> >            FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
> >  	    {
> >  	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> > -		SET_USE (use_p, scalar_result);
> > +		{
> > +		  if (main_exit_p)
> > +		    SET_USE (use_p, scalar_result);
> > +		  else
> > +		    {
> > +		      /* When multiple exits the same SSA name can appear in
> > +			 both the main and the early exits.  The meaning of the
> > +			 reduction however is not the same.  In the main exit
> > +			 case the meaning is "get the last value" and in the
> > +			 early exit case it means "get the first value".  As
> > +			 such we should only update the value for the exit
> > +			 attached to loop_exit.  To make this easier we always
> > +			 call vect_create_epilog_for_reduction on the early
> > +			 exit main block first.  As such for the main exit we
> > +			 no longer have to perform the BB check.  */
> > +		      gphi *stmt = as_a <gphi *> (USE_STMT (use_p));
> > +		      int idx = phi_arg_index_from_use (use_p);
> > +		      if (gimple_phi_arg_edge (stmt, idx) == merge_e)
> > +			SET_USE (use_p, scalar_result);
> 
> Hmm, I guess I still don't understand.  This code tries, in the reduction epilog
> 
>   # scalar_result_1 = PHI <..>
>   # vector_result_2 = PHI <..>
>   _3 = ... reduce vector_result_2 ...;
> 
> replace uses of scalar_result_1 with _3 of which there could be many,
> including in debug stmts (there doesn't have to be an epilog loop after all).
> 
> Now, for an early exit we know there _is_ an epilog loop and we have a merge
> block merging early exits before merging with the main exit.  We have forced(?)
> PHI nodes to merge the individual early exit reduction results?

Sure, but you only always go to it for the early exit.  Not for the main one.
That one is still decided by the exit guard.

> 
> Either I can't see how we can end up with multiple uses or I can't see how the
> main_exit_p case cannot also stomp on those?
> 
> Maybe it's related to the other question whether we are emitting a reduction
> epilogue for each of the early exits or just once.

We aren't. We are only doing so once.  While we loop over the exits to find the
alternative exit. After the first one is found it breaks out.  This is because we have
no easy way to identify the merge block but to iterate.

To explain the above lets look at an example with a reduction (testcase vect-early-break_16.c)

#define N 1024
unsigned vect_a[N];
unsigned vect_b[N];

unsigned test4(unsigned x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i] > x)
     return vect_a[i];
   vect_a[i] = x;
   ret += vect_a[i] + vect_b[i];
 }
 return ret;
}

This will give you this graph after peeling and updating of IVs
https://gist.github.com/Mistuke/c2d632498ceeb10e24a9057bafd87412

This function does not need the epilogue.  So when the guard is added
The condition is always false.  But it hasn't been folded away yet.

Because of this you have the same PHI on both in the edge to the
function exit, and to the merge block, at least until CFG cleanup. 

However thinking about it some more, the possibility is that we remove
The main edge from the merge block,  so if I just handle the main edge
First then the SSA chain can never be broken and the check isn't needed.

Fixed, will be in next update.

> 
> > +		    }
> > +		}
> >  	      update_stmt (use_stmt);
> >  	    }
> >          }
> > @@ -10481,15 +10536,17 @@ vectorizable_induction (loop_vec_info
> loop_vinfo,
> >    return true;
> >  }
> >
> > -
> >  /* Function vectorizable_live_operation_1.
> > +
> >     helper function for vectorizable_live_operation.  */
> > +
> >  tree
> >  vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> >  			       stmt_vec_info stmt_info, edge exit_e,
> >  			       tree vectype, int ncopies, slp_tree slp_node,
> >  			       tree bitsize, tree bitstart, tree vec_lhs,
> > -			       tree lhs_type, gimple_stmt_iterator *exit_gsi)
> > +			       tree lhs_type, bool restart_loop,
> > +			       gimple_stmt_iterator *exit_gsi)
> >  {
> >    basic_block exit_bb = exit_e->dest;
> >    gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS
> > (loop_vinfo)); @@ -10504,7 +10561,9 @@ vectorizable_live_operation_1
> (loop_vec_info loop_vinfo,
> >    if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> >      {
> >        /* Emit:
> > +
> >  	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
> > +
> >  	 where VEC_LHS is the vectorized live-out result and MASK is
> >  	 the loop mask for the final iteration.  */
> >        gcc_assert (ncopies == 1 && !slp_node); @@ -10513,15 +10572,18
> > @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> >        tree len = vect_get_loop_len (loop_vinfo, &gsi,
> >  				    &LOOP_VINFO_LENS (loop_vinfo),
> >  				    1, vectype, 0, 0);
> > +
> >        /* BIAS - 1.  */
> >        signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
> (loop_vinfo);
> >        tree bias_minus_one
> >  	= int_const_binop (MINUS_EXPR,
> >  			   build_int_cst (TREE_TYPE (len), biasval),
> >  			   build_one_cst (TREE_TYPE (len)));
> > +
> >        /* LAST_INDEX = LEN + (BIAS - 1).  */
> >        tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
> >  				     len, bias_minus_one);
> > +
> >        /* This needs to implement extraction of the first index, but not sure
> >  	 how the LEN stuff works.  At the moment we shouldn't get here since
> >  	 there's no LEN support for early breaks.  But guard this so there's
> > @@ -10532,13 +10594,16 @@ vectorizable_live_operation_1
> (loop_vec_info loop_vinfo,
> >        tree scalar_res
> >  	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> >  			vec_lhs_phi, last_index);
> > +
> >        /* Convert the extracted vector element to the scalar type.  */
> >        new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> >      }
> >    else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> >      {
> >        /* Emit:
> > +
> >  	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
> > +
> >  	 where VEC_LHS is the vectorized live-out result and MASK is
> >  	 the loop mask for the final iteration.  */
> >        gcc_assert (!slp_node);
> > @@ -10548,10 +10613,38 @@ vectorizable_live_operation_1
> (loop_vec_info loop_vinfo,
> >        tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
> >  				      &LOOP_VINFO_MASKS (loop_vinfo),
> >  				      1, vectype, 0);
> > +      tree scalar_res;
> > +
> > +      /* For an inverted control flow with early breaks we want
> EXTRACT_FIRST
> > +	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask.
> */
> > +      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +	{
> > +	  /* First create the permuted mask.  */
> > +	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> > +	  tree perm_dest = copy_ssa_name (mask);
> > +	  gimple *perm_stmt
> > +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> > +				       mask, perm_mask);
> > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> > +				       &gsi);
> > +	  mask = perm_dest;
> > +
> > +	  /* Then permute the vector contents.  */
> > +	  tree perm_elem = perm_mask_for_reverse (vectype);
> > +	  perm_dest = copy_ssa_name (vec_lhs_phi);
> > +	  perm_stmt
> > +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> vec_lhs_phi,
> > +				       vec_lhs_phi, perm_elem);
> > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> > +				       &gsi);
> > +	  vec_lhs_phi = perm_dest;
> > +	}
> >
> >        gimple_seq_add_seq (&stmts, tem);
> > -       tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST,
> scalar_type,
> > -				       mask, vec_lhs_phi);
> > +
> > +      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> > +				 mask, vec_lhs_phi);
> > +
> >        /* Convert the extracted vector element to the scalar type.  */
> >        new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> >      }
> > @@ -10564,12 +10657,26 @@ vectorizable_live_operation_1
> (loop_vec_info loop_vinfo,
> >        new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
> >  				       &stmts, true, NULL_TREE);
> >      }
> > +
> >    *exit_gsi = gsi_after_labels (exit_bb);
> >    if (stmts)
> >      gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
> > +
> >    return new_tree;
> >  }
> >
> > +/* Find the edge that's the final one in the path from SRC to DEST and
> > +   return it.  This edge must exist in at most one forwarder edge
> > +between.  */
> > +
> > +static edge
> > +find_connected_edge (edge src, basic_block dest) {
> > +   if (src->dest == dest)
> > +     return src;
> > +
> > +  return find_edge (src->dest, dest); }
> > +
> >  /* Function vectorizable_live_operation.
> >
> >     STMT_INFO computes a value that is used outside the loop.  Check
> > if @@ -10594,7 +10701,8 @@ vectorizable_live_operation (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >    int vec_entry = 0;
> >    poly_uint64 vec_index = 0;
> >
> > -  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
> > +  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
> > +	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> >
> >    /* If a stmt of a reduction is live, vectorize it via
> >       vect_create_epilog_for_reduction.  vectorizable_reduction
> > assessed @@ -10619,8 +10727,25 @@ vectorizable_live_operation
> (vec_info *vinfo, stmt_vec_info stmt_info,
> >        if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
> >  	  || STMT_VINFO_REDUC_TYPE (reduc_info) ==
> EXTRACT_LAST_REDUCTION)
> >  	return true;
> > +
> > +      /* If early break we only have to materialize the reduction on the merge
> > +	 block, but we have to find an alternate exit first.  */
> > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +	{
> > +	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP
> (loop_vinfo)))
> > +	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
> > +	      {
> > +		vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
> > +						  slp_node,
> slp_node_instance,
> > +						  exit);
> > +		break;
> > +	      }
> > +	}
> > +
> >        vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
> > -					slp_node_instance);
> > +					slp_node_instance,
> > +					LOOP_VINFO_IV_EXIT (loop_vinfo));
> > +
> >        return true;
> >      }
> >
> > @@ -10772,37 +10897,63 @@ vectorizable_live_operation (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >  	   lhs' = new_tree;  */
> >
> >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > -      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > -      gcc_assert (single_pred_p (exit_bb));
> > -
> > -      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> > -      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> > -      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx,
> vec_lhs);
> > -
> > -      gimple_stmt_iterator exit_gsi;
> > -      tree new_tree
> > -	= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> > -					 LOOP_VINFO_IV_EXIT (loop_vinfo),
> > -					 vectype, ncopies, slp_node, bitsize,
> > -					 bitstart, vec_lhs, lhs_type,
> > -					 &exit_gsi);
> > -
> > -      /* Remove existing phis that copy from lhs and create copies
> > -	 from new_tree.  */
> > -      gimple_stmt_iterator gsi;
> > -      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
> > -	{
> > -	  gimple *phi = gsi_stmt (gsi);
> > -	  if ((gimple_phi_arg_def (phi, 0) == lhs))
> > -	    {
> > -	      remove_phi_node (&gsi, false);
> > -	      tree lhs_phi = gimple_phi_result (phi);
> > -	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> > -	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> > -	    }
> > -	  else
> > -	    gsi_next (&gsi);
> > -	}
> > +      /* Check if we have a loop where the chosen exit is not the main exit,
> > +	 in these cases for an early break we restart the iteration the vector
> code
> > +	 did.  For the live values we want the value at the start of the iteration
> > +	 rather than at the end.  */
> > +      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > +      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED
> (loop_vinfo);
> > +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> > +	if (!is_gimple_debug (use_stmt)
> > +	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> > +	  {
> > +	    basic_block use_bb = gimple_bb (use_stmt);
> > +	    if (!is_a <gphi *> (use_stmt))
> 
> should always be a PHI
> 
> > +	      continue;
> > +	    for (auto exit_e : get_loop_exit_edges (loop))
> > +	      {
> > +		/* See if this exit leads to the value.  */
> > +		edge dest_e = find_connected_edge (exit_e, use_bb);
> 
> When is this not exit_e->dest == use_bb?

The main exit blocks have an intermediate block in between it and the merge block
that contains only the same PHI nodes as the original exit, and in an order to allow
it to be easily linked to epilogue's main exit.

That block won't contain induction values as they're only needed in the merge block.
With that said.. simplified the code.

> 
> > +		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e)
> != lhs)
> > +		  continue;
> 
> I'd change the above to
> 
>        FOR_EACH_IMM_USE_FAST (...)
> 
> then
> 
>    gimple_phi_arg_edge (USE_STMT (use_p), phi_arg_index_from_use (use_p))
> 
> is the exit edge you are looking for without iterating over all loop exits.
> 
> > +		gimple *tmp_vec_stmt = vec_stmt;
> > +		tree tmp_vec_lhs = vec_lhs;
> > +		tree tmp_bitstart = bitstart;
> > +		/* For early exit where the exit is not in the BB that leads
> > +		   to the latch then we're restarting the iteration in the
> > +		   scalar loop.  So get the first live value.  */
> > +		restart_loop = restart_loop || exit_e != main_e;
> > +		if (restart_loop)
> > +		  {
> > +		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > +		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> > +		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
> 
> Hmm, that gets you the value after the first iteration, not the one before which
> would be the last value of the preceeding vector iteration?
> (but we don't keep those, we'd need a PHI)

I don't fully follow.  The comment on top of this hunk under if (loop_vinfo) states
that lhs should be pointing to a PHI.

When I inspect the statement I see

i_14 = PHI <i_11(6), 0(14)>

so i_14 is the value at the start of the current iteration.  If we're coming from the
header 0, otherwise i_11 which is the value of the previous iteration?

The peeling code explicitly leaves i_14 in the merge block and not i_11 for this exact reason.
So I'm confused, my understanding is that we're already *at* the right PHI.

Is it perhaps that you thought we put i_11 here for the early exits? In which case
Yes I'd agree that that would be wrong, and there we would have had to look at
The defs, but i_11 is the def.

I already kept this in mind and leveraged peeling to make this part easier.
i_11 is used in the main exit and i_14 in the early one.

> 
> Why again do we need (non-induction) live values from the vector loop to the
> epilogue loop again?

They can appear as the result value of the main exit.

e.g. in testcase (vect-early-break_17.c)

#define N 1024
unsigned vect_a[N];
unsigned vect_b[N];

unsigned test4(unsigned x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i] > x)
     return vect_a[i];
   vect_a[i] = x;
   ret = vect_a[i] + vect_b[i];
 }
 return ret;
}

The only situation they can appear in the as an early-break is when
we have a case where main exit != latch connected exit.

However in these cases they are unused, and only there because
normally you would have exited (i.e. there was a return) but the
vector loop needs to start over so we ignore it.

These happen in testcase vect-early-break_74.c and
vect-early-break_78.c

> 
> If we are dealing with an induction (or a reduction, you can check the def
> type), there should be an associated PHI node to get that vector.
> 
> That said, are you sure there's testsuite coverage for the induction case?

Well, we now require it every for IV related variables between the two loops.
So there's not a single testcase that doesn't use it.

> 
> > +		  }
> > +
> > +		gimple_stmt_iterator exit_gsi;
> > +		tree new_tree
> > +		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> > +						   exit_e, vectype, ncopies,
> > +						   slp_node, bitsize,
> > +						   tmp_bitstart, tmp_vec_lhs,
> > +						   lhs_type, restart_loop,
> > +						   &exit_gsi);
> > +
> > +		/* Use the empty block on the exit to materialize the new
> stmts
> > +		   so we can use update the PHI here.  */
> > +		if (gimple_phi_num_args (use_stmt) == 1)
> > +		  {
> > +		    auto gsi = gsi_for_stmt (use_stmt);
> > +		    remove_phi_node (&gsi, false);
> > +		    tree lhs_phi = gimple_phi_result (use_stmt);
> > +		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> > +		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> > +		  }
> > +		else
> > +		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
> 
> if the else case works, why not use it always?

Because it doesn't work for main exit.  The early exit have a intermediate block
that is used to generate the statements on, so for them we are fine updating the
use in place.

The main exits don't. and so the existing trick the vectorizer uses is to materialize
the statements in the same block and then dissolves the phi node.   However you
can't do that for the early exit because the phi node isn't singular.

Now I know what your next question is, well why don't we just use the same method
for both?  When I create an extra intermediate block for this reduction vectorization
for VLA seems to go off the rails.  The code is still correct, just highly inefficient.

That is because without having code to prevent this peeling will create LCSSA PHI blocks
with singular entries.  These should be harmless, but VLA reductions generate some
unpacking and packing to deal with them.  I tried to figure out why but this is a large bit
of code.  So for now I went with the simpler approach of replacing the use only for the
early exit, where we never have the intermediate PHIs.

Thanks,
Tamar

> 
> 
> The rest looks OK.
> 
> Richard.
> 
> > +	      }
> > +	  }
> >
> >        /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
> >        FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs) diff --git
> > a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> >
> fe38beb4fa1d9f8593445354f56ba52e10a040cd..27221c6e8e86034050b56
> 2ee5c15
> > 992827a8d2cb 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info
> stmt_info,
> >     - it has uses outside the loop.
> >     - it has vdefs (it alters memory).
> >     - control stmts in the loop (except for the exit condition).
> > +   - it is an induction and we have multiple exits.
> >
> >     CHECKME: what other side effects would the vectorizer allow?  */
> >
> > @@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> loop_vec_info loop_vinfo,
> >  	}
> >      }
> >
> > +  /* Check if it's an induction and multiple exits.  In this case there will be
> > +     a usage later on after peeling which is needed for the alternate
> > +exit.  */
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > +      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> > +    {
> > +      if (dump_enabled_p ())
> > +	  dump_printf_loc (MSG_NOTE, vect_location,
> > +			   "vec_stmt_relevant_p: induction forced for "
> > +			   "early break.\n");
> > +      *live_p = true;
> > +
> > +    }
> > +
> >    if (*live_p && *relevant == vect_unused_in_scope
> >        && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
> >      {
> > @@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo,
> > stmt_vec_info stmt_info)
> >  /* If the target supports a permute mask that reverses the elements in
> >     a vector of type VECTYPE, return that mask, otherwise return null.
> > */
> >
> > -static tree
> > +tree
> >  perm_mask_for_reverse (tree vectype)
> >  {
> >    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); @@ -12720,20
> > +12734,27 @@ can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info
> stmt_info,
> >  			  bool vec_stmt_p,
> >  			  stmt_vector_for_cost *cost_vec)
> >  {
> > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> >    if (slp_node)
> >      {
> >        stmt_vec_info slp_stmt_info;
> >        unsigned int i;
> >        FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i,
> slp_stmt_info)
> >  	{
> > -	  if (STMT_VINFO_LIVE_P (slp_stmt_info)
> > +	  if ((STMT_VINFO_LIVE_P (slp_stmt_info)
> > +	       || (loop_vinfo
> > +		   && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > +		   && STMT_VINFO_DEF_TYPE (slp_stmt_info)
> > +			== vect_induction_def))
> >  	      && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
> >  					       slp_node_instance, i,
> >  					       vec_stmt_p, cost_vec))
> >  	    return false;
> >  	}
> >      }
> > -  else if (STMT_VINFO_LIVE_P (stmt_info)
> > +  else if ((STMT_VINFO_LIVE_P (stmt_info)
> > +	    || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > +		&& STMT_VINFO_DEF_TYPE (stmt_info) ==
> vect_induction_def))
> >  	   && !vectorizable_live_operation (vinfo, stmt_info,
> >  					    slp_node, slp_node_instance, -1,
> >  					    vec_stmt_p, cost_vec))
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> >
> de60da31e2a3030a7fbc302d3f676af9683fd019..fd4b0a787e6128b43c5ca2
> b0612f
> > 55845e6b3cef 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -2248,6 +2248,7 @@ extern bool vect_is_simple_use (vec_info *,
> stmt_vec_info, slp_tree,
> >  				enum vect_def_type *,
> >  				tree *, stmt_vec_info * = NULL);
> >  extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
> > +extern tree perm_mask_for_reverse (tree);
> >  extern bool supportable_widening_operation (vec_info*, code_helper,
> >  					    stmt_vec_info, tree, tree,
> >  					    code_helper*, code_helper*,
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-29 21:22                     ` Tamar Christina
@ 2023-11-30 13:23                       ` Richard Biener
  2023-12-06  4:21                         ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-11-30 13:23 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 29 Nov 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, November 29, 2023 2:29 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction
> > with support for multiple exits and different exits
> > 
> > On Mon, 27 Nov 2023, Tamar Christina wrote:
> > 
> > >  >
> > > > > This is a respun patch with a fix for VLA.
> > > > >
> > > > > This adds support to vectorizable_live_reduction to handle
> > > > > multiple exits by doing a search for which exit the live value should be
> > materialized in.
> > > > >
> > > > > Additionally which value in the index we're after depends on
> > > > > whether the exit it's materialized in is an early exit or whether
> > > > > the loop's main exit is different from the loop's natural one
> > > > > (i.e. the one with the same src block as the latch).
> > > > >
> > > > > In those two cases we want the first rather than the last value as
> > > > > we're going to restart the iteration in the scalar loop.  For VLA
> > > > > this means we need to reverse both the mask and vector since
> > > > > there's only a way to get the last active element and not the first.
> > > > >
> > > > > For inductions and multiple exits:
> > > > >   - we test if the target will support vectorizing the induction
> > > > >   - mark all inductions in the loop as relevant
> > > > >   - for codegen of non-live inductions during codegen
> > > > >   - induction during an early exit gets the first element rather than last.
> > > > >
> > > > > For reductions and multiple exits:
> > > > >   - Reductions for early exits reduces the reduction definition statement
> > > > >     rather than the reduction step.  This allows us to get the value at the
> > > > >     start of the iteration.
> > > > >   - The peeling layout means that we just have to update one
> > > > > block, the
> > > > merge
> > > > >     block.  We expect all the reductions to be the same but we leave it up to
> > > > >     the value numbering to clean up any duplicate code as we iterate over
> > all
> > > > >     edges.
> > > > >
> > > > > These two changes fix the reduction codegen given before which has
> > > > > been added to the testsuite for early vect.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 	* tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
> > > > > 	(vect_analyze_loop_operations): Check if target supports
> > > > > vectorizing
> > > > IV.
> > > > > 	(vect_transform_loop): Call vectorizable_live_operation for non-live
> > > > > 	inductions or reductions.
> > > > > 	(find_connected_edge, vectorizable_live_operation_1): New.
> > > > > 	(vect_create_epilog_for_reduction): Support reductions in early break.
> > > > > 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> > > > > 	(vect_stmt_relevant_p): Mark all inductions when early break as being
> > > > > 	relevant.
> > > > > 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> > > > > 	(vect_iv_increment_position): New.
> > > > > 	* tree-vect-loop-manip.cc (vect_iv_increment_position): Expose.
> > > > >
> > > > > --- inline copy of patch ---
> > > > >
> > > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > > b/gcc/tree-vect-loop-manip.cc index
> > > > >
> > > >
> > 476be8a0bb6da2d06c4ca7052cb07bacecca60b1..1a4ba349fb6ae39c79401
> > > > aecd4e7
> > > > > eaaaa9e2b8a0 100644
> > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > @@ -453,7 +453,7 @@ vect_adjust_loop_lens_control (tree iv_type,
> > > > gimple_seq *seq,
> > > > >     INSERT_AFTER is set to true if the increment should be inserted after
> > > > >     *BSI.  */
> > > > >
> > > > > -static void
> > > > > +void
> > > > >  vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
> > > > >  			    bool *insert_after)
> > > > >  {
> > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > > > >
> > > >
> > 8a50380de49bc12105be47ea1d8ee3cf1f2bdab4..b42318b2999e6a27e698
> > > > 33821907
> > > > > 92602cb25af1 100644
> > > > > --- a/gcc/tree-vect-loop.cc
> > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > @@ -2163,6 +2163,15 @@ vect_analyze_loop_operations
> > (loop_vec_info
> > > > loop_vinfo)
> > > > >  	    ok = vectorizable_live_operation (loop_vinfo, stmt_info,
> > > > > NULL,
> > > > NULL,
> > > > >  					      -1, false, &cost_vec);
> > > > >
> > > > > +	  /* Check if we can perform the operation for early break if we force
> > > > > +	     the live operation.  */
> > > > > +	  if (ok
> > > > > +	      && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > > +	      && !STMT_VINFO_LIVE_P (stmt_info)
> > > > > +	      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> > > > > +	    ok = vectorizable_live_operation (loop_vinfo, stmt_info,
> > > > > +NULL,
> > > > NULL,
> > > > > +					      -1, false, &cost_vec);
> > > >
> > > > can you add && !PURE_SLP_STMT?
> > > >
> > >
> > > I've cleaned up the patch a bit more, so these hunks are now all gone.
> > >
> > > > > @@ -6132,23 +6147,30 @@ vect_create_epilog_for_reduction
> > > > (loop_vec_info loop_vinfo,
> > > > >           Store them in NEW_PHIS.  */
> > > > >    if (double_reduc)
> > > > >      loop = outer_loop;
> > > > > -  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > > > +  /* We need to reduce values in all exits.  */  exit_bb =
> > > > > + loop_exit->dest;
> > > > >    exit_gsi = gsi_after_labels (exit_bb);
> > > > >    reduc_inputs.create (slp_node ? vec_num : ncopies);
> > > > > +  vec <gimple *> vec_stmts;
> > > > > +  if (main_exit_p)
> > > > > +    vec_stmts = STMT_VINFO_VEC_STMTS (rdef_info);  else
> > > > > +    vec_stmts = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF
> > > > > + (rdef_info));
> > > >
> > > > both would be wrong for SLP, also I think you need to look at
> > > > STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))?  For SLP the PHI
> > > > SLP node is reached via slp_node_instance->reduc_phis.
> > > >
> > > > I think an overall better structure would be to add a
> > > >
> > > > vect_get_vect_def (stmt_vec_info, slp_tree, unsigned);
> > > >
> > > > abstracting SLP and non-SLP and doing
> > > >
> > > >   for (unsigned i = 0; i < vec_num * ncopies; ++i)
> > > >     {
> > > >       def = vect_get_vect_def (stmt_info, slp_node, i); ...
> > > >     }
> > > >
> > > > and then adjusting stmt_info/slp_node according to main_exit_p?
> > >
> > > Done.
> > >
> > > > (would be nice to transition stmt_info->vec_stmts to
> > > > stmt_info->vec_defs)
> > >
> > > True. I guess since the plan is to remove non-SLP next year this'll just go
> > away anyway.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-loop.cc (vectorizable_live_operation,
> > > 	vectorizable_live_operation_1): Support early exits.
> > > 	(can_vectorize_live_stmts): Call vectorizable_live_operation for non-
> > live
> > > 	inductions or reductions.
> > > 	(find_connected_edge, vect_get_vect_def): New.
> > > 	(vect_create_epilog_for_reduction): Support reductions in early break.
> > > 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> > > 	(vect_stmt_relevant_p): Mark all inductions when early break as being
> > > 	live.
> > > 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> > >
> > > --- inline copy of patch ---
> > >
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > >
> > f38cc47551488525b15c2be758cac8291dbefb3a..4e48217a31e59318c2ea8
> > e5ab63b
> > > 06ba19840cbd 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -3346,6 +3346,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >  	    bb_before_epilog->count = single_pred_edge (bb_before_epilog)-
> > >count ();
> > >  	  bb_before_epilog = loop_preheader_edge (epilog)->src;
> > >  	}
> > > +
> > >        /* If loop is peeled for non-zero constant times, now niters refers to
> > >  	 orig_niters - prolog_peeling, it won't overflow even the orig_niters
> > >  	 overflows.  */
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > >
> > df5e1d28fac2ce35e71decdec0d8e31fb75557f5..90041d1e138afb08c0116f
> > 48f517
> > > fe0fcc615557 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -5831,6 +5831,34 @@ vect_create_partial_epilog (tree vec_def, tree
> > vectype, code_helper code,
> > >    return new_temp;
> > >  }
> > >
> > > +/* Retrieves the definining statement to be used for a reduction.
> > > +   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look
> > at
> > > +   the reduction definitions.  */
> > > +
> > > +tree
> > > +vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
> > > +		   slp_instance slp_node_instance, bool main_exit_p, unsigned
> > i,
> > > +		   vec <gimple *> &vec_stmts)
> > > +{
> > > +  tree def;
> > > +
> > > +  if (slp_node)
> > > +    {
> > > +      if (!main_exit_p)
> > > +        slp_node = slp_node_instance->reduc_phis;
> > > +      def = vect_get_slp_vect_def (slp_node, i);
> > > +    }
> > > +  else
> > > +    {
> > > +      if (!main_exit_p)
> > > +	reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
> > > +      vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
> > > +      def = gimple_get_lhs (vec_stmts[0]);
> > > +    }
> > > +
> > > +  return def;
> > > +}
> > > +
> > >  /* Function vect_create_epilog_for_reduction
> > >
> > >     Create code at the loop-epilog to finalize the result of a
> > > reduction @@ -5842,6 +5870,8 @@ vect_create_partial_epilog (tree
> > vec_def, tree vectype, code_helper code,
> > >     SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
> > >     REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction
> > phi
> > >       (counting from 0)
> > > +   LOOP_EXIT is the edge to update in the merge block.  In the case of a
> > single
> > > +     exit this edge is always the main loop exit.
> > >
> > >     This function:
> > >     1. Completes the reduction def-use cycles.
> > > @@ -5882,7 +5912,8 @@ static void
> > >  vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
> > >  				  stmt_vec_info stmt_info,
> > >  				  slp_tree slp_node,
> > > -				  slp_instance slp_node_instance)
> > > +				  slp_instance slp_node_instance,
> > > +				  edge loop_exit)
> > >  {
> > >    stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
> > >    gcc_assert (reduc_info->is_reduc_info); @@ -5891,6 +5922,7 @@
> > > vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
> > >       loop-closed PHI of the inner loop which we remember as
> > >       def for the reduction PHI generation.  */
> > >    bool double_reduc = false;
> > > +  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
> > >    stmt_vec_info rdef_info = stmt_info;
> > >    if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
> > >      {
> > > @@ -6053,7 +6085,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> > loop_vinfo,
> > >        /* Create an induction variable.  */
> > >        gimple_stmt_iterator incr_gsi;
> > >        bool insert_after;
> > > -      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> > > +      vect_iv_increment_position (loop_exit, &incr_gsi,
> > > + &insert_after);
> > >        create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
> > >  		 insert_after, &indx_before_incr, &indx_after_incr);
> > >
> > > @@ -6132,23 +6164,23 @@ vect_create_epilog_for_reduction
> > (loop_vec_info loop_vinfo,
> > >           Store them in NEW_PHIS.  */
> > >    if (double_reduc)
> > >      loop = outer_loop;
> > > -  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > +  /* We need to reduce values in all exits.  */  exit_bb =
> > > + loop_exit->dest;
> > >    exit_gsi = gsi_after_labels (exit_bb);
> > >    reduc_inputs.create (slp_node ? vec_num : ncopies);
> > > +  vec <gimple *> vec_stmts;
> > >    for (unsigned i = 0; i < vec_num; i++)
> > >      {
> > >        gimple_seq stmts = NULL;
> > > -      if (slp_node)
> > > -	def = vect_get_slp_vect_def (slp_node, i);
> > > -      else
> > > -	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
> > > +      def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
> > > +			       main_exit_p, i, vec_stmts);
> > >        for (j = 0; j < ncopies; j++)
> > >  	{
> > >  	  tree new_def = copy_ssa_name (def);
> > >  	  phi = create_phi_node (new_def, exit_bb);
> > >  	  if (j)
> > > -	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> > > -	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)-
> > >dest_idx, def);
> > > +	    def = gimple_get_lhs (vec_stmts[j]);
> > > +	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
> > >  	  new_def = gimple_convert (&stmts, vectype, new_def);
> > >  	  reduc_inputs.quick_push (new_def);
> > >  	}
> > > @@ -6882,10 +6914,33 @@ vect_create_epilog_for_reduction
> > (loop_vec_info loop_vinfo,
> > >  	    }
> > >
> > >            scalar_result = scalar_results[k];
> > > +	  edge merge_e = loop_exit;
> > > +	  if (!main_exit_p)
> > > +	    merge_e = single_succ_edge (loop_exit->dest);
> > >            FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
> > >  	    {
> > >  	      FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
> > > -		SET_USE (use_p, scalar_result);
> > > +		{
> > > +		  if (main_exit_p)
> > > +		    SET_USE (use_p, scalar_result);
> > > +		  else
> > > +		    {
> > > +		      /* When multiple exits the same SSA name can appear in
> > > +			 both the main and the early exits.  The meaning of the
> > > +			 reduction however is not the same.  In the main exit
> > > +			 case the meaning is "get the last value" and in the
> > > +			 early exit case it means "get the first value".  As
> > > +			 such we should only update the value for the exit
> > > +			 attached to loop_exit.  To make this easier we always
> > > +			 call vect_create_epilog_for_reduction on the early
> > > +			 exit main block first.  As such for the main exit we
> > > +			 no longer have to perform the BB check.  */
> > > +		      gphi *stmt = as_a <gphi *> (USE_STMT (use_p));
> > > +		      int idx = phi_arg_index_from_use (use_p);
> > > +		      if (gimple_phi_arg_edge (stmt, idx) == merge_e)
> > > +			SET_USE (use_p, scalar_result);
> > 
> > Hmm, I guess I still don't understand.  This code tries, in the reduction epilog
> > 
> >   # scalar_result_1 = PHI <..>
> >   # vector_result_2 = PHI <..>
> >   _3 = ... reduce vector_result_2 ...;
> > 
> > replace uses of scalar_result_1 with _3 of which there could be many,
> > including in debug stmts (there doesn't have to be an epilog loop after all).
> > 
> > Now, for an early exit we know there _is_ an epilog loop and we have a merge
> > block merging early exits before merging with the main exit.  We have forced(?)
> > PHI nodes to merge the individual early exit reduction results?
> 
> Sure, but you only always go to it for the early exit.  Not for the main one.
> That one is still decided by the exit guard.
> 
> > 
> > Either I can't see how we can end up with multiple uses or I can't see how the
> > main_exit_p case cannot also stomp on those?
> > 
> > Maybe it's related to the other question whether we are emitting a reduction
> > epilogue for each of the early exits or just once.
> 
> We aren't. We are only doing so once.  While we loop over the exits to find the
> alternative exit. After the first one is found it breaks out.  This is because we have
> no easy way to identify the merge block but to iterate.
> 
> To explain the above lets look at an example with a reduction (testcase vect-early-break_16.c)
> 
> #define N 1024
> unsigned vect_a[N];
> unsigned vect_b[N];
> 
> unsigned test4(unsigned x)
> {
>  unsigned ret = 0;
>  for (int i = 0; i < N; i++)
>  {
>    vect_b[i] = x + i;
>    if (vect_a[i] > x)
>      return vect_a[i];
>    vect_a[i] = x;
>    ret += vect_a[i] + vect_b[i];
>  }
>  return ret;
> }
> 
> This will give you this graph after peeling and updating of IVs
> https://gist.github.com/Mistuke/c2d632498ceeb10e24a9057bafd87412
> 
> This function does not need the epilogue.  So when the guard is added
> The condition is always false.  But it hasn't been folded away yet.
> 
> Because of this you have the same PHI on both in the edge to the
> function exit, and to the merge block, at least until CFG cleanup. 
> 
> However thinking about it some more, the possibility is that we remove
> The main edge from the merge block,  so if I just handle the main edge
> First then the SSA chain can never be broken and the check isn't needed.
> 
> Fixed, will be in next update.

Thanks.

> > 
> > > +		    }
> > > +		}
> > >  	      update_stmt (use_stmt);
> > >  	    }
> > >          }
> > > @@ -10481,15 +10536,17 @@ vectorizable_induction (loop_vec_info
> > loop_vinfo,
> > >    return true;
> > >  }
> > >
> > > -
> > >  /* Function vectorizable_live_operation_1.
> > > +
> > >     helper function for vectorizable_live_operation.  */
> > > +
> > >  tree
> > >  vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> > >  			       stmt_vec_info stmt_info, edge exit_e,
> > >  			       tree vectype, int ncopies, slp_tree slp_node,
> > >  			       tree bitsize, tree bitstart, tree vec_lhs,
> > > -			       tree lhs_type, gimple_stmt_iterator *exit_gsi)
> > > +			       tree lhs_type, bool restart_loop,
> > > +			       gimple_stmt_iterator *exit_gsi)
> > >  {
> > >    basic_block exit_bb = exit_e->dest;
> > >    gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS
> > > (loop_vinfo)); @@ -10504,7 +10561,9 @@ vectorizable_live_operation_1
> > (loop_vec_info loop_vinfo,
> > >    if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> > >      {
> > >        /* Emit:
> > > +
> > >  	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
> > > +
> > >  	 where VEC_LHS is the vectorized live-out result and MASK is
> > >  	 the loop mask for the final iteration.  */
> > >        gcc_assert (ncopies == 1 && !slp_node); @@ -10513,15 +10572,18
> > > @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> > >        tree len = vect_get_loop_len (loop_vinfo, &gsi,
> > >  				    &LOOP_VINFO_LENS (loop_vinfo),
> > >  				    1, vectype, 0, 0);
> > > +
> > >        /* BIAS - 1.  */
> > >        signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
> > (loop_vinfo);
> > >        tree bias_minus_one
> > >  	= int_const_binop (MINUS_EXPR,
> > >  			   build_int_cst (TREE_TYPE (len), biasval),
> > >  			   build_one_cst (TREE_TYPE (len)));
> > > +
> > >        /* LAST_INDEX = LEN + (BIAS - 1).  */
> > >        tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
> > >  				     len, bias_minus_one);
> > > +
> > >        /* This needs to implement extraction of the first index, but not sure
> > >  	 how the LEN stuff works.  At the moment we shouldn't get here since
> > >  	 there's no LEN support for early breaks.  But guard this so there's
> > > @@ -10532,13 +10594,16 @@ vectorizable_live_operation_1
> > (loop_vec_info loop_vinfo,
> > >        tree scalar_res
> > >  	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> > >  			vec_lhs_phi, last_index);
> > > +
> > >        /* Convert the extracted vector element to the scalar type.  */
> > >        new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> > >      }
> > >    else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > >      {
> > >        /* Emit:
> > > +
> > >  	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
> > > +
> > >  	 where VEC_LHS is the vectorized live-out result and MASK is
> > >  	 the loop mask for the final iteration.  */
> > >        gcc_assert (!slp_node);
> > > @@ -10548,10 +10613,38 @@ vectorizable_live_operation_1
> > (loop_vec_info loop_vinfo,
> > >        tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
> > >  				      &LOOP_VINFO_MASKS (loop_vinfo),
> > >  				      1, vectype, 0);
> > > +      tree scalar_res;
> > > +
> > > +      /* For an inverted control flow with early breaks we want
> > EXTRACT_FIRST
> > > +	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask.
> > */
> > > +      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +	{
> > > +	  /* First create the permuted mask.  */
> > > +	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> > > +	  tree perm_dest = copy_ssa_name (mask);
> > > +	  gimple *perm_stmt
> > > +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> > > +				       mask, perm_mask);
> > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> > > +				       &gsi);
> > > +	  mask = perm_dest;
> > > +
> > > +	  /* Then permute the vector contents.  */
> > > +	  tree perm_elem = perm_mask_for_reverse (vectype);
> > > +	  perm_dest = copy_ssa_name (vec_lhs_phi);
> > > +	  perm_stmt
> > > +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR,
> > vec_lhs_phi,
> > > +				       vec_lhs_phi, perm_elem);
> > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> > > +				       &gsi);
> > > +	  vec_lhs_phi = perm_dest;
> > > +	}
> > >
> > >        gimple_seq_add_seq (&stmts, tem);
> > > -       tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST,
> > scalar_type,
> > > -				       mask, vec_lhs_phi);
> > > +
> > > +      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> > > +				 mask, vec_lhs_phi);
> > > +
> > >        /* Convert the extracted vector element to the scalar type.  */
> > >        new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> > >      }
> > > @@ -10564,12 +10657,26 @@ vectorizable_live_operation_1
> > (loop_vec_info loop_vinfo,
> > >        new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
> > >  				       &stmts, true, NULL_TREE);
> > >      }
> > > +
> > >    *exit_gsi = gsi_after_labels (exit_bb);
> > >    if (stmts)
> > >      gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
> > > +
> > >    return new_tree;
> > >  }
> > >
> > > +/* Find the edge that's the final one in the path from SRC to DEST and
> > > +   return it.  This edge must exist in at most one forwarder edge
> > > +between.  */
> > > +
> > > +static edge
> > > +find_connected_edge (edge src, basic_block dest) {
> > > +   if (src->dest == dest)
> > > +     return src;
> > > +
> > > +  return find_edge (src->dest, dest); }
> > > +
> > >  /* Function vectorizable_live_operation.
> > >
> > >     STMT_INFO computes a value that is used outside the loop.  Check
> > > if @@ -10594,7 +10701,8 @@ vectorizable_live_operation (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> > >    int vec_entry = 0;
> > >    poly_uint64 vec_index = 0;
> > >
> > > -  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
> > > +  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
> > > +	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> > >
> > >    /* If a stmt of a reduction is live, vectorize it via
> > >       vect_create_epilog_for_reduction.  vectorizable_reduction
> > > assessed @@ -10619,8 +10727,25 @@ vectorizable_live_operation
> > (vec_info *vinfo, stmt_vec_info stmt_info,
> > >        if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
> > >  	  || STMT_VINFO_REDUC_TYPE (reduc_info) ==
> > EXTRACT_LAST_REDUCTION)
> > >  	return true;
> > > +
> > > +      /* If early break we only have to materialize the reduction on the merge
> > > +	 block, but we have to find an alternate exit first.  */
> > > +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +	{
> > > +	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP
> > (loop_vinfo)))
> > > +	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
> > > +	      {
> > > +		vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
> > > +						  slp_node,
> > slp_node_instance,
> > > +						  exit);
> > > +		break;
> > > +	      }
> > > +	}
> > > +
> > >        vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
> > > -					slp_node_instance);
> > > +					slp_node_instance,
> > > +					LOOP_VINFO_IV_EXIT (loop_vinfo));
> > > +
> > >        return true;
> > >      }
> > >
> > > @@ -10772,37 +10897,63 @@ vectorizable_live_operation (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> > >  	   lhs' = new_tree;  */
> > >
> > >        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > -      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > > -      gcc_assert (single_pred_p (exit_bb));
> > > -
> > > -      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> > > -      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> > > -      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx,
> > vec_lhs);
> > > -
> > > -      gimple_stmt_iterator exit_gsi;
> > > -      tree new_tree
> > > -	= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> > > -					 LOOP_VINFO_IV_EXIT (loop_vinfo),
> > > -					 vectype, ncopies, slp_node, bitsize,
> > > -					 bitstart, vec_lhs, lhs_type,
> > > -					 &exit_gsi);
> > > -
> > > -      /* Remove existing phis that copy from lhs and create copies
> > > -	 from new_tree.  */
> > > -      gimple_stmt_iterator gsi;
> > > -      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
> > > -	{
> > > -	  gimple *phi = gsi_stmt (gsi);
> > > -	  if ((gimple_phi_arg_def (phi, 0) == lhs))
> > > -	    {
> > > -	      remove_phi_node (&gsi, false);
> > > -	      tree lhs_phi = gimple_phi_result (phi);
> > > -	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> > > -	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> > > -	    }
> > > -	  else
> > > -	    gsi_next (&gsi);
> > > -	}
> > > +      /* Check if we have a loop where the chosen exit is not the main exit,
> > > +	 in these cases for an early break we restart the iteration the vector
> > code
> > > +	 did.  For the live values we want the value at the start of the iteration
> > > +	 rather than at the end.  */
> > > +      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > +      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED
> > (loop_vinfo);
> > > +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> > > +	if (!is_gimple_debug (use_stmt)
> > > +	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> > > +	  {
> > > +	    basic_block use_bb = gimple_bb (use_stmt);
> > > +	    if (!is_a <gphi *> (use_stmt))
> > 
> > should always be a PHI
> > 
> > > +	      continue;
> > > +	    for (auto exit_e : get_loop_exit_edges (loop))
> > > +	      {
> > > +		/* See if this exit leads to the value.  */
> > > +		edge dest_e = find_connected_edge (exit_e, use_bb);
> > 
> > When is this not exit_e->dest == use_bb?
> 
> The main exit blocks have an intermediate block in between it and the merge block
> that contains only the same PHI nodes as the original exit, and in an order to allow
> it to be easily linked to epilogue's main exit.
> 
> That block won't contain induction values as they're only needed in the merge block.
> With that said.. simplified the code.
> 
> > 
> > > +		if (!dest_e || PHI_ARG_DEF_FROM_EDGE (use_stmt, dest_e)
> > != lhs)
> > > +		  continue;
> > 
> > I'd change the above to
> > 
> >        FOR_EACH_IMM_USE_FAST (...)
> > 
> > then
> > 
> >    gimple_phi_arg_edge (USE_STMT (use_p), phi_arg_index_from_use (use_p))
> > 
> > is the exit edge you are looking for without iterating over all loop exits.
> > 
> > > +		gimple *tmp_vec_stmt = vec_stmt;
> > > +		tree tmp_vec_lhs = vec_lhs;
> > > +		tree tmp_bitstart = bitstart;
> > > +		/* For early exit where the exit is not in the BB that leads
> > > +		   to the latch then we're restarting the iteration in the
> > > +		   scalar loop.  So get the first live value.  */
> > > +		restart_loop = restart_loop || exit_e != main_e;
> > > +		if (restart_loop)
> > > +		  {
> > > +		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > > +		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> > > +		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
> > 
> > Hmm, that gets you the value after the first iteration, not the one before which
> > would be the last value of the preceeding vector iteration?
> > (but we don't keep those, we'd need a PHI)
> 
> I don't fully follow.  The comment on top of this hunk under if (loop_vinfo) states
> that lhs should be pointing to a PHI.
> 
> When I inspect the statement I see
> 
> i_14 = PHI <i_11(6), 0(14)>
> 
> so i_14 is the value at the start of the current iteration.  If we're coming from the
> header 0, otherwise i_11 which is the value of the previous iteration?
> 
> The peeling code explicitly leaves i_14 in the merge block and not i_11 for this exact reason.
> So I'm confused, my understanding is that we're already *at* the right PHI.
> 
> Is it perhaps that you thought we put i_11 here for the early exits? In which case
> Yes I'd agree that that would be wrong, and there we would have had to look at
> The defs, but i_11 is the def.
> 
> I already kept this in mind and leveraged peeling to make this part easier.
> i_11 is used in the main exit and i_14 in the early one.

I think the important detail is that this code is only executed for
vect_induction_defs which are indeed PHIs and so we're sure the
value live is before any modification so fine to feed as initial
value for the PHI in the epilog.

Maybe we can assert the def type here?

> > 
> > Why again do we need (non-induction) live values from the vector loop to the
> > epilogue loop again?
> 
> They can appear as the result value of the main exit.
> 
> e.g. in testcase (vect-early-break_17.c)
> 
> #define N 1024
> unsigned vect_a[N];
> unsigned vect_b[N];
> 
> unsigned test4(unsigned x)
> {
>  unsigned ret = 0;
>  for (int i = 0; i < N; i++)
>  {
>    vect_b[i] = x + i;
>    if (vect_a[i] > x)
>      return vect_a[i];
>    vect_a[i] = x;
>    ret = vect_a[i] + vect_b[i];
>  }
>  return ret;
> }
> 
> The only situation they can appear in the as an early-break is when
> we have a case where main exit != latch connected exit.
> 
> However in these cases they are unused, and only there because
> normally you would have exited (i.e. there was a return) but the
> vector loop needs to start over so we ignore it.
> 
> These happen in testcase vect-early-break_74.c and
> vect-early-break_78.c

Hmm, so in that case their value is incorrect (but doesn't matter,
we ignore it)?

> > 
> > If we are dealing with an induction (or a reduction, you can check the def
> > type), there should be an associated PHI node to get that vector.
> > 
> > That said, are you sure there's testsuite coverage for the induction case?
> 
> Well, we now require it every for IV related variables between the two loops.
> So there's not a single testcase that doesn't use it.

Ah, good.

As said, if this is only for induction_defs then that clears up stuff,
probably worth a comment/assert.

> > 
> > > +		  }
> > > +
> > > +		gimple_stmt_iterator exit_gsi;
> > > +		tree new_tree
> > > +		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> > > +						   exit_e, vectype, ncopies,
> > > +						   slp_node, bitsize,
> > > +						   tmp_bitstart, tmp_vec_lhs,
> > > +						   lhs_type, restart_loop,
> > > +						   &exit_gsi);
> > > +
> > > +		/* Use the empty block on the exit to materialize the new
> > stmts
> > > +		   so we can use update the PHI here.  */
> > > +		if (gimple_phi_num_args (use_stmt) == 1)
> > > +		  {
> > > +		    auto gsi = gsi_for_stmt (use_stmt);
> > > +		    remove_phi_node (&gsi, false);
> > > +		    tree lhs_phi = gimple_phi_result (use_stmt);
> > > +		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> > > +		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> > > +		  }
> > > +		else
> > > +		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
> > 
> > if the else case works, why not use it always?
> 
> Because it doesn't work for main exit.  The early exit have a intermediate block
> that is used to generate the statements on, so for them we are fine updating the
> use in place.
>
> The main exits don't. and so the existing trick the vectorizer uses is to materialize
> the statements in the same block and then dissolves the phi node.   However you
> can't do that for the early exit because the phi node isn't singular.

But if the PHI has a single arg you can replace that?  By making a
copy stmt from it don't you break LC SSA?

> Now I know what your next question is, well why don't we just use the same method
> for both?  When I create an extra intermediate block for this reduction vectorization
> for VLA seems to go off the rails.  The code is still correct, just highly inefficient.
> 
> That is because without having code to prevent this peeling will create LCSSA PHI blocks
> with singular entries.  These should be harmless, but VLA reductions generate some
> unpacking and packing to deal with them.  I tried to figure out why but this is a large bit
> of code.  So for now I went with the simpler approach of replacing the use only for the
> early exit, where we never have the intermediate PHIs.

I don't quite understand, but I'll take your word for it.

Richard.

> 
> Thanks,
> Tamar
> 
> > 
> > 
> > The rest looks OK.
> > 
> > Richard.
> > 
> > > +	      }
> > > +	  }
> > >
> > >        /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
> > >        FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs) diff --git
> > > a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> > >
> > fe38beb4fa1d9f8593445354f56ba52e10a040cd..27221c6e8e86034050b56
> > 2ee5c15
> > > 992827a8d2cb 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info
> > stmt_info,
> > >     - it has uses outside the loop.
> > >     - it has vdefs (it alters memory).
> > >     - control stmts in the loop (except for the exit condition).
> > > +   - it is an induction and we have multiple exits.
> > >
> > >     CHECKME: what other side effects would the vectorizer allow?  */
> > >
> > > @@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > loop_vec_info loop_vinfo,
> > >  	}
> > >      }
> > >
> > > +  /* Check if it's an induction and multiple exits.  In this case there will be
> > > +     a usage later on after peeling which is needed for the alternate
> > > +exit.  */
> > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > +      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> > > +    {
> > > +      if (dump_enabled_p ())
> > > +	  dump_printf_loc (MSG_NOTE, vect_location,
> > > +			   "vec_stmt_relevant_p: induction forced for "
> > > +			   "early break.\n");
> > > +      *live_p = true;
> > > +
> > > +    }
> > > +
> > >    if (*live_p && *relevant == vect_unused_in_scope
> > >        && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
> > >      {
> > > @@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo,
> > > stmt_vec_info stmt_info)
> > >  /* If the target supports a permute mask that reverses the elements in
> > >     a vector of type VECTYPE, return that mask, otherwise return null.
> > > */
> > >
> > > -static tree
> > > +tree
> > >  perm_mask_for_reverse (tree vectype)
> > >  {
> > >    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); @@ -12720,20
> > > +12734,27 @@ can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info
> > stmt_info,
> > >  			  bool vec_stmt_p,
> > >  			  stmt_vector_for_cost *cost_vec)
> > >  {
> > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > >    if (slp_node)
> > >      {
> > >        stmt_vec_info slp_stmt_info;
> > >        unsigned int i;
> > >        FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i,
> > slp_stmt_info)
> > >  	{
> > > -	  if (STMT_VINFO_LIVE_P (slp_stmt_info)
> > > +	  if ((STMT_VINFO_LIVE_P (slp_stmt_info)
> > > +	       || (loop_vinfo
> > > +		   && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > +		   && STMT_VINFO_DEF_TYPE (slp_stmt_info)
> > > +			== vect_induction_def))
> > >  	      && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
> > >  					       slp_node_instance, i,
> > >  					       vec_stmt_p, cost_vec))
> > >  	    return false;
> > >  	}
> > >      }
> > > -  else if (STMT_VINFO_LIVE_P (stmt_info)
> > > +  else if ((STMT_VINFO_LIVE_P (stmt_info)
> > > +	    || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > +		&& STMT_VINFO_DEF_TYPE (stmt_info) ==
> > vect_induction_def))
> > >  	   && !vectorizable_live_operation (vinfo, stmt_info,
> > >  					    slp_node, slp_node_instance, -1,
> > >  					    vec_stmt_p, cost_vec))
> > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index
> > >
> > de60da31e2a3030a7fbc302d3f676af9683fd019..fd4b0a787e6128b43c5ca2
> > b0612f
> > > 55845e6b3cef 100644
> > > --- a/gcc/tree-vectorizer.h
> > > +++ b/gcc/tree-vectorizer.h
> > > @@ -2248,6 +2248,7 @@ extern bool vect_is_simple_use (vec_info *,
> > stmt_vec_info, slp_tree,
> > >  				enum vect_def_type *,
> > >  				tree *, stmt_vec_info * = NULL);
> > >  extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
> > > +extern tree perm_mask_for_reverse (tree);
> > >  extern bool supportable_widening_operation (vec_info*, code_helper,
> > >  					    stmt_vec_info, tree, tree,
> > >  					    code_helper*, code_helper*,
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 13/21]middle-end: Update loop form analysis to support early break
  2023-11-27 22:48   ` Tamar Christina
@ 2023-12-06  4:00     ` Tamar Christina
  0 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-12-06  4:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

ping

> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Monday, November 27, 2023 10:48 PM
> To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: RE: [PATCH 13/21]middle-end: Update loop form analysis to support early
> break
> 
> Ping
> 
> > -----Original Message-----
> > From: Tamar Christina <tamar.christina@arm.com>
> > Sent: Monday, November 6, 2023 7:41 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> > Subject: [PATCH 13/21]middle-end: Update loop form analysis to support
> > early break
> >
> > Hi All,
> >
> > This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the
> > other patches are self contained.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
> > 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > 	(vect_transform_loop): Use it.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> > 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb
> > 991f07cd6052491d0 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost
> > (loop_vec_info loop_vinfo)
> >    loop_vinfo->scalar_costs->finish_cost (nullptr);  }
> >
> > -
> >  /* Function vect_analyze_loop_form.
> >
> >     Verify that certain CFG restrictions hold, including:
> >     - the loop has a pre-header
> > -   - the loop has a single entry and exit
> > +   - the loop has a single entry
> > +   - nested loops can have only a single exit.
> >     - the loop exit condition is simple enough
> >     - the number of iterations can be analyzed, i.e, a countable loop.  The
> >       niter could be analyzed under some assumptions.  */ @@ -1841,10
> > +1841,14 @@ vect_analyze_loop_form (class loop *loop,
> > vect_loop_form_info *info)
> >  				   "not vectorized: latch block not empty.\n");
> >
> >    /* Make sure the exit is not abnormal.  */
> > -  if (exit_e->flags & EDGE_ABNORMAL)
> > -    return opt_result::failure_at (vect_location,
> > -				   "not vectorized:"
> > -				   " abnormal loop exit edge.\n");
> > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > +  for (edge e : exits)
> > +    {
> > +      if (e->flags & EDGE_ABNORMAL)
> > +	return opt_result::failure_at (vect_location,
> > +				       "not vectorized:"
> > +				       " abnormal loop exit edge.\n");
> > +    }
> >
> >    info->conds
> >      = vect_get_loop_niters (loop, exit_e, &info->assumptions, @@ -1920,6
> > +1924,10 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared
> > *shared,
> >
> >    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> >
> > +  /* Check to see if we're vectorizing multiple exits.  */
> > + LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > +
> >    if (info->inner_loop_cond)
> >      {
> >        stmt_vec_info inner_loop_cond_info @@ -11577,7 +11585,7 @@
> > vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
> >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> >       versioning.   */
> >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > -  if (! single_pred_p (e->dest))
> > +  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> > + (loop_vinfo))
> >      {
> >        split_loop_exit_edge (e, true);
> >        if (dump_enabled_p ())
> >
> >
> >
> >
> > --

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 10/21]middle-end: implement relevancy analysis support for control flow
  2023-11-29 14:47     ` Richard Biener
@ 2023-12-06  4:10       ` Tamar Christina
  2023-12-06  9:44         ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-06  4:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 11391 bytes --]

> > > +	  && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != cond)
> > > +	*relevant = vect_used_in_scope;
> 
> but why not simply mark all gconds as vect_used_in_scope?
> 

We break outer-loop vectorization since doing so would pull the inner loop's
exit into scope for the outerloop.   Also we can't force the loop's main IV exit
to be in scope, since it will be replaced by the vectorizer.

I've updated the code to remove the quadratic lookup.

> > > +    }
> > >
> > >    /* changing memory.  */
> > >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI) @@ -374,6 +379,11 @@
> > > vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
> > >  	*relevant = vect_used_in_scope;
> > >        }
> > >
> > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);  auto_bitmap
> > > + exit_bbs;  for (edge exit : exits)
> 
> is it your mail client messing patches up?  missing line-break
> again.
> 

Yeah, seems it was, hopefully fixed now.

> > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > +
> 
> you don't seem to use the bitmap?
> 
> > >    /* uses outside the loop.  */
> > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > SSA_OP_DEF)
> > >      {
> > > @@ -392,7 +402,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > > loop_vec_info loop_vinfo,
> > >  	      /* We expect all such uses to be in the loop exit phis
> > >  		 (because of loop closed form)   */
> > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > > -	      gcc_assert (bb == single_exit (loop)->dest);
> > >
> > >                *live_p = true;
> > >  	    }
> > > @@ -793,6 +802,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info
> > > loop_vinfo, bool *fatal)
> > >  			return res;
> > >  		    }
> > >                   }
> > > +	    }
> > > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> > > +	    {
> > > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > > +	      opt_result res
> > > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > +			       loop_vinfo, relevant, &worklist, false);
> > > +	      if (!res)
> > > +		return res;
> > > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > +				loop_vinfo, relevant, &worklist, false);
> > > +	      if (!res)
> > > +		return res;
> > >              }
> 
> I guess we're missing an
> 
>   else
>     gcc_unreachable ();
> 
> to catch not handled stmt kinds (do we have gcond patterns yet?)
> 
> > >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> > >  	    {
> > > @@ -13043,11 +13066,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > >  			     node_instance, cost_vec);
> > >        if (!res)
> > >  	return res;
> > > -   }
> > > +    }
> > > +
> > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> 
> I think it should rather be vect_condition_def.  It's also not
> this functions business to set STMT_VINFO_DEF_TYPE.  If we ever
> get to handle not if-converted code (or BB vectorization of that)
> then a gcond would define the mask stmts are under.
> 

Hmm sure, I've had to place it in multiple other places but moved it
away from here.  The main ones are set during dataflow analysis when
we determine which statements need to be moved.

> > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > >      {
> > >        case vect_internal_def:
> > > +      case vect_early_exit_def:
> > >          break;
> > >
> > >        case vect_reduction_def:
> > > @@ -13080,6 +13107,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > >      {
> > >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> > >        *need_to_vectorize = true;
> > >      }
> > > @@ -13835,6 +13863,14 @@ vect_is_simple_use (vec_info *vinfo,
> > > stmt_vec_info stmt, slp_tree slp_node,
> > >  	  else
> > >  	    *op = gimple_op (ass, operand + 1);
> > >  	}
> > > +      else if (gcond *cond = dyn_cast <gcond *> (stmt->stmt))
> > > +	{
> > > +	  gimple_match_op m_op;
> > > +	  if (!gimple_extract_op (cond, &m_op))
> > > +	    return false;
> > > +	  gcc_assert (m_op.code.is_tree_code ());
> > > +	  *op = m_op.ops[operand];
> > > +	}
> 
> Please do not use gimple_extract_op, use
> 
>   *op = gimple_op (cond, operand);
> 
> > >        else if (gcall *call = dyn_cast <gcall *> (stmt->stmt))
> > >  	*op = gimple_call_arg (call, operand);
> > >        else
> > > @@ -14445,6 +14481,8 @@ vect_get_vector_types_for_stmt (vec_info
> > > *vinfo, stmt_vec_info stmt_info,
> > >    *nunits_vectype_out = NULL_TREE;
> > >
> > >    if (gimple_get_lhs (stmt) == NULL_TREE
> > > +      /* Allow vector conditionals through here.  */
> > > +      && !is_ctrl_stmt (stmt)
> 
> !is_a <gcond *> (stmt)
> 
> > >        /* MASK_STORE has no lhs, but is ok.  */
> > >        && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > >      {
> > > @@ -14461,7 +14499,7 @@ vect_get_vector_types_for_stmt (vec_info
> > > *vinfo, stmt_vec_info stmt_info,
> > >  	}
> > >
> > >        return opt_result::failure_at (stmt,
> > > -				     "not vectorized: irregular stmt.%G", stmt);
> > > +				     "not vectorized: irregular stmt: %G", stmt);
> > >      }
> > >
> > >    tree vectype;
> > > @@ -14490,6 +14528,14 @@ vect_get_vector_types_for_stmt (vec_info
> > > *vinfo, stmt_vec_info stmt_info,
> > >  	scalar_type = TREE_TYPE (DR_REF (dr));
> > >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > > +      else if (is_ctrl_stmt (stmt))
> 
> else if (gcond *cond = dyn_cast <...>)
> 
> > > +	{
> > > +	  gcond *cond = dyn_cast <gcond *> (stmt);
> > > +	  if (!cond)
> > > +	    return opt_result::failure_at (stmt, "not vectorized: unsupported"
> > > +					   " control flow statement.\n");
> > > +	  scalar_type = TREE_TYPE (gimple_cond_rhs (stmt));
> 
> As said in the other patch STMT_VINFO_VECTYPE of the gcond should
> be the _mask_ type the compare produces, not the vector type of
> the inputs (the nunits_vectype might be that one though).
> You possibly need to adjust vect_get_smallest_scalar_type for this.
> 

Fixed, but is in other patch now.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_mark_pattern_stmts): Support gcond
	patterns.
	* tree-vect-stmts.cc (vect_stmt_relevant_p,
	vect_mark_stmts_to_be_vectorized, vect_analyze_stmt, vect_is_simple_use,
	vect_get_vector_types_for_stmt): Support early breaks.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index c6cedf4fe7c1f1e1126ce166a059a4b2a2b49cbd..ea59ad337f14d802607850e8a7cf0125777ce2bc 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6987,6 +6987,10 @@ vect_mark_pattern_stmts (vec_info *vinfo,
     vect_set_pattern_stmt (vinfo,
 			   pattern_stmt, orig_stmt_info, pattern_vectype);
 
+  /* For any conditionals mark them as vect_condition_def.  */
+  if (is_a <gcond *> (pattern_stmt))
+    STMT_VINFO_DEF_TYPE (STMT_VINFO_RELATED_STMT (orig_stmt_info)) = vect_condition_def;
+
   /* Transfer reduction path info to the pattern.  */
   if (STMT_VINFO_REDUC_IDX (orig_stmt_info_saved) != -1)
     {
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d801b72a149ebe6aa4d1f2942324b042d07be530..1e2698fcb7e95ae7f0009d10a79ba8c891a8227d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -361,7 +361,9 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 
   /* cond stmt other than loop exit cond.  */
   gimple *stmt = STMT_VINFO_STMT (stmt_info);
-  if (dyn_cast <gcond *> (stmt))
+  if (is_a <gcond *> (stmt)
+      && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != stmt
+      && (!loop->inner || gimple_bb (stmt)->loop_father == loop))
     *relevant = vect_used_in_scope;
 
   /* changing memory.  */
@@ -393,7 +395,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	      /* We expect all such uses to be in the loop exit phis
 		 (because of loop closed form)   */
 	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
-	      gcc_assert (bb == single_exit (loop)->dest);
 
               *live_p = true;
 	    }
@@ -807,6 +808,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
 			return res;
 		    }
                  }
+	    }
+	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
+	    {
+	      enum tree_code rhs_code = gimple_cond_code (cond);
+	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
+	      opt_result res
+		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
+			       loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
+	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
+				loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
             }
 	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
 	    {
@@ -820,6 +835,8 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
 		    return res;
 		}
 	    }
+	  else
+	    gcc_unreachable ();
         }
       else
 	FOR_EACH_PHI_OR_STMT_USE (use_p, stmt_vinfo->stmt, iter, SSA_OP_USE)
@@ -13044,11 +13061,12 @@ vect_analyze_stmt (vec_info *vinfo,
 			     node_instance, cost_vec);
       if (!res)
 	return res;
-   }
+    }
 
   switch (STMT_VINFO_DEF_TYPE (stmt_info))
     {
       case vect_internal_def:
+      case vect_condition_def:
         break;
 
       case vect_reduction_def:
@@ -13081,6 +13099,7 @@ vect_analyze_stmt (vec_info *vinfo,
     {
       gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
       gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
+		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
 		  || (call && gimple_call_lhs (call) == NULL_TREE));
       *need_to_vectorize = true;
     }
@@ -13855,6 +13874,8 @@ vect_is_simple_use (vec_info *vinfo, stmt_vec_info stmt, slp_tree slp_node,
 	  else
 	    *op = gimple_op (ass, operand + 1);
 	}
+      else if (gcond *cond = dyn_cast <gcond *> (stmt->stmt))
+	*op = gimple_op (cond, operand);
       else if (gcall *call = dyn_cast <gcall *> (stmt->stmt))
 	*op = gimple_call_arg (call, operand);
       else
@@ -14465,6 +14486,8 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
   *nunits_vectype_out = NULL_TREE;
 
   if (gimple_get_lhs (stmt) == NULL_TREE
+      /* Allow vector conditionals through here.  */
+      && !is_a <gcond *> (stmt)
       /* MASK_STORE has no lhs, but is ok.  */
       && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
     {
@@ -14481,7 +14504,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	}
 
       return opt_result::failure_at (stmt,
-				     "not vectorized: irregular stmt.%G", stmt);
+				     "not vectorized: irregular stmt: %G", stmt);
     }
 
   tree vectype;

[-- Attachment #2: rb17970.patch --]
[-- Type: application/octet-stream, Size: 4366 bytes --]

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index c6cedf4fe7c1f1e1126ce166a059a4b2a2b49cbd..ea59ad337f14d802607850e8a7cf0125777ce2bc 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6987,6 +6987,10 @@ vect_mark_pattern_stmts (vec_info *vinfo,
     vect_set_pattern_stmt (vinfo,
 			   pattern_stmt, orig_stmt_info, pattern_vectype);
 
+  /* For any conditionals mark them as vect_condition_def.  */
+  if (is_a <gcond *> (pattern_stmt))
+    STMT_VINFO_DEF_TYPE (STMT_VINFO_RELATED_STMT (orig_stmt_info)) = vect_condition_def;
+
   /* Transfer reduction path info to the pattern.  */
   if (STMT_VINFO_REDUC_IDX (orig_stmt_info_saved) != -1)
     {
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d801b72a149ebe6aa4d1f2942324b042d07be530..1e2698fcb7e95ae7f0009d10a79ba8c891a8227d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -361,7 +361,9 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 
   /* cond stmt other than loop exit cond.  */
   gimple *stmt = STMT_VINFO_STMT (stmt_info);
-  if (dyn_cast <gcond *> (stmt))
+  if (is_a <gcond *> (stmt)
+      && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != stmt
+      && (!loop->inner || gimple_bb (stmt)->loop_father == loop))
     *relevant = vect_used_in_scope;
 
   /* changing memory.  */
@@ -393,7 +395,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	      /* We expect all such uses to be in the loop exit phis
 		 (because of loop closed form)   */
 	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
-	      gcc_assert (bb == single_exit (loop)->dest);
 
               *live_p = true;
 	    }
@@ -807,6 +808,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
 			return res;
 		    }
                  }
+	    }
+	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
+	    {
+	      enum tree_code rhs_code = gimple_cond_code (cond);
+	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
+	      opt_result res
+		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
+			       loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
+	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
+				loop_vinfo, relevant, &worklist, false);
+	      if (!res)
+		return res;
             }
 	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
 	    {
@@ -820,6 +835,8 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
 		    return res;
 		}
 	    }
+	  else
+	    gcc_unreachable ();
         }
       else
 	FOR_EACH_PHI_OR_STMT_USE (use_p, stmt_vinfo->stmt, iter, SSA_OP_USE)
@@ -13044,11 +13061,12 @@ vect_analyze_stmt (vec_info *vinfo,
 			     node_instance, cost_vec);
       if (!res)
 	return res;
-   }
+    }
 
   switch (STMT_VINFO_DEF_TYPE (stmt_info))
     {
       case vect_internal_def:
+      case vect_condition_def:
         break;
 
       case vect_reduction_def:
@@ -13081,6 +13099,7 @@ vect_analyze_stmt (vec_info *vinfo,
     {
       gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
       gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
+		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
 		  || (call && gimple_call_lhs (call) == NULL_TREE));
       *need_to_vectorize = true;
     }
@@ -13855,6 +13874,8 @@ vect_is_simple_use (vec_info *vinfo, stmt_vec_info stmt, slp_tree slp_node,
 	  else
 	    *op = gimple_op (ass, operand + 1);
 	}
+      else if (gcond *cond = dyn_cast <gcond *> (stmt->stmt))
+	*op = gimple_op (cond, operand);
       else if (gcall *call = dyn_cast <gcall *> (stmt->stmt))
 	*op = gimple_call_arg (call, operand);
       else
@@ -14465,6 +14486,8 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
   *nunits_vectype_out = NULL_TREE;
 
   if (gimple_get_lhs (stmt) == NULL_TREE
+      /* Allow vector conditionals through here.  */
+      && !is_a <gcond *> (stmt)
       /* MASK_STORE has no lhs, but is ok.  */
       && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
     {
@@ -14481,7 +14504,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	}
 
       return opt_result::failure_at (stmt,
-				     "not vectorized: irregular stmt.%G", stmt);
+				     "not vectorized: irregular stmt: %G", stmt);
     }
 
   tree vectype;

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-11-30 13:23                       ` Richard Biener
@ 2023-12-06  4:21                         ` Tamar Christina
  2023-12-06  9:33                           ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-06  4:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 25336 bytes --]

> > > is the exit edge you are looking for without iterating over all loop exits.
> > >
> > > > +		gimple *tmp_vec_stmt = vec_stmt;
> > > > +		tree tmp_vec_lhs = vec_lhs;
> > > > +		tree tmp_bitstart = bitstart;
> > > > +		/* For early exit where the exit is not in the BB that leads
> > > > +		   to the latch then we're restarting the iteration in the
> > > > +		   scalar loop.  So get the first live value.  */
> > > > +		restart_loop = restart_loop || exit_e != main_e;
> > > > +		if (restart_loop)
> > > > +		  {
> > > > +		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > > > +		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> > > > +		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
> > >
> > > Hmm, that gets you the value after the first iteration, not the one before which
> > > would be the last value of the preceeding vector iteration?
> > > (but we don't keep those, we'd need a PHI)
> >
> > I don't fully follow.  The comment on top of this hunk under if (loop_vinfo) states
> > that lhs should be pointing to a PHI.
> >
> > When I inspect the statement I see
> >
> > i_14 = PHI <i_11(6), 0(14)>
> >
> > so i_14 is the value at the start of the current iteration.  If we're coming from the
> > header 0, otherwise i_11 which is the value of the previous iteration?
> >
> > The peeling code explicitly leaves i_14 in the merge block and not i_11 for this
> exact reason.
> > So I'm confused, my understanding is that we're already *at* the right PHI.
> >
> > Is it perhaps that you thought we put i_11 here for the early exits? In which case
> > Yes I'd agree that that would be wrong, and there we would have had to look at
> > The defs, but i_11 is the def.
> >
> > I already kept this in mind and leveraged peeling to make this part easier.
> > i_11 is used in the main exit and i_14 in the early one.
> 
> I think the important detail is that this code is only executed for
> vect_induction_defs which are indeed PHIs and so we're sure the
> value live is before any modification so fine to feed as initial
> value for the PHI in the epilog.
> 
> Maybe we can assert the def type here?

We can't assert because until cfg cleanup the dead value is still seen and still
vectorized.  That said I've added a guard here.  We vectorize the non-induction
value as normal now and if it's ever used it'll fail.

> 
> > >
> > > Why again do we need (non-induction) live values from the vector loop to the
> > > epilogue loop again?
> >
> > They can appear as the result value of the main exit.
> >
> > e.g. in testcase (vect-early-break_17.c)
> >
> > #define N 1024
> > unsigned vect_a[N];
> > unsigned vect_b[N];
> >
> > unsigned test4(unsigned x)
> > {
> >  unsigned ret = 0;
> >  for (int i = 0; i < N; i++)
> >  {
> >    vect_b[i] = x + i;
> >    if (vect_a[i] > x)
> >      return vect_a[i];
> >    vect_a[i] = x;
> >    ret = vect_a[i] + vect_b[i];
> >  }
> >  return ret;
> > }
> >
> > The only situation they can appear in the as an early-break is when
> > we have a case where main exit != latch connected exit.
> >
> > However in these cases they are unused, and only there because
> > normally you would have exited (i.e. there was a return) but the
> > vector loop needs to start over so we ignore it.
> >
> > These happen in testcase vect-early-break_74.c and
> > vect-early-break_78.c
> 
> Hmm, so in that case their value is incorrect (but doesn't matter,
> we ignore it)?
> 

Correct, they're placed there due to exit redirection, but in these inverted
testcases where we've peeled the vector iteration you can't ever skip the
epilogue.  So they are guaranteed not to be used.

> > > > +		gimple_stmt_iterator exit_gsi;
> > > > +		tree new_tree
> > > > +		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> > > > +						   exit_e, vectype, ncopies,
> > > > +						   slp_node, bitsize,
> > > > +						   tmp_bitstart, tmp_vec_lhs,
> > > > +						   lhs_type, restart_loop,
> > > > +						   &exit_gsi);
> > > > +
> > > > +		/* Use the empty block on the exit to materialize the new
> > > stmts
> > > > +		   so we can use update the PHI here.  */
> > > > +		if (gimple_phi_num_args (use_stmt) == 1)
> > > > +		  {
> > > > +		    auto gsi = gsi_for_stmt (use_stmt);
> > > > +		    remove_phi_node (&gsi, false);
> > > > +		    tree lhs_phi = gimple_phi_result (use_stmt);
> > > > +		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> > > > +		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> > > > +		  }
> > > > +		else
> > > > +		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
> > >
> > > if the else case works, why not use it always?
> >
> > Because it doesn't work for main exit.  The early exit have a intermediate block
> > that is used to generate the statements on, so for them we are fine updating the
> > use in place.
> >
> > The main exits don't. and so the existing trick the vectorizer uses is to materialize
> > the statements in the same block and then dissolves the phi node.   However you
> > can't do that for the early exit because the phi node isn't singular.
> 
> But if the PHI has a single arg you can replace that?  By making a
> copy stmt from it don't you break LC SSA?
> 

Yeah, what the existing code is sneakily doing is this:

It has to vectorize

x = PHI <y>
y gets vectorized a z but

x = PHI <z>
z = ...

would be invalid,  so what it does, since it doesn't have a predecessor note to place stuff in,
it'll do

z = ...
x = z

and removed the PHI.  The PHI was only placed there for vectorization so it's not needed
after this point.  It's also for this reason why the code passes around a gimpe_seq since
it needs to make sure it gets the order right when inserting statements.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-loop.cc (vectorizable_live_operation,
	vectorizable_live_operation_1): Support early exits.
	(can_vectorize_live_stmts): Call vectorizable_live_operation for non-live
	inductions or reductions.
	(find_connected_edge, vect_get_vect_def): New.
	(vect_create_epilog_for_reduction): Support reductions in early break.
	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
	(vect_stmt_relevant_p): Mark all inductions when early break as being
	live.
	* tree-vectorizer.h (perm_mask_for_reverse): Expose.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index f38cc47551488525b15c2be758cac8291dbefb3a..4e48217a31e59318c2ea8e5ab63b06ba19840cbd 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3346,6 +3346,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	    bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count ();
 	  bb_before_epilog = loop_preheader_edge (epilog)->src;
 	}
+
       /* If loop is peeled for non-zero constant times, now niters refers to
 	 orig_niters - prolog_peeling, it won't overflow even the orig_niters
 	 overflows.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index df5e1d28fac2ce35e71decdec0d8e31fb75557f5..2f922b42f6d567dfd5da9b276b1c9d37bc681876 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5831,6 +5831,34 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
   return new_temp;
 }
 
+/* Retrieves the definining statement to be used for a reduction.
+   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at
+   the reduction definitions.  */
+
+tree
+vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
+		   slp_instance slp_node_instance, bool main_exit_p, unsigned i,
+		   vec <gimple *> &vec_stmts)
+{
+  tree def;
+
+  if (slp_node)
+    {
+      if (!main_exit_p)
+        slp_node = slp_node_instance->reduc_phis;
+      def = vect_get_slp_vect_def (slp_node, i);
+    }
+  else
+    {
+      if (!main_exit_p)
+	reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
+      vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
+      def = gimple_get_lhs (vec_stmts[0]);
+    }
+
+  return def;
+}
+
 /* Function vect_create_epilog_for_reduction
 
    Create code at the loop-epilog to finalize the result of a reduction
@@ -5842,6 +5870,8 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
    SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
    REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
      (counting from 0)
+   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
+     exit this edge is always the main loop exit.
 
    This function:
    1. Completes the reduction def-use cycles.
@@ -5882,7 +5912,8 @@ static void
 vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 				  stmt_vec_info stmt_info,
 				  slp_tree slp_node,
-				  slp_instance slp_node_instance)
+				  slp_instance slp_node_instance,
+				  edge loop_exit)
 {
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
@@ -5891,6 +5922,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
      loop-closed PHI of the inner loop which we remember as
      def for the reduction PHI generation.  */
   bool double_reduc = false;
+  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
   stmt_vec_info rdef_info = stmt_info;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
     {
@@ -6053,7 +6085,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
       /* Create an induction variable.  */
       gimple_stmt_iterator incr_gsi;
       bool insert_after;
-      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
       create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
 		 insert_after, &indx_before_incr, &indx_after_incr);
 
@@ -6132,23 +6164,23 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+  /* We need to reduce values in all exits.  */
+  exit_bb = loop_exit->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
+  vec <gimple *> vec_stmts;
   for (unsigned i = 0; i < vec_num; i++)
     {
       gimple_seq stmts = NULL;
-      if (slp_node)
-	def = vect_get_slp_vect_def (slp_node, i);
-      else
-	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
+      def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
+			       main_exit_p, i, vec_stmts);
       for (j = 0; j < ncopies; j++)
 	{
 	  tree new_def = copy_ssa_name (def);
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
-	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
+	    def = gimple_get_lhs (vec_stmts[j]);
+	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -10481,17 +10513,18 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return true;
 }
 
-
 /* Function vectorizable_live_operation_1.
+
    helper function for vectorizable_live_operation.  */
+
 tree
 vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
-			       stmt_vec_info stmt_info, edge exit_e,
+			       stmt_vec_info stmt_info, basic_block exit_bb,
 			       tree vectype, int ncopies, slp_tree slp_node,
 			       tree bitsize, tree bitstart, tree vec_lhs,
-			       tree lhs_type, gimple_stmt_iterator *exit_gsi)
+			       tree lhs_type, bool restart_loop,
+			       gimple_stmt_iterator *exit_gsi)
 {
-  basic_block exit_bb = exit_e->dest;
   gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   tree vec_lhs_phi = copy_ssa_name (vec_lhs);
@@ -10504,7 +10537,9 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
     {
       /* Emit:
+
 	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
+
 	 where VEC_LHS is the vectorized live-out result and MASK is
 	 the loop mask for the final iteration.  */
       gcc_assert (ncopies == 1 && !slp_node);
@@ -10513,15 +10548,18 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree len = vect_get_loop_len (loop_vinfo, &gsi,
 				    &LOOP_VINFO_LENS (loop_vinfo),
 				    1, vectype, 0, 0);
+
       /* BIAS - 1.  */
       signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
       tree bias_minus_one
 	= int_const_binop (MINUS_EXPR,
 			   build_int_cst (TREE_TYPE (len), biasval),
 			   build_one_cst (TREE_TYPE (len)));
+
       /* LAST_INDEX = LEN + (BIAS - 1).  */
       tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
 				     len, bias_minus_one);
+
       /* This needs to implement extraction of the first index, but not sure
 	 how the LEN stuff works.  At the moment we shouldn't get here since
 	 there's no LEN support for early breaks.  But guard this so there's
@@ -10532,13 +10570,16 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree scalar_res
 	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
 			vec_lhs_phi, last_index);
+
       /* Convert the extracted vector element to the scalar type.  */
       new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
     }
   else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
     {
       /* Emit:
+
 	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
+
 	 where VEC_LHS is the vectorized live-out result and MASK is
 	 the loop mask for the final iteration.  */
       gcc_assert (!slp_node);
@@ -10548,10 +10589,38 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
 				      &LOOP_VINFO_MASKS (loop_vinfo),
 				      1, vectype, 0);
+      tree scalar_res;
+
+      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* First create the permuted mask.  */
+	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	  tree perm_dest = copy_ssa_name (mask);
+	  gimple *perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+				       mask, perm_mask);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  mask = perm_dest;
+
+	  /* Then permute the vector contents.  */
+	  tree perm_elem = perm_mask_for_reverse (vectype);
+	  perm_dest = copy_ssa_name (vec_lhs_phi);
+	  perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+				       vec_lhs_phi, perm_elem);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  vec_lhs_phi = perm_dest;
+	}
 
       gimple_seq_add_seq (&stmts, tem);
-       tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-				       mask, vec_lhs_phi);
+
+      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				 mask, vec_lhs_phi);
+
       /* Convert the extracted vector element to the scalar type.  */
       new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
     }
@@ -10564,12 +10633,26 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
 				       &stmts, true, NULL_TREE);
     }
+
   *exit_gsi = gsi_after_labels (exit_bb);
   if (stmts)
     gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
+
   return new_tree;
 }
 
+/* Find the edge that's the final one in the path from SRC to DEST and
+   return it.  This edge must exist in at most one forwarder edge between.  */
+
+static edge
+find_connected_edge (edge src, basic_block dest)
+{
+   if (src->dest == dest)
+     return src;
+
+  return find_edge (src->dest, dest);
+}
+
 /* Function vectorizable_live_operation.
 
    STMT_INFO computes a value that is used outside the loop.  Check if
@@ -10590,11 +10673,13 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
   gimple *use_stmt;
+  use_operand_p use_p;
   auto_vec<tree> vec_oprnds;
   int vec_entry = 0;
   poly_uint64 vec_index = 0;
 
-  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
+  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
+	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   /* If a stmt of a reduction is live, vectorize it via
      vect_create_epilog_for_reduction.  vectorizable_reduction assessed
@@ -10619,8 +10704,25 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
 	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
 	return true;
+
       vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
-					slp_node_instance);
+					slp_node_instance,
+					LOOP_VINFO_IV_EXIT (loop_vinfo));
+
+      /* If early break we only have to materialize the reduction on the merge
+	 block, but we have to find an alternate exit first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      {
+		vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
+						  slp_node, slp_node_instance,
+						  exit);
+		break;
+	      }
+	}
+
       return true;
     }
 
@@ -10772,37 +10874,62 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-      gcc_assert (single_pred_p (exit_bb));
-
-      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
-      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
-
-      gimple_stmt_iterator exit_gsi;
-      tree new_tree
-	= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
-					 LOOP_VINFO_IV_EXIT (loop_vinfo),
-					 vectype, ncopies, slp_node, bitsize,
-					 bitstart, vec_lhs, lhs_type,
-					 &exit_gsi);
-
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
-	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	if (!is_gimple_debug (use_stmt)
+	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
 	    {
-	      remove_phi_node (&gsi, false);
-	      tree lhs_phi = gimple_phi_result (phi);
-	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
-	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
-	    }
-	  else
-	    gsi_next (&gsi);
-	}
+	      edge e = gimple_phi_arg_edge (as_a <gphi *> (use_stmt),
+					   phi_arg_index_from_use (use_p));
+	      bool main_exit_edge = e == main_e
+				    || find_connected_edge (main_e, e->src);
+
+	      /* Early exits have an merge block, we want the merge block itself
+		 so use ->src.  For main exit the merge block is the
+		 destination.  */
+	      basic_block dest = main_exit_edge ? main_e->dest : e->src;
+	      gimple *tmp_vec_stmt = vec_stmt;
+	      tree tmp_vec_lhs = vec_lhs;
+	      tree tmp_bitstart = bitstart;
+
+	      /* For early exit where the exit is not in the BB that leads
+		 to the latch then we're restarting the iteration in the
+		 scalar loop.  So get the first live value.  */
+	      restart_loop = restart_loop || !main_exit_edge;
+	      if (restart_loop
+		  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+		{
+		  tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+		  tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
+		  tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
+		}
+
+	      gimple_stmt_iterator exit_gsi;
+	      tree new_tree
+		= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
+						 dest, vectype, ncopies,
+						 slp_node, bitsize,
+						 tmp_bitstart, tmp_vec_lhs,
+						 lhs_type, restart_loop,
+						 &exit_gsi);
+
+	      if (gimple_phi_num_args (use_stmt) == 1)
+		{
+		  auto gsi = gsi_for_stmt (use_stmt);
+		  remove_phi_node (&gsi, false);
+		  tree lhs_phi = gimple_phi_result (use_stmt);
+		  gimple *copy = gimple_build_assign (lhs_phi, new_tree);
+		  gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+		}
+	      else
+		SET_PHI_ARG_DEF (use_stmt, e->dest_idx, new_tree);
+	  }
 
       /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
       FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index b3a09c0a804a38e17ef32b6ce13b98b077459fc7..582c5e678fad802d6e76300fe3c939b9f2978f17 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
    - it has uses outside the loop.
    - it has vdefs (it alters memory).
    - control stmts in the loop (except for the exit condition).
+   - it is an induction and we have multiple exits.
 
    CHECKME: what other side effects would the vectorizer allow?  */
 
@@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	}
     }
 
+  /* Check if it's an induction and multiple exits.  In this case there will be
+     a usage later on after peeling which is needed for the alternate exit.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "vec_stmt_relevant_p: induction forced for "
+			   "early break.\n");
+      *live_p = true;
+
+    }
+
   if (*live_p && *relevant == vect_unused_in_scope
       && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
     {
@@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
@@ -12720,20 +12734,27 @@ can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info stmt_info,
 			  bool vec_stmt_p,
 			  stmt_vector_for_cost *cost_vec)
 {
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
   if (slp_node)
     {
       stmt_vec_info slp_stmt_info;
       unsigned int i;
       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
 	{
-	  if (STMT_VINFO_LIVE_P (slp_stmt_info)
+	  if ((STMT_VINFO_LIVE_P (slp_stmt_info)
+	       || (loop_vinfo
+		   && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		   && STMT_VINFO_DEF_TYPE (slp_stmt_info)
+			== vect_induction_def))
 	      && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
 					       slp_node_instance, i,
 					       vec_stmt_p, cost_vec))
 	    return false;
 	}
     }
-  else if (STMT_VINFO_LIVE_P (stmt_info)
+  else if ((STMT_VINFO_LIVE_P (stmt_info)
+	    || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		&& STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def))
 	   && !vectorizable_live_operation (vinfo, stmt_info,
 					    slp_node, slp_node_instance, -1,
 					    vec_stmt_p, cost_vec))
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 15c7f75b1f3c61ab469f1b1970dae9c6ac1a9f55..974f617d54a14c903894dd20d60098ca259c96f2 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2248,6 +2248,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

[-- Attachment #2: rb17968.patch --]
[-- Type: application/octet-stream, Size: 18329 bytes --]

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index f38cc47551488525b15c2be758cac8291dbefb3a..4e48217a31e59318c2ea8e5ab63b06ba19840cbd 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3346,6 +3346,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	    bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count ();
 	  bb_before_epilog = loop_preheader_edge (epilog)->src;
 	}
+
       /* If loop is peeled for non-zero constant times, now niters refers to
 	 orig_niters - prolog_peeling, it won't overflow even the orig_niters
 	 overflows.  */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index df5e1d28fac2ce35e71decdec0d8e31fb75557f5..2f922b42f6d567dfd5da9b276b1c9d37bc681876 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5831,6 +5831,34 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
   return new_temp;
 }
 
+/* Retrieves the definining statement to be used for a reduction.
+   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at
+   the reduction definitions.  */
+
+tree
+vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
+		   slp_instance slp_node_instance, bool main_exit_p, unsigned i,
+		   vec <gimple *> &vec_stmts)
+{
+  tree def;
+
+  if (slp_node)
+    {
+      if (!main_exit_p)
+        slp_node = slp_node_instance->reduc_phis;
+      def = vect_get_slp_vect_def (slp_node, i);
+    }
+  else
+    {
+      if (!main_exit_p)
+	reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
+      vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
+      def = gimple_get_lhs (vec_stmts[0]);
+    }
+
+  return def;
+}
+
 /* Function vect_create_epilog_for_reduction
 
    Create code at the loop-epilog to finalize the result of a reduction
@@ -5842,6 +5870,8 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
    SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
    REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
      (counting from 0)
+   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
+     exit this edge is always the main loop exit.
 
    This function:
    1. Completes the reduction def-use cycles.
@@ -5882,7 +5912,8 @@ static void
 vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
 				  stmt_vec_info stmt_info,
 				  slp_tree slp_node,
-				  slp_instance slp_node_instance)
+				  slp_instance slp_node_instance,
+				  edge loop_exit)
 {
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
@@ -5891,6 +5922,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
      loop-closed PHI of the inner loop which we remember as
      def for the reduction PHI generation.  */
   bool double_reduc = false;
+  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
   stmt_vec_info rdef_info = stmt_info;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
     {
@@ -6053,7 +6085,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
       /* Create an induction variable.  */
       gimple_stmt_iterator incr_gsi;
       bool insert_after;
-      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
       create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
 		 insert_after, &indx_before_incr, &indx_after_incr);
 
@@ -6132,23 +6164,23 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
          Store them in NEW_PHIS.  */
   if (double_reduc)
     loop = outer_loop;
-  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+  /* We need to reduce values in all exits.  */
+  exit_bb = loop_exit->dest;
   exit_gsi = gsi_after_labels (exit_bb);
   reduc_inputs.create (slp_node ? vec_num : ncopies);
+  vec <gimple *> vec_stmts;
   for (unsigned i = 0; i < vec_num; i++)
     {
       gimple_seq stmts = NULL;
-      if (slp_node)
-	def = vect_get_slp_vect_def (slp_node, i);
-      else
-	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
+      def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
+			       main_exit_p, i, vec_stmts);
       for (j = 0; j < ncopies; j++)
 	{
 	  tree new_def = copy_ssa_name (def);
 	  phi = create_phi_node (new_def, exit_bb);
 	  if (j)
-	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
-	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
+	    def = gimple_get_lhs (vec_stmts[j]);
+	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
 	  new_def = gimple_convert (&stmts, vectype, new_def);
 	  reduc_inputs.quick_push (new_def);
 	}
@@ -10481,17 +10513,18 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return true;
 }
 
-
 /* Function vectorizable_live_operation_1.
+
    helper function for vectorizable_live_operation.  */
+
 tree
 vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
-			       stmt_vec_info stmt_info, edge exit_e,
+			       stmt_vec_info stmt_info, basic_block exit_bb,
 			       tree vectype, int ncopies, slp_tree slp_node,
 			       tree bitsize, tree bitstart, tree vec_lhs,
-			       tree lhs_type, gimple_stmt_iterator *exit_gsi)
+			       tree lhs_type, bool restart_loop,
+			       gimple_stmt_iterator *exit_gsi)
 {
-  basic_block exit_bb = exit_e->dest;
   gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   tree vec_lhs_phi = copy_ssa_name (vec_lhs);
@@ -10504,7 +10537,9 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
   if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
     {
       /* Emit:
+
 	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
+
 	 where VEC_LHS is the vectorized live-out result and MASK is
 	 the loop mask for the final iteration.  */
       gcc_assert (ncopies == 1 && !slp_node);
@@ -10513,15 +10548,18 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree len = vect_get_loop_len (loop_vinfo, &gsi,
 				    &LOOP_VINFO_LENS (loop_vinfo),
 				    1, vectype, 0, 0);
+
       /* BIAS - 1.  */
       signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
       tree bias_minus_one
 	= int_const_binop (MINUS_EXPR,
 			   build_int_cst (TREE_TYPE (len), biasval),
 			   build_one_cst (TREE_TYPE (len)));
+
       /* LAST_INDEX = LEN + (BIAS - 1).  */
       tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
 				     len, bias_minus_one);
+
       /* This needs to implement extraction of the first index, but not sure
 	 how the LEN stuff works.  At the moment we shouldn't get here since
 	 there's no LEN support for early breaks.  But guard this so there's
@@ -10532,13 +10570,16 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree scalar_res
 	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
 			vec_lhs_phi, last_index);
+
       /* Convert the extracted vector element to the scalar type.  */
       new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
     }
   else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
     {
       /* Emit:
+
 	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
+
 	 where VEC_LHS is the vectorized live-out result and MASK is
 	 the loop mask for the final iteration.  */
       gcc_assert (!slp_node);
@@ -10548,10 +10589,38 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
 				      &LOOP_VINFO_MASKS (loop_vinfo),
 				      1, vectype, 0);
+      tree scalar_res;
+
+      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
+	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  /* First create the permuted mask.  */
+	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
+	  tree perm_dest = copy_ssa_name (mask);
+	  gimple *perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
+				       mask, perm_mask);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  mask = perm_dest;
+
+	  /* Then permute the vector contents.  */
+	  tree perm_elem = perm_mask_for_reverse (vectype);
+	  perm_dest = copy_ssa_name (vec_lhs_phi);
+	  perm_stmt
+		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
+				       vec_lhs_phi, perm_elem);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
+				       &gsi);
+	  vec_lhs_phi = perm_dest;
+	}
 
       gimple_seq_add_seq (&stmts, tem);
-       tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-				       mask, vec_lhs_phi);
+
+      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+				 mask, vec_lhs_phi);
+
       /* Convert the extracted vector element to the scalar type.  */
       new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
     }
@@ -10564,12 +10633,26 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
       new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
 				       &stmts, true, NULL_TREE);
     }
+
   *exit_gsi = gsi_after_labels (exit_bb);
   if (stmts)
     gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
+
   return new_tree;
 }
 
+/* Find the edge that's the final one in the path from SRC to DEST and
+   return it.  This edge must exist in at most one forwarder edge between.  */
+
+static edge
+find_connected_edge (edge src, basic_block dest)
+{
+   if (src->dest == dest)
+     return src;
+
+  return find_edge (src->dest, dest);
+}
+
 /* Function vectorizable_live_operation.
 
    STMT_INFO computes a value that is used outside the loop.  Check if
@@ -10590,11 +10673,13 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
   gimple *use_stmt;
+  use_operand_p use_p;
   auto_vec<tree> vec_oprnds;
   int vec_entry = 0;
   poly_uint64 vec_index = 0;
 
-  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
+  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
+	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
 
   /* If a stmt of a reduction is live, vectorize it via
      vect_create_epilog_for_reduction.  vectorizable_reduction assessed
@@ -10619,8 +10704,25 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
       if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
 	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
 	return true;
+
       vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
-					slp_node_instance);
+					slp_node_instance,
+					LOOP_VINFO_IV_EXIT (loop_vinfo));
+
+      /* If early break we only have to materialize the reduction on the merge
+	 block, but we have to find an alternate exit first.  */
+      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+	{
+	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
+	      {
+		vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
+						  slp_node, slp_node_instance,
+						  exit);
+		break;
+	      }
+	}
+
       return true;
     }
 
@@ -10772,37 +10874,62 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
 	   lhs' = new_tree;  */
 
       class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
-      gcc_assert (single_pred_p (exit_bb));
-
-      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
-      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
-      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
-
-      gimple_stmt_iterator exit_gsi;
-      tree new_tree
-	= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
-					 LOOP_VINFO_IV_EXIT (loop_vinfo),
-					 vectype, ncopies, slp_node, bitsize,
-					 bitstart, vec_lhs, lhs_type,
-					 &exit_gsi);
-
-      /* Remove existing phis that copy from lhs and create copies
-	 from new_tree.  */
-      gimple_stmt_iterator gsi;
-      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
-	{
-	  gimple *phi = gsi_stmt (gsi);
-	  if ((gimple_phi_arg_def (phi, 0) == lhs))
+      /* Check if we have a loop where the chosen exit is not the main exit,
+	 in these cases for an early break we restart the iteration the vector code
+	 did.  For the live values we want the value at the start of the iteration
+	 rather than at the end.  */
+      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
+      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
+	if (!is_gimple_debug (use_stmt)
+	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
+	  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
 	    {
-	      remove_phi_node (&gsi, false);
-	      tree lhs_phi = gimple_phi_result (phi);
-	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
-	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
-	    }
-	  else
-	    gsi_next (&gsi);
-	}
+	      edge e = gimple_phi_arg_edge (as_a <gphi *> (use_stmt),
+					   phi_arg_index_from_use (use_p));
+	      bool main_exit_edge = e == main_e
+				    || find_connected_edge (main_e, e->src);
+
+	      /* Early exits have an merge block, we want the merge block itself
+		 so use ->src.  For main exit the merge block is the
+		 destination.  */
+	      basic_block dest = main_exit_edge ? main_e->dest : e->src;
+	      gimple *tmp_vec_stmt = vec_stmt;
+	      tree tmp_vec_lhs = vec_lhs;
+	      tree tmp_bitstart = bitstart;
+
+	      /* For early exit where the exit is not in the BB that leads
+		 to the latch then we're restarting the iteration in the
+		 scalar loop.  So get the first live value.  */
+	      restart_loop = restart_loop || !main_exit_edge;
+	      if (restart_loop
+		  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+		{
+		  tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
+		  tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
+		  tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
+		}
+
+	      gimple_stmt_iterator exit_gsi;
+	      tree new_tree
+		= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
+						 dest, vectype, ncopies,
+						 slp_node, bitsize,
+						 tmp_bitstart, tmp_vec_lhs,
+						 lhs_type, restart_loop,
+						 &exit_gsi);
+
+	      if (gimple_phi_num_args (use_stmt) == 1)
+		{
+		  auto gsi = gsi_for_stmt (use_stmt);
+		  remove_phi_node (&gsi, false);
+		  tree lhs_phi = gimple_phi_result (use_stmt);
+		  gimple *copy = gimple_build_assign (lhs_phi, new_tree);
+		  gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
+		}
+	      else
+		SET_PHI_ARG_DEF (use_stmt, e->dest_idx, new_tree);
+	  }
 
       /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
       FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index b3a09c0a804a38e17ef32b6ce13b98b077459fc7..582c5e678fad802d6e76300fe3c939b9f2978f17 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
    - it has uses outside the loop.
    - it has vdefs (it alters memory).
    - control stmts in the loop (except for the exit condition).
+   - it is an induction and we have multiple exits.
 
    CHECKME: what other side effects would the vectorizer allow?  */
 
@@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
 	}
     }
 
+  /* Check if it's an induction and multiple exits.  In this case there will be
+     a usage later on after peeling which is needed for the alternate exit.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "vec_stmt_relevant_p: induction forced for "
+			   "early break.\n");
+      *live_p = true;
+
+    }
+
   if (*live_p && *relevant == vect_unused_in_scope
       && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
     {
@@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
 /* If the target supports a permute mask that reverses the elements in
    a vector of type VECTYPE, return that mask, otherwise return null.  */
 
-static tree
+tree
 perm_mask_for_reverse (tree vectype)
 {
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
@@ -12720,20 +12734,27 @@ can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info stmt_info,
 			  bool vec_stmt_p,
 			  stmt_vector_for_cost *cost_vec)
 {
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
   if (slp_node)
     {
       stmt_vec_info slp_stmt_info;
       unsigned int i;
       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
 	{
-	  if (STMT_VINFO_LIVE_P (slp_stmt_info)
+	  if ((STMT_VINFO_LIVE_P (slp_stmt_info)
+	       || (loop_vinfo
+		   && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		   && STMT_VINFO_DEF_TYPE (slp_stmt_info)
+			== vect_induction_def))
 	      && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
 					       slp_node_instance, i,
 					       vec_stmt_p, cost_vec))
 	    return false;
 	}
     }
-  else if (STMT_VINFO_LIVE_P (stmt_info)
+  else if ((STMT_VINFO_LIVE_P (stmt_info)
+	    || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+		&& STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def))
 	   && !vectorizable_live_operation (vinfo, stmt_info,
 					    slp_node, slp_node_instance, -1,
 					    vec_stmt_p, cost_vec))
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 15c7f75b1f3c61ab469f1b1970dae9c6ac1a9f55..974f617d54a14c903894dd20d60098ca259c96f2 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2248,6 +2248,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
 				enum vect_def_type *,
 				tree *, stmt_vec_info * = NULL);
 extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
+extern tree perm_mask_for_reverse (tree);
 extern bool supportable_widening_operation (vec_info*, code_helper,
 					    stmt_vec_info, tree, tree,
 					    code_helper*, code_helper*,

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-11-29 13:50     ` Richard Biener
@ 2023-12-06  4:37       ` Tamar Christina
  2023-12-06  9:37         ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-06  4:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 25416 bytes --]

> > > +
> > > +  tree truth_type = truth_type_for (vectype_op);  machine_mode mode =
> > > + TYPE_MODE (truth_type);  int ncopies;
> > > +
> 
> more line break issues ... (also below, check yourself)
> 
> shouldn't STMT_VINFO_VECTYPE already match truth_type here?  If not
> it looks to be set wrongly (or shouldn't be set at all)
> 

Fixed, I now leverage the existing vect_recog_bool_pattern to update the types
If needed and determine the initial type in vect_get_vector_types_for_stmt.

> > > +  if (slp_node)
> > > +    ncopies = 1;
> > > +  else
> > > +    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
> > > +
> > > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);  bool
> > > + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > +
> 
> what about with_len?

Should be easy to add, but don't know how it works.

> 
> > > +  /* Analyze only.  */
> > > +  if (!vec_stmt)
> > > +    {
> > > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			       "can't vectorize early exit because the "
> > > +			       "target doesn't support flag setting vector "
> > > +			       "comparisons.\n");
> > > +	  return false;
> > > +	}
> > > +
> > > +      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
> 
> Why NE_EXPR?  This looks wrong.  Or vectype_op is wrong if you're
> emitting
> 
>  mask = op0 CMP op1;
>  if (mask != 0)
> 
> I think you need to check for CMP, not NE_EXPR.

Well CMP is checked by vectorizable_comparison_1, but I realized this
check is not checking what I wanted and the cbranch requirements
already do.  So removed.

> 
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			       "can't vectorize early exit because the "
> > > +			       "target does not support boolean vector "
> > > +			       "comparisons for type %T.\n", truth_type);
> > > +	  return false;
> > > +	}
> > > +
> > > +      if (ncopies > 1
> > > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			       "can't vectorize early exit because the "
> > > +			       "target does not support boolean vector OR for "
> > > +			       "type %T.\n", truth_type);
> > > +	  return false;
> > > +	}
> > > +
> > > +      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> > > +				      vec_stmt, slp_node, cost_vec))
> > > +	return false;
> 
> I suppose vectorizable_comparison_1 will check this again, so the above
> is redundant?
> 

The IOR? No, vectorizable_comparison_1 doesn't reduce so may not check it
depending on the condition.

> > > +  /* Determine if we need to reduce the final value.  */
> > > +  if (stmts.length () > 1)
> > > +    {
> > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > +	 possible.  */
> > > +      auto_vec<tree> workset (stmts.length ());
> > > +      workset.splice (stmts);
> > > +      while (workset.length () > 1)
> > > +	{
> > > +	  new_temp = make_temp_ssa_name (truth_type, NULL,
> > > "vexit_reduc");
> > > +	  tree arg0 = workset.pop ();
> > > +	  tree arg1 = workset.pop ();
> > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0,
> > > arg1);
> > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > +				       &cond_gsi);
> > > +	  if (slp_node)
> > > +	    slp_node->push_vec_def (new_stmt);
> > > +	  else
> > > +	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> > > +	  workset.quick_insert (0, new_temp);
> 
> Reduction epilogue handling has similar code to reduce a set of vectors
> to a single one with an operation.  I think we want to share that code.
> 

I've taken a look but that code isn't suitable here since they have different
constraints.  I don't require an in-order reduction since for the comparison
all we care about is whether in a lane any bit is set or not.  This means:

1. we can reduce using a fast operation like IOR.
2. we can reduce in as much parallelism as possible.

The comparison is on the critical path for the loop now, unlike live reductions
which are always at the end, so using the live reduction code resulted in a
slow down since it creates a longer dependency chain.

> > > +	}
> > > +    }
> > > +  else
> > > +    new_temp = stmts[0];
> > > +
> > > +  gcc_assert (new_temp);
> > > +
> > > +  tree cond = new_temp;
> > > +  if (masked_loop_p)
> > > +    {
> > > +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> > > truth_type, 0);
> > > +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > +			       &cond_gsi);
> 
> I don't think this is correct when 'stmts' had more than one vector?
> 

It is, because even when VLA, since we only support counted loops partial vectors
are disabled. And it looks like --parm vect-partial-vector-usage=1 cannot force it on.

In principal I suppose I could mask the individual stmts, that should handle the future case when
This is relaxed to supposed non-fix length buffers?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
	vect_recog_bool_pattern): Support gconds type analysis.
	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..c6cedf4fe7c1f1e1126ce166a059a4b2a2b49cbd 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,19 +5211,27 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if COND then a gcond is being inspected instead of a normal COND,  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    gcond *cond)
 {
   tree rhs1;
   enum tree_code rhs_code;
+  gassign *def_stmt = NULL;
 
   stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
-  if (!def_stmt_info)
+  if (!def_stmt_info && !cond)
     return false;
+  else if (!def_stmt_info)
+    /* If we're a gcond we won't be codegen-ing the statements and are only
+       after if the types match.  In that case we can accept loop invariant
+       values.  */
+    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
+  else
+    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
 
-  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
   if (!def_stmt)
     return false;
 
@@ -5234,27 +5243,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   cond))
 	return false;
       break;
 
@@ -5275,6 +5285,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
+	      && !cond
 	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
 	    return false;
 
@@ -5324,11 +5335,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
    VAR is an SSA_NAME that should be transformed from bool to a wider integer
    type, OUT_TYPE is the desired final integer type of the whole pattern.
    STMT_INFO is the info of the pattern root and is where pattern stmts should
-   be associated with.  DEFS is a map of pattern defs.  */
+   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
+   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
 
 static void
 adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
-		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
+		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
+		     gimple *&last_stmt, bool type_only)
 {
   gimple *stmt = SSA_NAME_DEF_STMT (var);
   enum tree_code rhs_code, def_rhs_code;
@@ -5492,8 +5505,10 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
     }
 
   gimple_set_location (pattern_stmt, loc);
-  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
-			  get_vectype_for_scalar_type (vinfo, itype));
+  if (!type_only)
+    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
+			    get_vectype_for_scalar_type (vinfo, itype));
+  last_stmt = pattern_stmt;
   defs.put (var, gimple_assign_lhs (pattern_stmt));
 }
 
@@ -5509,11 +5524,14 @@ sort_after_uid (const void *p1, const void *p2)
 
 /* Create pattern stmts for all stmts participating in the bool pattern
    specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
-   OUT_TYPE.  Return the def of the pattern root.  */
+   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
+   statements are not emitted as pattern statements and the tree returned is
+   only useful for type queries.  */
 
 static tree
 adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
-		   tree out_type, stmt_vec_info stmt_info)
+		   tree out_type, stmt_vec_info stmt_info,
+		   bool type_only = false)
 {
   /* Gather original stmts in the bool pattern in their order of appearance
      in the IL.  */
@@ -5523,16 +5541,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
     bool_stmts.quick_push (*i);
   bool_stmts.qsort (sort_after_uid);
 
+  gimple *last_stmt = NULL;
+
   /* Now process them in that order, producing pattern stmts.  */
   hash_map <tree, tree> defs;
   for (unsigned i = 0; i < bool_stmts.length (); ++i)
     adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
-			 out_type, stmt_info, defs);
+			 out_type, stmt_info, defs, last_stmt, type_only);
 
   /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
-  gimple *pattern_stmt
-    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
-  return gimple_assign_lhs (pattern_stmt);
+  return gimple_assign_lhs (last_stmt);
 }
 
 /* Return the proper type for converting bool VAR into
@@ -5608,13 +5626,22 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else
+    {
+      lhs = var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5659,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5707,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5727,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
       else if (integer_type_for_mask (var, vinfo))
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (gimple_cond_code (cond), gimple_cond_lhs (cond),
+				 gimple_cond_rhs (cond),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
+	  vectype = truth_type_for (vectype);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5765,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..d801b72a149ebe6aa4d1f2942324b042d07be530 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,176 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+      workset.splice (stmts);
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    {
+      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
+				      vectype, 0);
+      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			       &cond_gsi);
+    }
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
+			     build_zero_cst (vectype));
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13123,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13148,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13310,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14506,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14533,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

[-- Attachment #2: rb17969.patch --]
[-- Type: application/octet-stream, Size: 18893 bytes --]

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..c6cedf4fe7c1f1e1126ce166a059a4b2a2b49cbd 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,19 +5211,27 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if COND then a gcond is being inspected instead of a normal COND,  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    gcond *cond)
 {
   tree rhs1;
   enum tree_code rhs_code;
+  gassign *def_stmt = NULL;
 
   stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
-  if (!def_stmt_info)
+  if (!def_stmt_info && !cond)
     return false;
+  else if (!def_stmt_info)
+    /* If we're a gcond we won't be codegen-ing the statements and are only
+       after if the types match.  In that case we can accept loop invariant
+       values.  */
+    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
+  else
+    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
 
-  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
   if (!def_stmt)
     return false;
 
@@ -5234,27 +5243,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, cond)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   cond))
 	return false;
       break;
 
@@ -5275,6 +5285,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
+	      && !cond
 	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
 	    return false;
 
@@ -5324,11 +5335,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
    VAR is an SSA_NAME that should be transformed from bool to a wider integer
    type, OUT_TYPE is the desired final integer type of the whole pattern.
    STMT_INFO is the info of the pattern root and is where pattern stmts should
-   be associated with.  DEFS is a map of pattern defs.  */
+   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
+   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
 
 static void
 adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
-		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
+		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
+		     gimple *&last_stmt, bool type_only)
 {
   gimple *stmt = SSA_NAME_DEF_STMT (var);
   enum tree_code rhs_code, def_rhs_code;
@@ -5492,8 +5505,10 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
     }
 
   gimple_set_location (pattern_stmt, loc);
-  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
-			  get_vectype_for_scalar_type (vinfo, itype));
+  if (!type_only)
+    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
+			    get_vectype_for_scalar_type (vinfo, itype));
+  last_stmt = pattern_stmt;
   defs.put (var, gimple_assign_lhs (pattern_stmt));
 }
 
@@ -5509,11 +5524,14 @@ sort_after_uid (const void *p1, const void *p2)
 
 /* Create pattern stmts for all stmts participating in the bool pattern
    specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
-   OUT_TYPE.  Return the def of the pattern root.  */
+   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
+   statements are not emitted as pattern statements and the tree returned is
+   only useful for type queries.  */
 
 static tree
 adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
-		   tree out_type, stmt_vec_info stmt_info)
+		   tree out_type, stmt_vec_info stmt_info,
+		   bool type_only = false)
 {
   /* Gather original stmts in the bool pattern in their order of appearance
      in the IL.  */
@@ -5523,16 +5541,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
     bool_stmts.quick_push (*i);
   bool_stmts.qsort (sort_after_uid);
 
+  gimple *last_stmt = NULL;
+
   /* Now process them in that order, producing pattern stmts.  */
   hash_map <tree, tree> defs;
   for (unsigned i = 0; i < bool_stmts.length (); ++i)
     adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
-			 out_type, stmt_info, defs);
+			 out_type, stmt_info, defs, last_stmt, type_only);
 
   /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
-  gimple *pattern_stmt
-    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
-  return gimple_assign_lhs (pattern_stmt);
+  return gimple_assign_lhs (last_stmt);
 }
 
 /* Return the proper type for converting bool VAR into
@@ -5608,13 +5626,22 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else
+    {
+      lhs = var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5659,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5707,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5727,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
       else if (integer_type_for_mask (var, vinfo))
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (gimple_cond_code (cond), gimple_cond_lhs (cond),
+				 gimple_cond_rhs (cond),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
+	  vectype = truth_type_for (vectype);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5765,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..d801b72a149ebe6aa4d1f2942324b042d07be530 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,176 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+      workset.splice (stmts);
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    {
+      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
+				      vectype, 0);
+      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			       &cond_gsi);
+    }
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
+			     build_zero_cst (vectype));
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13123,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13148,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13310,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14506,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14533,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 13/21]middle-end: Update loop form analysis to support early break
  2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
  2023-11-27 22:48   ` Tamar Christina
@ 2023-12-06  8:18   ` Richard Biener
  2023-12-06  8:52     ` Tamar Christina
  1 sibling, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-12-06  8:18 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the other
> patches are self contained.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
> 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> 	(vect_transform_loop): Use it.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991f07cd6052491d0 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
>    loop_vinfo->scalar_costs->finish_cost (nullptr);
>  }
>  
> -
>  /* Function vect_analyze_loop_form.
>  
>     Verify that certain CFG restrictions hold, including:
>     - the loop has a pre-header
> -   - the loop has a single entry and exit
> +   - the loop has a single entry
> +   - nested loops can have only a single exit.
>     - the loop exit condition is simple enough
>     - the number of iterations can be analyzed, i.e, a countable loop.  The
>       niter could be analyzed under some assumptions.  */
> @@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info)
>  				   "not vectorized: latch block not empty.\n");
>  
>    /* Make sure the exit is not abnormal.  */
> -  if (exit_e->flags & EDGE_ABNORMAL)
> -    return opt_result::failure_at (vect_location,
> -				   "not vectorized:"
> -				   " abnormal loop exit edge.\n");
> +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> +  for (edge e : exits)

Seeing this multiple times, this isn't the most efficient way to
iterate over all exits with LOOPS_HAVE_RECORDED_EXITS.

Note to myself: fix (add to) the API.

> +    {
> +      if (e->flags & EDGE_ABNORMAL)
> +	return opt_result::failure_at (vect_location,
> +				       "not vectorized:"
> +				       " abnormal loop exit edge.\n");
> +    }
>  
>    info->conds
>      = vect_get_loop_niters (loop, exit_e, &info->assumptions,
> @@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
>  
>    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
>  
> +  /* Check to see if we're vectorizing multiple exits.  */
> +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> +

Seeing this, s/LOOP_VINFO_LOOP_CONDS/LOOP_VINFO_LOOP_EXIT_CONDS/g
might be good, if we in future avoid if-conversion in a separate
pass we will have other CONDs as well.

>    if (info->inner_loop_cond)
>      {
>        stmt_vec_info inner_loop_cond_info
> @@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>    /* Make sure there exists a single-predecessor exit bb.  Do this before 
>       versioning.   */
>    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> -  if (! single_pred_p (e->dest))
> +  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      {
>        split_loop_exit_edge (e, true);

Note this splitting is done to fulfil versioning constraints on CFG
update.  Do you have test coverage with alias versioning and early
breaks?

Otherwise OK.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks
  2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
  2023-11-27 22:48   ` Tamar Christina
@ 2023-12-06  8:31   ` Richard Biener
  2023-12-06  9:10     ` Tamar Christina
  1 sibling, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-12-06  8:31 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> This finishes wiring that didn't fit in any of the other patches.
> Essentially just adding related changes so peeling for early break works.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal,
> 	vect_do_peeling): Support early breaks.
> 	* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): Likewise.
> 	* tree-vectorizer.cc (pass_vectorize::execute): Check all exits.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index eef2bb50c1505f5cf802d5d80300affc2cbe69f6..9c1405d79fd8fe8689007df3b7605b7a3d3ecdd7 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop,
>     loop handles exactly VF scalars per iteration.  */
>  
>  static gcond *
> -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
> +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
>  				class loop *loop, tree niters, tree step,
>  				tree final_iv, bool niters_maybe_zero,
>  				gimple_stmt_iterator loop_cond_gsi)
> @@ -1296,7 +1296,8 @@ vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge exit_edge,
>    gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
>  
>    /* Record the number of latch iterations.  */
> -  if (limit == niters)
> +  if (limit == niters
> +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      /* Case A: the loop iterates NITERS times.  Subtract one to get the
>         latch count.  */
>      loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> @@ -3242,6 +3243,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>      bound_epilog += vf - 1;
>    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>      bound_epilog += 1;
> +
> +  /* For early breaks the scalar loop needs to execute at most VF times
> +     to find the element that caused the break.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    {
> +      bound_epilog = vf;
> +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> +      vect_epilogues = false;

This is originally initialized with

  bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0;

so I think we should avoid filling that with LOOP_VINFO_EARLY_BREAKS
rather than fixing up after the fact?  That is in vect_analyze_loop
adjust

  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
     enabled, SIMDUID is not set, it is the innermost loop and we have
     either already found the loop's SIMDLEN or there was no SIMDLEN to
     begin with.
     TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
  bool vect_epilogues = (!simdlen
                         && loop->inner == NULL
                         && param_vect_epilogues_nomask
                         && LOOP_VINFO_PEELING_FOR_NITER 
(first_loop_vinfo)
                         && !loop->simduid);

and add !LOOP_VINFO_EARLY_BREAKS?

> +    }
> +
>    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
>    poly_uint64 bound_scalar = bound_epilog;
>  
> @@ -3376,14 +3387,23 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  				  bound_prolog + bound_epilog)
>  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
>  			 || vect_epilogues));
> +
> +  /* We only support early break vectorization on known bounds at this time.
> +     This means that if the vector loop can't be entered then we won't generate
> +     it at all.  So for now force skip_vector off because the additional control
> +     flow messes with the BB exits and we've already analyzed them.  */
> + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> +

  bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
                      ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
                                  bound_prolog + bound_epilog)
                      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo) 
                         || vect_epilogues));

to me that looks like

  gcc_assert (!skip_vector || !LOOP_VINFO_EARLY_BREAKS (loop_vinfo));

should work?  You are basically relying on cost modeling rejecting
vectorization that doesn't enter the vector loop.

>    /* Epilog loop must be executed if the number of iterations for epilog
>       loop is known at compile time, otherwise we need to add a check at
>       the end of vector loop and skip to the end of epilog loop.  */
>    bool skip_epilog = (prolog_peeling < 0
>  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>  		      || !vf.is_constant ());
> -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
> -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> +  /* PEELING_FOR_GAPS and peeling for early breaks are special because epilog
> +     loop must be executed.  */
> +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      skip_epilog = false;
>  
>    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 55d6aee3d29151e6b528f6fdde15c693e5bdd847..51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1236,6 +1236,14 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
>      th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO
>  					  (loop_vinfo));
>  
> +  /* When we have multiple exits and VF is unknown, we must require partial
> +     vectors because the loop bounds is not a minimum but a maximum.  That is to
> +     say we cannot unpredicate the main loop unless we peel or use partial
> +     vectors in the epilogue.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> +    return true;
> +

I don't quite understand the !.is_constant (), early breaks always
require peeling?

>    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
>      {
> @@ -3149,7 +3157,8 @@ start_over:
>  
>    /* If an epilogue loop is required make sure we can create one.  */
>    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
>      {
>        if (dump_enabled_p ())
>          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
> diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> index d97e2b54c25ac60378935392aa7b73476efed74b..8b495fc7ca137109fc0097c77837233c660e032a 100644
> --- a/gcc/tree-vectorizer.cc
> +++ b/gcc/tree-vectorizer.cc
> @@ -1381,7 +1381,9 @@ pass_vectorize::execute (function *fun)
>  	 predicates that need to be shared for optimal predicate usage.
>  	 However reassoc will re-order them and prevent CSE from working
>  	 as it should.  CSE only the loop body, not the entry.  */
> -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> +      for (edge exit : exits)
> +	bitmap_set_bit (exit_bbs, exit->dest->index);
>  
>        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
>        do_rpo_vn (fun, entry, exit_bbs);

Otherwise OK.

Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 13/21]middle-end: Update loop form analysis to support early break
  2023-12-06  8:18   ` Richard Biener
@ 2023-12-06  8:52     ` Tamar Christina
  2023-12-06  9:15       ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-06  8:52 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, December 6, 2023 8:18 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: Re: [PATCH 13/21]middle-end: Update loop form analysis to support early
> break
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the other
> > patches are self contained.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
> > 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > 	(vect_transform_loop): Use it.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index
> 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991
> f07cd6052491d0 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost
> (loop_vec_info loop_vinfo)
> >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> >  }
> >
> > -
> >  /* Function vect_analyze_loop_form.
> >
> >     Verify that certain CFG restrictions hold, including:
> >     - the loop has a pre-header
> > -   - the loop has a single entry and exit
> > +   - the loop has a single entry
> > +   - nested loops can have only a single exit.
> >     - the loop exit condition is simple enough
> >     - the number of iterations can be analyzed, i.e, a countable loop.  The
> >       niter could be analyzed under some assumptions.  */
> > @@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop,
> vect_loop_form_info *info)
> >  				   "not vectorized: latch block not empty.\n");
> >
> >    /* Make sure the exit is not abnormal.  */
> > -  if (exit_e->flags & EDGE_ABNORMAL)
> > -    return opt_result::failure_at (vect_location,
> > -				   "not vectorized:"
> > -				   " abnormal loop exit edge.\n");
> > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > +  for (edge e : exits)
> 
> Seeing this multiple times, this isn't the most efficient way to
> iterate over all exits with LOOPS_HAVE_RECORDED_EXITS.
> 
> Note to myself: fix (add to) the API.
> 
> > +    {
> > +      if (e->flags & EDGE_ABNORMAL)
> > +	return opt_result::failure_at (vect_location,
> > +				       "not vectorized:"
> > +				       " abnormal loop exit edge.\n");
> > +    }
> >
> >    info->conds
> >      = vect_get_loop_niters (loop, exit_e, &info->assumptions,
> > @@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop,
> vec_info_shared *shared,
> >
> >    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> >
> > +  /* Check to see if we're vectorizing multiple exits.  */
> > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > +
> 
> Seeing this, s/LOOP_VINFO_LOOP_CONDS/LOOP_VINFO_LOOP_EXIT_CONDS/g
> might be good, if we in future avoid if-conversion in a separate
> pass we will have other CONDs as well.
> 
> >    if (info->inner_loop_cond)
> >      {
> >        stmt_vec_info inner_loop_cond_info
> > @@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info loop_vinfo,
> gimple *loop_vectorized_call)
> >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> >       versioning.   */
> >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > -  if (! single_pred_p (e->dest))
> > +  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >      {
> >        split_loop_exit_edge (e, true);
> 
> Note this splitting is done to fulfil versioning constraints on CFG
> update.  Do you have test coverage with alias versioning and early
> breaks?

No, only non-alias versioning.  I don't believe we can alias in the current
implementation because it's restricted to statically known objects with
a fixed size.

Thanks,
Tamar

> 
> Otherwise OK.
> 
> Thanks,
> Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks
  2023-12-06  8:31   ` Richard Biener
@ 2023-12-06  9:10     ` Tamar Christina
  2023-12-06  9:27       ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-06  9:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, December 6, 2023 8:32 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: Re: [PATCH 12/21]middle-end: Add remaining changes to peeling and
> vectorizer to support early breaks
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This finishes wiring that didn't fit in any of the other patches.
> > Essentially just adding related changes so peeling for early break works.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal,
> > 	vect_do_peeling): Support early breaks.
> > 	* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): Likewise.
> > 	* tree-vectorizer.cc (pass_vectorize::execute): Check all exits.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> eef2bb50c1505f5cf802d5d80300affc2cbe69f6..9c1405d79fd8fe8689007df3b7
> 605b7a3d3ecdd7 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512
> (class loop *loop,
> >     loop handles exactly VF scalars per iteration.  */
> >
> >  static gcond *
> > -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> exit_edge,
> > +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
> >  				class loop *loop, tree niters, tree step,
> >  				tree final_iv, bool niters_maybe_zero,
> >  				gimple_stmt_iterator loop_cond_gsi)
> > @@ -1296,7 +1296,8 @@ vect_set_loop_condition_normal (loop_vec_info /*
> loop_vinfo */, edge exit_edge,
> >    gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
> >
> >    /* Record the number of latch iterations.  */
> > -  if (limit == niters)
> > +  if (limit == niters
> > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >      /* Case A: the loop iterates NITERS times.  Subtract one to get the
> >         latch count.  */
> >      loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> > @@ -3242,6 +3243,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >      bound_epilog += vf - 1;
> >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> >      bound_epilog += 1;
> > +
> > +  /* For early breaks the scalar loop needs to execute at most VF times
> > +     to find the element that caused the break.  */
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +    {
> > +      bound_epilog = vf;
> > +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> > +      vect_epilogues = false;
> 
> This is originally initialized with
> 
>   bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0;
> 
> so I think we should avoid filling that with LOOP_VINFO_EARLY_BREAKS
> rather than fixing up after the fact?  That is in vect_analyze_loop
> adjust
> 
>   /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
>      enabled, SIMDUID is not set, it is the innermost loop and we have
>      either already found the loop's SIMDLEN or there was no SIMDLEN to
>      begin with.
>      TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
>   bool vect_epilogues = (!simdlen
>                          && loop->inner == NULL
>                          && param_vect_epilogues_nomask
>                          && LOOP_VINFO_PEELING_FOR_NITER
> (first_loop_vinfo)
>                          && !loop->simduid);
> 
> and add !LOOP_VINFO_EARLY_BREAKS?
> 
> > +    }
> > +
> >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> >    poly_uint64 bound_scalar = bound_epilog;
> >
> > @@ -3376,14 +3387,23 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> >  				  bound_prolog + bound_epilog)
> >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> >  			 || vect_epilogues));
> > +
> > +  /* We only support early break vectorization on known bounds at this time.
> > +     This means that if the vector loop can't be entered then we won't generate
> > +     it at all.  So for now force skip_vector off because the additional control
> > +     flow messes with the BB exits and we've already analyzed them.  */
> > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> > +
> 
>   bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>                       ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
>                                   bound_prolog + bound_epilog)
>                       : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
>                          || vect_epilogues));
> 
> to me that looks like
> 
>   gcc_assert (!skip_vector || !LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> 
> should work?  You are basically relying on cost modeling rejecting
> vectorization that doesn't enter the vector loop.
> 
> >    /* Epilog loop must be executed if the number of iterations for epilog
> >       loop is known at compile time, otherwise we need to add a check at
> >       the end of vector loop and skip to the end of epilog loop.  */
> >    bool skip_epilog = (prolog_peeling < 0
> >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >  		      || !vf.is_constant ());
> > -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
> > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > +  /* PEELING_FOR_GAPS and peeling for early breaks are special because epilog
> > +     loop must be executed.  */
> > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >      skip_epilog = false;
> >
> >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index
> 55d6aee3d29151e6b528f6fdde15c693e5bdd847..51a054c5b035ac80dfbbf3b5
> ba2f6da82fda91f6 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1236,6 +1236,14 @@ vect_need_peeling_or_partial_vectors_p
> (loop_vec_info loop_vinfo)
> >      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> (LOOP_VINFO_ORIG_LOOP_INFO
> >  					  (loop_vinfo));
> >
> > +  /* When we have multiple exits and VF is unknown, we must require partial
> > +     vectors because the loop bounds is not a minimum but a maximum.  That is
> to
> > +     say we cannot unpredicate the main loop unless we peel or use partial
> > +     vectors in the epilogue.  */
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > +    return true;
> > +
> 
> I don't quite understand the !.is_constant (), early breaks always
> require peeling?

It's mostly to force the use of an unpredicated main loop, since this
forces LOOP_VINFO_PEELING_FOR_NITER.  I can alternatively not
set it and then mask the individual comparisons in
vectorizable_early_break.  Both work equally,  I guess that may be
better since the assumption is that the break is not taken.

Any preference?

Thanks,
Tamar

> 
> >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> >      {
> > @@ -3149,7 +3157,8 @@ start_over:
> >
> >    /* If an epilogue loop is required make sure we can create one.  */
> >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> >      {
> >        if (dump_enabled_p ())
> >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
> > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > index
> d97e2b54c25ac60378935392aa7b73476efed74b..8b495fc7ca137109fc0097c7
> 7837233c660e032a 100644
> > --- a/gcc/tree-vectorizer.cc
> > +++ b/gcc/tree-vectorizer.cc
> > @@ -1381,7 +1381,9 @@ pass_vectorize::execute (function *fun)
> >  	 predicates that need to be shared for optimal predicate usage.
> >  	 However reassoc will re-order them and prevent CSE from working
> >  	 as it should.  CSE only the loop body, not the entry.  */
> > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> > +      for (edge exit : exits)
> > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> >
> >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> >        do_rpo_vn (fun, entry, exit_bbs);
> 
> Otherwise OK.
> 
> Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 13/21]middle-end: Update loop form analysis to support early break
  2023-12-06  8:52     ` Tamar Christina
@ 2023-12-06  9:15       ` Richard Biener
  2023-12-06  9:29         ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-12-06  9:15 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, December 6, 2023 8:18 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: Re: [PATCH 13/21]middle-end: Update loop form analysis to support early
> > break
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the other
> > > patches are self contained.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
> > > 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > > 	(vect_transform_loop): Use it.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index
> > 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991
> > f07cd6052491d0 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost
> > (loop_vec_info loop_vinfo)
> > >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> > >  }
> > >
> > > -
> > >  /* Function vect_analyze_loop_form.
> > >
> > >     Verify that certain CFG restrictions hold, including:
> > >     - the loop has a pre-header
> > > -   - the loop has a single entry and exit
> > > +   - the loop has a single entry
> > > +   - nested loops can have only a single exit.
> > >     - the loop exit condition is simple enough
> > >     - the number of iterations can be analyzed, i.e, a countable loop.  The
> > >       niter could be analyzed under some assumptions.  */
> > > @@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop,
> > vect_loop_form_info *info)
> > >  				   "not vectorized: latch block not empty.\n");
> > >
> > >    /* Make sure the exit is not abnormal.  */
> > > -  if (exit_e->flags & EDGE_ABNORMAL)
> > > -    return opt_result::failure_at (vect_location,
> > > -				   "not vectorized:"
> > > -				   " abnormal loop exit edge.\n");
> > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > +  for (edge e : exits)
> > 
> > Seeing this multiple times, this isn't the most efficient way to
> > iterate over all exits with LOOPS_HAVE_RECORDED_EXITS.
> > 
> > Note to myself: fix (add to) the API.
> > 
> > > +    {
> > > +      if (e->flags & EDGE_ABNORMAL)
> > > +	return opt_result::failure_at (vect_location,
> > > +				       "not vectorized:"
> > > +				       " abnormal loop exit edge.\n");
> > > +    }
> > >
> > >    info->conds
> > >      = vect_get_loop_niters (loop, exit_e, &info->assumptions,
> > > @@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop,
> > vec_info_shared *shared,
> > >
> > >    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> > >
> > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > +
> > 
> > Seeing this, s/LOOP_VINFO_LOOP_CONDS/LOOP_VINFO_LOOP_EXIT_CONDS/g
> > might be good, if we in future avoid if-conversion in a separate
> > pass we will have other CONDs as well.
> > 
> > >    if (info->inner_loop_cond)
> > >      {
> > >        stmt_vec_info inner_loop_cond_info
> > > @@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info loop_vinfo,
> > gimple *loop_vectorized_call)
> > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > >       versioning.   */
> > >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > -  if (! single_pred_p (e->dest))
> > > +  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >      {
> > >        split_loop_exit_edge (e, true);
> > 
> > Note this splitting is done to fulfil versioning constraints on CFG
> > update.  Do you have test coverage with alias versioning and early
> > breaks?
> 
> No, only non-alias versioning.  I don't believe we can alias in the current
> implementation because it's restricted to statically known objects with
> a fixed size.

Hm, if side-effects are all correctly in place do we still have that
restriction?

int x;
void foo (int *a, int *b)
{
  int local_x = x;
  for (int i = 0; i < 1024; ++i)
    {
      if (i % local_x == 13)
        break;
      a[i] = 2 * b[i];
    }
}

the early exit isn't SCEV analyzable but doesn't depend on any
memory and all side-effects are after the exit already.  But
vectorizing would require alias versioning.

Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks
  2023-12-06  9:10     ` Tamar Christina
@ 2023-12-06  9:27       ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-12-06  9:27 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, December 6, 2023 8:32 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: Re: [PATCH 12/21]middle-end: Add remaining changes to peeling and
> > vectorizer to support early breaks
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This finishes wiring that didn't fit in any of the other patches.
> > > Essentially just adding related changes so peeling for early break works.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-loop-manip.cc (vect_set_loop_condition_normal,
> > > 	vect_do_peeling): Support early breaks.
> > > 	* tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): Likewise.
> > > 	* tree-vectorizer.cc (pass_vectorize::execute): Check all exits.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > eef2bb50c1505f5cf802d5d80300affc2cbe69f6..9c1405d79fd8fe8689007df3b7
> > 605b7a3d3ecdd7 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -1187,7 +1187,7 @@ vect_set_loop_condition_partial_vectors_avx512
> > (class loop *loop,
> > >     loop handles exactly VF scalars per iteration.  */
> > >
> > >  static gcond *
> > > -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > exit_edge,
> > > +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
> > >  				class loop *loop, tree niters, tree step,
> > >  				tree final_iv, bool niters_maybe_zero,
> > >  				gimple_stmt_iterator loop_cond_gsi)
> > > @@ -1296,7 +1296,8 @@ vect_set_loop_condition_normal (loop_vec_info /*
> > loop_vinfo */, edge exit_edge,
> > >    gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
> > >
> > >    /* Record the number of latch iterations.  */
> > > -  if (limit == niters)
> > > +  if (limit == niters
> > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >      /* Case A: the loop iterates NITERS times.  Subtract one to get the
> > >         latch count.  */
> > >      loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters,
> > > @@ -3242,6 +3243,16 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >      bound_epilog += vf - 1;
> > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > >      bound_epilog += 1;
> > > +
> > > +  /* For early breaks the scalar loop needs to execute at most VF times
> > > +     to find the element that caused the break.  */
> > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +    {
> > > +      bound_epilog = vf;
> > > +      /* Force a scalar epilogue as we can't vectorize the index finding.  */
> > > +      vect_epilogues = false;
> > 
> > This is originally initialized with
> > 
> >   bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0;
> > 
> > so I think we should avoid filling that with LOOP_VINFO_EARLY_BREAKS
> > rather than fixing up after the fact?  That is in vect_analyze_loop
> > adjust
> > 
> >   /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
> >      enabled, SIMDUID is not set, it is the innermost loop and we have
> >      either already found the loop's SIMDLEN or there was no SIMDLEN to
> >      begin with.
> >      TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
> >   bool vect_epilogues = (!simdlen
> >                          && loop->inner == NULL
> >                          && param_vect_epilogues_nomask
> >                          && LOOP_VINFO_PEELING_FOR_NITER
> > (first_loop_vinfo)
> >                          && !loop->simduid);
> > 
> > and add !LOOP_VINFO_EARLY_BREAKS?
> > 
> > > +    }
> > > +
> > >    bool epilog_peeling = maybe_ne (bound_epilog, 0U);
> > >    poly_uint64 bound_scalar = bound_epilog;
> > >
> > > @@ -3376,14 +3387,23 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> > niters, tree nitersm1,
> > >  				  bound_prolog + bound_epilog)
> > >  		      : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> > >  			 || vect_epilogues));
> > > +
> > > +  /* We only support early break vectorization on known bounds at this time.
> > > +     This means that if the vector loop can't be entered then we won't generate
> > > +     it at all.  So for now force skip_vector off because the additional control
> > > +     flow messes with the BB exits and we've already analyzed them.  */
> > > + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo);
> > > +
> > 
> >   bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >                       ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
> >                                   bound_prolog + bound_epilog)
> >                       : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> >                          || vect_epilogues));
> > 
> > to me that looks like
> > 
> >   gcc_assert (!skip_vector || !LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
> > 
> > should work?  You are basically relying on cost modeling rejecting
> > vectorization that doesn't enter the vector loop.
> > 
> > >    /* Epilog loop must be executed if the number of iterations for epilog
> > >       loop is known at compile time, otherwise we need to add a check at
> > >       the end of vector loop and skip to the end of epilog loop.  */
> > >    bool skip_epilog = (prolog_peeling < 0
> > >  		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > >  		      || !vf.is_constant ());
> > > -  /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
> > > -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> > > +  /* PEELING_FOR_GAPS and peeling for early breaks are special because epilog
> > > +     loop must be executed.  */
> > > +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >      skip_epilog = false;
> > >
> > >    class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index
> > 55d6aee3d29151e6b528f6fdde15c693e5bdd847..51a054c5b035ac80dfbbf3b5
> > ba2f6da82fda91f6 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -1236,6 +1236,14 @@ vect_need_peeling_or_partial_vectors_p
> > (loop_vec_info loop_vinfo)
> > >      th = LOOP_VINFO_COST_MODEL_THRESHOLD
> > (LOOP_VINFO_ORIG_LOOP_INFO
> > >  					  (loop_vinfo));
> > >
> > > +  /* When we have multiple exits and VF is unknown, we must require partial
> > > +     vectors because the loop bounds is not a minimum but a maximum.  That is
> > to
> > > +     say we cannot unpredicate the main loop unless we peel or use partial
> > > +     vectors in the epilogue.  */
> > > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > +      && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())
> > > +    return true;
> > > +
> > 
> > I don't quite understand the !.is_constant (), early breaks always
> > require peeling?
> 
> It's mostly to force the use of an unpredicated main loop, since this
> forces LOOP_VINFO_PEELING_FOR_NITER.  I can alternatively not
> set it and then mask the individual comparisons in
> vectorizable_early_break.  Both work equally,  I guess that may be
> better since the assumption is that the break is not taken.

That might be easier to understand, yes.  But why not

  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
    return true;

?  Because even with constant VF we can predicate the main loop
(with AVX512, --param vect-partial-vector-usage=2)

> Any preference?
> 
> Thanks,
> Tamar
> 
> > 
> > >    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > >        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
> > >      {
> > > @@ -3149,7 +3157,8 @@ start_over:
> > >
> > >    /* If an epilogue loop is required make sure we can create one.  */
> > >    if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > > -      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
> > > +      || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> > > +      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > >      {
> > >        if (dump_enabled_p ())
> > >          dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n");
> > > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc
> > > index
> > d97e2b54c25ac60378935392aa7b73476efed74b..8b495fc7ca137109fc0097c7
> > 7837233c660e032a 100644
> > > --- a/gcc/tree-vectorizer.cc
> > > +++ b/gcc/tree-vectorizer.cc
> > > @@ -1381,7 +1381,9 @@ pass_vectorize::execute (function *fun)
> > >  	 predicates that need to be shared for optimal predicate usage.
> > >  	 However reassoc will re-order them and prevent CSE from working
> > >  	 as it should.  CSE only the loop body, not the entry.  */
> > > -      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> > > +      auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > +      for (edge exit : exits)
> > > +	bitmap_set_bit (exit_bbs, exit->dest->index);
> > >
> > >        edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
> > >        do_rpo_vn (fun, entry, exit_bbs);
> > 
> > Otherwise OK.
> > 
> > Richard.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 13/21]middle-end: Update loop form analysis to support early break
  2023-12-06  9:15       ` Richard Biener
@ 2023-12-06  9:29         ` Tamar Christina
  0 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-12-06  9:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, December 6, 2023 9:15 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 13/21]middle-end: Update loop form analysis to support early
> break
> 
> On Wed, 6 Dec 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Wednesday, December 6, 2023 8:18 AM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > > Subject: Re: [PATCH 13/21]middle-end: Update loop form analysis to support
> early
> > > break
> > >
> > > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the
> other
> > > > patches are self contained.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
> > > > 	(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > > > 	(vect_transform_loop): Use it.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > index
> > >
> 51a054c5b035ac80dfbbf3b5ba2f6da82fda91f6..f9483eff6e9606e835906fb991
> > > f07cd6052491d0 100644
> > > > --- a/gcc/tree-vect-loop.cc
> > > > +++ b/gcc/tree-vect-loop.cc
> > > > @@ -1700,12 +1700,12 @@ vect_compute_single_scalar_iteration_cost
> > > (loop_vec_info loop_vinfo)
> > > >    loop_vinfo->scalar_costs->finish_cost (nullptr);
> > > >  }
> > > >
> > > > -
> > > >  /* Function vect_analyze_loop_form.
> > > >
> > > >     Verify that certain CFG restrictions hold, including:
> > > >     - the loop has a pre-header
> > > > -   - the loop has a single entry and exit
> > > > +   - the loop has a single entry
> > > > +   - nested loops can have only a single exit.
> > > >     - the loop exit condition is simple enough
> > > >     - the number of iterations can be analyzed, i.e, a countable loop.  The
> > > >       niter could be analyzed under some assumptions.  */
> > > > @@ -1841,10 +1841,14 @@ vect_analyze_loop_form (class loop *loop,
> > > vect_loop_form_info *info)
> > > >  				   "not vectorized: latch block not empty.\n");
> > > >
> > > >    /* Make sure the exit is not abnormal.  */
> > > > -  if (exit_e->flags & EDGE_ABNORMAL)
> > > > -    return opt_result::failure_at (vect_location,
> > > > -				   "not vectorized:"
> > > > -				   " abnormal loop exit edge.\n");
> > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > +  for (edge e : exits)
> > >
> > > Seeing this multiple times, this isn't the most efficient way to
> > > iterate over all exits with LOOPS_HAVE_RECORDED_EXITS.
> > >
> > > Note to myself: fix (add to) the API.
> > >
> > > > +    {
> > > > +      if (e->flags & EDGE_ABNORMAL)
> > > > +	return opt_result::failure_at (vect_location,
> > > > +				       "not vectorized:"
> > > > +				       " abnormal loop exit edge.\n");
> > > > +    }
> > > >
> > > >    info->conds
> > > >      = vect_get_loop_niters (loop, exit_e, &info->assumptions,
> > > > @@ -1920,6 +1924,10 @@ vect_create_loop_vinfo (class loop *loop,
> > > vec_info_shared *shared,
> > > >
> > > >    LOOP_VINFO_IV_EXIT (loop_vinfo) = info->loop_exit;
> > > >
> > > > +  /* Check to see if we're vectorizing multiple exits.  */
> > > > +  LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> > > > +    = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty ();
> > > > +
> > >
> > > Seeing this,
> s/LOOP_VINFO_LOOP_CONDS/LOOP_VINFO_LOOP_EXIT_CONDS/g
> > > might be good, if we in future avoid if-conversion in a separate
> > > pass we will have other CONDs as well.
> > >
> > > >    if (info->inner_loop_cond)
> > > >      {
> > > >        stmt_vec_info inner_loop_cond_info
> > > > @@ -11577,7 +11585,7 @@ vect_transform_loop (loop_vec_info
> loop_vinfo,
> > > gimple *loop_vectorized_call)
> > > >    /* Make sure there exists a single-predecessor exit bb.  Do this before
> > > >       versioning.   */
> > > >    edge e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > > > -  if (! single_pred_p (e->dest))
> > > > +  if (! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS
> (loop_vinfo))
> > > >      {
> > > >        split_loop_exit_edge (e, true);
> > >
> > > Note this splitting is done to fulfil versioning constraints on CFG
> > > update.  Do you have test coverage with alias versioning and early
> > > breaks?
> >
> > No, only non-alias versioning.  I don't believe we can alias in the current
> > implementation because it's restricted to statically known objects with
> > a fixed size.
> 
> Hm, if side-effects are all correctly in place do we still have that
> restriction?
> 
> int x;
> void foo (int *a, int *b)
> {
>   int local_x = x;
>   for (int i = 0; i < 1024; ++i)
>     {
>       if (i % local_x == 13)
>         break;
>       a[i] = 2 * b[i];
>     }
> }
> 
> the early exit isn't SCEV analyzable but doesn't depend on any
> memory and all side-effects are after the exit already.  But
> vectorizing would require alias versioning.

Oh, you're right.  A slightly simpler testcase:

int x;                                                                                                                                                                                                                                                                       void foo (int *a, int *b)
{
  int local_x = x;
  for (int i = 0; i < 1024; ++i)
    {
      if (i + local_x == 13)
        break;
      a[i] = 2 * b[i];
    }
}

Vectorizes and has added the correct alias check.  

uwl-alias.c:9:19: missed:   versioning for alias required: can't determine dependence between *_4 and *_6                                                                                                                                                                    consider run-time aliasing test between *_4 and *_6
merged alias checks:
  reference:      *_4 vs. *_6
  segment length: 12
  access size:    4
  alignment:      4
  flags:          WAR
uwl-alias.c:5:21: note:   improved number of alias checks from 1 to 1
uwl-alias.c:5:21: note:  created 1 versioning for alias checks.                                                                                                                                                                                                              uwl-alias.c:5:21: note:  trying to apply versioning to outer loop 0

So I'll make it a runtime test and added it to the testsuite.

Cheers,
Tamar

> 
> Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits
  2023-12-06  4:21                         ` Tamar Christina
@ 2023-12-06  9:33                           ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-12-06  9:33 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > > > is the exit edge you are looking for without iterating over all loop exits.
> > > >
> > > > > +		gimple *tmp_vec_stmt = vec_stmt;
> > > > > +		tree tmp_vec_lhs = vec_lhs;
> > > > > +		tree tmp_bitstart = bitstart;
> > > > > +		/* For early exit where the exit is not in the BB that leads
> > > > > +		   to the latch then we're restarting the iteration in the
> > > > > +		   scalar loop.  So get the first live value.  */
> > > > > +		restart_loop = restart_loop || exit_e != main_e;
> > > > > +		if (restart_loop)
> > > > > +		  {
> > > > > +		    tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> > > > > +		    tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> > > > > +		    tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
> > > >
> > > > Hmm, that gets you the value after the first iteration, not the one before which
> > > > would be the last value of the preceeding vector iteration?
> > > > (but we don't keep those, we'd need a PHI)
> > >
> > > I don't fully follow.  The comment on top of this hunk under if (loop_vinfo) states
> > > that lhs should be pointing to a PHI.
> > >
> > > When I inspect the statement I see
> > >
> > > i_14 = PHI <i_11(6), 0(14)>
> > >
> > > so i_14 is the value at the start of the current iteration.  If we're coming from the
> > > header 0, otherwise i_11 which is the value of the previous iteration?
> > >
> > > The peeling code explicitly leaves i_14 in the merge block and not i_11 for this
> > exact reason.
> > > So I'm confused, my understanding is that we're already *at* the right PHI.
> > >
> > > Is it perhaps that you thought we put i_11 here for the early exits? In which case
> > > Yes I'd agree that that would be wrong, and there we would have had to look at
> > > The defs, but i_11 is the def.
> > >
> > > I already kept this in mind and leveraged peeling to make this part easier.
> > > i_11 is used in the main exit and i_14 in the early one.
> > 
> > I think the important detail is that this code is only executed for
> > vect_induction_defs which are indeed PHIs and so we're sure the
> > value live is before any modification so fine to feed as initial
> > value for the PHI in the epilog.
> > 
> > Maybe we can assert the def type here?
> 
> We can't assert because until cfg cleanup the dead value is still seen and still
> vectorized.  That said I've added a guard here.  We vectorize the non-induction
> value as normal now and if it's ever used it'll fail.
> 
> > 
> > > >
> > > > Why again do we need (non-induction) live values from the vector loop to the
> > > > epilogue loop again?
> > >
> > > They can appear as the result value of the main exit.
> > >
> > > e.g. in testcase (vect-early-break_17.c)
> > >
> > > #define N 1024
> > > unsigned vect_a[N];
> > > unsigned vect_b[N];
> > >
> > > unsigned test4(unsigned x)
> > > {
> > >  unsigned ret = 0;
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >    vect_b[i] = x + i;
> > >    if (vect_a[i] > x)
> > >      return vect_a[i];
> > >    vect_a[i] = x;
> > >    ret = vect_a[i] + vect_b[i];
> > >  }
> > >  return ret;
> > > }
> > >
> > > The only situation they can appear in the as an early-break is when
> > > we have a case where main exit != latch connected exit.
> > >
> > > However in these cases they are unused, and only there because
> > > normally you would have exited (i.e. there was a return) but the
> > > vector loop needs to start over so we ignore it.
> > >
> > > These happen in testcase vect-early-break_74.c and
> > > vect-early-break_78.c
> > 
> > Hmm, so in that case their value is incorrect (but doesn't matter,
> > we ignore it)?
> > 
> 
> Correct, they're placed there due to exit redirection, but in these inverted
> testcases where we've peeled the vector iteration you can't ever skip the
> epilogue.  So they are guaranteed not to be used.
> 
> > > > > +		gimple_stmt_iterator exit_gsi;
> > > > > +		tree new_tree
> > > > > +		  = vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> > > > > +						   exit_e, vectype, ncopies,
> > > > > +						   slp_node, bitsize,
> > > > > +						   tmp_bitstart, tmp_vec_lhs,
> > > > > +						   lhs_type, restart_loop,
> > > > > +						   &exit_gsi);
> > > > > +
> > > > > +		/* Use the empty block on the exit to materialize the new
> > > > stmts
> > > > > +		   so we can use update the PHI here.  */
> > > > > +		if (gimple_phi_num_args (use_stmt) == 1)
> > > > > +		  {
> > > > > +		    auto gsi = gsi_for_stmt (use_stmt);
> > > > > +		    remove_phi_node (&gsi, false);
> > > > > +		    tree lhs_phi = gimple_phi_result (use_stmt);
> > > > > +		    gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> > > > > +		    gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> > > > > +		  }
> > > > > +		else
> > > > > +		  SET_PHI_ARG_DEF (use_stmt, dest_e->dest_idx, new_tree);
> > > >
> > > > if the else case works, why not use it always?
> > >
> > > Because it doesn't work for main exit.  The early exit have a intermediate block
> > > that is used to generate the statements on, so for them we are fine updating the
> > > use in place.
> > >
> > > The main exits don't. and so the existing trick the vectorizer uses is to materialize
> > > the statements in the same block and then dissolves the phi node.   However you
> > > can't do that for the early exit because the phi node isn't singular.
> > 
> > But if the PHI has a single arg you can replace that?  By making a
> > copy stmt from it don't you break LC SSA?
> > 
> 
> Yeah, what the existing code is sneakily doing is this:
> 
> It has to vectorize
> 
> x = PHI <y>
> y gets vectorized a z but
> 
> x = PHI <z>
> z = ...
> 
> would be invalid,  so what it does, since it doesn't have a predecessor note to place stuff in,
> it'll do
> 
> z = ...
> x = z
> 
> and removed the PHI.  The PHI was only placed there for vectorization so it's not needed
> after this point.  It's also for this reason why the code passes around a gimpe_seq since
> it needs to make sure it gets the order right when inserting statements.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-loop.cc (vectorizable_live_operation,
> 	vectorizable_live_operation_1): Support early exits.
> 	(can_vectorize_live_stmts): Call vectorizable_live_operation for non-live
> 	inductions or reductions.
> 	(find_connected_edge, vect_get_vect_def): New.
> 	(vect_create_epilog_for_reduction): Support reductions in early break.
> 	* tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> 	(vect_stmt_relevant_p): Mark all inductions when early break as being
> 	live.
> 	* tree-vectorizer.h (perm_mask_for_reverse): Expose.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index f38cc47551488525b15c2be758cac8291dbefb3a..4e48217a31e59318c2ea8e5ab63b06ba19840cbd 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3346,6 +3346,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
>  	    bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count ();
>  	  bb_before_epilog = loop_preheader_edge (epilog)->src;
>  	}
> +
>        /* If loop is peeled for non-zero constant times, now niters refers to
>  	 orig_niters - prolog_peeling, it won't overflow even the orig_niters
>  	 overflows.  */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index df5e1d28fac2ce35e71decdec0d8e31fb75557f5..2f922b42f6d567dfd5da9b276b1c9d37bc681876 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5831,6 +5831,34 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
>    return new_temp;
>  }
>  
> +/* Retrieves the definining statement to be used for a reduction.
> +   For MAIN_EXIT_P we use the current VEC_STMTs and otherwise we look at
> +   the reduction definitions.  */
> +
> +tree
> +vect_get_vect_def (stmt_vec_info reduc_info, slp_tree slp_node,
> +		   slp_instance slp_node_instance, bool main_exit_p, unsigned i,
> +		   vec <gimple *> &vec_stmts)
> +{
> +  tree def;
> +
> +  if (slp_node)
> +    {
> +      if (!main_exit_p)
> +        slp_node = slp_node_instance->reduc_phis;
> +      def = vect_get_slp_vect_def (slp_node, i);
> +    }
> +  else
> +    {
> +      if (!main_exit_p)
> +	reduc_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (reduc_info));
> +      vec_stmts = STMT_VINFO_VEC_STMTS (reduc_info);
> +      def = gimple_get_lhs (vec_stmts[0]);
> +    }
> +
> +  return def;
> +}
> +
>  /* Function vect_create_epilog_for_reduction
>  
>     Create code at the loop-epilog to finalize the result of a reduction
> @@ -5842,6 +5870,8 @@ vect_create_partial_epilog (tree vec_def, tree vectype, code_helper code,
>     SLP_NODE_INSTANCE is the SLP node instance containing SLP_NODE
>     REDUC_INDEX says which rhs operand of the STMT_INFO is the reduction phi
>       (counting from 0)
> +   LOOP_EXIT is the edge to update in the merge block.  In the case of a single
> +     exit this edge is always the main loop exit.
>  
>     This function:
>     1. Completes the reduction def-use cycles.
> @@ -5882,7 +5912,8 @@ static void
>  vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>  				  stmt_vec_info stmt_info,
>  				  slp_tree slp_node,
> -				  slp_instance slp_node_instance)
> +				  slp_instance slp_node_instance,
> +				  edge loop_exit)
>  {
>    stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
>    gcc_assert (reduc_info->is_reduc_info);
> @@ -5891,6 +5922,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>       loop-closed PHI of the inner loop which we remember as
>       def for the reduction PHI generation.  */
>    bool double_reduc = false;
> +  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
>    stmt_vec_info rdef_info = stmt_info;
>    if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
>      {
> @@ -6053,7 +6085,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>        /* Create an induction variable.  */
>        gimple_stmt_iterator incr_gsi;
>        bool insert_after;
> -      standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +      vect_iv_increment_position (loop_exit, &incr_gsi, &insert_after);
>        create_iv (series_vect, PLUS_EXPR, vec_step, NULL_TREE, loop, &incr_gsi,
>  		 insert_after, &indx_before_incr, &indx_after_incr);
>  
> @@ -6132,23 +6164,23 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
>           Store them in NEW_PHIS.  */
>    if (double_reduc)
>      loop = outer_loop;
> -  exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +  /* We need to reduce values in all exits.  */
> +  exit_bb = loop_exit->dest;
>    exit_gsi = gsi_after_labels (exit_bb);
>    reduc_inputs.create (slp_node ? vec_num : ncopies);
> +  vec <gimple *> vec_stmts;
>    for (unsigned i = 0; i < vec_num; i++)
>      {
>        gimple_seq stmts = NULL;
> -      if (slp_node)
> -	def = vect_get_slp_vect_def (slp_node, i);
> -      else
> -	def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[0]);
> +      def = vect_get_vect_def (rdef_info, slp_node, slp_node_instance,
> +			       main_exit_p, i, vec_stmts);
>        for (j = 0; j < ncopies; j++)
>  	{
>  	  tree new_def = copy_ssa_name (def);
>  	  phi = create_phi_node (new_def, exit_bb);
>  	  if (j)
> -	    def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]);
> -	  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def);
> +	    def = gimple_get_lhs (vec_stmts[j]);
> +	  SET_PHI_ARG_DEF (phi, loop_exit->dest_idx, def);
>  	  new_def = gimple_convert (&stmts, vectype, new_def);
>  	  reduc_inputs.quick_push (new_def);
>  	}
> @@ -10481,17 +10513,18 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>    return true;
>  }
>  
> -
>  /* Function vectorizable_live_operation_1.
> +
>     helper function for vectorizable_live_operation.  */
> +
>  tree
>  vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
> -			       stmt_vec_info stmt_info, edge exit_e,
> +			       stmt_vec_info stmt_info, basic_block exit_bb,
>  			       tree vectype, int ncopies, slp_tree slp_node,
>  			       tree bitsize, tree bitstart, tree vec_lhs,
> -			       tree lhs_type, gimple_stmt_iterator *exit_gsi)
> +			       tree lhs_type, bool restart_loop,
> +			       gimple_stmt_iterator *exit_gsi)
>  {
> -  basic_block exit_bb = exit_e->dest;
>    gcc_assert (single_pred_p (exit_bb) || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
>  
>    tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> @@ -10504,7 +10537,9 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>    if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
>      {
>        /* Emit:
> +
>  	 SCALAR_RES = VEC_EXTRACT <VEC_LHS, LEN + BIAS - 1>
> +
>  	 where VEC_LHS is the vectorized live-out result and MASK is
>  	 the loop mask for the final iteration.  */
>        gcc_assert (ncopies == 1 && !slp_node);
> @@ -10513,15 +10548,18 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        tree len = vect_get_loop_len (loop_vinfo, &gsi,
>  				    &LOOP_VINFO_LENS (loop_vinfo),
>  				    1, vectype, 0, 0);
> +
>        /* BIAS - 1.  */
>        signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
>        tree bias_minus_one
>  	= int_const_binop (MINUS_EXPR,
>  			   build_int_cst (TREE_TYPE (len), biasval),
>  			   build_one_cst (TREE_TYPE (len)));
> +
>        /* LAST_INDEX = LEN + (BIAS - 1).  */
>        tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len),
>  				     len, bias_minus_one);
> +
>        /* This needs to implement extraction of the first index, but not sure
>  	 how the LEN stuff works.  At the moment we shouldn't get here since
>  	 there's no LEN support for early breaks.  But guard this so there's
> @@ -10532,13 +10570,16 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        tree scalar_res
>  	= gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
>  			vec_lhs_phi, last_index);
> +
>        /* Convert the extracted vector element to the scalar type.  */
>        new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
>      }
>    else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>      {
>        /* Emit:
> +
>  	 SCALAR_RES = EXTRACT_LAST <VEC_LHS, MASK>
> +
>  	 where VEC_LHS is the vectorized live-out result and MASK is
>  	 the loop mask for the final iteration.  */
>        gcc_assert (!slp_node);
> @@ -10548,10 +10589,38 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
>  				      &LOOP_VINFO_MASKS (loop_vinfo),
>  				      1, vectype, 0);
> +      tree scalar_res;
> +
> +      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
> +	 instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
> +      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	{
> +	  /* First create the permuted mask.  */
> +	  tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
> +	  tree perm_dest = copy_ssa_name (mask);
> +	  gimple *perm_stmt
> +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
> +				       mask, perm_mask);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> +				       &gsi);
> +	  mask = perm_dest;
> +
> +	  /* Then permute the vector contents.  */
> +	  tree perm_elem = perm_mask_for_reverse (vectype);
> +	  perm_dest = copy_ssa_name (vec_lhs_phi);
> +	  perm_stmt
> +		= gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
> +				       vec_lhs_phi, perm_elem);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
> +				       &gsi);
> +	  vec_lhs_phi = perm_dest;
> +	}
>  
>        gimple_seq_add_seq (&stmts, tem);
> -       tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> -				       mask, vec_lhs_phi);
> +
> +      scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
> +				 mask, vec_lhs_phi);
> +
>        /* Convert the extracted vector element to the scalar type.  */
>        new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
>      }
> @@ -10564,12 +10633,26 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
>        new_tree = force_gimple_operand (fold_convert (lhs_type, new_tree),
>  				       &stmts, true, NULL_TREE);
>      }
> +
>    *exit_gsi = gsi_after_labels (exit_bb);
>    if (stmts)
>      gsi_insert_seq_before (exit_gsi, stmts, GSI_SAME_STMT);
> +
>    return new_tree;
>  }
>  
> +/* Find the edge that's the final one in the path from SRC to DEST and
> +   return it.  This edge must exist in at most one forwarder edge between.  */
> +
> +static edge
> +find_connected_edge (edge src, basic_block dest)
> +{
> +   if (src->dest == dest)
> +     return src;
> +
> +  return find_edge (src->dest, dest);
> +}
> +
>  /* Function vectorizable_live_operation.
>  
>     STMT_INFO computes a value that is used outside the loop.  Check if
> @@ -10590,11 +10673,13 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
>    int ncopies;
>    gimple *use_stmt;
> +  use_operand_p use_p;
>    auto_vec<tree> vec_oprnds;
>    int vec_entry = 0;
>    poly_uint64 vec_index = 0;
>  
> -  gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
> +  gcc_assert (STMT_VINFO_LIVE_P (stmt_info)
> +	      || LOOP_VINFO_EARLY_BREAKS (loop_vinfo));
>  
>    /* If a stmt of a reduction is live, vectorize it via
>       vect_create_epilog_for_reduction.  vectorizable_reduction assessed
> @@ -10619,8 +10704,25 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>        if (STMT_VINFO_REDUC_TYPE (reduc_info) == FOLD_LEFT_REDUCTION
>  	  || STMT_VINFO_REDUC_TYPE (reduc_info) == EXTRACT_LAST_REDUCTION)
>  	return true;
> +
>        vect_create_epilog_for_reduction (loop_vinfo, stmt_info, slp_node,
> -					slp_node_instance);
> +					slp_node_instance,
> +					LOOP_VINFO_IV_EXIT (loop_vinfo));
> +
> +      /* If early break we only have to materialize the reduction on the merge
> +	 block, but we have to find an alternate exit first.  */
> +      if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +	{
> +	  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
> +	    if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
> +	      {
> +		vect_create_epilog_for_reduction (loop_vinfo, stmt_info,
> +						  slp_node, slp_node_instance,
> +						  exit);
> +		break;
> +	      }
> +	}
> +
>        return true;
>      }
>  
> @@ -10772,37 +10874,62 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info,
>  	   lhs' = new_tree;  */
>  
>        class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -      basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> -      gcc_assert (single_pred_p (exit_bb));
> -
> -      tree vec_lhs_phi = copy_ssa_name (vec_lhs);
> -      gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> -      SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs);
> -
> -      gimple_stmt_iterator exit_gsi;
> -      tree new_tree
> -	= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> -					 LOOP_VINFO_IV_EXIT (loop_vinfo),
> -					 vectype, ncopies, slp_node, bitsize,
> -					 bitstart, vec_lhs, lhs_type,
> -					 &exit_gsi);
> -
> -      /* Remove existing phis that copy from lhs and create copies
> -	 from new_tree.  */
> -      gimple_stmt_iterator gsi;
> -      for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
> -	{
> -	  gimple *phi = gsi_stmt (gsi);
> -	  if ((gimple_phi_arg_def (phi, 0) == lhs))
> +      /* Check if we have a loop where the chosen exit is not the main exit,
> +	 in these cases for an early break we restart the iteration the vector code
> +	 did.  For the live values we want the value at the start of the iteration
> +	 rather than at the end.  */
> +      edge main_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +      bool restart_loop = LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
> +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> +	if (!is_gimple_debug (use_stmt)
> +	    && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> +	  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
>  	    {
> -	      remove_phi_node (&gsi, false);
> -	      tree lhs_phi = gimple_phi_result (phi);
> -	      gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> -	      gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> -	    }
> -	  else
> -	    gsi_next (&gsi);
> -	}
> +	      edge e = gimple_phi_arg_edge (as_a <gphi *> (use_stmt),
> +					   phi_arg_index_from_use (use_p));
> +	      bool main_exit_edge = e == main_e
> +				    || find_connected_edge (main_e, e->src);
> +
> +	      /* Early exits have an merge block, we want the merge block itself
> +		 so use ->src.  For main exit the merge block is the
> +		 destination.  */
> +	      basic_block dest = main_exit_edge ? main_e->dest : e->src;
> +	      gimple *tmp_vec_stmt = vec_stmt;
> +	      tree tmp_vec_lhs = vec_lhs;
> +	      tree tmp_bitstart = bitstart;
> +
> +	      /* For early exit where the exit is not in the BB that leads
> +		 to the latch then we're restarting the iteration in the
> +		 scalar loop.  So get the first live value.  */
> +	      restart_loop = restart_loop || !main_exit_edge;
> +	      if (restart_loop
> +		  && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> +		{
> +		  tmp_vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> +		  tmp_vec_lhs = gimple_get_lhs (tmp_vec_stmt);
> +		  tmp_bitstart = build_zero_cst (TREE_TYPE (bitstart));
> +		}
> +
> +	      gimple_stmt_iterator exit_gsi;
> +	      tree new_tree
> +		= vectorizable_live_operation_1 (loop_vinfo, stmt_info,
> +						 dest, vectype, ncopies,
> +						 slp_node, bitsize,
> +						 tmp_bitstart, tmp_vec_lhs,
> +						 lhs_type, restart_loop,
> +						 &exit_gsi);
> +
> +	      if (gimple_phi_num_args (use_stmt) == 1)
> +		{
> +		  auto gsi = gsi_for_stmt (use_stmt);
> +		  remove_phi_node (&gsi, false);
> +		  tree lhs_phi = gimple_phi_result (use_stmt);
> +		  gimple *copy = gimple_build_assign (lhs_phi, new_tree);
> +		  gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
> +		}
> +	      else
> +		SET_PHI_ARG_DEF (use_stmt, e->dest_idx, new_tree);
> +	  }
>  
>        /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
>        FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a09c0a804a38e17ef32b6ce13b98b077459fc7..582c5e678fad802d6e76300fe3c939b9f2978f17 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -342,6 +342,7 @@ is_simple_and_all_uses_invariant (stmt_vec_info stmt_info,
>     - it has uses outside the loop.
>     - it has vdefs (it alters memory).
>     - control stmts in the loop (except for the exit condition).
> +   - it is an induction and we have multiple exits.
>  
>     CHECKME: what other side effects would the vectorizer allow?  */
>  
> @@ -399,6 +400,19 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>  	}
>      }
>  
> +  /* Check if it's an induction and multiple exits.  In this case there will be
> +     a usage later on after peeling which is needed for the alternate exit.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +      && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def)
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "vec_stmt_relevant_p: induction forced for "
> +			   "early break.\n");
> +      *live_p = true;
> +
> +    }
> +
>    if (*live_p && *relevant == vect_unused_in_scope
>        && !is_simple_and_all_uses_invariant (stmt_info, loop_vinfo))
>      {
> @@ -1774,7 +1788,7 @@ compare_step_with_zero (vec_info *vinfo, stmt_vec_info stmt_info)
>  /* If the target supports a permute mask that reverses the elements in
>     a vector of type VECTYPE, return that mask, otherwise return null.  */
>  
> -static tree
> +tree
>  perm_mask_for_reverse (tree vectype)
>  {
>    poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> @@ -12720,20 +12734,27 @@ can_vectorize_live_stmts (vec_info *vinfo, stmt_vec_info stmt_info,
>  			  bool vec_stmt_p,
>  			  stmt_vector_for_cost *cost_vec)
>  {
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
>    if (slp_node)
>      {
>        stmt_vec_info slp_stmt_info;
>        unsigned int i;
>        FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, slp_stmt_info)
>  	{
> -	  if (STMT_VINFO_LIVE_P (slp_stmt_info)
> +	  if ((STMT_VINFO_LIVE_P (slp_stmt_info)
> +	       || (loop_vinfo
> +		   && LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +		   && STMT_VINFO_DEF_TYPE (slp_stmt_info)
> +			== vect_induction_def))
>  	      && !vectorizable_live_operation (vinfo, slp_stmt_info, slp_node,
>  					       slp_node_instance, i,
>  					       vec_stmt_p, cost_vec))
>  	    return false;
>  	}
>      }
> -  else if (STMT_VINFO_LIVE_P (stmt_info)
> +  else if ((STMT_VINFO_LIVE_P (stmt_info)
> +	    || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> +		&& STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def))
>  	   && !vectorizable_live_operation (vinfo, stmt_info,
>  					    slp_node, slp_node_instance, -1,
>  					    vec_stmt_p, cost_vec))
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 15c7f75b1f3c61ab469f1b1970dae9c6ac1a9f55..974f617d54a14c903894dd20d60098ca259c96f2 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2248,6 +2248,7 @@ extern bool vect_is_simple_use (vec_info *, stmt_vec_info, slp_tree,
>  				enum vect_def_type *,
>  				tree *, stmt_vec_info * = NULL);
>  extern bool vect_maybe_update_slp_op_vectype (slp_tree, tree);
> +extern tree perm_mask_for_reverse (tree);
>  extern bool supportable_widening_operation (vec_info*, code_helper,
>  					    stmt_vec_info, tree, tree,
>  					    code_helper*, code_helper*,
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-06  4:37       ` Tamar Christina
@ 2023-12-06  9:37         ` Richard Biener
  2023-12-08  8:58           ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-12-06  9:37 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > > > +
> > > > +  tree truth_type = truth_type_for (vectype_op);  machine_mode mode =
> > > > + TYPE_MODE (truth_type);  int ncopies;
> > > > +
> > 
> > more line break issues ... (also below, check yourself)
> > 
> > shouldn't STMT_VINFO_VECTYPE already match truth_type here?  If not
> > it looks to be set wrongly (or shouldn't be set at all)
> > 
> 
> Fixed, I now leverage the existing vect_recog_bool_pattern to update the types
> If needed and determine the initial type in vect_get_vector_types_for_stmt.
> 
> > > > +  if (slp_node)
> > > > +    ncopies = 1;
> > > > +  else
> > > > +    ncopies = vect_get_num_copies (loop_vinfo, truth_type);
> > > > +
> > > > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);  bool
> > > > + masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > > +
> > 
> > what about with_len?
> 
> Should be easy to add, but don't know how it works.
> 
> > 
> > > > +  /* Analyze only.  */
> > > > +  if (!vec_stmt)
> > > > +    {
> > > > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target doesn't support flag setting vector "
> > > > +			       "comparisons.\n");
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
> > 
> > Why NE_EXPR?  This looks wrong.  Or vectype_op is wrong if you're
> > emitting
> > 
> >  mask = op0 CMP op1;
> >  if (mask != 0)
> > 
> > I think you need to check for CMP, not NE_EXPR.
> 
> Well CMP is checked by vectorizable_comparison_1, but I realized this
> check is not checking what I wanted and the cbranch requirements
> already do.  So removed.
> 
> > 
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target does not support boolean vector "
> > > > +			       "comparisons for type %T.\n", truth_type);
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (ncopies > 1
> > > > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target does not support boolean vector OR for "
> > > > +			       "type %T.\n", truth_type);
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
> > > > +				      vec_stmt, slp_node, cost_vec))
> > > > +	return false;
> > 
> > I suppose vectorizable_comparison_1 will check this again, so the above
> > is redundant?
> > 
> 
> The IOR? No, vectorizable_comparison_1 doesn't reduce so may not check it
> depending on the condition.
> 
> > > > +  /* Determine if we need to reduce the final value.  */
> > > > +  if (stmts.length () > 1)
> > > > +    {
> > > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > > +	 possible.  */
> > > > +      auto_vec<tree> workset (stmts.length ());
> > > > +      workset.splice (stmts);
> > > > +      while (workset.length () > 1)
> > > > +	{
> > > > +	  new_temp = make_temp_ssa_name (truth_type, NULL,
> > > > "vexit_reduc");
> > > > +	  tree arg0 = workset.pop ();
> > > > +	  tree arg1 = workset.pop ();
> > > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0,
> > > > arg1);
> > > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > > +				       &cond_gsi);
> > > > +	  if (slp_node)
> > > > +	    slp_node->push_vec_def (new_stmt);
> > > > +	  else
> > > > +	    STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> > > > +	  workset.quick_insert (0, new_temp);
> > 
> > Reduction epilogue handling has similar code to reduce a set of vectors
> > to a single one with an operation.  I think we want to share that code.
> > 
> 
> I've taken a look but that code isn't suitable here since they have different
> constraints.  I don't require an in-order reduction since for the comparison
> all we care about is whether in a lane any bit is set or not.  This means:
> 
> 1. we can reduce using a fast operation like IOR.
> 2. we can reduce in as much parallelism as possible.
> 
> The comparison is on the critical path for the loop now, unlike live reductions
> which are always at the end, so using the live reduction code resulted in a
> slow down since it creates a longer dependency chain.

OK.

> > > > +	}
> > > > +    }
> > > > +  else
> > > > +    new_temp = stmts[0];
> > > > +
> > > > +  gcc_assert (new_temp);
> > > > +
> > > > +  tree cond = new_temp;
> > > > +  if (masked_loop_p)
> > > > +    {
> > > > +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> > > > truth_type, 0);
> > > > +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > > +			       &cond_gsi);
> > 
> > I don't think this is correct when 'stmts' had more than one vector?
> > 
> 
> It is, because even when VLA, since we only support counted loops partial vectors
> are disabled. And it looks like --parm vect-partial-vector-usage=1 cannot force it on.

--param vect-partial-vector-usage=2 would, no?

> In principal I suppose I could mask the individual stmts, that should handle the future case when
> This is relaxed to supposed non-fix length buffers?

Well, it looks wrong - either put in an assert that we start with a
single stmt or assert !masked_loop_p instead?  Better ICE than
generate wrong code.

That said, I think you need to apply the masking on the original
stmts[], before reducing them, no?

Thanks,
Richard.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> 	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> 	vect_recog_bool_pattern): Support gconds type analysis.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..c6cedf4fe7c1f1e1126ce166a059a4b2a2b49cbd 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5210,19 +5211,27 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
>     true if bool VAR can and should be optimized that way.  Assume it shouldn't
>     in case it's a result of a comparison which can be directly vectorized into
>     a vector comparison.  Fills in STMTS with all stmts visited during the
> -   walk.  */
> +   walk.  if COND then a gcond is being inspected instead of a normal COND,  */
>  
>  static bool
> -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> +		    gcond *cond)
>  {
>    tree rhs1;
>    enum tree_code rhs_code;
> +  gassign *def_stmt = NULL;
>  
>    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> -  if (!def_stmt_info)
> +  if (!def_stmt_info && !cond)
>      return false;
> +  else if (!def_stmt_info)
> +    /* If we're a gcond we won't be codegen-ing the statements and are only
> +       after if the types match.  In that case we can accept loop invariant
> +       values.  */
> +    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
> +  else
> +    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
>  
> -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
>    if (!def_stmt)
>      return false;
>  
> @@ -5234,27 +5243,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>    switch (rhs_code)
>      {
>      case SSA_NAME:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
>  	return false;
>        break;
>  
>      CASE_CONVERT:
>        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
>  	return false;
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
>  	return false;
>        break;
>  
>      case BIT_NOT_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond))
>  	return false;
>        break;
>  
>      case BIT_AND_EXPR:
>      case BIT_IOR_EXPR:
>      case BIT_XOR_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, cond)
> +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> +				   cond))
>  	return false;
>        break;
>  
> @@ -5275,6 +5285,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
>  							  TREE_TYPE (rhs1));
>  	  if (mask_type
> +	      && !cond
>  	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
>  	    return false;
>  
> @@ -5324,11 +5335,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
>     VAR is an SSA_NAME that should be transformed from bool to a wider integer
>     type, OUT_TYPE is the desired final integer type of the whole pattern.
>     STMT_INFO is the info of the pattern root and is where pattern stmts should
> -   be associated with.  DEFS is a map of pattern defs.  */
> +   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
> +   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
>  
>  static void
>  adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
> -		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
> +		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
> +		     gimple *&last_stmt, bool type_only)
>  {
>    gimple *stmt = SSA_NAME_DEF_STMT (var);
>    enum tree_code rhs_code, def_rhs_code;
> @@ -5492,8 +5505,10 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
>      }
>  
>    gimple_set_location (pattern_stmt, loc);
> -  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> -			  get_vectype_for_scalar_type (vinfo, itype));
> +  if (!type_only)
> +    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> +			    get_vectype_for_scalar_type (vinfo, itype));
> +  last_stmt = pattern_stmt;
>    defs.put (var, gimple_assign_lhs (pattern_stmt));
>  }
>  
> @@ -5509,11 +5524,14 @@ sort_after_uid (const void *p1, const void *p2)
>  
>  /* Create pattern stmts for all stmts participating in the bool pattern
>     specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
> -   OUT_TYPE.  Return the def of the pattern root.  */
> +   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
> +   statements are not emitted as pattern statements and the tree returned is
> +   only useful for type queries.  */
>  
>  static tree
>  adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
> -		   tree out_type, stmt_vec_info stmt_info)
> +		   tree out_type, stmt_vec_info stmt_info,
> +		   bool type_only = false)
>  {
>    /* Gather original stmts in the bool pattern in their order of appearance
>       in the IL.  */
> @@ -5523,16 +5541,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
>      bool_stmts.quick_push (*i);
>    bool_stmts.qsort (sort_after_uid);
>  
> +  gimple *last_stmt = NULL;
> +
>    /* Now process them in that order, producing pattern stmts.  */
>    hash_map <tree, tree> defs;
>    for (unsigned i = 0; i < bool_stmts.length (); ++i)
>      adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
> -			 out_type, stmt_info, defs);
> +			 out_type, stmt_info, defs, last_stmt, type_only);
>  
>    /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
> -  gimple *pattern_stmt
> -    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> -  return gimple_assign_lhs (pattern_stmt);
> +  return gimple_assign_lhs (last_stmt);
>  }
>  
>  /* Return the proper type for converting bool VAR into
> @@ -5608,13 +5626,22 @@ vect_recog_bool_pattern (vec_info *vinfo,
>    enum tree_code rhs_code;
>    tree var, lhs, rhs, vectype;
>    gimple *pattern_stmt;
> -
> -  if (!is_gimple_assign (last_stmt))
> +  gcond* cond = NULL;
> +  if (!is_gimple_assign (last_stmt)
> +      && !(cond = dyn_cast <gcond *> (last_stmt)))
>      return NULL;
>  
> -  var = gimple_assign_rhs1 (last_stmt);
> -  lhs = gimple_assign_lhs (last_stmt);
> -  rhs_code = gimple_assign_rhs_code (last_stmt);
> +  if (is_gimple_assign (last_stmt))
> +    {
> +      var = gimple_assign_rhs1 (last_stmt);
> +      lhs = gimple_assign_lhs (last_stmt);
> +      rhs_code = gimple_assign_rhs_code (last_stmt);
> +    }
> +  else
> +    {
> +      lhs = var = gimple_cond_lhs (last_stmt);
> +      rhs_code = gimple_cond_code (last_stmt);
> +    }
>  
>    if (rhs_code == VIEW_CONVERT_EXPR)
>      var = TREE_OPERAND (var, 0);
> @@ -5632,7 +5659,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  	return NULL;
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
>  	{
>  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				   TREE_TYPE (lhs), stmt_vinfo);
> @@ -5680,7 +5707,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  
>        return pattern_stmt;
>      }
> -  else if (rhs_code == COND_EXPR
> +  else if ((rhs_code == COND_EXPR || cond)
>  	   && TREE_CODE (var) == SSA_NAME)
>      {
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> @@ -5700,18 +5727,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
>  	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
>        else if (integer_type_for_mask (var, vinfo))
>  	return NULL;
>  
> -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> -      pattern_stmt 
> -	= gimple_build_assign (lhs, COND_EXPR,
> -			       build2 (NE_EXPR, boolean_type_node,
> -				       var, build_int_cst (TREE_TYPE (var), 0)),
> -			       gimple_assign_rhs2 (last_stmt),
> -			       gimple_assign_rhs3 (last_stmt));
> +      if (!cond)
> +	{
> +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> +	  pattern_stmt
> +	    = gimple_build_assign (lhs, COND_EXPR,
> +				   build2 (NE_EXPR, boolean_type_node, var,
> +					   build_int_cst (TREE_TYPE (var), 0)),
> +				   gimple_assign_rhs2 (last_stmt),
> +				   gimple_assign_rhs3 (last_stmt));
> +	}
> +      else
> +	{
> +	  pattern_stmt
> +	    = gimple_build_cond (gimple_cond_code (cond), gimple_cond_lhs (cond),
> +				 gimple_cond_rhs (cond),
> +				 gimple_cond_true_label (cond),
> +				 gimple_cond_false_label (cond));
> +	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
> +	  vectype = truth_type_for (vectype);
> +	}
>        *type_out = vectype;
>        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
>  
> @@ -5725,7 +5765,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
>  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				 TREE_TYPE (vectype), stmt_vinfo);
>        else
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..d801b72a149ebe6aa4d1f2942324b042d07be530 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,176 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  gcc_assert (vectype);
> +
> +  tree vectype_op0 = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype_op0))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }
> +
> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +      workset.splice (stmts);
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  /* If we have multiple statements after reduction we should check all the
> +     lanes and treat it as a full vector.  */
> +  if (masked_loop_p)
> +    {
> +      tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> +				      vectype, 0);
> +      cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			       &cond_gsi);
> +    }
> +
> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
> +			     build_zero_cst (vectype));
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13123,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13148,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13310,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,10 +14506,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      gcond *cond = NULL;
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else if ((cond = dyn_cast <gcond *> (stmt)))
> +	{
> +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> +	     single bit precision and we need the vector boolean to be a
> +	     representation of the integer mask.  So set the correct integer type and
> +	     convert to boolean vector once we have a vectype.  */
> +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> +	}
>        else
>  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>  
> @@ -14339,12 +14533,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>  			     "get vectype for scalar type: %T\n", scalar_type);
>  	}
>        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +
>        if (!vectype)
>  	return opt_result::failure_at (stmt,
>  				       "not vectorized:"
>  				       " unsupported data-type %T\n",
>  				       scalar_type);
>  
> +      /* If we were a gcond, convert the resulting type to a vector boolean type now
> +	 that we have the correct integer mask type.  */
> +      if (cond)
> +	vectype = truth_type_for (vectype);
> +
>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
>      }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 10/21]middle-end: implement relevancy analysis support for control flow
  2023-12-06  4:10       ` Tamar Christina
@ 2023-12-06  9:44         ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-12-06  9:44 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 6 Dec 2023, Tamar Christina wrote:

> > > > +	  && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != cond)
> > > > +	*relevant = vect_used_in_scope;
> > 
> > but why not simply mark all gconds as vect_used_in_scope?
> > 
> 
> We break outer-loop vectorization since doing so would pull the inner loop's
> exit into scope for the outerloop.   Also we can't force the loop's main IV exit
> to be in scope, since it will be replaced by the vectorizer.
> 
> I've updated the code to remove the quadratic lookup.
> 
> > > > +    }
> > > >
> > > >    /* changing memory.  */
> > > >    if (gimple_code (stmt_info->stmt) != GIMPLE_PHI) @@ -374,6 +379,11 @@
> > > > vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
> > > >  	*relevant = vect_used_in_scope;
> > > >        }
> > > >
> > > > +  auto_vec<edge> exits = get_loop_exit_edges (loop);  auto_bitmap
> > > > + exit_bbs;  for (edge exit : exits)
> > 
> > is it your mail client messing patches up?  missing line-break
> > again.
> > 
> 
> Yeah, seems it was, hopefully fixed now.
> 
> > > > +    bitmap_set_bit (exit_bbs, exit->dest->index);
> > > > +
> > 
> > you don't seem to use the bitmap?
> > 
> > > >    /* uses outside the loop.  */
> > > >    FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter,
> > > > SSA_OP_DEF)
> > > >      {
> > > > @@ -392,7 +402,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info,
> > > > loop_vec_info loop_vinfo,
> > > >  	      /* We expect all such uses to be in the loop exit phis
> > > >  		 (because of loop closed form)   */
> > > >  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> > > > -	      gcc_assert (bb == single_exit (loop)->dest);
> > > >
> > > >                *live_p = true;
> > > >  	    }
> > > > @@ -793,6 +802,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info
> > > > loop_vinfo, bool *fatal)
> > > >  			return res;
> > > >  		    }
> > > >                   }
> > > > +	    }
> > > > +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> > > > +	    {
> > > > +	      enum tree_code rhs_code = gimple_cond_code (cond);
> > > > +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> > > > +	      opt_result res
> > > > +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> > > > +			       loop_vinfo, relevant, &worklist, false);
> > > > +	      if (!res)
> > > > +		return res;
> > > > +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> > > > +				loop_vinfo, relevant, &worklist, false);
> > > > +	      if (!res)
> > > > +		return res;
> > > >              }
> > 
> > I guess we're missing an
> > 
> >   else
> >     gcc_unreachable ();
> > 
> > to catch not handled stmt kinds (do we have gcond patterns yet?)
> > 
> > > >  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
> > > >  	    {
> > > > @@ -13043,11 +13066,15 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >  			     node_instance, cost_vec);
> > > >        if (!res)
> > > >  	return res;
> > > > -   }
> > > > +    }
> > > > +
> > > > +  if (is_ctrl_stmt (stmt_info->stmt))
> > > > +    STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def;
> > 
> > I think it should rather be vect_condition_def.  It's also not
> > this functions business to set STMT_VINFO_DEF_TYPE.  If we ever
> > get to handle not if-converted code (or BB vectorization of that)
> > then a gcond would define the mask stmts are under.
> > 
> 
> Hmm sure, I've had to place it in multiple other places but moved it
> away from here.  The main ones are set during dataflow analysis when
> we determine which statements need to be moved.

I'd have set it where we set STMT_VINFO_TYPE on conds to
loop_exit_ctrl_vec_info_type.

The patch below has it in vect_mark_pattern_stmts only?  Guess it's
in the other patch(es) now.

> > > >    switch (STMT_VINFO_DEF_TYPE (stmt_info))
> > > >      {
> > > >        case vect_internal_def:
> > > > +      case vect_early_exit_def:
> > > >          break;
> > > >
> > > >        case vect_reduction_def:
> > > > @@ -13080,6 +13107,7 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >      {
> > > >        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
> > > >        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> > > > +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
> > > >  		  || (call && gimple_call_lhs (call) == NULL_TREE));
> > > >        *need_to_vectorize = true;
> > > >      }
> > > > @@ -13835,6 +13863,14 @@ vect_is_simple_use (vec_info *vinfo,
> > > > stmt_vec_info stmt, slp_tree slp_node,
> > > >  	  else
> > > >  	    *op = gimple_op (ass, operand + 1);
> > > >  	}
> > > > +      else if (gcond *cond = dyn_cast <gcond *> (stmt->stmt))
> > > > +	{
> > > > +	  gimple_match_op m_op;
> > > > +	  if (!gimple_extract_op (cond, &m_op))
> > > > +	    return false;
> > > > +	  gcc_assert (m_op.code.is_tree_code ());
> > > > +	  *op = m_op.ops[operand];
> > > > +	}
> > 
> > Please do not use gimple_extract_op, use
> > 
> >   *op = gimple_op (cond, operand);
> > 
> > > >        else if (gcall *call = dyn_cast <gcall *> (stmt->stmt))
> > > >  	*op = gimple_call_arg (call, operand);
> > > >        else
> > > > @@ -14445,6 +14481,8 @@ vect_get_vector_types_for_stmt (vec_info
> > > > *vinfo, stmt_vec_info stmt_info,
> > > >    *nunits_vectype_out = NULL_TREE;
> > > >
> > > >    if (gimple_get_lhs (stmt) == NULL_TREE
> > > > +      /* Allow vector conditionals through here.  */
> > > > +      && !is_ctrl_stmt (stmt)
> > 
> > !is_a <gcond *> (stmt)
> > 
> > > >        /* MASK_STORE has no lhs, but is ok.  */
> > > >        && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > > >      {
> > > > @@ -14461,7 +14499,7 @@ vect_get_vector_types_for_stmt (vec_info
> > > > *vinfo, stmt_vec_info stmt_info,
> > > >  	}
> > > >
> > > >        return opt_result::failure_at (stmt,
> > > > -				     "not vectorized: irregular stmt.%G", stmt);
> > > > +				     "not vectorized: irregular stmt: %G", stmt);
> > > >      }
> > > >
> > > >    tree vectype;
> > > > @@ -14490,6 +14528,14 @@ vect_get_vector_types_for_stmt (vec_info
> > > > *vinfo, stmt_vec_info stmt_info,
> > > >  	scalar_type = TREE_TYPE (DR_REF (dr));
> > > >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > > >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > > > +      else if (is_ctrl_stmt (stmt))
> > 
> > else if (gcond *cond = dyn_cast <...>)
> > 
> > > > +	{
> > > > +	  gcond *cond = dyn_cast <gcond *> (stmt);
> > > > +	  if (!cond)
> > > > +	    return opt_result::failure_at (stmt, "not vectorized: unsupported"
> > > > +					   " control flow statement.\n");
> > > > +	  scalar_type = TREE_TYPE (gimple_cond_rhs (stmt));
> > 
> > As said in the other patch STMT_VINFO_VECTYPE of the gcond should
> > be the _mask_ type the compare produces, not the vector type of
> > the inputs (the nunits_vectype might be that one though).
> > You possibly need to adjust vect_get_smallest_scalar_type for this.
> > 
> 
> Fixed, but is in other patch now.
> 
> Ok for master?

OK.

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_mark_pattern_stmts): Support gcond
> 	patterns.
> 	* tree-vect-stmts.cc (vect_stmt_relevant_p,
> 	vect_mark_stmts_to_be_vectorized, vect_analyze_stmt, vect_is_simple_use,
> 	vect_get_vector_types_for_stmt): Support early breaks.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index c6cedf4fe7c1f1e1126ce166a059a4b2a2b49cbd..ea59ad337f14d802607850e8a7cf0125777ce2bc 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -6987,6 +6987,10 @@ vect_mark_pattern_stmts (vec_info *vinfo,
>      vect_set_pattern_stmt (vinfo,
>  			   pattern_stmt, orig_stmt_info, pattern_vectype);
>  
> +  /* For any conditionals mark them as vect_condition_def.  */
> +  if (is_a <gcond *> (pattern_stmt))
> +    STMT_VINFO_DEF_TYPE (STMT_VINFO_RELATED_STMT (orig_stmt_info)) = vect_condition_def;
> +
>    /* Transfer reduction path info to the pattern.  */
>    if (STMT_VINFO_REDUC_IDX (orig_stmt_info_saved) != -1)
>      {
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index d801b72a149ebe6aa4d1f2942324b042d07be530..1e2698fcb7e95ae7f0009d10a79ba8c891a8227d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -361,7 +361,9 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>  
>    /* cond stmt other than loop exit cond.  */
>    gimple *stmt = STMT_VINFO_STMT (stmt_info);
> -  if (dyn_cast <gcond *> (stmt))
> +  if (is_a <gcond *> (stmt)
> +      && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != stmt
> +      && (!loop->inner || gimple_bb (stmt)->loop_father == loop))
>      *relevant = vect_used_in_scope;
>  
>    /* changing memory.  */
> @@ -393,7 +395,6 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>  	      /* We expect all such uses to be in the loop exit phis
>  		 (because of loop closed form)   */
>  	      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
> -	      gcc_assert (bb == single_exit (loop)->dest);
>  
>                *live_p = true;
>  	    }
> @@ -807,6 +808,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
>  			return res;
>  		    }
>                   }
> +	    }
> +	  else if (gcond *cond = dyn_cast <gcond *> (stmt_vinfo->stmt))
> +	    {
> +	      enum tree_code rhs_code = gimple_cond_code (cond);
> +	      gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
> +	      opt_result res
> +		= process_use (stmt_vinfo, gimple_cond_lhs (cond),
> +			       loop_vinfo, relevant, &worklist, false);
> +	      if (!res)
> +		return res;
> +	      res = process_use (stmt_vinfo, gimple_cond_rhs (cond),
> +				loop_vinfo, relevant, &worklist, false);
> +	      if (!res)
> +		return res;
>              }
>  	  else if (gcall *call = dyn_cast <gcall *> (stmt_vinfo->stmt))
>  	    {
> @@ -820,6 +835,8 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal)
>  		    return res;
>  		}
>  	    }
> +	  else
> +	    gcc_unreachable ();
>          }
>        else
>  	FOR_EACH_PHI_OR_STMT_USE (use_p, stmt_vinfo->stmt, iter, SSA_OP_USE)
> @@ -13044,11 +13061,12 @@ vect_analyze_stmt (vec_info *vinfo,
>  			     node_instance, cost_vec);
>        if (!res)
>  	return res;
> -   }
> +    }
>  
>    switch (STMT_VINFO_DEF_TYPE (stmt_info))
>      {
>        case vect_internal_def:
> +      case vect_condition_def:
>          break;
>  
>        case vect_reduction_def:
> @@ -13081,6 +13099,7 @@ vect_analyze_stmt (vec_info *vinfo,
>      {
>        gcall *call = dyn_cast <gcall *> (stmt_info->stmt);
>        gcc_assert (STMT_VINFO_VECTYPE (stmt_info)
> +		  || gimple_code (stmt_info->stmt) == GIMPLE_COND
>  		  || (call && gimple_call_lhs (call) == NULL_TREE));
>        *need_to_vectorize = true;
>      }
> @@ -13855,6 +13874,8 @@ vect_is_simple_use (vec_info *vinfo, stmt_vec_info stmt, slp_tree slp_node,
>  	  else
>  	    *op = gimple_op (ass, operand + 1);
>  	}
> +      else if (gcond *cond = dyn_cast <gcond *> (stmt->stmt))
> +	*op = gimple_op (cond, operand);
>        else if (gcall *call = dyn_cast <gcall *> (stmt->stmt))
>  	*op = gimple_call_arg (call, operand);
>        else
> @@ -14465,6 +14486,8 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>    *nunits_vectype_out = NULL_TREE;
>  
>    if (gimple_get_lhs (stmt) == NULL_TREE
> +      /* Allow vector conditionals through here.  */
> +      && !is_a <gcond *> (stmt)
>        /* MASK_STORE has no lhs, but is ok.  */
>        && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
>      {
> @@ -14481,7 +14504,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>  	}
>  
>        return opt_result::failure_at (stmt,
> -				     "not vectorized: irregular stmt.%G", stmt);
> +				     "not vectorized: irregular stmt: %G", stmt);
>      }
>  
>    tree vectype;
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD
  2023-11-28 17:55     ` Richard Sandiford
@ 2023-12-06 16:25       ` Tamar Christina
  2023-12-07  0:56         ` Richard Sandiford
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-06 16:25 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 9788 bytes --]

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Tuesday, November 28, 2023 5:56 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for
> Advanced SIMD
> 
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > Tamar Christina <tamar.christina@arm.com> writes:
> >> Hi All,
> >>
> >> This adds an implementation for conditional branch optab for AArch64.
> >>
> >> For e.g.
> >>
> >> void f1 ()
> >> {
> >>   for (int i = 0; i < N; i++)
> >>     {
> >>       b[i] += a[i];
> >>       if (a[i] > 0)
> >> 	break;
> >>     }
> >> }
> >>
> >> For 128-bit vectors we generate:
> >>
> >>         cmgt    v1.4s, v1.4s, #0
> >>         umaxp   v1.4s, v1.4s, v1.4s
> >>         fmov    x3, d1
> >>         cbnz    x3, .L8
> >>
> >> and of 64-bit vector we can omit the compression:
> >>
> >>         cmgt    v1.2s, v1.2s, #0
> >>         fmov    x2, d1
> >>         cbz     x2, .L13
> >>
> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>
> >> Ok for master?
> >>
> >> Thanks,
> >> Tamar
> >>
> >> gcc/ChangeLog:
> >>
> >> 	* config/aarch64/aarch64-simd.md (cbranch<mode>4): New.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 	* gcc.target/aarch64/vect-early-break-cbranch.c: New test.
> >>
> >> --- inline copy of patch --
> >> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> >> index
> 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd7
> 0a92924f62524c15 100644
> >> --- a/gcc/config/aarch64/aarch64-simd.md
> >> +++ b/gcc/config/aarch64/aarch64-simd.md
> >> @@ -3830,6 +3830,46 @@ (define_expand
> "vcond_mask_<mode><v_int_equiv>"
> >>    DONE;
> >>  })
> >>
> >> +;; Patterns comparing two vectors and conditionally jump
> >> +
> >> +(define_expand "cbranch<mode>4"
> >> +  [(set (pc)
> >> +        (if_then_else
> >> +          (match_operator 0 "aarch64_equality_operator"
> >> +            [(match_operand:VDQ_I 1 "register_operand")
> >> +             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
> >> +          (label_ref (match_operand 3 ""))
> >> +          (pc)))]
> >> +  "TARGET_SIMD"
> >> +{
> >> +  auto code = GET_CODE (operands[0]);
> >> +  rtx tmp = operands[1];
> >> +
> >> +  /* If comparing against a non-zero vector we have to do a comparison first
> >> +     so we can have a != 0 comparison with the result.  */
> >> +  if (operands[2] != CONST0_RTX (<MODE>mode))
> >> +    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
> >> +					operands[2]));
> >> +
> >> +  /* For 64-bit vectors we need no reductions.  */
> >> +  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
> >> +    {
> >> +      /* Always reduce using a V4SI.  */
> >> +      rtx reduc = gen_lowpart (V4SImode, tmp);
> >> +      rtx res = gen_reg_rtx (V4SImode);
> >> +      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
> >> +      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
> >> +    }
> >> +
> >> +  rtx val = gen_reg_rtx (DImode);
> >> +  emit_move_insn (val, gen_lowpart (DImode, tmp));
> >> +
> >> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
> >> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
> >> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
> >> +  DONE;
> >
> > Are you sure this is correct for the operands[2] != const0_rtx case?
> > It looks like it uses the same comparison code for the vector comparison
> > and the scalar comparison.
> >
> > E.g. if the pattern is passed a comparison:
> >
> >   (eq (reg:V2SI x) (reg:V2SI y))
> >
> > it looks like we'd generate a CMEQ for the x and y, then branch
> > when the DImode bitcast of the CMEQ result equals zero.  This means
> > that we branch when no elements of x and y are equal, rather than
> > when all elements of x and y are equal.
> >
> > E.g. for:
> >
> >    { 1, 2 } == { 1, 2 }
> >
> > CMEQ will produce { -1, -1 }, the scalar comparison will be -1 == 0,
> > and the branch won't be taken.
> >
> > ISTM it would be easier for the operands[2] != const0_rtx case to use
> > EOR instead of a comparison.  That gives a zero result if the input
> > vectors are equal and a nonzero result if the input vectors are
> > different.  We can then branch on the result using CODE and const0_rtx.
> >
> > (Hope I've got that right.)
> >
> > Maybe that also removes the need for patch 18.
> 
> Sorry, I forgot to say: we can't use operands[1] as a temporary,
> since it's only an input to the pattern.  The EOR destination would
> need to be a fresh register.

I've updated the patch but it doesn't help since cbranch doesn't really push
comparisons in.  So we don't seem to ever really get called with anything non-zero.

That said, I'm not entirely convince that the == case is correct. Since == means all bits
Equal instead of any bit set, and so it needs to generate cbz instead of cbnz and I'm not
sure that's guaranteed.

I do have a failing testcase with this but haven't tracked down yet if the mid-end did the
right thing.  Think there might be a similar issue in a match.pd simplication.

Thoughts on the == case?

Thanks,
Tamar

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c6f2d5828373f2a5272b9d1227bfe34365f9fd09..7b289b1fbec6b1f15fbf51b6c862bcf9a5588b6b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3911,6 +3911,46 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+        (if_then_else
+          (match_operator 0 "aarch64_equality_operator"
+            [(match_operand:VDQ_I 1 "register_operand")
+             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+          (label_ref (match_operand 3 ""))
+          (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+     so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (<MODE>mode))
+    {
+      tmp = gen_reg_rtx (<MODE>mode);
+      emit_insn (gen_xor<mode>3 (tmp, operands[1], operands[2]));
+    }
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      rtx reduc = gen_lowpart (V4SImode, tmp);
+      rtx res = gen_reg_rtx (V4SImode);
+      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+    }
+
+  rtx val = gen_reg_rtx (DImode);
+  emit_move_insn (val, gen_lowpart (DImode, tmp));
+  emit_jump_insn (gen_cbranchdi4 (operands[0], val, CONST0_RTX (DImode),
+				  operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}

[-- Attachment #2: rb17509.patch --]
[-- Type: application/octet-stream, Size: 4048 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c6f2d5828373f2a5272b9d1227bfe34365f9fd09..7b289b1fbec6b1f15fbf51b6c862bcf9a5588b6b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3911,6 +3911,46 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+        (if_then_else
+          (match_operator 0 "aarch64_equality_operator"
+            [(match_operand:VDQ_I 1 "register_operand")
+             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+          (label_ref (match_operand 3 ""))
+          (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+     so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (<MODE>mode))
+    {
+      tmp = gen_reg_rtx (<MODE>mode);
+      emit_insn (gen_xor<mode>3 (tmp, operands[1], operands[2]));
+    }
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      rtx reduc = gen_lowpart (V4SImode, tmp);
+      rtx res = gen_reg_rtx (V4SImode);
+      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+    }
+
+  rtx val = gen_reg_rtx (DImode);
+  emit_move_insn (val, gen_lowpart (DImode, tmp));
+  emit_jump_insn (gen_cbranchdi4 (operands[0], val, CONST0_RTX (DImode),
+				  operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD
  2023-12-06 16:25       ` Tamar Christina
@ 2023-12-07  0:56         ` Richard Sandiford
  2023-12-14 18:40           ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Sandiford @ 2023-12-07  0:56 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Tuesday, November 28, 2023 5:56 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
>> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> Subject: Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for
>> Advanced SIMD
>> 
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> > Tamar Christina <tamar.christina@arm.com> writes:
>> >> Hi All,
>> >>
>> >> This adds an implementation for conditional branch optab for AArch64.
>> >>
>> >> For e.g.
>> >>
>> >> void f1 ()
>> >> {
>> >>   for (int i = 0; i < N; i++)
>> >>     {
>> >>       b[i] += a[i];
>> >>       if (a[i] > 0)
>> >> 	break;
>> >>     }
>> >> }
>> >>
>> >> For 128-bit vectors we generate:
>> >>
>> >>         cmgt    v1.4s, v1.4s, #0
>> >>         umaxp   v1.4s, v1.4s, v1.4s
>> >>         fmov    x3, d1
>> >>         cbnz    x3, .L8
>> >>
>> >> and of 64-bit vector we can omit the compression:
>> >>
>> >>         cmgt    v1.2s, v1.2s, #0
>> >>         fmov    x2, d1
>> >>         cbz     x2, .L13
>> >>
>> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >>
>> >> Ok for master?
>> >>
>> >> Thanks,
>> >> Tamar
>> >>
>> >> gcc/ChangeLog:
>> >>
>> >> 	* config/aarch64/aarch64-simd.md (cbranch<mode>4): New.
>> >>
>> >> gcc/testsuite/ChangeLog:
>> >>
>> >> 	* gcc.target/aarch64/vect-early-break-cbranch.c: New test.
>> >>
>> >> --- inline copy of patch --
>> >> diff --git a/gcc/config/aarch64/aarch64-simd.md
>> b/gcc/config/aarch64/aarch64-simd.md
>> >> index
>> 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd7
>> 0a92924f62524c15 100644
>> >> --- a/gcc/config/aarch64/aarch64-simd.md
>> >> +++ b/gcc/config/aarch64/aarch64-simd.md
>> >> @@ -3830,6 +3830,46 @@ (define_expand
>> "vcond_mask_<mode><v_int_equiv>"
>> >>    DONE;
>> >>  })
>> >>
>> >> +;; Patterns comparing two vectors and conditionally jump
>> >> +
>> >> +(define_expand "cbranch<mode>4"
>> >> +  [(set (pc)
>> >> +        (if_then_else
>> >> +          (match_operator 0 "aarch64_equality_operator"
>> >> +            [(match_operand:VDQ_I 1 "register_operand")
>> >> +             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
>> >> +          (label_ref (match_operand 3 ""))
>> >> +          (pc)))]
>> >> +  "TARGET_SIMD"
>> >> +{
>> >> +  auto code = GET_CODE (operands[0]);
>> >> +  rtx tmp = operands[1];
>> >> +
>> >> +  /* If comparing against a non-zero vector we have to do a comparison first
>> >> +     so we can have a != 0 comparison with the result.  */
>> >> +  if (operands[2] != CONST0_RTX (<MODE>mode))
>> >> +    emit_insn (gen_vec_cmp<mode><mode> (tmp, operands[0], operands[1],
>> >> +					operands[2]));
>> >> +
>> >> +  /* For 64-bit vectors we need no reductions.  */
>> >> +  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
>> >> +    {
>> >> +      /* Always reduce using a V4SI.  */
>> >> +      rtx reduc = gen_lowpart (V4SImode, tmp);
>> >> +      rtx res = gen_reg_rtx (V4SImode);
>> >> +      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
>> >> +      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
>> >> +    }
>> >> +
>> >> +  rtx val = gen_reg_rtx (DImode);
>> >> +  emit_move_insn (val, gen_lowpart (DImode, tmp));
>> >> +
>> >> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
>> >> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
>> >> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
>> >> +  DONE;
>> >
>> > Are you sure this is correct for the operands[2] != const0_rtx case?
>> > It looks like it uses the same comparison code for the vector comparison
>> > and the scalar comparison.
>> >
>> > E.g. if the pattern is passed a comparison:
>> >
>> >   (eq (reg:V2SI x) (reg:V2SI y))
>> >
>> > it looks like we'd generate a CMEQ for the x and y, then branch
>> > when the DImode bitcast of the CMEQ result equals zero.  This means
>> > that we branch when no elements of x and y are equal, rather than
>> > when all elements of x and y are equal.
>> >
>> > E.g. for:
>> >
>> >    { 1, 2 } == { 1, 2 }
>> >
>> > CMEQ will produce { -1, -1 }, the scalar comparison will be -1 == 0,
>> > and the branch won't be taken.
>> >
>> > ISTM it would be easier for the operands[2] != const0_rtx case to use
>> > EOR instead of a comparison.  That gives a zero result if the input
>> > vectors are equal and a nonzero result if the input vectors are
>> > different.  We can then branch on the result using CODE and const0_rtx.
>> >
>> > (Hope I've got that right.)
>> >
>> > Maybe that also removes the need for patch 18.
>> 
>> Sorry, I forgot to say: we can't use operands[1] as a temporary,
>> since it's only an input to the pattern.  The EOR destination would
>> need to be a fresh register.
>
> I've updated the patch but it doesn't help since cbranch doesn't really push
> comparisons in.  So we don't seem to ever really get called with anything non-zero.

I suppose it won't trigger for the early-break stuff, since for a scalar
== break condition, that wants:

  foo = a == b
  if (foo != 0)
    break

(break if one element equal) rather than:

  foo = a == b
  if (foo == -1)
    break

which is what would fold to:

  if (a == b)
    break

and so be a cbranch on (eq a b).  But keeping it as was would probably
be storing problems up for later.

> That said, I'm not entirely convince that the == case is correct. Since == means all bits
> Equal instead of any bit set, and so it needs to generate cbz instead of cbnz and I'm not
> sure that's guaranteed.

I see you've changed it from:

+  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));

to:

+  emit_jump_insn (gen_cbranchdi4 (operands[0], val, CONST0_RTX (DImode),
+				  operands[3]));

Was that to fix a specific problem?  The original looked OK to me
for that part (it was the vector comparison that I was asking about).

If we do keep the cbranchdi4, I think it's more obviously correct to
recreate operands[0] with the new comparison operands, even if it
happens to work without.

For the == case, both the condjump and cbranch versions will branch iff
all bits of val are zero, which is true iff the result of the EOR is zero,
which is true iff the vector operands were bitwise identical.  So it looks
like it should work.

Thanks,
Richard

> I do have a failing testcase with this but haven't tracked down yet if the mid-end did the
> right thing.  Think there might be a similar issue in a match.pd simplication.
>
> Thoughts on the == case?
>
> Thanks,
> Tamar
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index c6f2d5828373f2a5272b9d1227bfe34365f9fd09..7b289b1fbec6b1f15fbf51b6c862bcf9a5588b6b 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3911,6 +3911,46 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
>    DONE;
>  })
>  
> +;; Patterns comparing two vectors and conditionally jump
> +
> +(define_expand "cbranch<mode>4"
> +  [(set (pc)
> +        (if_then_else
> +          (match_operator 0 "aarch64_equality_operator"
> +            [(match_operand:VDQ_I 1 "register_operand")
> +             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
> +          (label_ref (match_operand 3 ""))
> +          (pc)))]
> +  "TARGET_SIMD"
> +{
> +  auto code = GET_CODE (operands[0]);
> +  rtx tmp = operands[1];
> +
> +  /* If comparing against a non-zero vector we have to do a comparison first
> +     so we can have a != 0 comparison with the result.  */
> +  if (operands[2] != CONST0_RTX (<MODE>mode))
> +    {
> +      tmp = gen_reg_rtx (<MODE>mode);
> +      emit_insn (gen_xor<mode>3 (tmp, operands[1], operands[2]));
> +    }
> +
> +  /* For 64-bit vectors we need no reductions.  */
> +  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
> +    {
> +      /* Always reduce using a V4SI.  */
> +      rtx reduc = gen_lowpart (V4SImode, tmp);
> +      rtx res = gen_reg_rtx (V4SImode);
> +      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
> +      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
> +    }
> +
> +  rtx val = gen_reg_rtx (DImode);
> +  emit_move_insn (val, gen_lowpart (DImode, tmp));
> +  emit_jump_insn (gen_cbranchdi4 (operands[0], val, CONST0_RTX (DImode),
> +				  operands[3]));
> +  DONE;
> +})
> +
>  ;; Patterns comparing two vectors to produce a mask.
>  
>  (define_expand "vec_cmp<mode><mode>"
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
> @@ -0,0 +1,124 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#pragma GCC target "+nosve"
> +
> +#define N 640
> +int a[N] = {0};
> +int b[N] = {0};
> +
> +
> +/*
> +** f1:
> +**	...
> +**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f1 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] > 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f2:
> +**	...
> +**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f2 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] >= 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f3:
> +**	...
> +**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f3 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] == 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f4:
> +**	...
> +**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f4 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] != 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f5:
> +**	...
> +**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f5 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] < 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f6:
> +**	...
> +**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f6 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] <= 0)
> +	break;
> +    }
> +}

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-06  9:37         ` Richard Biener
@ 2023-12-08  8:58           ` Tamar Christina
  2023-12-08 10:28             ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-08  8:58 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 23128 bytes --]

> --param vect-partial-vector-usage=2 would, no?
> 
I.. didn't even know it went to 2!

> > In principal I suppose I could mask the individual stmts, that should handle the
> future case when
> > This is relaxed to supposed non-fix length buffers?
> 
> Well, it looks wrong - either put in an assert that we start with a
> single stmt or assert !masked_loop_p instead?  Better ICE than
> generate wrong code.
> 
> That said, I think you need to apply the masking on the original
> stmts[], before reducing them, no?

Yeah, I've done so now.  For simplicity I've just kept the final masking always as well
and just leave it up to the optimizers to drop it when it's superfluous.

Simple testcase:

#ifndef N
#define N 837
#endif
float vect_a[N];
unsigned vect_b[N];

unsigned test4(double x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   if (vect_a[i] > x)
     break;
   vect_a[i] = x;

 }
 return ret;
}

Looks good now. After this one there's only one patch left, the dependency analysis.
I'm almost done with the cleanup/respin, but want to take the weekend to double check and will post it first thing Monday morning.

Did you want to see the testsuite changes as well again? I've basically just added the right dg-requires-effective and add-options etc.

Thanks for all the reviews!

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
	vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.


--- inline copy of patch ---

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848ae12523576d29744d 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform any
+   codegen associated with the boolean condition.  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    bool analyze_only)
 {
   tree rhs1;
   enum tree_code rhs_code;
+  gassign *def_stmt = NULL;
 
   stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
-  if (!def_stmt_info)
+  if (!def_stmt_info && !analyze_only)
     return false;
+  else if (!def_stmt_info)
+    /* If we're a only analyzing we won't be codegen-ing the statements and are
+       only after if the types match.  In that case we can accept loop invariant
+       values.  */
+    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
+  else
+    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
 
-  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
   if (!def_stmt)
     return false;
 
@@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   analyze_only))
 	return false;
       break;
 
@@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
+	      && !analyze_only
 	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
 	    return false;
 
@@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	    }
 	  else
 	    vecitype = comp_vectype;
-	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
+	  if (!analyze_only
+	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
 	    return false;
 	}
       else
@@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
    VAR is an SSA_NAME that should be transformed from bool to a wider integer
    type, OUT_TYPE is the desired final integer type of the whole pattern.
    STMT_INFO is the info of the pattern root and is where pattern stmts should
-   be associated with.  DEFS is a map of pattern defs.  */
+   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
+   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
 
 static void
 adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
-		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
+		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
+		     gimple *&last_stmt, bool type_only)
 {
   gimple *stmt = SSA_NAME_DEF_STMT (var);
   enum tree_code rhs_code, def_rhs_code;
@@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
     }
 
   gimple_set_location (pattern_stmt, loc);
-  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
-			  get_vectype_for_scalar_type (vinfo, itype));
+  if (!type_only)
+    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
+			    get_vectype_for_scalar_type (vinfo, itype));
+  last_stmt = pattern_stmt;
   defs.put (var, gimple_assign_lhs (pattern_stmt));
 }
 
-/* Comparison function to qsort a vector of gimple stmts after UID.  */
+/* Comparison function to qsort a vector of gimple stmts after BB and UID.
+   the def of one statement can be in an earlier block than the use, so if
+   the BB are different, first compare by BB.  */
 
 static int
 sort_after_uid (const void *p1, const void *p2)
 {
   const gimple *stmt1 = *(const gimple * const *)p1;
   const gimple *stmt2 = *(const gimple * const *)p2;
+  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
+    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
+
   return gimple_uid (stmt1) - gimple_uid (stmt2);
 }
 
 /* Create pattern stmts for all stmts participating in the bool pattern
    specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
-   OUT_TYPE.  Return the def of the pattern root.  */
+   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
+   statements are not emitted as pattern statements and the tree returned is
+   only useful for type queries.  */
 
 static tree
 adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
-		   tree out_type, stmt_vec_info stmt_info)
+		   tree out_type, stmt_vec_info stmt_info,
+		   bool type_only = false)
 {
   /* Gather original stmts in the bool pattern in their order of appearance
      in the IL.  */
@@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
     bool_stmts.quick_push (*i);
   bool_stmts.qsort (sort_after_uid);
 
+  gimple *last_stmt = NULL;
+
   /* Now process them in that order, producing pattern stmts.  */
   hash_map <tree, tree> defs;
-  for (unsigned i = 0; i < bool_stmts.length (); ++i)
-    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
-			 out_type, stmt_info, defs);
+  for (auto bool_stmt : bool_stmts)
+    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
+			 out_type, stmt_info, defs, last_stmt, type_only);
 
   /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
-  gimple *pattern_stmt
-    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
-  return gimple_assign_lhs (pattern_stmt);
+  return gimple_assign_lhs (last_stmt);
 }
 
 /* Return the proper type for converting bool VAR into
@@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      /* If not multiple exits, and loop vectorization don't bother analyzing
+	 the gcond as we don't support SLP today.  */
+      lhs = var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
+  else
+    return NULL;
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, false))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
-	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
+	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
       else if (integer_type_for_mask (var, vinfo))
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (gimple_cond_code (cond),
+				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
+	  vectype = truth_type_for (vectype);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, false))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f562721c140d586c94 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  */
+      if (masked_loop_p)
+	for (auto stmt : stmts)
+	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
+						mask, stmt, &cond_gsi));
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
+			     build_zero_cst (vectype));
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

[-- Attachment #2: rb17969 (1).patch --]
[-- Type: application/octet-stream, Size: 20605 bytes --]

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848ae12523576d29744d 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform any
+   codegen associated with the boolean condition.  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    bool analyze_only)
 {
   tree rhs1;
   enum tree_code rhs_code;
+  gassign *def_stmt = NULL;
 
   stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
-  if (!def_stmt_info)
+  if (!def_stmt_info && !analyze_only)
     return false;
+  else if (!def_stmt_info)
+    /* If we're a only analyzing we won't be codegen-ing the statements and are
+       only after if the types match.  In that case we can accept loop invariant
+       values.  */
+    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
+  else
+    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
 
-  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
   if (!def_stmt)
     return false;
 
@@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   analyze_only))
 	return false;
       break;
 
@@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
+	      && !analyze_only
 	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
 	    return false;
 
@@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	    }
 	  else
 	    vecitype = comp_vectype;
-	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
+	  if (!analyze_only
+	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
 	    return false;
 	}
       else
@@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
    VAR is an SSA_NAME that should be transformed from bool to a wider integer
    type, OUT_TYPE is the desired final integer type of the whole pattern.
    STMT_INFO is the info of the pattern root and is where pattern stmts should
-   be associated with.  DEFS is a map of pattern defs.  */
+   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
+   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
 
 static void
 adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
-		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
+		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
+		     gimple *&last_stmt, bool type_only)
 {
   gimple *stmt = SSA_NAME_DEF_STMT (var);
   enum tree_code rhs_code, def_rhs_code;
@@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
     }
 
   gimple_set_location (pattern_stmt, loc);
-  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
-			  get_vectype_for_scalar_type (vinfo, itype));
+  if (!type_only)
+    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
+			    get_vectype_for_scalar_type (vinfo, itype));
+  last_stmt = pattern_stmt;
   defs.put (var, gimple_assign_lhs (pattern_stmt));
 }
 
-/* Comparison function to qsort a vector of gimple stmts after UID.  */
+/* Comparison function to qsort a vector of gimple stmts after BB and UID.
+   the def of one statement can be in an earlier block than the use, so if
+   the BB are different, first compare by BB.  */
 
 static int
 sort_after_uid (const void *p1, const void *p2)
 {
   const gimple *stmt1 = *(const gimple * const *)p1;
   const gimple *stmt2 = *(const gimple * const *)p2;
+  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
+    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
+
   return gimple_uid (stmt1) - gimple_uid (stmt2);
 }
 
 /* Create pattern stmts for all stmts participating in the bool pattern
    specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
-   OUT_TYPE.  Return the def of the pattern root.  */
+   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
+   statements are not emitted as pattern statements and the tree returned is
+   only useful for type queries.  */
 
 static tree
 adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
-		   tree out_type, stmt_vec_info stmt_info)
+		   tree out_type, stmt_vec_info stmt_info,
+		   bool type_only = false)
 {
   /* Gather original stmts in the bool pattern in their order of appearance
      in the IL.  */
@@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
     bool_stmts.quick_push (*i);
   bool_stmts.qsort (sort_after_uid);
 
+  gimple *last_stmt = NULL;
+
   /* Now process them in that order, producing pattern stmts.  */
   hash_map <tree, tree> defs;
-  for (unsigned i = 0; i < bool_stmts.length (); ++i)
-    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
-			 out_type, stmt_info, defs);
+  for (auto bool_stmt : bool_stmts)
+    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
+			 out_type, stmt_info, defs, last_stmt, type_only);
 
   /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
-  gimple *pattern_stmt
-    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
-  return gimple_assign_lhs (pattern_stmt);
+  return gimple_assign_lhs (last_stmt);
 }
 
 /* Return the proper type for converting bool VAR into
@@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      /* If not multiple exits, and loop vectorization don't bother analyzing
+	 the gcond as we don't support SLP today.  */
+      lhs = var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
+  else
+    return NULL;
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, false))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
-	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
+      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
+	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
       else if (integer_type_for_mask (var, vinfo))
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (gimple_cond_code (cond),
+				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
+	  vectype = truth_type_for (vectype);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, false))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f562721c140d586c94 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  */
+      if (masked_loop_p)
+	for (auto stmt : stmts)
+	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
+						mask, stmt, &cond_gsi));
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
+			     build_zero_cst (vectype));
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-08  8:58           ` Tamar Christina
@ 2023-12-08 10:28             ` Richard Biener
  2023-12-08 13:45               ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-12-08 10:28 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 8 Dec 2023, Tamar Christina wrote:

> > --param vect-partial-vector-usage=2 would, no?
> > 
> I.. didn't even know it went to 2!
> 
> > > In principal I suppose I could mask the individual stmts, that should handle the
> > future case when
> > > This is relaxed to supposed non-fix length buffers?
> > 
> > Well, it looks wrong - either put in an assert that we start with a
> > single stmt or assert !masked_loop_p instead?  Better ICE than
> > generate wrong code.
> > 
> > That said, I think you need to apply the masking on the original
> > stmts[], before reducing them, no?
> 
> Yeah, I've done so now.  For simplicity I've just kept the final masking always as well
> and just leave it up to the optimizers to drop it when it's superfluous.
> 
> Simple testcase:
> 
> #ifndef N
> #define N 837
> #endif
> float vect_a[N];
> unsigned vect_b[N];
> 
> unsigned test4(double x)
> {
>  unsigned ret = 0;
>  for (int i = 0; i < N; i++)
>  {
>    if (vect_a[i] > x)
>      break;
>    vect_a[i] = x;
> 
>  }
>  return ret;
> }
> 
> Looks good now. After this one there's only one patch left, the dependency analysis.
> I'm almost done with the cleanup/respin, but want to take the weekend to double check and will post it first thing Monday morning.
> 
> Did you want to see the testsuite changes as well again? I've basically just added the right dg-requires-effective and add-options etc.

Yes please.

> Thanks for all the reviews!
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
>
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> 	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> 	vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848ae12523576d29744d 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
>     true if bool VAR can and should be optimized that way.  Assume it shouldn't
>     in case it's a result of a comparison which can be directly vectorized into
>     a vector comparison.  Fills in STMTS with all stmts visited during the
> -   walk.  */
> +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform any
> +   codegen associated with the boolean condition.  */
>  
>  static bool
> -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> +		    bool analyze_only)
>  {
>    tree rhs1;
>    enum tree_code rhs_code;
> +  gassign *def_stmt = NULL;
>  
>    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> -  if (!def_stmt_info)
> +  if (!def_stmt_info && !analyze_only)
>      return false;
> +  else if (!def_stmt_info)
> +    /* If we're a only analyzing we won't be codegen-ing the statements and are
> +       only after if the types match.  In that case we can accept loop invariant
> +       values.  */
> +    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
> +  else
> +    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
>  

Hmm, but we're visiting them then?  I wonder how you get along
without doing adjustmens on the uses if you consider

    _1 = a < b;
    _2 = c != d;
    _3 = _1 | _2;
    if (_3 != 0)
      exit loop;

thus a combined condition like

    if (a < b || c != d)

that we if-converted.  We need to recognize that _1, _2 and _3 have
mask uses and thus possibly adjust them.

What bad happens if you drop 'analyze_only'?  We're not really
rewriting anything there.

> -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
>    if (!def_stmt)
>      return false;
>  
> @@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>    switch (rhs_code)
>      {
>      case SSA_NAME:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
>  	return false;
>        break;
>  
>      CASE_CONVERT:
>        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
>  	return false;
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
>  	return false;
>        break;
>  
>      case BIT_NOT_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
>  	return false;
>        break;
>  
>      case BIT_AND_EXPR:
>      case BIT_IOR_EXPR:
>      case BIT_XOR_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
> +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> +				   analyze_only))
>  	return false;
>        break;
>  
> @@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
>  							  TREE_TYPE (rhs1));
>  	  if (mask_type
> +	      && !analyze_only
>  	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
>  	    return false;
>  
> @@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>  	    }
>  	  else
>  	    vecitype = comp_vectype;
> -	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> +	  if (!analyze_only
> +	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
>  	    return false;
>  	}
>        else
> @@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
>     VAR is an SSA_NAME that should be transformed from bool to a wider integer
>     type, OUT_TYPE is the desired final integer type of the whole pattern.
>     STMT_INFO is the info of the pattern root and is where pattern stmts should
> -   be associated with.  DEFS is a map of pattern defs.  */
> +   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
> +   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
>  
>  static void
>  adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
> -		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
> +		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
> +		     gimple *&last_stmt, bool type_only)
>  {
>    gimple *stmt = SSA_NAME_DEF_STMT (var);
>    enum tree_code rhs_code, def_rhs_code;
> @@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
>      }
>  
>    gimple_set_location (pattern_stmt, loc);
> -  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> -			  get_vectype_for_scalar_type (vinfo, itype));
> +  if (!type_only)
> +    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> +			    get_vectype_for_scalar_type (vinfo, itype));
> +  last_stmt = pattern_stmt;
>    defs.put (var, gimple_assign_lhs (pattern_stmt));
>  }
>  
> -/* Comparison function to qsort a vector of gimple stmts after UID.  */
> +/* Comparison function to qsort a vector of gimple stmts after BB and UID.
> +   the def of one statement can be in an earlier block than the use, so if
> +   the BB are different, first compare by BB.  */
>  
>  static int
>  sort_after_uid (const void *p1, const void *p2)
>  {
>    const gimple *stmt1 = *(const gimple * const *)p1;
>    const gimple *stmt2 = *(const gimple * const *)p2;
> +  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
> +    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
> +

is this because you eventually get out-of-loop stmts (without UID)?

>    return gimple_uid (stmt1) - gimple_uid (stmt2);
>  }
>  
>  /* Create pattern stmts for all stmts participating in the bool pattern
>     specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
> -   OUT_TYPE.  Return the def of the pattern root.  */
> +   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
> +   statements are not emitted as pattern statements and the tree returned is
> +   only useful for type queries.  */
>  
>  static tree
>  adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
> -		   tree out_type, stmt_vec_info stmt_info)
> +		   tree out_type, stmt_vec_info stmt_info,
> +		   bool type_only = false)
>  {
>    /* Gather original stmts in the bool pattern in their order of appearance
>       in the IL.  */
> @@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
>      bool_stmts.quick_push (*i);
>    bool_stmts.qsort (sort_after_uid);
>  
> +  gimple *last_stmt = NULL;
> +
>    /* Now process them in that order, producing pattern stmts.  */
>    hash_map <tree, tree> defs;
> -  for (unsigned i = 0; i < bool_stmts.length (); ++i)
> -    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
> -			 out_type, stmt_info, defs);
> +  for (auto bool_stmt : bool_stmts)
> +    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
> +			 out_type, stmt_info, defs, last_stmt, type_only);
>  
>    /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
> -  gimple *pattern_stmt
> -    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> -  return gimple_assign_lhs (pattern_stmt);
> +  return gimple_assign_lhs (last_stmt);
>  }
>  
>  /* Return the proper type for converting bool VAR into
> @@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
>    enum tree_code rhs_code;
>    tree var, lhs, rhs, vectype;
>    gimple *pattern_stmt;
> -
> -  if (!is_gimple_assign (last_stmt))
> +  gcond* cond = NULL;
> +  if (!is_gimple_assign (last_stmt)
> +      && !(cond = dyn_cast <gcond *> (last_stmt)))
>      return NULL;
>  
> -  var = gimple_assign_rhs1 (last_stmt);
> -  lhs = gimple_assign_lhs (last_stmt);
> -  rhs_code = gimple_assign_rhs_code (last_stmt);
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (is_gimple_assign (last_stmt))
> +    {
> +      var = gimple_assign_rhs1 (last_stmt);
> +      lhs = gimple_assign_lhs (last_stmt);
> +      rhs_code = gimple_assign_rhs_code (last_stmt);
> +    }
> +  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    {
> +      /* If not multiple exits, and loop vectorization don't bother analyzing
> +	 the gcond as we don't support SLP today.  */
> +      lhs = var = gimple_cond_lhs (last_stmt);
> +      rhs_code = gimple_cond_code (last_stmt);
> +    }
> +  else
> +    return NULL;
>  
>    if (rhs_code == VIEW_CONVERT_EXPR)
>      var = TREE_OPERAND (var, 0);
> @@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  	return NULL;
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
>  	{
>  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				   TREE_TYPE (lhs), stmt_vinfo);
> @@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  
>        return pattern_stmt;
>      }
> -  else if (rhs_code == COND_EXPR
> +  else if ((rhs_code == COND_EXPR || cond)
>  	   && TREE_CODE (var) == SSA_NAME)
>      {
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> @@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> -	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
> +	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
>        else if (integer_type_for_mask (var, vinfo))
>  	return NULL;
>  
> -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> -      pattern_stmt 
> -	= gimple_build_assign (lhs, COND_EXPR,
> -			       build2 (NE_EXPR, boolean_type_node,
> -				       var, build_int_cst (TREE_TYPE (var), 0)),
> -			       gimple_assign_rhs2 (last_stmt),
> -			       gimple_assign_rhs3 (last_stmt));
> +      if (!cond)
> +	{
> +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> +	  pattern_stmt
> +	    = gimple_build_assign (lhs, COND_EXPR,
> +				   build2 (NE_EXPR, boolean_type_node, var,
> +					   build_int_cst (TREE_TYPE (var), 0)),
> +				   gimple_assign_rhs2 (last_stmt),
> +				   gimple_assign_rhs3 (last_stmt));
> +	}
> +      else
> +	{
> +	  pattern_stmt
> +	    = gimple_build_cond (gimple_cond_code (cond),
> +				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
> +				 gimple_cond_true_label (cond),
> +				 gimple_cond_false_label (cond));
> +	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
> +	  vectype = truth_type_for (vectype);
> +	}
>        *type_out = vectype;
>        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
>  

So this is also quite odd.  You're hooking into COND_EXPR handling
but only look at the LHS of the GIMPLE_COND compare.

Please refactor the changes to separate the GIMPLE_COND path
completely.

Is there test coverage for such "complex" condition?  I think
you'll need adjustments to vect_recog_mask_conversion_pattern
as well similar as to how COND_EXPR is handled there.

> @@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
>  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				 TREE_TYPE (vectype), stmt_vinfo);
>        else
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f562721c140d586c94 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  gcc_assert (vectype);
> +
> +  tree vectype_op0 = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype_op0))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }

I think you rely on patterns transforming this into canonical form
mask != 0, so I suggest to check this here.

> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1

Also required for vec_num > 1 with SLP
(SLP_TREE_NUMBER_OF_VEC_STMTS)

> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  tree mask = NULL_TREE;
> +  if (masked_loop_p)
> +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +
> +      /* Mask the statements as we queue them up.  */
> +      if (masked_loop_p)
> +	for (auto stmt : stmts)
> +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> +						mask, stmt, &cond_gsi));

I think this still uses the wrong mask, you need to use

  vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, <cnt>)

replacing <cnt> with the vector def index to mask I think.  For this
reason keeping the "final" mask below is also wrong.

Or am I missing something?

> +      else
> +	workset.splice (stmts);
> +
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  /* If we have multiple statements after reduction we should check all the
> +     lanes and treat it as a full vector.  */
> +  if (masked_loop_p)
> +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			     &cond_gsi);

so just do this in the else path above

Otherwise looks OK.

Richard.

> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
> +			     build_zero_cst (vectype));
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      gcond *cond = NULL;
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else if ((cond = dyn_cast <gcond *> (stmt)))
> +	{
> +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> +	     single bit precision and we need the vector boolean to be a
> +	     representation of the integer mask.  So set the correct integer type and
> +	     convert to boolean vector once we have a vectype.  */
> +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> +	}
>        else
>  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>  
> @@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>  			     "get vectype for scalar type: %T\n", scalar_type);
>  	}
>        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +
>        if (!vectype)
>  	return opt_result::failure_at (stmt,
>  				       "not vectorized:"
>  				       " unsupported data-type %T\n",
>  				       scalar_type);
>  
> +      /* If we were a gcond, convert the resulting type to a vector boolean type now
> +	 that we have the correct integer mask type.  */
> +      if (cond)
> +	vectype = truth_type_for (vectype);
> +
>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
>      }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-08 10:28             ` Richard Biener
@ 2023-12-08 13:45               ` Tamar Christina
  2023-12-08 13:59                 ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-08 13:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, December 8, 2023 10:28 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Fri, 8 Dec 2023, Tamar Christina wrote:
> 
> > > --param vect-partial-vector-usage=2 would, no?
> > >
> > I.. didn't even know it went to 2!
> >
> > > > In principal I suppose I could mask the individual stmts, that should handle
> the
> > > future case when
> > > > This is relaxed to supposed non-fix length buffers?
> > >
> > > Well, it looks wrong - either put in an assert that we start with a
> > > single stmt or assert !masked_loop_p instead?  Better ICE than
> > > generate wrong code.
> > >
> > > That said, I think you need to apply the masking on the original
> > > stmts[], before reducing them, no?
> >
> > Yeah, I've done so now.  For simplicity I've just kept the final masking always as
> well
> > and just leave it up to the optimizers to drop it when it's superfluous.
> >
> > Simple testcase:
> >
> > #ifndef N
> > #define N 837
> > #endif
> > float vect_a[N];
> > unsigned vect_b[N];
> >
> > unsigned test4(double x)
> > {
> >  unsigned ret = 0;
> >  for (int i = 0; i < N; i++)
> >  {
> >    if (vect_a[i] > x)
> >      break;
> >    vect_a[i] = x;
> >
> >  }
> >  return ret;
> > }
> >
> > Looks good now. After this one there's only one patch left, the dependency
> analysis.
> > I'm almost done with the cleanup/respin, but want to take the weekend to
> double check and will post it first thing Monday morning.
> >
> > Did you want to see the testsuite changes as well again? I've basically just added
> the right dg-requires-effective and add-options etc.
> 
> Yes please.
> 
> > Thanks for all the reviews!
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> > 	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> > 	vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
> > 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> > 	lhs.
> > 	(vectorizable_early_exit): New.
> > 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> > 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> >
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index
> 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848
> ae12523576d29744d 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> *pattern_stmt,
> >    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> >      {
> >        gcc_assert (!vectype
> > +		  || is_a <gcond *> (pattern_stmt)
> >  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
> >  		      == vect_use_mask_type_p (orig_stmt_info)));
> >        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info
> *vinfo,
> >     true if bool VAR can and should be optimized that way.  Assume it shouldn't
> >     in case it's a result of a comparison which can be directly vectorized into
> >     a vector comparison.  Fills in STMTS with all stmts visited during the
> > -   walk.  */
> > +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform
> any
> > +   codegen associated with the boolean condition.  */
> >
> >  static bool
> > -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> > +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> > +		    bool analyze_only)
> >  {
> >    tree rhs1;
> >    enum tree_code rhs_code;
> > +  gassign *def_stmt = NULL;
> >
> >    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> > -  if (!def_stmt_info)
> > +  if (!def_stmt_info && !analyze_only)
> >      return false;
> > +  else if (!def_stmt_info)
> > +    /* If we're a only analyzing we won't be codegen-ing the statements and are
> > +       only after if the types match.  In that case we can accept loop invariant
> > +       values.  */
> > +    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
> > +  else
> > +    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> >
> 
> Hmm, but we're visiting them then?  I wonder how you get along
> without doing adjustmens on the uses if you consider
> 
>     _1 = a < b;
>     _2 = c != d;
>     _3 = _1 | _2;
>     if (_3 != 0)
>       exit loop;
> 
> thus a combined condition like
> 
>     if (a < b || c != d)
> 
> that we if-converted.  We need to recognize that _1, _2 and _3 have
> mask uses and thus possibly adjust them.
> 
> What bad happens if you drop 'analyze_only'?  We're not really
> rewriting anything there.

You mean drop it only in the above? We then fail to update the type for
the gcond.  So in certain circumstances like with

int a, c, d;
short b;

int
main ()
{
  int e[1];
  for (; b < 2; b++)
    {
      a = 0;
      if (b == 28378)
        a = e[b];
      if (!(d || b))
        for (; c;)
          ;
    }
  return 0;
}

Unless we walk the statements regardless of whether they come from inside the loop or not.

> 
> > -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> >    if (!def_stmt)
> >      return false;
> >
> > @@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo,
> hash_set<gimple *> &stmts)
> >    switch (rhs_code)
> >      {
> >      case SSA_NAME:
> > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> >  	return false;
> >        break;
> >
> >      CASE_CONVERT:
> >        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
> >  	return false;
> > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> >  	return false;
> >        break;
> >
> >      case BIT_NOT_EXPR:
> > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> >  	return false;
> >        break;
> >
> >      case BIT_AND_EXPR:
> >      case BIT_IOR_EXPR:
> >      case BIT_XOR_EXPR:
> > -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> > -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
> > +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> > +				   analyze_only))
> >  	return false;
> >        break;
> >
> > @@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo,
> hash_set<gimple *> &stmts)
> >  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
> >  							  TREE_TYPE (rhs1));
> >  	  if (mask_type
> > +	      && !analyze_only
> >  	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
> >  	    return false;
> >
> > @@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo,
> hash_set<gimple *> &stmts)
> >  	    }
> >  	  else
> >  	    vecitype = comp_vectype;
> > -	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> > +	  if (!analyze_only
> > +	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> >  	    return false;
> >  	}
> >        else
> > @@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
> >     VAR is an SSA_NAME that should be transformed from bool to a wider integer
> >     type, OUT_TYPE is the desired final integer type of the whole pattern.
> >     STMT_INFO is the info of the pattern root and is where pattern stmts should
> > -   be associated with.  DEFS is a map of pattern defs.  */
> > +   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
> > +   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
> >
> >  static void
> >  adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
> > -		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
> > +		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
> > +		     gimple *&last_stmt, bool type_only)
> >  {
> >    gimple *stmt = SSA_NAME_DEF_STMT (var);
> >    enum tree_code rhs_code, def_rhs_code;
> > @@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree
> out_type,
> >      }
> >
> >    gimple_set_location (pattern_stmt, loc);
> > -  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > -			  get_vectype_for_scalar_type (vinfo, itype));
> > +  if (!type_only)
> > +    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > +			    get_vectype_for_scalar_type (vinfo, itype));
> > +  last_stmt = pattern_stmt;
> >    defs.put (var, gimple_assign_lhs (pattern_stmt));
> >  }
> >
> > -/* Comparison function to qsort a vector of gimple stmts after UID.  */
> > +/* Comparison function to qsort a vector of gimple stmts after BB and UID.
> > +   the def of one statement can be in an earlier block than the use, so if
> > +   the BB are different, first compare by BB.  */
> >
> >  static int
> >  sort_after_uid (const void *p1, const void *p2)
> >  {
> >    const gimple *stmt1 = *(const gimple * const *)p1;
> >    const gimple *stmt2 = *(const gimple * const *)p2;
> > +  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
> > +    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
> > +
> 
> is this because you eventually get out-of-loop stmts (without UID)?
> 

No the problem I was having is that with an early exit the statement of
one branch of the compare can be in a different BB than the other.

The testcase specifically was this:

int a, c, d;
short b;

int
main ()
{
  int e[1];
  for (; b < 2; b++)
    {
      a = 0;
      if (b == 28378)
        a = e[b];
      if (!(d || b))
        for (; c;)
          ;
    }
  return 0;
}

Without debug info it happened to work:

>>> p gimple_uid (bool_stmts[0])
$1 = 3
>>> p gimple_uid (bool_stmts[1])
$2 = 3
>>> p gimple_uid (bool_stmts[2])
$3 = 4

The first two statements got the same uid, but are in different BB in the loop.
When we add debug, it looks like 1 bb got more debug state than the other:

>>> p gimple_uid (bool_stmts[0])
$1 = 3
>>> p gimple_uid (bool_stmts[1])
$2 = 4
>>> p gimple_uid (bool_stmts[2])
$3 = 6

That last statement, which now has a UID of 6 used to be 3.

> >    return gimple_uid (stmt1) - gimple_uid (stmt2);
> >  }
> >
> >  /* Create pattern stmts for all stmts participating in the bool pattern
> >     specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
> > -   OUT_TYPE.  Return the def of the pattern root.  */
> > +   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
> > +   statements are not emitted as pattern statements and the tree returned is
> > +   only useful for type queries.  */
> >
> >  static tree
> >  adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
> > -		   tree out_type, stmt_vec_info stmt_info)
> > +		   tree out_type, stmt_vec_info stmt_info,
> > +		   bool type_only = false)
> >  {
> >    /* Gather original stmts in the bool pattern in their order of appearance
> >       in the IL.  */
> > @@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set
> <gimple *> &bool_stmt_set,
> >      bool_stmts.quick_push (*i);
> >    bool_stmts.qsort (sort_after_uid);
> >
> > +  gimple *last_stmt = NULL;
> > +
> >    /* Now process them in that order, producing pattern stmts.  */
> >    hash_map <tree, tree> defs;
> > -  for (unsigned i = 0; i < bool_stmts.length (); ++i)
> > -    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
> > -			 out_type, stmt_info, defs);
> > +  for (auto bool_stmt : bool_stmts)
> > +    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
> > +			 out_type, stmt_info, defs, last_stmt, type_only);
> >
> >    /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
> > -  gimple *pattern_stmt
> > -    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> > -  return gimple_assign_lhs (pattern_stmt);
> > +  return gimple_assign_lhs (last_stmt);
> >  }
> >
> >  /* Return the proper type for converting bool VAR into
> > @@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
> >    enum tree_code rhs_code;
> >    tree var, lhs, rhs, vectype;
> >    gimple *pattern_stmt;
> > -
> > -  if (!is_gimple_assign (last_stmt))
> > +  gcond* cond = NULL;
> > +  if (!is_gimple_assign (last_stmt)
> > +      && !(cond = dyn_cast <gcond *> (last_stmt)))
> >      return NULL;
> >
> > -  var = gimple_assign_rhs1 (last_stmt);
> > -  lhs = gimple_assign_lhs (last_stmt);
> > -  rhs_code = gimple_assign_rhs_code (last_stmt);
> > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > +  if (is_gimple_assign (last_stmt))
> > +    {
> > +      var = gimple_assign_rhs1 (last_stmt);
> > +      lhs = gimple_assign_lhs (last_stmt);
> > +      rhs_code = gimple_assign_rhs_code (last_stmt);
> > +    }
> > +  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +    {
> > +      /* If not multiple exits, and loop vectorization don't bother analyzing
> > +	 the gcond as we don't support SLP today.  */
> > +      lhs = var = gimple_cond_lhs (last_stmt);
> > +      rhs_code = gimple_cond_code (last_stmt);
> > +    }
> > +  else
> > +    return NULL;
> >
> >    if (rhs_code == VIEW_CONVERT_EXPR)
> >      var = TREE_OPERAND (var, 0);
> > @@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> >  	return NULL;
> >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> >
> > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> >  	{
> >  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
> >  				   TREE_TYPE (lhs), stmt_vinfo);
> > @@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> >
> >        return pattern_stmt;
> >      }
> > -  else if (rhs_code == COND_EXPR
> > +  else if ((rhs_code == COND_EXPR || cond)
> >  	   && TREE_CODE (var) == SSA_NAME)
> >      {
> >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> > @@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
> >        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
> >  	return NULL;
> >
> > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > -	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> > +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
> > +	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
> >        else if (integer_type_for_mask (var, vinfo))
> >  	return NULL;
> >
> > -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > -      pattern_stmt
> > -	= gimple_build_assign (lhs, COND_EXPR,
> > -			       build2 (NE_EXPR, boolean_type_node,
> > -				       var, build_int_cst (TREE_TYPE (var), 0)),
> > -			       gimple_assign_rhs2 (last_stmt),
> > -			       gimple_assign_rhs3 (last_stmt));
> > +      if (!cond)
> > +	{
> > +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > +	  pattern_stmt
> > +	    = gimple_build_assign (lhs, COND_EXPR,
> > +				   build2 (NE_EXPR, boolean_type_node, var,
> > +					   build_int_cst (TREE_TYPE (var), 0)),
> > +				   gimple_assign_rhs2 (last_stmt),
> > +				   gimple_assign_rhs3 (last_stmt));
> > +	}
> > +      else
> > +	{
> > +	  pattern_stmt
> > +	    = gimple_build_cond (gimple_cond_code (cond),
> > +				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
> > +				 gimple_cond_true_label (cond),
> > +				 gimple_cond_false_label (cond));
> > +	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
> > +	  vectype = truth_type_for (vectype);
> > +	}
> >        *type_out = vectype;
> >        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
> >
> 
> So this is also quite odd.  You're hooking into COND_EXPR handling
> but only look at the LHS of the GIMPLE_COND compare.
> 

Hmm, not sure I follow, GIMPLE_CONDs don't have an LHS no? we look at the LHS
For the COND_EXPR but a GCOND we just recreate the statement and set vectype
based on the updated var. I guess this is related to:

> that we if-converted.  We need to recognize that _1, _2 and _3 have
> mask uses and thus possibly adjust them.

Which I did think about somewhat, so what you're saying is that I need to create
a new GIMPLE_COND here with an NE to 0 compare against var like the COND_EXPR
case?

> Please refactor the changes to separate the GIMPLE_COND path
> completely.
> 

Ok, then it seems better to make two patterns?

> Is there test coverage for such "complex" condition?  I think
> you'll need adjustments to vect_recog_mask_conversion_pattern
> as well similar as to how COND_EXPR is handled there.

Yes, the existing testsuite has many cases which fail, including gcc/testsuite/gcc.c-torture/execute/20150611-1.c

Cheers,
Tamar

> 
> > @@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> >        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
> >  	return NULL;
> >
> > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> >  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
> >  				 TREE_TYPE (vectype), stmt_vinfo);
> >        else
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f5
> 62721c140d586c94 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >    vec<tree> vec_oprnds0 = vNULL;
> >    vec<tree> vec_oprnds1 = vNULL;
> >    tree mask_type;
> > -  tree mask;
> > +  tree mask = NULL_TREE;
> >
> >    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> >      return false;
> > @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >    /* Transform.  */
> >
> >    /* Handle def.  */
> > -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> > -  mask = vect_create_destination_var (lhs, mask_type);
> > +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> > +  if (lhs)
> > +    mask = vect_create_destination_var (lhs, mask_type);
> >
> >    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
> >  		     rhs1, &vec_oprnds0, vectype,
> > @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >        gimple *new_stmt;
> >        vec_rhs2 = vec_oprnds1[i];
> >
> > -      new_temp = make_ssa_name (mask);
> > +      if (lhs)
> > +	new_temp = make_ssa_name (mask);
> > +      else
> > +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
> >        if (bitop1 == NOP_EXPR)
> >  	{
> >  	  new_stmt = gimple_build_assign (new_temp, code,
> > @@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
> >    return true;
> >  }
> >
> > +/* Check to see if the current early break given in STMT_INFO is valid for
> > +   vectorization.  */
> > +
> > +static bool
> > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> > +{
> > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > +  if (!loop_vinfo
> > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > +    return false;
> > +
> > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> > +    return false;
> > +
> > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > +    return false;
> > +
> > +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> > +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> > +  gcc_assert (vectype);
> > +
> > +  tree vectype_op0 = NULL_TREE;
> > +  slp_tree slp_op0;
> > +  tree op0;
> > +  enum vect_def_type dt0;
> > +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> > +			   &vectype_op0))
> > +    {
> > +      if (dump_enabled_p ())
> > +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			   "use not simple.\n");
> > +	return false;
> > +    }
> 
> I think you rely on patterns transforming this into canonical form
> mask != 0, so I suggest to check this here.
> 
> > +  machine_mode mode = TYPE_MODE (vectype);
> > +  int ncopies;
> > +
> > +  if (slp_node)
> > +    ncopies = 1;
> > +  else
> > +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> > +
> > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > +
> > +  /* Analyze only.  */
> > +  if (!vec_stmt)
> > +    {
> > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target doesn't support flag setting vector "
> > +			       "comparisons.\n");
> > +	  return false;
> > +	}
> > +
> > +      if (ncopies > 1
> 
> Also required for vec_num > 1 with SLP
> (SLP_TREE_NUMBER_OF_VEC_STMTS)
> 
> > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target does not support boolean vector OR for "
> > +			       "type %T.\n", vectype);
> > +	  return false;
> > +	}
> > +
> > +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > +				      vec_stmt, slp_node, cost_vec))
> > +	return false;
> > +
> > +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> > +	{
> > +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> > +					      OPTIMIZE_FOR_SPEED))
> > +	    return false;
> > +	  else
> > +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> > +	}
> > +
> > +
> > +      return true;
> > +    }
> > +
> > +  /* Tranform.  */
> > +
> > +  tree new_temp = NULL_TREE;
> > +  gimple *new_stmt = NULL;
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> > +
> > +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > +				  vec_stmt, slp_node, cost_vec))
> > +    gcc_unreachable ();
> > +
> > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > +  basic_block cond_bb = gimple_bb (stmt);
> > +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> > +
> > +  auto_vec<tree> stmts;
> > +
> > +  tree mask = NULL_TREE;
> > +  if (masked_loop_p)
> > +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> > +
> > +  if (slp_node)
> > +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> > +  else
> > +    {
> > +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> > +      stmts.reserve_exact (vec_stmts.length ());
> > +      for (auto stmt : vec_stmts)
> > +	stmts.quick_push (gimple_assign_lhs (stmt));
> > +    }
> > +
> > +  /* Determine if we need to reduce the final value.  */
> > +  if (stmts.length () > 1)
> > +    {
> > +      /* We build the reductions in a way to maintain as much parallelism as
> > +	 possible.  */
> > +      auto_vec<tree> workset (stmts.length ());
> > +
> > +      /* Mask the statements as we queue them up.  */
> > +      if (masked_loop_p)
> > +	for (auto stmt : stmts)
> > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > +						mask, stmt, &cond_gsi));
> 
> I think this still uses the wrong mask, you need to use
> 
>   vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, <cnt>)
> 
> replacing <cnt> with the vector def index to mask I think.  For this
> reason keeping the "final" mask below is also wrong.
> 
> Or am I missing something?
> 
> > +      else
> > +	workset.splice (stmts);
> > +
> > +      while (workset.length () > 1)
> > +	{
> > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > +	  tree arg0 = workset.pop ();
> > +	  tree arg1 = workset.pop ();
> > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > +				       &cond_gsi);
> > +	  workset.quick_insert (0, new_temp);
> > +	}
> > +    }
> > +  else
> > +    new_temp = stmts[0];
> > +
> > +  gcc_assert (new_temp);
> > +
> > +  tree cond = new_temp;
> > +  /* If we have multiple statements after reduction we should check all the
> > +     lanes and treat it as a full vector.  */
> > +  if (masked_loop_p)
> > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > +			     &cond_gsi);
> 
> so just do this in the else path above
> 
> Otherwise looks OK.
> 
> Richard.
> 
> > +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> > +     codegen so we must replace the original insn.  */
> > +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> > +  gcond *cond_stmt = as_a <gcond *>(stmt);
> > +  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
> > +			     build_zero_cst (vectype));
> > +  update_stmt (stmt);
> > +
> > +  if (slp_node)
> > +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> > +   else
> > +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> > +
> > +
> > +  if (!slp_node)
> > +    *vec_stmt = stmt;
> > +
> > +  return true;
> > +}
> > +
> >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> >     can handle all live statements in the node.  Otherwise return true
> >     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> > @@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
> >  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> >  				  stmt_info, NULL, node)
> >  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > -				   stmt_info, NULL, node, cost_vec));
> > +				   stmt_info, NULL, node, cost_vec)
> > +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +				      cost_vec));
> >    else
> >      {
> >        if (bb_vinfo)
> > @@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
> >  					 NULL, NULL, node, cost_vec)
> >  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> >  					  cost_vec)
> > -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +					  cost_vec));
> > +
> >      }
> >
> >    if (node)
> > @@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
> >        gcc_assert (done);
> >        break;
> >
> > +    case loop_exit_ctrl_vec_info_type:
> > +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> > +				      slp_node, NULL);
> > +      gcc_assert (done);
> > +      break;
> > +
> >      default:
> >        if (!STMT_VINFO_LIVE_P (stmt_info))
> >  	{
> > @@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >      }
> >    else
> >      {
> > +      gcond *cond = NULL;
> >        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> >  	scalar_type = TREE_TYPE (DR_REF (dr));
> >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > +      else if ((cond = dyn_cast <gcond *> (stmt)))
> > +	{
> > +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> > +	     single bit precision and we need the vector boolean to be a
> > +	     representation of the integer mask.  So set the correct integer type and
> > +	     convert to boolean vector once we have a vectype.  */
> > +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> > +	}
> >        else
> >  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> >
> > @@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >  			     "get vectype for scalar type: %T\n", scalar_type);
> >  	}
> >        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> > +
> >        if (!vectype)
> >  	return opt_result::failure_at (stmt,
> >  				       "not vectorized:"
> >  				       " unsupported data-type %T\n",
> >  				       scalar_type);
> >
> > +      /* If we were a gcond, convert the resulting type to a vector boolean type
> now
> > +	 that we have the correct integer mask type.  */
> > +      if (cond)
> > +	vectype = truth_type_for (vectype);
> > +
> >        if (dump_enabled_p ())
> >  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
> >      }
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-08 13:45               ` Tamar Christina
@ 2023-12-08 13:59                 ` Richard Biener
  2023-12-08 15:01                   ` Tamar Christina
  2023-12-11  7:09                   ` Tamar Christina
  0 siblings, 2 replies; 200+ messages in thread
From: Richard Biener @ 2023-12-08 13:59 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 8 Dec 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Friday, December 8, 2023 10:28 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> > codegen of exit code
> > 
> > On Fri, 8 Dec 2023, Tamar Christina wrote:
> > 
> > > > --param vect-partial-vector-usage=2 would, no?
> > > >
> > > I.. didn't even know it went to 2!
> > >
> > > > > In principal I suppose I could mask the individual stmts, that should handle
> > the
> > > > future case when
> > > > > This is relaxed to supposed non-fix length buffers?
> > > >
> > > > Well, it looks wrong - either put in an assert that we start with a
> > > > single stmt or assert !masked_loop_p instead?  Better ICE than
> > > > generate wrong code.
> > > >
> > > > That said, I think you need to apply the masking on the original
> > > > stmts[], before reducing them, no?
> > >
> > > Yeah, I've done so now.  For simplicity I've just kept the final masking always as
> > well
> > > and just leave it up to the optimizers to drop it when it's superfluous.
> > >
> > > Simple testcase:
> > >
> > > #ifndef N
> > > #define N 837
> > > #endif
> > > float vect_a[N];
> > > unsigned vect_b[N];
> > >
> > > unsigned test4(double x)
> > > {
> > >  unsigned ret = 0;
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >    if (vect_a[i] > x)
> > >      break;
> > >    vect_a[i] = x;
> > >
> > >  }
> > >  return ret;
> > > }
> > >
> > > Looks good now. After this one there's only one patch left, the dependency
> > analysis.
> > > I'm almost done with the cleanup/respin, but want to take the weekend to
> > double check and will post it first thing Monday morning.
> > >
> > > Did you want to see the testsuite changes as well again? I've basically just added
> > the right dg-requires-effective and add-options etc.
> > 
> > Yes please.
> > 
> > > Thanks for all the reviews!
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> > > 	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> > > 	vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
> > > 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> > > 	lhs.
> > > 	(vectorizable_early_exit): New.
> > > 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> > > 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> > >
> > >
> > > --- inline copy of patch ---
> > >
> > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > index
> > 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848
> > ae12523576d29744d 100644
> > > --- a/gcc/tree-vect-patterns.cc
> > > +++ b/gcc/tree-vect-patterns.cc
> > > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> > *pattern_stmt,
> > >    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> > >      {
> > >        gcc_assert (!vectype
> > > +		  || is_a <gcond *> (pattern_stmt)
> > >  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
> > >  		      == vect_use_mask_type_p (orig_stmt_info)));
> > >        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > > @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info
> > *vinfo,
> > >     true if bool VAR can and should be optimized that way.  Assume it shouldn't
> > >     in case it's a result of a comparison which can be directly vectorized into
> > >     a vector comparison.  Fills in STMTS with all stmts visited during the
> > > -   walk.  */
> > > +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform
> > any
> > > +   codegen associated with the boolean condition.  */
> > >
> > >  static bool
> > > -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> > > +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> > > +		    bool analyze_only)
> > >  {
> > >    tree rhs1;
> > >    enum tree_code rhs_code;
> > > +  gassign *def_stmt = NULL;
> > >
> > >    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> > > -  if (!def_stmt_info)
> > > +  if (!def_stmt_info && !analyze_only)
> > >      return false;
> > > +  else if (!def_stmt_info)
> > > +    /* If we're a only analyzing we won't be codegen-ing the statements and are
> > > +       only after if the types match.  In that case we can accept loop invariant
> > > +       values.  */
> > > +    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
> > > +  else
> > > +    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> > >
> > 
> > Hmm, but we're visiting them then?  I wonder how you get along
> > without doing adjustmens on the uses if you consider
> > 
> >     _1 = a < b;
> >     _2 = c != d;
> >     _3 = _1 | _2;
> >     if (_3 != 0)
> >       exit loop;
> > 
> > thus a combined condition like
> > 
> >     if (a < b || c != d)
> > 
> > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > mask uses and thus possibly adjust them.
> > 
> > What bad happens if you drop 'analyze_only'?  We're not really
> > rewriting anything there.
> 
> You mean drop it only in the above? We then fail to update the type for
> the gcond.  So in certain circumstances like with
> 
> int a, c, d;
> short b;
> 
> int
> main ()
> {
>   int e[1];
>   for (; b < 2; b++)
>     {
>       a = 0;
>       if (b == 28378)
>         a = e[b];
>       if (!(d || b))
>         for (; c;)
>           ;
>     }
>   return 0;
> }
> 
> Unless we walk the statements regardless of whether they come from inside the loop or not.

What do you mean by "fail to update the type for the gcond"?  If
I understood correctly the 'analyze_only' short-cuts some
checks, it doens't add some?

But it's hard to follow what's actually done for a gcond ...

> > 
> > > -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> > >    if (!def_stmt)
> > >      return false;
> > >
> > > @@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > hash_set<gimple *> &stmts)
> > >    switch (rhs_code)
> > >      {
> > >      case SSA_NAME:
> > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > >  	return false;
> > >        break;
> > >
> > >      CASE_CONVERT:
> > >        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
> > >  	return false;
> > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > >  	return false;
> > >        break;
> > >
> > >      case BIT_NOT_EXPR:
> > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > >  	return false;
> > >        break;
> > >
> > >      case BIT_AND_EXPR:
> > >      case BIT_IOR_EXPR:
> > >      case BIT_XOR_EXPR:
> > > -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> > > -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
> > > +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> > > +				   analyze_only))
> > >  	return false;
> > >        break;
> > >
> > > @@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > hash_set<gimple *> &stmts)
> > >  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
> > >  							  TREE_TYPE (rhs1));
> > >  	  if (mask_type
> > > +	      && !analyze_only
> > >  	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
> > >  	    return false;
> > >
> > > @@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > hash_set<gimple *> &stmts)
> > >  	    }
> > >  	  else
> > >  	    vecitype = comp_vectype;
> > > -	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> > > +	  if (!analyze_only
> > > +	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> > >  	    return false;
> > >  	}
> > >        else
> > > @@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
> > >     VAR is an SSA_NAME that should be transformed from bool to a wider integer
> > >     type, OUT_TYPE is the desired final integer type of the whole pattern.
> > >     STMT_INFO is the info of the pattern root and is where pattern stmts should
> > > -   be associated with.  DEFS is a map of pattern defs.  */
> > > +   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
> > > +   create new pattern statements and instead only fill LAST_STMT and DEFS.  */
> > >
> > >  static void
> > >  adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
> > > -		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
> > > +		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
> > > +		     gimple *&last_stmt, bool type_only)
> > >  {
> > >    gimple *stmt = SSA_NAME_DEF_STMT (var);
> > >    enum tree_code rhs_code, def_rhs_code;
> > > @@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var, tree
> > out_type,
> > >      }
> > >
> > >    gimple_set_location (pattern_stmt, loc);
> > > -  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > > -			  get_vectype_for_scalar_type (vinfo, itype));
> > > +  if (!type_only)
> > > +    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > > +			    get_vectype_for_scalar_type (vinfo, itype));
> > > +  last_stmt = pattern_stmt;
> > >    defs.put (var, gimple_assign_lhs (pattern_stmt));
> > >  }
> > >
> > > -/* Comparison function to qsort a vector of gimple stmts after UID.  */
> > > +/* Comparison function to qsort a vector of gimple stmts after BB and UID.
> > > +   the def of one statement can be in an earlier block than the use, so if
> > > +   the BB are different, first compare by BB.  */
> > >
> > >  static int
> > >  sort_after_uid (const void *p1, const void *p2)
> > >  {
> > >    const gimple *stmt1 = *(const gimple * const *)p1;
> > >    const gimple *stmt2 = *(const gimple * const *)p2;
> > > +  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
> > > +    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
> > > +
> > 
> > is this because you eventually get out-of-loop stmts (without UID)?
> > 
> 
> No the problem I was having is that with an early exit the statement of
> one branch of the compare can be in a different BB than the other.
> 
> The testcase specifically was this:
> 
> int a, c, d;
> short b;
> 
> int
> main ()
> {
>   int e[1];
>   for (; b < 2; b++)
>     {
>       a = 0;
>       if (b == 28378)
>         a = e[b];
>       if (!(d || b))
>         for (; c;)
>           ;
>     }
>   return 0;
> }
> 
> Without debug info it happened to work:
> 
> >>> p gimple_uid (bool_stmts[0])
> $1 = 3
> >>> p gimple_uid (bool_stmts[1])
> $2 = 3
> >>> p gimple_uid (bool_stmts[2])
> $3 = 4
> 
> The first two statements got the same uid, but are in different BB in the loop.
> When we add debug, it looks like 1 bb got more debug state than the other:
> 
> >>> p gimple_uid (bool_stmts[0])
> $1 = 3
> >>> p gimple_uid (bool_stmts[1])
> $2 = 4
> >>> p gimple_uid (bool_stmts[2])
> $3 = 6
> 
> That last statement, which now has a UID of 6 used to be 3.

?  gimple_uid is used to map to stmt_vec_info and initially all UIDs
are zero.  It should never happen that two stmts belonging to the
same analyzed loop have the same UID.  In particular debug stmts
never get stmt_vec_info and thus no UID.

If you run into stmts not within the loop or that have no stmt_info
then all bets are off and you can't use UID at all.

As said, I didn't get why you look at those.

> > >    return gimple_uid (stmt1) - gimple_uid (stmt2);
> > >  }
> > >
> > >  /* Create pattern stmts for all stmts participating in the bool pattern
> > >     specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
> > > -   OUT_TYPE.  Return the def of the pattern root.  */
> > > +   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
> > > +   statements are not emitted as pattern statements and the tree returned is
> > > +   only useful for type queries.  */
> > >
> > >  static tree
> > >  adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
> > > -		   tree out_type, stmt_vec_info stmt_info)
> > > +		   tree out_type, stmt_vec_info stmt_info,
> > > +		   bool type_only = false)
> > >  {
> > >    /* Gather original stmts in the bool pattern in their order of appearance
> > >       in the IL.  */
> > > @@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set
> > <gimple *> &bool_stmt_set,
> > >      bool_stmts.quick_push (*i);
> > >    bool_stmts.qsort (sort_after_uid);
> > >
> > > +  gimple *last_stmt = NULL;
> > > +
> > >    /* Now process them in that order, producing pattern stmts.  */
> > >    hash_map <tree, tree> defs;
> > > -  for (unsigned i = 0; i < bool_stmts.length (); ++i)
> > > -    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
> > > -			 out_type, stmt_info, defs);
> > > +  for (auto bool_stmt : bool_stmts)
> > > +    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
> > > +			 out_type, stmt_info, defs, last_stmt, type_only);
> > >
> > >    /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
> > > -  gimple *pattern_stmt
> > > -    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> > > -  return gimple_assign_lhs (pattern_stmt);
> > > +  return gimple_assign_lhs (last_stmt);
> > >  }
> > >
> > >  /* Return the proper type for converting bool VAR into
> > > @@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > >    enum tree_code rhs_code;
> > >    tree var, lhs, rhs, vectype;
> > >    gimple *pattern_stmt;
> > > -
> > > -  if (!is_gimple_assign (last_stmt))
> > > +  gcond* cond = NULL;
> > > +  if (!is_gimple_assign (last_stmt)
> > > +      && !(cond = dyn_cast <gcond *> (last_stmt)))
> > >      return NULL;
> > >
> > > -  var = gimple_assign_rhs1 (last_stmt);
> > > -  lhs = gimple_assign_lhs (last_stmt);
> > > -  rhs_code = gimple_assign_rhs_code (last_stmt);
> > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > +  if (is_gimple_assign (last_stmt))
> > > +    {
> > > +      var = gimple_assign_rhs1 (last_stmt);
> > > +      lhs = gimple_assign_lhs (last_stmt);
> > > +      rhs_code = gimple_assign_rhs_code (last_stmt);
> > > +    }
> > > +  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +    {
> > > +      /* If not multiple exits, and loop vectorization don't bother analyzing
> > > +	 the gcond as we don't support SLP today.  */
> > > +      lhs = var = gimple_cond_lhs (last_stmt);
> > > +      rhs_code = gimple_cond_code (last_stmt);
> > > +    }
> > > +  else
> > > +    return NULL;
> > >
> > >    if (rhs_code == VIEW_CONVERT_EXPR)
> > >      var = TREE_OPERAND (var, 0);
> > > @@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > >  	return NULL;
> > >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> > >
> > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> > >  	{
> > >  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
> > >  				   TREE_TYPE (lhs), stmt_vinfo);
> > > @@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > >
> > >        return pattern_stmt;
> > >      }
> > > -  else if (rhs_code == COND_EXPR
> > > +  else if ((rhs_code == COND_EXPR || cond)
> > >  	   && TREE_CODE (var) == SSA_NAME)
> > >      {
> > >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> > > @@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > >        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
> > >  	return NULL;
> > >
> > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > -	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> > > +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
> > > +	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
> > >        else if (integer_type_for_mask (var, vinfo))
> > >  	return NULL;
> > >
> > > -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > > -      pattern_stmt
> > > -	= gimple_build_assign (lhs, COND_EXPR,
> > > -			       build2 (NE_EXPR, boolean_type_node,
> > > -				       var, build_int_cst (TREE_TYPE (var), 0)),
> > > -			       gimple_assign_rhs2 (last_stmt),
> > > -			       gimple_assign_rhs3 (last_stmt));
> > > +      if (!cond)
> > > +	{
> > > +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > > +	  pattern_stmt
> > > +	    = gimple_build_assign (lhs, COND_EXPR,
> > > +				   build2 (NE_EXPR, boolean_type_node, var,
> > > +					   build_int_cst (TREE_TYPE (var), 0)),
> > > +				   gimple_assign_rhs2 (last_stmt),
> > > +				   gimple_assign_rhs3 (last_stmt));
> > > +	}
> > > +      else
> > > +	{
> > > +	  pattern_stmt
> > > +	    = gimple_build_cond (gimple_cond_code (cond),
> > > +				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
> > > +				 gimple_cond_true_label (cond),
> > > +				 gimple_cond_false_label (cond));
> > > +	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
> > > +	  vectype = truth_type_for (vectype);
> > > +	}
> > >        *type_out = vectype;
> > >        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
> > >
> > 
> > So this is also quite odd.  You're hooking into COND_EXPR handling
> > but only look at the LHS of the GIMPLE_COND compare.
> > 
> 
> Hmm, not sure I follow, GIMPLE_CONDs don't have an LHS no? we look at the LHS
> For the COND_EXPR but a GCOND we just recreate the statement and set vectype
> based on the updated var. I guess this is related to:

a GIMPLE_COND has "lhs" and "rhs", the two operands of the embedded
compare.  You seem to look at only "lhs" for analyzing bool patterns.

> > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > mask uses and thus possibly adjust them.
> 
> Which I did think about somewhat, so what you're saying is that I need to create
> a new GIMPLE_COND here with an NE to 0 compare against var like the COND_EXPR
> case?

Well, it depends how you wire eveything up.  But since we later want
a mask def and vectorize the GIMPLE_COND as cbranch it seemed to me
it's easiest to pattern

  if (a > b)

as

  mask.patt = a > b;
  if (mask.patt != 0)

I thought you were doing this.  And yes, COND_EXPRs are now
effectively doing that since we no longer embed a comparison
in the first operand (only the pattern recognizer still does that
as I was lazy).

> 
> > Please refactor the changes to separate the GIMPLE_COND path
> > completely.
> > 
> 
> Ok, then it seems better to make two patterns?

Maybe.

> > Is there test coverage for such "complex" condition?  I think
> > you'll need adjustments to vect_recog_mask_conversion_pattern
> > as well similar as to how COND_EXPR is handled there.
> 
> Yes, the existing testsuite has many cases which fail, including gcc/testsuite/gcc.c-torture/execute/20150611-1.c

Fail in which way?  Fail to vectorize because we don't handle such
condition?

Thanks,
Richard.

> Cheers,
> Tamar
> 
> > 
> > > @@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > >        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
> > >  	return NULL;
> > >
> > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> > >  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
> > >  				 TREE_TYPE (vectype), stmt_vinfo);
> > >        else
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index
> > 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f5
> > 62721c140d586c94 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> > vectype,
> > >    vec<tree> vec_oprnds0 = vNULL;
> > >    vec<tree> vec_oprnds1 = vNULL;
> > >    tree mask_type;
> > > -  tree mask;
> > > +  tree mask = NULL_TREE;
> > >
> > >    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> > >      return false;
> > > @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> > vectype,
> > >    /* Transform.  */
> > >
> > >    /* Handle def.  */
> > > -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> > > -  mask = vect_create_destination_var (lhs, mask_type);
> > > +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> > > +  if (lhs)
> > > +    mask = vect_create_destination_var (lhs, mask_type);
> > >
> > >    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
> > >  		     rhs1, &vec_oprnds0, vectype,
> > > @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> > vectype,
> > >        gimple *new_stmt;
> > >        vec_rhs2 = vec_oprnds1[i];
> > >
> > > -      new_temp = make_ssa_name (mask);
> > > +      if (lhs)
> > > +	new_temp = make_ssa_name (mask);
> > > +      else
> > > +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
> > >        if (bitop1 == NOP_EXPR)
> > >  	{
> > >  	  new_stmt = gimple_build_assign (new_temp, code,
> > > @@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
> > >    return true;
> > >  }
> > >
> > > +/* Check to see if the current early break given in STMT_INFO is valid for
> > > +   vectorization.  */
> > > +
> > > +static bool
> > > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > > +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > > +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> > > +{
> > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > +  if (!loop_vinfo
> > > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > > +    return false;
> > > +
> > > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> > > +    return false;
> > > +
> > > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > > +    return false;
> > > +
> > > +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> > > +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> > > +  gcc_assert (vectype);
> > > +
> > > +  tree vectype_op0 = NULL_TREE;
> > > +  slp_tree slp_op0;
> > > +  tree op0;
> > > +  enum vect_def_type dt0;
> > > +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> > > +			   &vectype_op0))
> > > +    {
> > > +      if (dump_enabled_p ())
> > > +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			   "use not simple.\n");
> > > +	return false;
> > > +    }
> > 
> > I think you rely on patterns transforming this into canonical form
> > mask != 0, so I suggest to check this here.
> > 
> > > +  machine_mode mode = TYPE_MODE (vectype);
> > > +  int ncopies;
> > > +
> > > +  if (slp_node)
> > > +    ncopies = 1;
> > > +  else
> > > +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> > > +
> > > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > +
> > > +  /* Analyze only.  */
> > > +  if (!vec_stmt)
> > > +    {
> > > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			       "can't vectorize early exit because the "
> > > +			       "target doesn't support flag setting vector "
> > > +			       "comparisons.\n");
> > > +	  return false;
> > > +	}
> > > +
> > > +      if (ncopies > 1
> > 
> > Also required for vec_num > 1 with SLP
> > (SLP_TREE_NUMBER_OF_VEC_STMTS)
> > 
> > > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +			       "can't vectorize early exit because the "
> > > +			       "target does not support boolean vector OR for "
> > > +			       "type %T.\n", vectype);
> > > +	  return false;
> > > +	}
> > > +
> > > +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > > +				      vec_stmt, slp_node, cost_vec))
> > > +	return false;
> > > +
> > > +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> > > +	{
> > > +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> > > +					      OPTIMIZE_FOR_SPEED))
> > > +	    return false;
> > > +	  else
> > > +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> > > +	}
> > > +
> > > +
> > > +      return true;
> > > +    }
> > > +
> > > +  /* Tranform.  */
> > > +
> > > +  tree new_temp = NULL_TREE;
> > > +  gimple *new_stmt = NULL;
> > > +
> > > +  if (dump_enabled_p ())
> > > +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> > > +
> > > +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > > +				  vec_stmt, slp_node, cost_vec))
> > > +    gcc_unreachable ();
> > > +
> > > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > > +  basic_block cond_bb = gimple_bb (stmt);
> > > +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> > > +
> > > +  auto_vec<tree> stmts;
> > > +
> > > +  tree mask = NULL_TREE;
> > > +  if (masked_loop_p)
> > > +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> > > +
> > > +  if (slp_node)
> > > +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> > > +  else
> > > +    {
> > > +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> > > +      stmts.reserve_exact (vec_stmts.length ());
> > > +      for (auto stmt : vec_stmts)
> > > +	stmts.quick_push (gimple_assign_lhs (stmt));
> > > +    }
> > > +
> > > +  /* Determine if we need to reduce the final value.  */
> > > +  if (stmts.length () > 1)
> > > +    {
> > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > +	 possible.  */
> > > +      auto_vec<tree> workset (stmts.length ());
> > > +
> > > +      /* Mask the statements as we queue them up.  */
> > > +      if (masked_loop_p)
> > > +	for (auto stmt : stmts)
> > > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > > +						mask, stmt, &cond_gsi));
> > 
> > I think this still uses the wrong mask, you need to use
> > 
> >   vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, <cnt>)
> > 
> > replacing <cnt> with the vector def index to mask I think.  For this
> > reason keeping the "final" mask below is also wrong.
> > 
> > Or am I missing something?
> > 
> > > +      else
> > > +	workset.splice (stmts);
> > > +
> > > +      while (workset.length () > 1)
> > > +	{
> > > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > > +	  tree arg0 = workset.pop ();
> > > +	  tree arg1 = workset.pop ();
> > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > +				       &cond_gsi);
> > > +	  workset.quick_insert (0, new_temp);
> > > +	}
> > > +    }
> > > +  else
> > > +    new_temp = stmts[0];
> > > +
> > > +  gcc_assert (new_temp);
> > > +
> > > +  tree cond = new_temp;
> > > +  /* If we have multiple statements after reduction we should check all the
> > > +     lanes and treat it as a full vector.  */
> > > +  if (masked_loop_p)
> > > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > +			     &cond_gsi);
> > 
> > so just do this in the else path above
> > 
> > Otherwise looks OK.
> > 
> > Richard.
> > 
> > > +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> > > +     codegen so we must replace the original insn.  */
> > > +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> > > +  gcond *cond_stmt = as_a <gcond *>(stmt);
> > > +  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
> > > +			     build_zero_cst (vectype));
> > > +  update_stmt (stmt);
> > > +
> > > +  if (slp_node)
> > > +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> > > +   else
> > > +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> > > +
> > > +
> > > +  if (!slp_node)
> > > +    *vec_stmt = stmt;
> > > +
> > > +  return true;
> > > +}
> > > +
> > >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> > >     can handle all live statements in the node.  Otherwise return true
> > >     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> > > @@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
> > >  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> > >  				  stmt_info, NULL, node)
> > >  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > > -				   stmt_info, NULL, node, cost_vec));
> > > +				   stmt_info, NULL, node, cost_vec)
> > > +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > > +				      cost_vec));
> > >    else
> > >      {
> > >        if (bb_vinfo)
> > > @@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
> > >  					 NULL, NULL, node, cost_vec)
> > >  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> > >  					  cost_vec)
> > > -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > > +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > > +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > > +					  cost_vec));
> > > +
> > >      }
> > >
> > >    if (node)
> > > @@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
> > >        gcc_assert (done);
> > >        break;
> > >
> > > +    case loop_exit_ctrl_vec_info_type:
> > > +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> > > +				      slp_node, NULL);
> > > +      gcc_assert (done);
> > > +      break;
> > > +
> > >      default:
> > >        if (!STMT_VINFO_LIVE_P (stmt_info))
> > >  	{
> > > @@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> > >      }
> > >    else
> > >      {
> > > +      gcond *cond = NULL;
> > >        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> > >  	scalar_type = TREE_TYPE (DR_REF (dr));
> > >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > > +      else if ((cond = dyn_cast <gcond *> (stmt)))
> > > +	{
> > > +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> > > +	     single bit precision and we need the vector boolean to be a
> > > +	     representation of the integer mask.  So set the correct integer type and
> > > +	     convert to boolean vector once we have a vectype.  */
> > > +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> > > +	}
> > >        else
> > >  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> > >
> > > @@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info
> > *vinfo, stmt_vec_info stmt_info,
> > >  			     "get vectype for scalar type: %T\n", scalar_type);
> > >  	}
> > >        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> > > +
> > >        if (!vectype)
> > >  	return opt_result::failure_at (stmt,
> > >  				       "not vectorized:"
> > >  				       " unsupported data-type %T\n",
> > >  				       scalar_type);
> > >
> > > +      /* If we were a gcond, convert the resulting type to a vector boolean type
> > now
> > > +	 that we have the correct integer mask type.  */
> > > +      if (cond)
> > > +	vectype = truth_type_for (vectype);
> > > +
> > >        if (dump_enabled_p ())
> > >  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
> > >      }
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-08 13:59                 ` Richard Biener
@ 2023-12-08 15:01                   ` Tamar Christina
  2023-12-11  7:09                   ` Tamar Christina
  1 sibling, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-12-08 15:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, December 8, 2023 2:00 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Fri, 8 Dec 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Friday, December 8, 2023 10:28 AM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com
> > > Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> > > codegen of exit code
> > >
> > > On Fri, 8 Dec 2023, Tamar Christina wrote:
> > >
> > > > > --param vect-partial-vector-usage=2 would, no?
> > > > >
> > > > I.. didn't even know it went to 2!
> > > >
> > > > > > In principal I suppose I could mask the individual stmts, that should handle
> > > the
> > > > > future case when
> > > > > > This is relaxed to supposed non-fix length buffers?
> > > > >
> > > > > Well, it looks wrong - either put in an assert that we start with a
> > > > > single stmt or assert !masked_loop_p instead?  Better ICE than
> > > > > generate wrong code.
> > > > >
> > > > > That said, I think you need to apply the masking on the original
> > > > > stmts[], before reducing them, no?
> > > >
> > > > Yeah, I've done so now.  For simplicity I've just kept the final masking always
> as
> > > well
> > > > and just leave it up to the optimizers to drop it when it's superfluous.
> > > >
> > > > Simple testcase:
> > > >
> > > > #ifndef N
> > > > #define N 837
> > > > #endif
> > > > float vect_a[N];
> > > > unsigned vect_b[N];
> > > >
> > > > unsigned test4(double x)
> > > > {
> > > >  unsigned ret = 0;
> > > >  for (int i = 0; i < N; i++)
> > > >  {
> > > >    if (vect_a[i] > x)
> > > >      break;
> > > >    vect_a[i] = x;
> > > >
> > > >  }
> > > >  return ret;
> > > > }
> > > >
> > > > Looks good now. After this one there's only one patch left, the dependency
> > > analysis.
> > > > I'm almost done with the cleanup/respin, but want to take the weekend to
> > > double check and will post it first thing Monday morning.
> > > >
> > > > Did you want to see the testsuite changes as well again? I've basically just
> added
> > > the right dg-requires-effective and add-options etc.
> > >
> > > Yes please.
> > >
> > > > Thanks for all the reviews!
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> > > > 	(check_bool_pattern, adjust_bool_pattern, adjust_bool_stmts,
> > > > 	vect_recog_bool_pattern, sort_after_uid): Support gconds type analysis.
> > > > 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> > > > 	lhs.
> > > > 	(vectorizable_early_exit): New.
> > > > 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> > > > 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> > > >
> > > >
> > > > --- inline copy of patch ---
> > > >
> > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > > index
> > >
> 7debe7f0731673cd1bf25cd39d55e23990a73d0e..8865cde9f3481a474d31848
> > > ae12523576d29744d 100644
> > > > --- a/gcc/tree-vect-patterns.cc
> > > > +++ b/gcc/tree-vect-patterns.cc
> > > > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> > > *pattern_stmt,
> > > >    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> > > >      {
> > > >        gcc_assert (!vectype
> > > > +		  || is_a <gcond *> (pattern_stmt)
> > > >  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
> > > >  		      == vect_use_mask_type_p (orig_stmt_info)));
> > > >        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > > > @@ -5210,19 +5211,28 @@ vect_recog_mixed_size_cond_pattern (vec_info
> > > *vinfo,
> > > >     true if bool VAR can and should be optimized that way.  Assume it shouldn't
> > > >     in case it's a result of a comparison which can be directly vectorized into
> > > >     a vector comparison.  Fills in STMTS with all stmts visited during the
> > > > -   walk.  */
> > > > +   walk.  if ANALYZE_ONLY then only analyze the booleans but do not perform
> > > any
> > > > +   codegen associated with the boolean condition.  */
> > > >
> > > >  static bool
> > > > -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> > > > +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> > > > +		    bool analyze_only)
> > > >  {
> > > >    tree rhs1;
> > > >    enum tree_code rhs_code;
> > > > +  gassign *def_stmt = NULL;
> > > >
> > > >    stmt_vec_info def_stmt_info = vect_get_internal_def (vinfo, var);
> > > > -  if (!def_stmt_info)
> > > > +  if (!def_stmt_info && !analyze_only)
> > > >      return false;
> > > > +  else if (!def_stmt_info)
> > > > +    /* If we're a only analyzing we won't be codegen-ing the statements and
> are
> > > > +       only after if the types match.  In that case we can accept loop invariant
> > > > +       values.  */
> > > > +    def_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (var));
> > > > +  else
> > > > +    def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> > > >
> > >
> > > Hmm, but we're visiting them then?  I wonder how you get along
> > > without doing adjustmens on the uses if you consider
> > >
> > >     _1 = a < b;
> > >     _2 = c != d;
> > >     _3 = _1 | _2;
> > >     if (_3 != 0)
> > >       exit loop;
> > >
> > > thus a combined condition like
> > >
> > >     if (a < b || c != d)
> > >
> > > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > > mask uses and thus possibly adjust them.
> > >
> > > What bad happens if you drop 'analyze_only'?  We're not really
> > > rewriting anything there.
> >
> > You mean drop it only in the above? We then fail to update the type for
> > the gcond.  So in certain circumstances like with
> >
> > int a, c, d;
> > short b;
> >
> > int
> > main ()
> > {
> >   int e[1];
> >   for (; b < 2; b++)
> >     {
> >       a = 0;
> >       if (b == 28378)
> >         a = e[b];
> >       if (!(d || b))
> >         for (; c;)
> >           ;
> >     }
> >   return 0;
> > }
> >
> > Unless we walk the statements regardless of whether they come from inside the
> loop or not.
> 
> What do you mean by "fail to update the type for the gcond"?  If
> I understood correctly the 'analyze_only' short-cuts some
> checks, it doens't add some?

analyze_only got it to not check for a vector compare because I wasn't rewriting the
condition and still kept it in the gcond.  And the gcond check happens later anyway
during vectorizable_early_exit.

But more on it below.

> 
> But it's hard to follow what's actually done for a gcond ...
> 
> > >
> > > > -  gassign *def_stmt = dyn_cast <gassign *> (def_stmt_info->stmt);
> > > >    if (!def_stmt)
> > > >      return false;
> > > >
> > > > @@ -5234,27 +5244,28 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > > hash_set<gimple *> &stmts)
> > > >    switch (rhs_code)
> > > >      {
> > > >      case SSA_NAME:
> > > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > > >  	return false;
> > > >        break;
> > > >
> > > >      CASE_CONVERT:
> > > >        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
> > > >  	return false;
> > > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > > >  	return false;
> > > >        break;
> > > >
> > > >      case BIT_NOT_EXPR:
> > > > -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> > > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only))
> > > >  	return false;
> > > >        break;
> > > >
> > > >      case BIT_AND_EXPR:
> > > >      case BIT_IOR_EXPR:
> > > >      case BIT_XOR_EXPR:
> > > > -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> > > > -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> > > > +      if (! check_bool_pattern (rhs1, vinfo, stmts, analyze_only)
> > > > +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> > > > +				   analyze_only))
> > > >  	return false;
> > > >        break;
> > > >
> > > > @@ -5275,6 +5286,7 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > > hash_set<gimple *> &stmts)
> > > >  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
> > > >  							  TREE_TYPE (rhs1));
> > > >  	  if (mask_type
> > > > +	      && !analyze_only
> > > >  	      && expand_vec_cmp_expr_p (comp_vectype, mask_type, rhs_code))
> > > >  	    return false;
> > > >
> > > > @@ -5289,7 +5301,8 @@ check_bool_pattern (tree var, vec_info *vinfo,
> > > hash_set<gimple *> &stmts)
> > > >  	    }
> > > >  	  else
> > > >  	    vecitype = comp_vectype;
> > > > -	  if (! expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> > > > +	  if (!analyze_only
> > > > +	      && !expand_vec_cond_expr_p (vecitype, comp_vectype, rhs_code))
> > > >  	    return false;
> > > >  	}
> > > >        else
> > > > @@ -5324,11 +5337,13 @@ adjust_bool_pattern_cast (vec_info *vinfo,
> > > >     VAR is an SSA_NAME that should be transformed from bool to a wider
> integer
> > > >     type, OUT_TYPE is the desired final integer type of the whole pattern.
> > > >     STMT_INFO is the info of the pattern root and is where pattern stmts
> should
> > > > -   be associated with.  DEFS is a map of pattern defs.  */
> > > > +   be associated with.  DEFS is a map of pattern defs.  If TYPE_ONLY then don't
> > > > +   create new pattern statements and instead only fill LAST_STMT and DEFS.
> */
> > > >
> > > >  static void
> > > >  adjust_bool_pattern (vec_info *vinfo, tree var, tree out_type,
> > > > -		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs)
> > > > +		     stmt_vec_info stmt_info, hash_map <tree, tree> &defs,
> > > > +		     gimple *&last_stmt, bool type_only)
> > > >  {
> > > >    gimple *stmt = SSA_NAME_DEF_STMT (var);
> > > >    enum tree_code rhs_code, def_rhs_code;
> > > > @@ -5492,28 +5507,38 @@ adjust_bool_pattern (vec_info *vinfo, tree var,
> tree
> > > out_type,
> > > >      }
> > > >
> > > >    gimple_set_location (pattern_stmt, loc);
> > > > -  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > > > -			  get_vectype_for_scalar_type (vinfo, itype));
> > > > +  if (!type_only)
> > > > +    append_pattern_def_seq (vinfo, stmt_info, pattern_stmt,
> > > > +			    get_vectype_for_scalar_type (vinfo, itype));
> > > > +  last_stmt = pattern_stmt;
> > > >    defs.put (var, gimple_assign_lhs (pattern_stmt));
> > > >  }
> > > >
> > > > -/* Comparison function to qsort a vector of gimple stmts after UID.  */
> > > > +/* Comparison function to qsort a vector of gimple stmts after BB and UID.
> > > > +   the def of one statement can be in an earlier block than the use, so if
> > > > +   the BB are different, first compare by BB.  */
> > > >
> > > >  static int
> > > >  sort_after_uid (const void *p1, const void *p2)
> > > >  {
> > > >    const gimple *stmt1 = *(const gimple * const *)p1;
> > > >    const gimple *stmt2 = *(const gimple * const *)p2;
> > > > +  if (gimple_bb (stmt1)->index != gimple_bb (stmt2)->index)
> > > > +    return gimple_bb (stmt1)->index - gimple_bb (stmt2)->index;
> > > > +
> > >
> > > is this because you eventually get out-of-loop stmts (without UID)?
> > >
> >
> > No the problem I was having is that with an early exit the statement of
> > one branch of the compare can be in a different BB than the other.
> >
> > The testcase specifically was this:
> >
> > int a, c, d;
> > short b;
> >
> > int
> > main ()
> > {
> >   int e[1];
> >   for (; b < 2; b++)
> >     {
> >       a = 0;
> >       if (b == 28378)
> >         a = e[b];
> >       if (!(d || b))
> >         for (; c;)
> >           ;
> >     }
> >   return 0;
> > }
> >
> > Without debug info it happened to work:
> >
> > >>> p gimple_uid (bool_stmts[0])
> > $1 = 3
> > >>> p gimple_uid (bool_stmts[1])
> > $2 = 3
> > >>> p gimple_uid (bool_stmts[2])
> > $3 = 4
> >
> > The first two statements got the same uid, but are in different BB in the loop.
> > When we add debug, it looks like 1 bb got more debug state than the other:
> >
> > >>> p gimple_uid (bool_stmts[0])
> > $1 = 3
> > >>> p gimple_uid (bool_stmts[1])
> > $2 = 4
> > >>> p gimple_uid (bool_stmts[2])
> > $3 = 6
> >
> > That last statement, which now has a UID of 6 used to be 3.
> 
> ?  gimple_uid is used to map to stmt_vec_info and initially all UIDs
> are zero.  It should never happen that two stmts belonging to the
> same analyzed loop have the same UID.  In particular debug stmts
> never get stmt_vec_info and thus no UID.
> 
> If you run into stmts not within the loop or that have no stmt_info
> then all bets are off and you can't use UID at all.
> 
> As said, I didn't get why you look at those.

Right, it was hard to tell from the gimple dumps, but the graph made me realize
that your initial statement was right,  one of the uses is out of loop.

https://gist.github.com/Mistuke/2460471529e6e42d34d5db0b307ff3cf

where _12 is out of loop.

I don't particularly care about the out of loop use itself, only the type of it in
the loop.  But check_bool_pattern needs to see all the uses or it stops and
returns NULL.  In that case the Boolean pattern isn't generated and we end
up vectorizing with the wrong types.

That said I the loop above should be valid for vectorization.

Part of the reason for doing this isn't just for the mask uses, it's also to figure
out what the type of the arguments are.  But your right in that I need to actually
replace the statements...  I may have misunderstood how the pattern was supposed to
work as I was focused on mainly determining the correct type of the gcond in the case
where the input types differ, like:

#define N 1024
complex double vect_a[N];
complex double vect_b[N];

complex double test4(complex double x)
{
 complex double ret = 0;
 for (int i = 0; i < N; i++)
 {
   vect_b[i] += x + i;
   if (vect_a[i] == x)
     return i;
   vect_a[i] += x * vect_b[i];

 }
 return ret;
}

Reverting the analyze_stmt we fail to vectorize, but that could be something else,
I'll investigate.

However...

> 
> > > >    return gimple_uid (stmt1) - gimple_uid (stmt2);
> > > >  }
> > > >
> > > >  /* Create pattern stmts for all stmts participating in the bool pattern
> > > >     specified by BOOL_STMT_SET and its root STMT_INFO with the desired type
> > > > -   OUT_TYPE.  Return the def of the pattern root.  */
> > > > +   OUT_TYPE.  Return the def of the pattern root.  If TYPE_ONLY the new
> > > > +   statements are not emitted as pattern statements and the tree returned is
> > > > +   only useful for type queries.  */
> > > >
> > > >  static tree
> > > >  adjust_bool_stmts (vec_info *vinfo, hash_set <gimple *> &bool_stmt_set,
> > > > -		   tree out_type, stmt_vec_info stmt_info)
> > > > +		   tree out_type, stmt_vec_info stmt_info,
> > > > +		   bool type_only = false)
> > > >  {
> > > >    /* Gather original stmts in the bool pattern in their order of appearance
> > > >       in the IL.  */
> > > > @@ -5523,16 +5548,16 @@ adjust_bool_stmts (vec_info *vinfo, hash_set
> > > <gimple *> &bool_stmt_set,
> > > >      bool_stmts.quick_push (*i);
> > > >    bool_stmts.qsort (sort_after_uid);
> > > >
> > > > +  gimple *last_stmt = NULL;
> > > > +
> > > >    /* Now process them in that order, producing pattern stmts.  */
> > > >    hash_map <tree, tree> defs;
> > > > -  for (unsigned i = 0; i < bool_stmts.length (); ++i)
> > > > -    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmts[i]),
> > > > -			 out_type, stmt_info, defs);
> > > > +  for (auto bool_stmt : bool_stmts)
> > > > +    adjust_bool_pattern (vinfo, gimple_assign_lhs (bool_stmt),
> > > > +			 out_type, stmt_info, defs, last_stmt, type_only);
> > > >
> > > >    /* Pop the last pattern seq stmt and install it as pattern root for STMT.  */
> > > > -  gimple *pattern_stmt
> > > > -    = gimple_seq_last_stmt (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
> > > > -  return gimple_assign_lhs (pattern_stmt);
> > > > +  return gimple_assign_lhs (last_stmt);
> > > >  }
> > > >
> > > >  /* Return the proper type for converting bool VAR into
> > > > @@ -5608,13 +5633,27 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > > >    enum tree_code rhs_code;
> > > >    tree var, lhs, rhs, vectype;
> > > >    gimple *pattern_stmt;
> > > > -
> > > > -  if (!is_gimple_assign (last_stmt))
> > > > +  gcond* cond = NULL;
> > > > +  if (!is_gimple_assign (last_stmt)
> > > > +      && !(cond = dyn_cast <gcond *> (last_stmt)))
> > > >      return NULL;
> > > >
> > > > -  var = gimple_assign_rhs1 (last_stmt);
> > > > -  lhs = gimple_assign_lhs (last_stmt);
> > > > -  rhs_code = gimple_assign_rhs_code (last_stmt);
> > > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > > +  if (is_gimple_assign (last_stmt))
> > > > +    {
> > > > +      var = gimple_assign_rhs1 (last_stmt);
> > > > +      lhs = gimple_assign_lhs (last_stmt);
> > > > +      rhs_code = gimple_assign_rhs_code (last_stmt);
> > > > +    }
> > > > +  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    {
> > > > +      /* If not multiple exits, and loop vectorization don't bother analyzing
> > > > +	 the gcond as we don't support SLP today.  */
> > > > +      lhs = var = gimple_cond_lhs (last_stmt);
> > > > +      rhs_code = gimple_cond_code (last_stmt);
> > > > +    }
> > > > +  else
> > > > +    return NULL;
> > > >
> > > >    if (rhs_code == VIEW_CONVERT_EXPR)
> > > >      var = TREE_OPERAND (var, 0);
> > > > @@ -5632,7 +5671,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > > >  	return NULL;
> > > >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> > > >
> > > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> > > >  	{
> > > >  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
> > > >  				   TREE_TYPE (lhs), stmt_vinfo);
> > > > @@ -5680,7 +5719,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > > >
> > > >        return pattern_stmt;
> > > >      }
> > > > -  else if (rhs_code == COND_EXPR
> > > > +  else if ((rhs_code == COND_EXPR || cond)
> > > >  	   && TREE_CODE (var) == SSA_NAME)
> > > >      {
> > > >        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> > > > @@ -5700,18 +5739,31 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > > >        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
> > > >  	return NULL;
> > > >
> > > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > > -	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> > > > +      if (check_bool_pattern (var, vinfo, bool_stmts, cond))
> > > > +	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo, cond);
> > > >        else if (integer_type_for_mask (var, vinfo))
> > > >  	return NULL;
> > > >
> > > > -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > > > -      pattern_stmt
> > > > -	= gimple_build_assign (lhs, COND_EXPR,
> > > > -			       build2 (NE_EXPR, boolean_type_node,
> > > > -				       var, build_int_cst (TREE_TYPE (var), 0)),
> > > > -			       gimple_assign_rhs2 (last_stmt),
> > > > -			       gimple_assign_rhs3 (last_stmt));
> > > > +      if (!cond)
> > > > +	{
> > > > +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> > > > +	  pattern_stmt
> > > > +	    = gimple_build_assign (lhs, COND_EXPR,
> > > > +				   build2 (NE_EXPR, boolean_type_node, var,
> > > > +					   build_int_cst (TREE_TYPE (var), 0)),
> > > > +				   gimple_assign_rhs2 (last_stmt),
> > > > +				   gimple_assign_rhs3 (last_stmt));
> > > > +	}
> > > > +      else
> > > > +	{
> > > > +	  pattern_stmt
> > > > +	    = gimple_build_cond (gimple_cond_code (cond),
> > > > +				 gimple_cond_lhs (cond), gimple_cond_rhs (cond),
> > > > +				 gimple_cond_true_label (cond),
> > > > +				 gimple_cond_false_label (cond));
> > > > +	  vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (var));
> > > > +	  vectype = truth_type_for (vectype);
> > > > +	}
> > > >        *type_out = vectype;
> > > >        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
> > > >
> > >
> > > So this is also quite odd.  You're hooking into COND_EXPR handling
> > > but only look at the LHS of the GIMPLE_COND compare.
> > >
> >
> > Hmm, not sure I follow, GIMPLE_CONDs don't have an LHS no? we look at the
> LHS
> > For the COND_EXPR but a GCOND we just recreate the statement and set
> vectype
> > based on the updated var. I guess this is related to:
> 
> a GIMPLE_COND has "lhs" and "rhs", the two operands of the embedded
> compare.  You seem to look at only "lhs" for analyzing bool patterns.
> 
> > > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > > mask uses and thus possibly adjust them.
> >
> > Which I did think about somewhat, so what you're saying is that I need to create
> > a new GIMPLE_COND here with an NE to 0 compare against var like the
> COND_EXPR
> > case?
> 
> Well, it depends how you wire eveything up.  But since we later want
> a mask def and vectorize the GIMPLE_COND as cbranch it seemed to me
> it's easiest to pattern
> 
>   if (a > b)
> 
> as
> 
>   mask.patt = a > b;
>   if (mask.patt != 0)
> 
> I thought you were doing this.  And yes, COND_EXPRs are now
> effectively doing that since we no longer embed a comparison
> in the first operand (only the pattern recognizer still does that
> as I was lazy).

I did initially do that, but reverted it since I quite understand the
full point of the pattern.   Added it back in.

> 
> >
> > > Please refactor the changes to separate the GIMPLE_COND path
> > > completely.
> > >
> >
> > Ok, then it seems better to make two patterns?
> 
> Maybe.
> 
> > > Is there test coverage for such "complex" condition?  I think
> > > you'll need adjustments to vect_recog_mask_conversion_pattern
> > > as well similar as to how COND_EXPR is handled there.
> >
> > Yes, the existing testsuite has many cases which fail, including gcc/testsuite/gcc.c-
> torture/execute/20150611-1.c
> 
> Fail in which way?  Fail to vectorize because we don't handle such
> condition?

No, it vectorizes, but crashes later in

gcc/testsuite/gcc.c-torture/execute/20150611-1.c: In function 'main':
gcc/testsuite/gcc.c-torture/execute/20150611-1.c:5:1: internal compiler error: in eliminate_stmt, at tree-ssa-sccvn.cc:6959
0x19f43cc eliminate_dom_walker::eliminate_stmt(basic_block_def*, gimple_stmt_iterator*)
        /data/tamchr01/gnu-work-b1/src/gcc/gcc/tree-ssa-sccvn.cc:6959
0x19f929f process_bb
        /data/tamchr01/gnu-work-b1/src/gcc/gcc/tree-ssa-sccvn.cc:8171
0x19fb393 do_rpo_vn_1
        /data/tamchr01/gnu-work-b1/src/gcc/gcc/tree-ssa-sccvn.cc:8621
0x19fbad4 do_rpo_vn(function*, edge_def*, bitmap_head*, bool, bool, vn_lookup_kind)
        /data/tamchr01/gnu-work-b1/src/gcc/gcc/tree-ssa-sccvn.cc:8723
0x1b1bd82 execute
        /data/tamchr01/gnu-work-b1/src/gcc/gcc/tree-vectorizer.cc:1389

Because

>>> p debug_tree (lhs)
 <ssa_name 0x7fbfcbf20750
    type <vector_type 0x7fbfcbf27690
        type <boolean_type 0x7fbfcbf27bd0 public QI
            size <integer_cst 0x7fbfcbfdcf60 constant 8>
            unit-size <integer_cst 0x7fbfcbfdcf78 constant 1>
            align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fbfcbf27bd0 precision:8 min <integer_cst 0x7fbfcbf26048 -128> max <integer_cst 0x7fbfcbf26060 127>>
        V8QI
        size <integer_cst 0x7fbfcbfdce70 constant 64>
        unit-size <integer_cst 0x7fbfcbfdce88 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fbfcbf27690 nunits:8>

    def_stmt cmp_56 = mask__13.21_54 ^ vect_cst__55;
    version:56>
$1 = void

Because the out of tree operand made the pattern not apply and so we didn't vectorize using V8HI as we should have.

So not sure what to do for those cases.

Regards,
Tamar

> 
> Thanks,
> Richard.
> 
> > Cheers,
> > Tamar
> >
> > >
> > > > @@ -5725,7 +5777,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
> > > >        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
> > > >  	return NULL;
> > > >
> > > > -      if (check_bool_pattern (var, vinfo, bool_stmts))
> > > > +      if (check_bool_pattern (var, vinfo, bool_stmts, false))
> > > >  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
> > > >  				 TREE_TYPE (vectype), stmt_vinfo);
> > > >        else
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index
> > >
> 582c5e678fad802d6e76300fe3c939b9f2978f17..e9116d184149826ba436b0f5
> > > 62721c140d586c94 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree
> > > vectype,
> > > >    vec<tree> vec_oprnds0 = vNULL;
> > > >    vec<tree> vec_oprnds1 = vNULL;
> > > >    tree mask_type;
> > > > -  tree mask;
> > > > +  tree mask = NULL_TREE;
> > > >
> > > >    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> > > >      return false;
> > > > @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree
> > > vectype,
> > > >    /* Transform.  */
> > > >
> > > >    /* Handle def.  */
> > > > -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> > > > -  mask = vect_create_destination_var (lhs, mask_type);
> > > > +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> > > > +  if (lhs)
> > > > +    mask = vect_create_destination_var (lhs, mask_type);
> > > >
> > > >    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
> > > >  		     rhs1, &vec_oprnds0, vectype,
> > > > @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo,
> tree
> > > vectype,
> > > >        gimple *new_stmt;
> > > >        vec_rhs2 = vec_oprnds1[i];
> > > >
> > > > -      new_temp = make_ssa_name (mask);
> > > > +      if (lhs)
> > > > +	new_temp = make_ssa_name (mask);
> > > > +      else
> > > > +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
> > > >        if (bitop1 == NOP_EXPR)
> > > >  	{
> > > >  	  new_stmt = gimple_build_assign (new_temp, code,
> > > > @@ -12723,6 +12727,184 @@ vectorizable_comparison (vec_info *vinfo,
> > > >    return true;
> > > >  }
> > > >
> > > > +/* Check to see if the current early break given in STMT_INFO is valid for
> > > > +   vectorization.  */
> > > > +
> > > > +static bool
> > > > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > > > +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > > > +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> > > > +{
> > > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > > +  if (!loop_vinfo
> > > > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > > > +    return false;
> > > > +
> > > > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> > > > +    return false;
> > > > +
> > > > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > > > +    return false;
> > > > +
> > > > +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> > > > +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> > > > +  gcc_assert (vectype);
> > > > +
> > > > +  tree vectype_op0 = NULL_TREE;
> > > > +  slp_tree slp_op0;
> > > > +  tree op0;
> > > > +  enum vect_def_type dt0;
> > > > +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0,
> &dt0,
> > > > +			   &vectype_op0))
> > > > +    {
> > > > +      if (dump_enabled_p ())
> > > > +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			   "use not simple.\n");
> > > > +	return false;
> > > > +    }
> > >
> > > I think you rely on patterns transforming this into canonical form
> > > mask != 0, so I suggest to check this here.
> > >
> > > > +  machine_mode mode = TYPE_MODE (vectype);
> > > > +  int ncopies;
> > > > +
> > > > +  if (slp_node)
> > > > +    ncopies = 1;
> > > > +  else
> > > > +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> > > > +
> > > > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > > +
> > > > +  /* Analyze only.  */
> > > > +  if (!vec_stmt)
> > > > +    {
> > > > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target doesn't support flag setting vector "
> > > > +			       "comparisons.\n");
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (ncopies > 1
> > >
> > > Also required for vec_num > 1 with SLP
> > > (SLP_TREE_NUMBER_OF_VEC_STMTS)
> > >
> > > > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > > > +	{
> > > > +	  if (dump_enabled_p ())
> > > > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +			       "can't vectorize early exit because the "
> > > > +			       "target does not support boolean vector OR for "
> > > > +			       "type %T.\n", vectype);
> > > > +	  return false;
> > > > +	}
> > > > +
> > > > +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > > > +				      vec_stmt, slp_node, cost_vec))
> > > > +	return false;
> > > > +
> > > > +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> > > > +	{
> > > > +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> > > > +					      OPTIMIZE_FOR_SPEED))
> > > > +	    return false;
> > > > +	  else
> > > > +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> > > > +	}
> > > > +
> > > > +
> > > > +      return true;
> > > > +    }
> > > > +
> > > > +  /* Tranform.  */
> > > > +
> > > > +  tree new_temp = NULL_TREE;
> > > > +  gimple *new_stmt = NULL;
> > > > +
> > > > +  if (dump_enabled_p ())
> > > > +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> > > > +
> > > > +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > > > +				  vec_stmt, slp_node, cost_vec))
> > > > +    gcc_unreachable ();
> > > > +
> > > > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > > > +  basic_block cond_bb = gimple_bb (stmt);
> > > > +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> > > > +
> > > > +  auto_vec<tree> stmts;
> > > > +
> > > > +  tree mask = NULL_TREE;
> > > > +  if (masked_loop_p)
> > > > +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> > > > +
> > > > +  if (slp_node)
> > > > +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> > > > +  else
> > > > +    {
> > > > +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> > > > +      stmts.reserve_exact (vec_stmts.length ());
> > > > +      for (auto stmt : vec_stmts)
> > > > +	stmts.quick_push (gimple_assign_lhs (stmt));
> > > > +    }
> > > > +
> > > > +  /* Determine if we need to reduce the final value.  */
> > > > +  if (stmts.length () > 1)
> > > > +    {
> > > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > > +	 possible.  */
> > > > +      auto_vec<tree> workset (stmts.length ());
> > > > +
> > > > +      /* Mask the statements as we queue them up.  */
> > > > +      if (masked_loop_p)
> > > > +	for (auto stmt : stmts)
> > > > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > > > +						mask, stmt, &cond_gsi));
> > >
> > > I think this still uses the wrong mask, you need to use
> > >
> > >   vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, <cnt>)
> > >
> > > replacing <cnt> with the vector def index to mask I think.  For this
> > > reason keeping the "final" mask below is also wrong.
> > >
> > > Or am I missing something?
> > >
> > > > +      else
> > > > +	workset.splice (stmts);
> > > > +
> > > > +      while (workset.length () > 1)
> > > > +	{
> > > > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > > > +	  tree arg0 = workset.pop ();
> > > > +	  tree arg1 = workset.pop ();
> > > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > > +				       &cond_gsi);
> > > > +	  workset.quick_insert (0, new_temp);
> > > > +	}
> > > > +    }
> > > > +  else
> > > > +    new_temp = stmts[0];
> > > > +
> > > > +  gcc_assert (new_temp);
> > > > +
> > > > +  tree cond = new_temp;
> > > > +  /* If we have multiple statements after reduction we should check all the
> > > > +     lanes and treat it as a full vector.  */
> > > > +  if (masked_loop_p)
> > > > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > > +			     &cond_gsi);
> > >
> > > so just do this in the else path above
> > >
> > > Otherwise looks OK.
> > >
> > > Richard.
> > >
> > > > +  /* Now build the new conditional.  Pattern gimple_conds get dropped
> during
> > > > +     codegen so we must replace the original insn.  */
> > > > +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> > > > +  gcond *cond_stmt = as_a <gcond *>(stmt);
> > > > +  gimple_cond_set_condition (cond_stmt, NE_EXPR, cond,
> > > > +			     build_zero_cst (vectype));
> > > > +  update_stmt (stmt);
> > > > +
> > > > +  if (slp_node)
> > > > +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> > > > +   else
> > > > +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> > > > +
> > > > +
> > > > +  if (!slp_node)
> > > > +    *vec_stmt = stmt;
> > > > +
> > > > +  return true;
> > > > +}
> > > > +
> > > >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> > > >     can handle all live statements in the node.  Otherwise return true
> > > >     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> > > > @@ -12949,7 +13131,9 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> > > >  				  stmt_info, NULL, node)
> > > >  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > > > -				   stmt_info, NULL, node, cost_vec));
> > > > +				   stmt_info, NULL, node, cost_vec)
> > > > +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > > > +				      cost_vec));
> > > >    else
> > > >      {
> > > >        if (bb_vinfo)
> > > > @@ -12972,7 +13156,10 @@ vect_analyze_stmt (vec_info *vinfo,
> > > >  					 NULL, NULL, node, cost_vec)
> > > >  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> > > >  					  cost_vec)
> > > > -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > > > +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > > > +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > > > +					  cost_vec));
> > > > +
> > > >      }
> > > >
> > > >    if (node)
> > > > @@ -13131,6 +13318,12 @@ vect_transform_stmt (vec_info *vinfo,
> > > >        gcc_assert (done);
> > > >        break;
> > > >
> > > > +    case loop_exit_ctrl_vec_info_type:
> > > > +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> > > > +				      slp_node, NULL);
> > > > +      gcc_assert (done);
> > > > +      break;
> > > > +
> > > >      default:
> > > >        if (!STMT_VINFO_LIVE_P (stmt_info))
> > > >  	{
> > > > @@ -14321,10 +14514,19 @@ vect_get_vector_types_for_stmt (vec_info
> > > *vinfo, stmt_vec_info stmt_info,
> > > >      }
> > > >    else
> > > >      {
> > > > +      gcond *cond = NULL;
> > > >        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> > > >  	scalar_type = TREE_TYPE (DR_REF (dr));
> > > >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> > > >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > > > +      else if ((cond = dyn_cast <gcond *> (stmt)))
> > > > +	{
> > > > +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> > > > +	     single bit precision and we need the vector boolean to be a
> > > > +	     representation of the integer mask.  So set the correct integer type and
> > > > +	     convert to boolean vector once we have a vectype.  */
> > > > +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> > > > +	}
> > > >        else
> > > >  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> > > >
> > > > @@ -14339,12 +14541,18 @@ vect_get_vector_types_for_stmt (vec_info
> > > *vinfo, stmt_vec_info stmt_info,
> > > >  			     "get vectype for scalar type: %T\n", scalar_type);
> > > >  	}
> > > >        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> > > > +
> > > >        if (!vectype)
> > > >  	return opt_result::failure_at (stmt,
> > > >  				       "not vectorized:"
> > > >  				       " unsupported data-type %T\n",
> > > >  				       scalar_type);
> > > >
> > > > +      /* If we were a gcond, convert the resulting type to a vector boolean type
> > > now
> > > > +	 that we have the correct integer mask type.  */
> > > > +      if (cond)
> > > > +	vectype = truth_type_for (vectype);
> > > > +
> > > >        if (dump_enabled_p ())
> > > >  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
> > > >      }
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging
  2023-11-06  7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
@ 2023-12-09 10:38   ` Richard Sandiford
  2023-12-11  7:38     ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Sandiford @ 2023-12-09 10:38 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, rguenther, jlaw

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> What do people think about having the ability to force only the latch connected
> exit as the exit as a param? I.e. what's in the patch but as a param.
>
> I found this useful when debugging large example failures as it tells me where
> I should be looking.  No hard requirement but just figured I'd ask if we should.

If it's useful for that, then perhaps it would be worth making it a
DEBUG_COUNTER instead of a --param, for easy bisection.

Thanks,
Richard

>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* tree-vect-loop.cc (vec_init_loop_exit_info): Allow forcing of exit.
>
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198..d6b35372623e94e02965510ab557cb568c302ebe 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -964,6 +964,7 @@ vec_init_loop_exit_info (class loop *loop)
>    if (exits.length () == 1)
>      return exits[0];
>  
> +#if 0
>    /* If we have multiple exits we only support counting IV at the moment.  Analyze
>       all exits and return one */
>    class tree_niter_desc niter_desc;
> @@ -982,6 +983,16 @@ vec_init_loop_exit_info (class loop *loop)
>      }
>  
>    return candidate;
> +#else
> +  basic_block bb = ip_normal_pos (loop);
> +  if (!bb)
> +    return NULL;
> +
> +  edge exit = EDGE_SUCC (bb, 0);
> +  if (exit->dest == loop->latch)
> +    return EDGE_SUCC (bb, 1);
> +  return exit;
> +#endif
>  }
>  
>  /* Function bb_in_loop_p

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-08 13:59                 ` Richard Biener
  2023-12-08 15:01                   ` Tamar Christina
@ 2023-12-11  7:09                   ` Tamar Christina
  2023-12-11  9:36                     ` Richard Biener
  1 sibling, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-11  7:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 19809 bytes --]

> > >
> > > Hmm, but we're visiting them then?  I wonder how you get along
> > > without doing adjustmens on the uses if you consider
> > >
> > >     _1 = a < b;
> > >     _2 = c != d;
> > >     _3 = _1 | _2;
> > >     if (_3 != 0)
> > >       exit loop;
> > >
> > > thus a combined condition like
> > >
> > >     if (a < b || c != d)
> > >
> > > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > > mask uses and thus possibly adjust them.
> > >
> > > What bad happens if you drop 'analyze_only'?  We're not really
> > > rewriting anything there.
> >
> > You mean drop it only in the above? We then fail to update the type for
> > the gcond.  So in certain circumstances like with
> >
> > int a, c, d;
> > short b;
> >
> > int
> > main ()
> > {
> >   int e[1];
> >   for (; b < 2; b++)
> >     {
> >       a = 0;
> >       if (b == 28378)
> >         a = e[b];
> >       if (!(d || b))
> >         for (; c;)
> >           ;
> >     }
> >   return 0;
> > }
> >
> > Unless we walk the statements regardless of whether they come from inside the
> loop or not.
> 
> What do you mean by "fail to update the type for the gcond"?  If
> I understood correctly the 'analyze_only' short-cuts some
> checks, it doens't add some?
> 
> But it's hard to follow what's actually done for a gcond ...
> 

Yes so I had realized I had misunderstood what this pattern was doing and once
I had made the first wrong change it snowballed.

This is an updates patch where the only modification made is to check_bool_pattern
to also return the type of the overall expression even if we are going to handle the
conditional through an optab expansion.  I'm piggybacking on the fact that this function
has seen enough of the operands to be able to tell the precision needed when vectorizing.

This is needed because in the cases where the condition to the gcond was already a bool
The precision would be 1 bit, to find the actual mask since we have to dig through the
operands which this function already does.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
	(check_bool_pattern, vect_recog_bool_pattern): Support gconds type
	analysis.
	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..6bf1c0aba8ce94f70ce4e952efd1c5695b189690 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,10 +5211,12 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if VECTYPE then this value will contain the common type of the
+   operations making up the comparisons.  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    tree *vectype)
 {
   tree rhs1;
   enum tree_code rhs_code;
@@ -5234,27 +5237,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   vectype))
 	return false;
       break;
 
@@ -5272,6 +5276,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  if (comp_vectype == NULL_TREE)
 	    return false;
 
+	  if (vectype)
+	    *vectype = comp_vectype;
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
@@ -5608,13 +5614,28 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      /* If not multiple exits, and loop vectorization don't bother analyzing
+	 the gcond as we don't support SLP today.  */
+      lhs = gimple_cond_lhs (last_stmt);
+      var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
+  else
+    return NULL;
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5653,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5701,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5721,33 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      tree comp_type = NULL_TREE;
+      if (check_bool_pattern (var, vinfo, bool_stmts, &comp_type))
 	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
-      else if (integer_type_for_mask (var, vinfo))
+      else if (!cond && integer_type_for_mask (var, vinfo))
+	return NULL;
+      else if (cond && !comp_type)
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (NE_EXPR,
+				 var, build_int_cst (TREE_TYPE (var), 0),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = truth_type_for (comp_type);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5761,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..d0878250f6fb9de4d6e6a39d16956ca147be4b80 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,198 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  */
+      if (masked_loop_p)
+	for (auto stmt : stmts)
+	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
+						mask, stmt, &cond_gsi));
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13145,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13170,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13332,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14528,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14555,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

[-- Attachment #2: rb17969 (2).patch --]
[-- Type: application/octet-stream, Size: 16690 bytes --]

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..6bf1c0aba8ce94f70ce4e952efd1c5695b189690 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,10 +5211,12 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
    true if bool VAR can and should be optimized that way.  Assume it shouldn't
    in case it's a result of a comparison which can be directly vectorized into
    a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if VECTYPE then this value will contain the common type of the
+   operations making up the comparisons.  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
+		    tree *vectype)
 {
   tree rhs1;
   enum tree_code rhs_code;
@@ -5234,27 +5237,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
   switch (rhs_code)
     {
     case SSA_NAME:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     CASE_CONVERT:
       if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
 	return false;
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     case BIT_NOT_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
 	return false;
       break;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
-      if (! check_bool_pattern (rhs1, vinfo, stmts)
-	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype)
+	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+				   vectype))
 	return false;
       break;
 
@@ -5272,6 +5276,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
 	  if (comp_vectype == NULL_TREE)
 	    return false;
 
+	  if (vectype)
+	    *vectype = comp_vectype;
 	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
 							  TREE_TYPE (rhs1));
 	  if (mask_type
@@ -5608,13 +5614,28 @@ vect_recog_bool_pattern (vec_info *vinfo,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   gimple *pattern_stmt;
-
-  if (!is_gimple_assign (last_stmt))
+  gcond* cond = NULL;
+  if (!is_gimple_assign (last_stmt)
+      && !(cond = dyn_cast <gcond *> (last_stmt)))
     return NULL;
 
-  var = gimple_assign_rhs1 (last_stmt);
-  lhs = gimple_assign_lhs (last_stmt);
-  rhs_code = gimple_assign_rhs_code (last_stmt);
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (is_gimple_assign (last_stmt))
+    {
+      var = gimple_assign_rhs1 (last_stmt);
+      lhs = gimple_assign_lhs (last_stmt);
+      rhs_code = gimple_assign_rhs_code (last_stmt);
+    }
+  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    {
+      /* If not multiple exits, and loop vectorization don't bother analyzing
+	 the gcond as we don't support SLP today.  */
+      lhs = gimple_cond_lhs (last_stmt);
+      var = gimple_cond_lhs (last_stmt);
+      rhs_code = gimple_cond_code (last_stmt);
+    }
+  else
+    return NULL;
 
   if (rhs_code == VIEW_CONVERT_EXPR)
     var = TREE_OPERAND (var, 0);
@@ -5632,7 +5653,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 	return NULL;
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
 	{
 	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				   TREE_TYPE (lhs), stmt_vinfo);
@@ -5680,7 +5701,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
 
       return pattern_stmt;
     }
-  else if (rhs_code == COND_EXPR
+  else if ((rhs_code == COND_EXPR || cond)
 	   && TREE_CODE (var) == SSA_NAME)
     {
       vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
@@ -5700,18 +5721,33 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      tree comp_type = NULL_TREE;
+      if (check_bool_pattern (var, vinfo, bool_stmts, &comp_type))
 	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
-      else if (integer_type_for_mask (var, vinfo))
+      else if (!cond && integer_type_for_mask (var, vinfo))
+	return NULL;
+      else if (cond && !comp_type)
 	return NULL;
 
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      pattern_stmt 
-	= gimple_build_assign (lhs, COND_EXPR,
-			       build2 (NE_EXPR, boolean_type_node,
-				       var, build_int_cst (TREE_TYPE (var), 0)),
-			       gimple_assign_rhs2 (last_stmt),
-			       gimple_assign_rhs3 (last_stmt));
+      if (!cond)
+	{
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  pattern_stmt
+	    = gimple_build_assign (lhs, COND_EXPR,
+				   build2 (NE_EXPR, boolean_type_node, var,
+					   build_int_cst (TREE_TYPE (var), 0)),
+				   gimple_assign_rhs2 (last_stmt),
+				   gimple_assign_rhs3 (last_stmt));
+	}
+      else
+	{
+	  pattern_stmt
+	    = gimple_build_cond (NE_EXPR,
+				 var, build_int_cst (TREE_TYPE (var), 0),
+				 gimple_cond_true_label (cond),
+				 gimple_cond_false_label (cond));
+	  vectype = truth_type_for (comp_type);
+	}
       *type_out = vectype;
       vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
 
@@ -5725,7 +5761,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
       if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
 
-      if (check_bool_pattern (var, vinfo, bool_stmts))
+      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
 	rhs = adjust_bool_stmts (vinfo, bool_stmts,
 				 TREE_TYPE (vectype), stmt_vinfo);
       else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..d0878250f6fb9de4d6e6a39d16956ca147be4b80 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,198 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype);
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  */
+      if (masked_loop_p)
+	for (auto stmt : stmts)
+	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
+						mask, stmt, &cond_gsi));
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13145,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13170,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13332,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14528,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14555,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging
  2023-12-09 10:38   ` Richard Sandiford
@ 2023-12-11  7:38     ` Richard Biener
  2023-12-11  8:49       ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-12-11  7:38 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Tamar Christina, gcc-patches, nd, jlaw

On Sat, 9 Dec 2023, Richard Sandiford wrote:

> Tamar Christina <tamar.christina@arm.com> writes:
> > Hi All,
> >
> > What do people think about having the ability to force only the latch connected
> > exit as the exit as a param? I.e. what's in the patch but as a param.
> >
> > I found this useful when debugging large example failures as it tells me where
> > I should be looking.  No hard requirement but just figured I'd ask if we should.
> 
> If it's useful for that, then perhaps it would be worth making it a
> DEBUG_COUNTER instead of a --param, for easy bisection.

Or even better, make a debug counter that would skip the IV edge and
choose the "next".

Richard.

> Thanks,
> Richard
> 
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-loop.cc (vec_init_loop_exit_info): Allow forcing of exit.
> >
> > --- inline copy of patch -- 
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198..d6b35372623e94e02965510ab557cb568c302ebe 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -964,6 +964,7 @@ vec_init_loop_exit_info (class loop *loop)
> >    if (exits.length () == 1)
> >      return exits[0];
> >  
> > +#if 0
> >    /* If we have multiple exits we only support counting IV at the moment.  Analyze
> >       all exits and return one */
> >    class tree_niter_desc niter_desc;
> > @@ -982,6 +983,16 @@ vec_init_loop_exit_info (class loop *loop)
> >      }
> >  
> >    return candidate;
> > +#else
> > +  basic_block bb = ip_normal_pos (loop);
> > +  if (!bb)
> > +    return NULL;
> > +
> > +  edge exit = EDGE_SUCC (bb, 0);
> > +  if (exit->dest == loop->latch)
> > +    return EDGE_SUCC (bb, 1);
> > +  return exit;
> > +#endif
> >  }
> >  
> >  /* Function bb_in_loop_p
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging
  2023-12-11  7:38     ` Richard Biener
@ 2023-12-11  8:49       ` Tamar Christina
  2023-12-11  9:00         ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-11  8:49 UTC (permalink / raw)
  To: Richard Biener, Richard Sandiford; +Cc: gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, December 11, 2023 7:38 AM
> To: Richard Sandiford <Richard.Sandiford@arm.com>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
> <nd@arm.com>; jlaw@ventanamicro.com
> Subject: Re: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final
> edge for debugging
> 
> On Sat, 9 Dec 2023, Richard Sandiford wrote:
> 
> > Tamar Christina <tamar.christina@arm.com> writes:
> > > Hi All,
> > >
> > > What do people think about having the ability to force only the latch connected
> > > exit as the exit as a param? I.e. what's in the patch but as a param.
> > >
> > > I found this useful when debugging large example failures as it tells me where
> > > I should be looking.  No hard requirement but just figured I'd ask if we should.
> >
> > If it's useful for that, then perhaps it would be worth making it a
> > DEBUG_COUNTER instead of a --param, for easy bisection.
> 
> Or even better, make a debug counter that would skip the IV edge and
> choose the "next".
> 

Ah, I'd never heard of debug counters. They look very useful!

Did you mean everytime the counter is reached it picks the n-th successor?

So If the counter is hit twice it picks the 3rd exit?

Thanks,
Tamar

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging
  2023-12-11  8:49       ` Tamar Christina
@ 2023-12-11  9:00         ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-12-11  9:00 UTC (permalink / raw)
  To: Tamar Christina; +Cc: Richard Sandiford, gcc-patches, nd, jlaw

On Mon, 11 Dec 2023, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, December 11, 2023 7:38 AM
> > To: Richard Sandiford <Richard.Sandiford@arm.com>
> > Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
> > <nd@arm.com>; jlaw@ventanamicro.com
> > Subject: Re: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final
> > edge for debugging
> > 
> > On Sat, 9 Dec 2023, Richard Sandiford wrote:
> > 
> > > Tamar Christina <tamar.christina@arm.com> writes:
> > > > Hi All,
> > > >
> > > > What do people think about having the ability to force only the latch connected
> > > > exit as the exit as a param? I.e. what's in the patch but as a param.
> > > >
> > > > I found this useful when debugging large example failures as it tells me where
> > > > I should be looking.  No hard requirement but just figured I'd ask if we should.
> > >
> > > If it's useful for that, then perhaps it would be worth making it a
> > > DEBUG_COUNTER instead of a --param, for easy bisection.
> > 
> > Or even better, make a debug counter that would skip the IV edge and
> > choose the "next".
> > 
> 
> Ah, I'd never heard of debug counters. They look very useful!
> 
> Did you mean everytime the counter is reached it picks the n-th successor?
> 
> So If the counter is hit twice it picks the 3rd exit?

  if (!dbg_cnt (...))
    do not take this exit, try next

which means it might even fail to find an exit.


> Thanks,
> Tamar
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-11  7:09                   ` Tamar Christina
@ 2023-12-11  9:36                     ` Richard Biener
  2023-12-11 23:12                       ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-12-11  9:36 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Mon, 11 Dec 2023, Tamar Christina wrote:

> > > >
> > > > Hmm, but we're visiting them then?  I wonder how you get along
> > > > without doing adjustmens on the uses if you consider
> > > >
> > > >     _1 = a < b;
> > > >     _2 = c != d;
> > > >     _3 = _1 | _2;
> > > >     if (_3 != 0)
> > > >       exit loop;
> > > >
> > > > thus a combined condition like
> > > >
> > > >     if (a < b || c != d)
> > > >
> > > > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > > > mask uses and thus possibly adjust them.
> > > >
> > > > What bad happens if you drop 'analyze_only'?  We're not really
> > > > rewriting anything there.
> > >
> > > You mean drop it only in the above? We then fail to update the type for
> > > the gcond.  So in certain circumstances like with
> > >
> > > int a, c, d;
> > > short b;
> > >
> > > int
> > > main ()
> > > {
> > >   int e[1];
> > >   for (; b < 2; b++)
> > >     {
> > >       a = 0;
> > >       if (b == 28378)
> > >         a = e[b];
> > >       if (!(d || b))
> > >         for (; c;)
> > >           ;
> > >     }
> > >   return 0;
> > > }
> > >
> > > Unless we walk the statements regardless of whether they come from inside the
> > loop or not.
> > 
> > What do you mean by "fail to update the type for the gcond"?  If
> > I understood correctly the 'analyze_only' short-cuts some
> > checks, it doens't add some?
> > 
> > But it's hard to follow what's actually done for a gcond ...
> > 
> 
> Yes so I had realized I had misunderstood what this pattern was doing and once
> I had made the first wrong change it snowballed.
> 
> This is an updates patch where the only modification made is to check_bool_pattern
> to also return the type of the overall expression even if we are going to handle the
> conditional through an optab expansion.  I'm piggybacking on the fact that this function
> has seen enough of the operands to be able to tell the precision needed when vectorizing.
> 
> This is needed because in the cases where the condition to the gcond was already a bool
> The precision would be 1 bit, to find the actual mask since we have to dig through the
> operands which this function already does.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> 	(check_bool_pattern, vect_recog_bool_pattern): Support gconds type
> 	analysis.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..6bf1c0aba8ce94f70ce4e952efd1c5695b189690 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5210,10 +5211,12 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
>     true if bool VAR can and should be optimized that way.  Assume it shouldn't
>     in case it's a result of a comparison which can be directly vectorized into
>     a vector comparison.  Fills in STMTS with all stmts visited during the
> -   walk.  */
> +   walk.  if VECTYPE then this value will contain the common type of the
> +   operations making up the comparisons.  */
>  
>  static bool
> -check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
> +check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts,
> +		    tree *vectype)
>  {
>    tree rhs1;
>    enum tree_code rhs_code;
> @@ -5234,27 +5237,28 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>    switch (rhs_code)
>      {
>      case SSA_NAME:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
>  	return false;
>        break;
>  
>      CASE_CONVERT:
>        if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
>  	return false;
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
>  	return false;
>        break;
>  
>      case BIT_NOT_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
>  	return false;
>        break;
>  
>      case BIT_AND_EXPR:
>      case BIT_IOR_EXPR:
>      case BIT_XOR_EXPR:
> -      if (! check_bool_pattern (rhs1, vinfo, stmts)
> -	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
> +      if (! check_bool_pattern (rhs1, vinfo, stmts, vectype)
> +	  || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
> +				   vectype))
>  	return false;
>        break;
>  
> @@ -5272,6 +5276,8 @@ check_bool_pattern (tree var, vec_info *vinfo, hash_set<gimple *> &stmts)
>  	  if (comp_vectype == NULL_TREE)
>  	    return false;
>  
> +	  if (vectype)
> +	    *vectype = comp_vectype;
>  	  tree mask_type = get_mask_type_for_scalar_type (vinfo,
>  							  TREE_TYPE (rhs1));
>  	  if (mask_type
> @@ -5608,13 +5614,28 @@ vect_recog_bool_pattern (vec_info *vinfo,
>    enum tree_code rhs_code;
>    tree var, lhs, rhs, vectype;
>    gimple *pattern_stmt;
> -
> -  if (!is_gimple_assign (last_stmt))
> +  gcond* cond = NULL;
> +  if (!is_gimple_assign (last_stmt)
> +      && !(cond = dyn_cast <gcond *> (last_stmt)))
>      return NULL;

I still think the code will be much easier to follow if you add

     if (gcond *cond = dyn_cast <gcond *> (last_stmt))
       {
         thread to all branches
         return;
       }

     if (!is_gimple_assign (last_stmt))
       return NULL;

     .. original code unchanged ..

you can then also choose better names for the local variables.

> -  var = gimple_assign_rhs1 (last_stmt);
> -  lhs = gimple_assign_lhs (last_stmt);
> -  rhs_code = gimple_assign_rhs_code (last_stmt);
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (is_gimple_assign (last_stmt))
> +    {
> +      var = gimple_assign_rhs1 (last_stmt);
> +      lhs = gimple_assign_lhs (last_stmt);
> +      rhs_code = gimple_assign_rhs_code (last_stmt);
> +    }
> +  else if (loop_vinfo && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    {
> +      /* If not multiple exits, and loop vectorization don't bother analyzing
> +	 the gcond as we don't support SLP today.  */
> +      lhs = gimple_cond_lhs (last_stmt);
> +      var = gimple_cond_lhs (last_stmt);
> +      rhs_code = gimple_cond_code (last_stmt);
> +    }
> +  else
> +    return NULL;
>  
>    if (rhs_code == VIEW_CONVERT_EXPR)
>      var = TREE_OPERAND (var, 0);
> @@ -5632,7 +5653,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  	return NULL;
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
>  	{
>  	  rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				   TREE_TYPE (lhs), stmt_vinfo);
> @@ -5680,7 +5701,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>  
>        return pattern_stmt;
>      }
> -  else if (rhs_code == COND_EXPR
> +  else if ((rhs_code == COND_EXPR || cond)
>  	   && TREE_CODE (var) == SSA_NAME)
>      {
>        vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
> @@ -5700,18 +5721,33 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      tree comp_type = NULL_TREE;
> +      if (check_bool_pattern (var, vinfo, bool_stmts, &comp_type))
>  	var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
> -      else if (integer_type_for_mask (var, vinfo))
> +      else if (!cond && integer_type_for_mask (var, vinfo))
> +	return NULL;
> +      else if (cond && !comp_type)
>  	return NULL;
>  
> -      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> -      pattern_stmt 
> -	= gimple_build_assign (lhs, COND_EXPR,
> -			       build2 (NE_EXPR, boolean_type_node,
> -				       var, build_int_cst (TREE_TYPE (var), 0)),
> -			       gimple_assign_rhs2 (last_stmt),
> -			       gimple_assign_rhs3 (last_stmt));
> +      if (!cond)
> +	{
> +	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> +	  pattern_stmt
> +	    = gimple_build_assign (lhs, COND_EXPR,
> +				   build2 (NE_EXPR, boolean_type_node, var,
> +					   build_int_cst (TREE_TYPE (var), 0)),
> +				   gimple_assign_rhs2 (last_stmt),
> +				   gimple_assign_rhs3 (last_stmt));
> +	}
> +      else
> +	{
> +	  pattern_stmt
> +	    = gimple_build_cond (NE_EXPR,
> +				 var, build_int_cst (TREE_TYPE (var), 0),
> +				 gimple_cond_true_label (cond),
> +				 gimple_cond_false_label (cond));

the labels are always NULL, so just use NULL_TREE for them.

> +	  vectype = truth_type_for (comp_type);

so this leaves the producer of the mask in the GIMPLE_COND and we
vectorize the GIMPLE_COND as

  mask_1 = ...;
  if (mask_1 != {-1,-1...})
    ..

?  In principle only the mask producer needs a vector type and that
adjusted by bool handling, the branch itself doesn't need any
STMT_VINFO_VECTYPE.

As said I believe if you recognize a GIMPLE_COND pattern for conds
that aren't bool != 0 producing the mask stmt this should be picked
up by bool handling correctly already.

Also as said piggy-backing on the COND_EXPR handling in this function
which has the condition split out into a separate stmt(!) might not
completely handle things correctly and you are likely missing
the tcc_comparison handling of the embedded compare.

> +	}
>        *type_out = vectype;
>        vect_pattern_detected ("vect_recog_bool_pattern", last_stmt);
>  
> @@ -5725,7 +5761,7 @@ vect_recog_bool_pattern (vec_info *vinfo,
>        if (!vectype || !VECTOR_MODE_P (TYPE_MODE (vectype)))
>  	return NULL;
>  
> -      if (check_bool_pattern (var, vinfo, bool_stmts))
> +      if (check_bool_pattern (var, vinfo, bool_stmts, NULL))
>  	rhs = adjust_bool_stmts (vinfo, bool_stmts,
>  				 TREE_TYPE (vectype), stmt_vinfo);
>        else
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..d0878250f6fb9de4d6e6a39d16956ca147be4b80 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,198 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  gcc_assert (vectype);
> +
> +  tree vectype_op0 = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype_op0))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }
> +
> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  tree mask = NULL_TREE;
> +  if (masked_loop_p)
> +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +
> +      /* Mask the statements as we queue them up.  */
> +      if (masked_loop_p)
> +	for (auto stmt : stmts)
> +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> +						mask, stmt, &cond_gsi));
> +      else
> +	workset.splice (stmts);
> +
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  /* If we have multiple statements after reduction we should check all the
> +     lanes and treat it as a full vector.  */
> +  if (masked_loop_p)
> +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			     &cond_gsi);

You didn't fix any of the code above it seems, it's still wrong.

Richard.

> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  /* When vectorizing we assume that if the branch edge is taken that we're
> +     exiting the loop.  This is not however always the case as the compiler will
> +     rewrite conditions to always be a comparison against 0.  To do this it
> +     sometimes flips the edges.  This is fine for scalar,  but for vector we
> +     then have to flip the test, as we're still assuming that if you take the
> +     branch edge that we found the exit condition.  */
> +  auto new_code = NE_EXPR;
> +  tree cst = build_zero_cst (vectype);
> +  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> +			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
> +    {
> +      new_code = EQ_EXPR;
> +      cst = build_minus_one_cst (vectype);
> +    }
> +
> +  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13145,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13170,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13332,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,10 +14528,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      gcond *cond = NULL;
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else if ((cond = dyn_cast <gcond *> (stmt)))
> +	{
> +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> +	     single bit precision and we need the vector boolean to be a
> +	     representation of the integer mask.  So set the correct integer type and
> +	     convert to boolean vector once we have a vectype.  */
> +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> +	}
>        else
>  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>  
> @@ -14339,12 +14555,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>  			     "get vectype for scalar type: %T\n", scalar_type);
>  	}
>        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +
>        if (!vectype)
>  	return opt_result::failure_at (stmt,
>  				       "not vectorized:"
>  				       " unsupported data-type %T\n",
>  				       scalar_type);
>  
> +      /* If we were a gcond, convert the resulting type to a vector boolean type now
> +	 that we have the correct integer mask type.  */
> +      if (cond)
> +	vectype = truth_type_for (vectype);
> +
>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
>      }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-11  9:36                     ` Richard Biener
@ 2023-12-11 23:12                       ` Tamar Christina
  2023-12-12 10:10                         ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-11 23:12 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 19900 bytes --]

> > +	  vectype = truth_type_for (comp_type);
> 
> so this leaves the producer of the mask in the GIMPLE_COND and we
> vectorize the GIMPLE_COND as
> 
>   mask_1 = ...;
>   if (mask_1 != {-1,-1...})
>     ..
> 
> ?  In principle only the mask producer needs a vector type and that
> adjusted by bool handling, the branch itself doesn't need any
> STMT_VINFO_VECTYPE.
> 
> As said I believe if you recognize a GIMPLE_COND pattern for conds
> that aren't bool != 0 producing the mask stmt this should be picked
> up by bool handling correctly already.
> 
> Also as said piggy-backing on the COND_EXPR handling in this function
> which has the condition split out into a separate stmt(!) might not
> completely handle things correctly and you are likely missing
> the tcc_comparison handling of the embedded compare.
> 

Ok, I've stopped piggy-backing on the COND_EXPR handling and created
vect_recog_gcond_pattern.  As you said in the previous email I've also
stopped setting the vectype for the gcond and instead use the type of the
operand.

Note that because the pattern doesn't apply if you were already an NE_EXPR
I do need the extra truth_type_for for that case.  Because in the case of e.g.

a = b > 4;
If (a != 0)

The producer of the mask is already outside of the cond but will not trigger
Boolean recognition.  That means that while the integral type is correct it
Won't be a Boolean one and vectorable_comparison expects a Boolean
vector.  Alternatively, we can remove that assert?  But that seems worse.

Additionally in the previous email you mention "adjusted Boolean statement".

I'm guessing you were referring to generating a COND_EXPR from the gcond.
So vect_recog_bool_pattern detects it?  The problem with that this gets folded
to x & 1 and doesn't trigger.  It also then blocks vectorization.  So instead I've
not forced it.

> > +  /* Determine if we need to reduce the final value.  */
> > +  if (stmts.length () > 1)
> > +    {
> > +      /* We build the reductions in a way to maintain as much parallelism as
> > +	 possible.  */
> > +      auto_vec<tree> workset (stmts.length ());
> > +
> > +      /* Mask the statements as we queue them up.  */
> > +      if (masked_loop_p)
> > +	for (auto stmt : stmts)
> > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > +						mask, stmt, &cond_gsi));
> > +      else
> > +	workset.splice (stmts);
> > +
> > +      while (workset.length () > 1)
> > +	{
> > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > +	  tree arg0 = workset.pop ();
> > +	  tree arg1 = workset.pop ();
> > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > +				       &cond_gsi);
> > +	  workset.quick_insert (0, new_temp);
> > +	}
> > +    }
> > +  else
> > +    new_temp = stmts[0];
> > +
> > +  gcc_assert (new_temp);
> > +
> > +  tree cond = new_temp;
> > +  /* If we have multiple statements after reduction we should check all the
> > +     lanes and treat it as a full vector.  */
> > +  if (masked_loop_p)
> > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > +			     &cond_gsi);
> 
> You didn't fix any of the code above it seems, it's still wrong.
> 

Apologies, I hadn't realized that the last argument to get_loop_mask was the index.

Should be fixed now. Is this closer to what you wanted?
The individual ops are now masked with separate masks. (See testcase when N=865).

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
	(vect_recog_gcond_pattern): New.
	(vect_vect_recog_func_ptrs): Use it.
	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-early-break_88.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
new file mode 100644
index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
@@ -0,0 +1,36 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(double x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7.0) != 0)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df577c08adffa44e71b 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
   return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
 }
 
+/* Function vect_recog_gcond_pattern
+
+   Try to find pattern like following:
+
+     if (a op b)
+
+   where operator 'op' is not != and convert it to an adjusted boolean pattern
+
+     mask = a op b
+     if (mask != 0)
+
+   and set the mask type on MASK.
+
+   Input:
+
+   * STMT_VINFO: The stmt at the end from which the pattern
+		 search begins, i.e. cast of a bool to
+		 an integer type.
+
+   Output:
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the pattern.  */
+
+static gimple *
+vect_recog_gcond_pattern (vec_info *vinfo,
+			 stmt_vec_info stmt_vinfo, tree *type_out)
+{
+  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
+  gcond* cond = NULL;
+  if (!(cond = dyn_cast <gcond *> (last_stmt)))
+    return NULL;
+
+  auto lhs = gimple_cond_lhs (cond);
+  auto rhs = gimple_cond_rhs (cond);
+  auto code = gimple_cond_code (cond);
+
+  tree scalar_type = TREE_TYPE (lhs);
+  if (VECTOR_TYPE_P (scalar_type))
+    return NULL;
+
+  if (code == NE_EXPR && zerop (rhs))
+    return NULL;
+
+  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  /* Build a scalar type for the boolean result that when vectorized matches the
+     vector type of the result in size and number of elements.  */
+  unsigned prec
+    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
+			   TYPE_VECTOR_SUBPARTS (vecitype));
+
+  scalar_type
+    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
+
+  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  tree vectype = truth_type_for (vecitype);
+
+  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
+  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
+  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
+
+  gimple *pattern_stmt
+    = gimple_build_cond (NE_EXPR, new_lhs,
+			 build_int_cst (TREE_TYPE (new_lhs), 0),
+			 NULL_TREE, NULL_TREE);
+  *type_out = vectype;
+  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
+  return pattern_stmt;
+}
+
 /* Function vect_recog_bool_pattern
 
    Try to find pattern like following:
@@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
+  { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
   /* This must come before mask conversion, and includes the parts
      of mask conversion that are needed for gather and scatter
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea2e00b4450023f9 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  DUMP_VECT_SCOPE ("vectorizable_early_exit");
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  stmt_vec_info op0_info = vinfo->lookup_def (op0);
+  tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info));
+  gcc_assert (vectype);
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  Normally we loop over
+	 vec_num,  but since we inspect the exact results of vectorization
+	 we don't need to and instead can just use the stmts themselves.  */
+      if (masked_loop_p)
+	for (unsigned i = 0; i < stmts.length (); i++)
+	  {
+	    tree stmt_mask
+	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
+				    i);
+	    stmt_mask
+	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
+				  stmts[i], &cond_gsi);
+	    workset.quick_push (stmt_mask);
+	  }
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

[-- Attachment #2: rb17969.patch --]
[-- Type: application/octet-stream, Size: 15296 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
new file mode 100644
index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
@@ -0,0 +1,36 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(double x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7.0) != 0)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df577c08adffa44e71b 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
   return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
 }
 
+/* Function vect_recog_gcond_pattern
+
+   Try to find pattern like following:
+
+     if (a op b)
+
+   where operator 'op' is not != and convert it to an adjusted boolean pattern
+
+     mask = a op b
+     if (mask != 0)
+
+   and set the mask type on MASK.
+
+   Input:
+
+   * STMT_VINFO: The stmt at the end from which the pattern
+		 search begins, i.e. cast of a bool to
+		 an integer type.
+
+   Output:
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the pattern.  */
+
+static gimple *
+vect_recog_gcond_pattern (vec_info *vinfo,
+			 stmt_vec_info stmt_vinfo, tree *type_out)
+{
+  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
+  gcond* cond = NULL;
+  if (!(cond = dyn_cast <gcond *> (last_stmt)))
+    return NULL;
+
+  auto lhs = gimple_cond_lhs (cond);
+  auto rhs = gimple_cond_rhs (cond);
+  auto code = gimple_cond_code (cond);
+
+  tree scalar_type = TREE_TYPE (lhs);
+  if (VECTOR_TYPE_P (scalar_type))
+    return NULL;
+
+  if (code == NE_EXPR && zerop (rhs))
+    return NULL;
+
+  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  /* Build a scalar type for the boolean result that when vectorized matches the
+     vector type of the result in size and number of elements.  */
+  unsigned prec
+    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
+			   TYPE_VECTOR_SUBPARTS (vecitype));
+
+  scalar_type
+    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
+
+  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  tree vectype = truth_type_for (vecitype);
+
+  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
+  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
+  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
+
+  gimple *pattern_stmt
+    = gimple_build_cond (NE_EXPR, new_lhs,
+			 build_int_cst (TREE_TYPE (new_lhs), 0),
+			 NULL_TREE, NULL_TREE);
+  *type_out = vectype;
+  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
+  return pattern_stmt;
+}
+
 /* Function vect_recog_bool_pattern
 
    Try to find pattern like following:
@@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
+  { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
   /* This must come before mask conversion, and includes the parts
      of mask conversion that are needed for gather and scatter
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea2e00b4450023f9 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  DUMP_VECT_SCOPE ("vectorizable_early_exit");
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+
+  tree vectype_op0 = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype_op0))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  stmt_vec_info op0_info = vinfo->lookup_def (op0);
+  tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info));
+  gcc_assert (vectype);
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  tree mask = NULL_TREE;
+  if (masked_loop_p)
+    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  Normally we loop over
+	 vec_num,  but since we inspect the exact results of vectorization
+	 we don't need to and instead can just use the stmts themselves.  */
+      if (masked_loop_p)
+	for (unsigned i = 0; i < stmts.length (); i++)
+	  {
+	    tree stmt_mask
+	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
+				    i);
+	    stmt_mask
+	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
+				  stmts[i], &cond_gsi);
+	    workset.quick_push (stmt_mask);
+	  }
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    new_temp = stmts[0];
+
+  gcc_assert (new_temp);
+
+  tree cond = new_temp;
+  /* If we have multiple statements after reduction we should check all the
+     lanes and treat it as a full vector.  */
+  if (masked_loop_p)
+    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
+			     &cond_gsi);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      gcond *cond = NULL;
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
 	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else if ((cond = dyn_cast <gcond *> (stmt)))
+	{
+	  /* We can't convert the scalar type to boolean yet, since booleans have a
+	     single bit precision and we need the vector boolean to be a
+	     representation of the integer mask.  So set the correct integer type and
+	     convert to boolean vector once we have a vectype.  */
+	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
+	}
       else
 	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 
@@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 			     "get vectype for scalar type: %T\n", scalar_type);
 	}
       vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+
       if (!vectype)
 	return opt_result::failure_at (stmt,
 				       "not vectorized:"
 				       " unsupported data-type %T\n",
 				       scalar_type);
 
+      /* If we were a gcond, convert the resulting type to a vector boolean type now
+	 that we have the correct integer mask type.  */
+      if (cond)
+	vectype = truth_type_for (vectype);
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
     }

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-11 23:12                       ` Tamar Christina
@ 2023-12-12 10:10                         ` Richard Biener
  2023-12-12 10:27                           ` Tamar Christina
  2023-12-12 10:59                           ` Richard Sandiford
  0 siblings, 2 replies; 200+ messages in thread
From: Richard Biener @ 2023-12-12 10:10 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw, richard.sandiford

On Mon, 11 Dec 2023, Tamar Christina wrote:

> > > +	  vectype = truth_type_for (comp_type);
> > 
> > so this leaves the producer of the mask in the GIMPLE_COND and we
> > vectorize the GIMPLE_COND as
> > 
> >   mask_1 = ...;
> >   if (mask_1 != {-1,-1...})
> >     ..
> > 
> > ?  In principle only the mask producer needs a vector type and that
> > adjusted by bool handling, the branch itself doesn't need any
> > STMT_VINFO_VECTYPE.
> > 
> > As said I believe if you recognize a GIMPLE_COND pattern for conds
> > that aren't bool != 0 producing the mask stmt this should be picked
> > up by bool handling correctly already.
> > 
> > Also as said piggy-backing on the COND_EXPR handling in this function
> > which has the condition split out into a separate stmt(!) might not
> > completely handle things correctly and you are likely missing
> > the tcc_comparison handling of the embedded compare.
> > 
> 
> Ok, I've stopped piggy-backing on the COND_EXPR handling and created
> vect_recog_gcond_pattern.  As you said in the previous email I've also
> stopped setting the vectype for the gcond and instead use the type of the
> operand.
> 
> Note that because the pattern doesn't apply if you were already an NE_EXPR
> I do need the extra truth_type_for for that case.  Because in the case of e.g.
> 
> a = b > 4;
> If (a != 0)
> 
> The producer of the mask is already outside of the cond but will not trigger
> Boolean recognition.

It should trigger because we have a mask use of 'a', I always forget
where we do that - it might be where we compute mask precision stuff
or it might be bool pattern recognition itself ...

That said, a GIMPLE_COND (be it pattern or not) should be recognized
as mask use.

>  That means that while the integral type is correct it
> Won't be a Boolean one and vectorable_comparison expects a Boolean
> vector.  Alternatively, we can remove that assert?  But that seems worse.
> 
> Additionally in the previous email you mention "adjusted Boolean statement".
> 
> I'm guessing you were referring to generating a COND_EXPR from the gcond.
> So vect_recog_bool_pattern detects it?  The problem with that this gets folded
> to x & 1 and doesn't trigger.  It also then blocks vectorization.  So instead I've
> not forced it.

Not sure what you are refering to, but no - we shouln't generate a
COND_EXPR from the gcond.  Pattern recog generates COND_EXPRs for
_data_ uses of masks (if we need a 'bool' data type for storing).
We then get mask != 0 ? true : false;

> > > +  /* Determine if we need to reduce the final value.  */
> > > +  if (stmts.length () > 1)
> > > +    {
> > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > +	 possible.  */
> > > +      auto_vec<tree> workset (stmts.length ());
> > > +
> > > +      /* Mask the statements as we queue them up.  */
> > > +      if (masked_loop_p)
> > > +	for (auto stmt : stmts)
> > > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > > +						mask, stmt, &cond_gsi));
> > > +      else
> > > +	workset.splice (stmts);
> > > +
> > > +      while (workset.length () > 1)
> > > +	{
> > > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > > +	  tree arg0 = workset.pop ();
> > > +	  tree arg1 = workset.pop ();
> > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > +				       &cond_gsi);
> > > +	  workset.quick_insert (0, new_temp);
> > > +	}
> > > +    }
> > > +  else
> > > +    new_temp = stmts[0];
> > > +
> > > +  gcc_assert (new_temp);
> > > +
> > > +  tree cond = new_temp;
> > > +  /* If we have multiple statements after reduction we should check all the
> > > +     lanes and treat it as a full vector.  */
> > > +  if (masked_loop_p)
> > > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > +			     &cond_gsi);
> > 
> > You didn't fix any of the code above it seems, it's still wrong.
> > 
> 
> Apologies, I hadn't realized that the last argument to get_loop_mask was the index.
> 
> Should be fixed now. Is this closer to what you wanted?
> The individual ops are now masked with separate masks. (See testcase when N=865).
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> 	(vect_recog_gcond_pattern): New.
> 	(vect_vect_recog_func_ptrs): Use it.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-early-break_88.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> @@ -0,0 +1,36 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 5
> +#endif
> +float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
> +unsigned vect_b[N] = { 0 };
> +
> +__attribute__ ((noinline, noipa))
> +unsigned test4(double x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +  if (test4 (7.0) != 0)
> +    abort ();
> +
> +  if (vect_b[2] != 0 && vect_b[1] == 0)
> +    abort ();
> +}
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df577c08adffa44e71b 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
>    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
>  }
>  
> +/* Function vect_recog_gcond_pattern
> +
> +   Try to find pattern like following:
> +
> +     if (a op b)
> +
> +   where operator 'op' is not != and convert it to an adjusted boolean pattern
> +
> +     mask = a op b
> +     if (mask != 0)
> +
> +   and set the mask type on MASK.
> +
> +   Input:
> +
> +   * STMT_VINFO: The stmt at the end from which the pattern
> +		 search begins, i.e. cast of a bool to
> +		 an integer type.
> +
> +   Output:
> +
> +   * TYPE_OUT: The type of the output of this pattern.
> +
> +   * Return value: A new stmt that will be used to replace the pattern.  */
> +
> +static gimple *
> +vect_recog_gcond_pattern (vec_info *vinfo,
> +			 stmt_vec_info stmt_vinfo, tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +  gcond* cond = NULL;
> +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
> +    return NULL;
> +
> +  auto lhs = gimple_cond_lhs (cond);
> +  auto rhs = gimple_cond_rhs (cond);
> +  auto code = gimple_cond_code (cond);
> +
> +  tree scalar_type = TREE_TYPE (lhs);
> +  if (VECTOR_TYPE_P (scalar_type))
> +    return NULL;
> +
> +  if (code == NE_EXPR && zerop (rhs))

I think you need && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) here,
an integer != 0 would not be an appropriate mask.  I guess two
relevant testcases would have an early exit like

   if (here[i] != 0)
     break;

once with a 'bool here[]' and once with a 'int here[]'.

> +    return NULL;
> +
> +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +  if (vecitype == NULL_TREE)
> +    return NULL;
> +
> +  /* Build a scalar type for the boolean result that when vectorized matches the
> +     vector type of the result in size and number of elements.  */
> +  unsigned prec
> +    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
> +			   TYPE_VECTOR_SUBPARTS (vecitype));
> +
> +  scalar_type
> +    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
> +
> +  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +  if (vecitype == NULL_TREE)
> +    return NULL;
> +
> +  tree vectype = truth_type_for (vecitype);

That looks awfully complicated.  I guess one complication is that
we compute mask_precision & friends before this pattern gets
recognized.  See vect_determine_mask_precision and its handling
of tcc_comparison, see also integer_type_for_mask.  For comparisons
properly handled during pattern recog the vector type is determined
in vect_get_vector_types_for_stmt via

  else if (vect_use_mask_type_p (stmt_info))
    {
      unsigned int precision = stmt_info->mask_precision;
      scalar_type = build_nonstandard_integer_type (precision, 1);
      vectype = get_mask_type_for_scalar_type (vinfo, scalar_type, 
group_size);
      if (!vectype)
        return opt_result::failure_at (stmt, "not vectorized: unsupported"
                                       " data-type %T\n", scalar_type);

Richard, do you have any advice here?  I suppose vect_determine_precisions
needs to handle the gcond case with bool != 0 somehow and for the
extra mask producer we add here we have to emulate what it would have 
done, right?

> +  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
> +  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
> +  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
> +
> +  gimple *pattern_stmt
> +    = gimple_build_cond (NE_EXPR, new_lhs,
> +			 build_int_cst (TREE_TYPE (new_lhs), 0),
> +			 NULL_TREE, NULL_TREE);
> +  *type_out = vectype;
> +  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
> +  return pattern_stmt;
> +}
> +
>  /* Function vect_recog_bool_pattern
>  
>     Try to find pattern like following:
> @@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>    { vect_recog_divmod_pattern, "divmod" },
>    { vect_recog_mult_pattern, "mult" },
>    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> +  { vect_recog_gcond_pattern, "gcond" },
>    { vect_recog_bool_pattern, "bool" },
>    /* This must come before mask conversion, and includes the parts
>       of mask conversion that are needed for gather and scatter
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea2e00b4450023f9 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  DUMP_VECT_SCOPE ("vectorizable_early_exit");
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +
> +  tree vectype_op0 = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype_op0))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }
> +
> +  stmt_vec_info op0_info = vinfo->lookup_def (op0);
> +  tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info));
> +  gcc_assert (vectype);
> +
> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  tree mask = NULL_TREE;
> +  if (masked_loop_p)
> +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +
> +      /* Mask the statements as we queue them up.  Normally we loop over
> +	 vec_num,  but since we inspect the exact results of vectorization
> +	 we don't need to and instead can just use the stmts themselves.  */
> +      if (masked_loop_p)
> +	for (unsigned i = 0; i < stmts.length (); i++)
> +	  {
> +	    tree stmt_mask
> +	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
> +				    i);
> +	    stmt_mask
> +	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
> +				  stmts[i], &cond_gsi);
> +	    workset.quick_push (stmt_mask);
> +	  }
> +      else
> +	workset.splice (stmts);
> +
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    new_temp = stmts[0];
> +
> +  gcc_assert (new_temp);
> +
> +  tree cond = new_temp;
> +  /* If we have multiple statements after reduction we should check all the
> +     lanes and treat it as a full vector.  */
> +  if (masked_loop_p)
> +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +			     &cond_gsi);

This is still wrong, you are applying mask[0] on the IOR reduced result.
As suggested do that in the else { new_temp = stmts[0] } clause instead
(or simply elide the optimization of a single vector)

> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  /* When vectorizing we assume that if the branch edge is taken that we're
> +     exiting the loop.  This is not however always the case as the compiler will
> +     rewrite conditions to always be a comparison against 0.  To do this it
> +     sometimes flips the edges.  This is fine for scalar,  but for vector we
> +     then have to flip the test, as we're still assuming that if you take the
> +     branch edge that we found the exit condition.  */
> +  auto new_code = NE_EXPR;
> +  tree cst = build_zero_cst (vectype);
> +  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> +			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
> +    {
> +      new_code = EQ_EXPR;
> +      cst = build_minus_one_cst (vectype);
> +    }
> +
> +  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      gcond *cond = NULL;
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
>  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else if ((cond = dyn_cast <gcond *> (stmt)))
> +	{
> +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> +	     single bit precision and we need the vector boolean to be a
> +	     representation of the integer mask.  So set the correct integer type and
> +	     convert to boolean vector once we have a vectype.  */
> +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));

You should get into the vect_use_mask_type_p (stmt_info) path for
early exit conditions (see above with regard to mask_precision).

> +	}
>        else
>  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
>  
> @@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>  			     "get vectype for scalar type: %T\n", scalar_type);
>  	}
>        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> +
>        if (!vectype)
>  	return opt_result::failure_at (stmt,
>  				       "not vectorized:"
>  				       " unsupported data-type %T\n",
>  				       scalar_type);
>  
> +      /* If we were a gcond, convert the resulting type to a vector boolean type now
> +	 that we have the correct integer mask type.  */
> +      if (cond)
> +	vectype = truth_type_for (vectype);
> +

which makes this moot.

Richard.

>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
>      }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-12 10:10                         ` Richard Biener
@ 2023-12-12 10:27                           ` Tamar Christina
  2023-12-12 10:59                           ` Richard Sandiford
  1 sibling, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-12-12 10:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw, Richard Sandiford

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Tuesday, December 12, 2023 10:10 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jlaw@ventanamicro.com;
> Richard Sandiford <Richard.Sandiford@arm.com>
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Mon, 11 Dec 2023, Tamar Christina wrote:
> 
> > > > +	  vectype = truth_type_for (comp_type);
> > >
> > > so this leaves the producer of the mask in the GIMPLE_COND and we
> > > vectorize the GIMPLE_COND as
> > >
> > >   mask_1 = ...;
> > >   if (mask_1 != {-1,-1...})
> > >     ..
> > >
> > > ?  In principle only the mask producer needs a vector type and that
> > > adjusted by bool handling, the branch itself doesn't need any
> > > STMT_VINFO_VECTYPE.
> > >
> > > As said I believe if you recognize a GIMPLE_COND pattern for conds
> > > that aren't bool != 0 producing the mask stmt this should be picked
> > > up by bool handling correctly already.
> > >
> > > Also as said piggy-backing on the COND_EXPR handling in this function
> > > which has the condition split out into a separate stmt(!) might not
> > > completely handle things correctly and you are likely missing
> > > the tcc_comparison handling of the embedded compare.
> > >
> >
> > Ok, I've stopped piggy-backing on the COND_EXPR handling and created
> > vect_recog_gcond_pattern.  As you said in the previous email I've also
> > stopped setting the vectype for the gcond and instead use the type of the
> > operand.
> >
> > Note that because the pattern doesn't apply if you were already an NE_EXPR
> > I do need the extra truth_type_for for that case.  Because in the case of e.g.
> >
> > a = b > 4;
> > If (a != 0)
> >
> > The producer of the mask is already outside of the cond but will not trigger
> > Boolean recognition.
> 
> It should trigger because we have a mask use of 'a', I always forget
> where we do that - it might be where we compute mask precision stuff
> or it might be bool pattern recognition itself ...
> 
> That said, a GIMPLE_COND (be it pattern or not) should be recognized
> as mask use.
> 
> >  That means that while the integral type is correct it
> > Won't be a Boolean one and vectorable_comparison expects a Boolean
> > vector.  Alternatively, we can remove that assert?  But that seems worse.
> >
> > Additionally in the previous email you mention "adjusted Boolean statement".
> >
> > I'm guessing you were referring to generating a COND_EXPR from the gcond.
> > So vect_recog_bool_pattern detects it?  The problem with that this gets folded
> > to x & 1 and doesn't trigger.  It also then blocks vectorization.  So instead I've
> > not forced it.
> 
> Not sure what you are refering to, but no - we shouln't generate a
> COND_EXPR from the gcond.  Pattern recog generates COND_EXPRs for
> _data_ uses of masks (if we need a 'bool' data type for storing).
> We then get mask != 0 ? true : false;
> 

Thought so.. but there happens to be a function called adjust_bool_stmts which
I thought you wanted me to call.  This is where the confusion came from, couldn't
tell whether "adjusted Boolean statement" meant just the new modified one or
one from adjust_bool_stmts.  But that last one didn't make much sense so hence
the question above..

> > > > +  /* Determine if we need to reduce the final value.  */
> > > > +  if (stmts.length () > 1)
> > > > +    {
> > > > +      /* We build the reductions in a way to maintain as much parallelism as
> > > > +	 possible.  */
> > > > +      auto_vec<tree> workset (stmts.length ());
> > > > +
> > > > +      /* Mask the statements as we queue them up.  */
> > > > +      if (masked_loop_p)
> > > > +	for (auto stmt : stmts)
> > > > +	  workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE (mask),
> > > > +						mask, stmt, &cond_gsi));
> > > > +      else
> > > > +	workset.splice (stmts);
> > > > +
> > > > +      while (workset.length () > 1)
> > > > +	{
> > > > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > > > +	  tree arg0 = workset.pop ();
> > > > +	  tree arg1 = workset.pop ();
> > > > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > > > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > > +				       &cond_gsi);
> > > > +	  workset.quick_insert (0, new_temp);
> > > > +	}
> > > > +    }
> > > > +  else
> > > > +    new_temp = stmts[0];
> > > > +
> > > > +  gcc_assert (new_temp);
> > > > +
> > > > +  tree cond = new_temp;
> > > > +  /* If we have multiple statements after reduction we should check all the
> > > > +     lanes and treat it as a full vector.  */
> > > > +  if (masked_loop_p)
> > > > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > > +			     &cond_gsi);
> > >
> > > You didn't fix any of the code above it seems, it's still wrong.
> > >
> >
> > Apologies, I hadn't realized that the last argument to get_loop_mask was the
> index.
> >
> > Should be fixed now. Is this closer to what you wanted?
> > The individual ops are now masked with separate masks. (See testcase when
> N=865).
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> > 	(vect_recog_gcond_pattern): New.
> > 	(vect_vect_recog_func_ptrs): Use it.
> > 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> > 	lhs.
> > 	(vectorizable_early_exit): New.
> > 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> > 	(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* gcc.dg/vect/vect-early-break_88.c: New test.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..b64becd588973f5860119
> 6bfcb15afbe4bab60f2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> > @@ -0,0 +1,36 @@
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 5
> > +#endif
> > +float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
> > +unsigned vect_b[N] = { 0 };
> > +
> > +__attribute__ ((noinline, noipa))
> > +unsigned test4(double x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +extern void abort ();
> > +
> > +int main ()
> > +{
> > +  if (test4 (7.0) != 0)
> > +    abort ();
> > +
> > +  if (vect_b[2] != 0 && vect_b[1] == 0)
> > +    abort ();
> > +}
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index
> 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df
> 577c08adffa44e71b 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> *pattern_stmt,
> >    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> >      {
> >        gcc_assert (!vectype
> > +		  || is_a <gcond *> (pattern_stmt)
> >  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
> >  		      == vect_use_mask_type_p (orig_stmt_info)));
> >        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
> >    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
> >  }
> >
> > +/* Function vect_recog_gcond_pattern
> > +
> > +   Try to find pattern like following:
> > +
> > +     if (a op b)
> > +
> > +   where operator 'op' is not != and convert it to an adjusted boolean pattern
> > +
> > +     mask = a op b
> > +     if (mask != 0)
> > +
> > +   and set the mask type on MASK.
> > +
> > +   Input:
> > +
> > +   * STMT_VINFO: The stmt at the end from which the pattern
> > +		 search begins, i.e. cast of a bool to
> > +		 an integer type.
> > +
> > +   Output:
> > +
> > +   * TYPE_OUT: The type of the output of this pattern.
> > +
> > +   * Return value: A new stmt that will be used to replace the pattern.  */
> > +
> > +static gimple *
> > +vect_recog_gcond_pattern (vec_info *vinfo,
> > +			 stmt_vec_info stmt_vinfo, tree *type_out)
> > +{
> > +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> > +  gcond* cond = NULL;
> > +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
> > +    return NULL;
> > +
> > +  auto lhs = gimple_cond_lhs (cond);
> > +  auto rhs = gimple_cond_rhs (cond);
> > +  auto code = gimple_cond_code (cond);
> > +
> > +  tree scalar_type = TREE_TYPE (lhs);
> > +  if (VECTOR_TYPE_P (scalar_type))
> > +    return NULL;
> > +
> > +  if (code == NE_EXPR && zerop (rhs))
> 
> I think you need && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) here,
> an integer != 0 would not be an appropriate mask.  I guess two
> relevant testcases would have an early exit like
> 
>    if (here[i] != 0)
>      break;
> 
> once with a 'bool here[]' and once with a 'int here[]'.
> 
> > +    return NULL;
> > +
> > +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> > +  if (vecitype == NULL_TREE)
> > +    return NULL;
> > +
> > +  /* Build a scalar type for the boolean result that when vectorized matches the
> > +     vector type of the result in size and number of elements.  */
> > +  unsigned prec
> > +    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
> > +			   TYPE_VECTOR_SUBPARTS (vecitype));
> > +
> > +  scalar_type
> > +    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
> > +
> > +  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> > +  if (vecitype == NULL_TREE)
> > +    return NULL;
> > +
> > +  tree vectype = truth_type_for (vecitype);
> 
> That looks awfully complicated.  I guess one complication is that
> we compute mask_precision & friends before this pattern gets
> recognized.  See vect_determine_mask_precision and its handling
> of tcc_comparison, see also integer_type_for_mask.  For comparisons
> properly handled during pattern recog the vector type is determined
> in vect_get_vector_types_for_stmt via
> 
>   else if (vect_use_mask_type_p (stmt_info))
>     {
>       unsigned int precision = stmt_info->mask_precision;
>       scalar_type = build_nonstandard_integer_type (precision, 1);
>       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> group_size);
>       if (!vectype)
>         return opt_result::failure_at (stmt, "not vectorized: unsupported"
>                                        " data-type %T\n", scalar_type);
> 
> Richard, do you have any advice here?  I suppose vect_determine_precisions
> needs to handle the gcond case with bool != 0 somehow and for the
> extra mask producer we add here we have to emulate what it would have
> done, right?
> 

There seems to be an awful lots of places that determine types and precision 😊
It's quite hard to figure out which part is used where... and Boolean handling
seems to be especially complicated.

> > +  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
> > +  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
> > +  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
> > +
> > +  gimple *pattern_stmt
> > +    = gimple_build_cond (NE_EXPR, new_lhs,
> > +			 build_int_cst (TREE_TYPE (new_lhs), 0),
> > +			 NULL_TREE, NULL_TREE);
> > +  *type_out = vectype;
> > +  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
> > +  return pattern_stmt;
> > +}
> > +
> >  /* Function vect_recog_bool_pattern
> >
> >     Try to find pattern like following:
> > @@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] =
> {
> >    { vect_recog_divmod_pattern, "divmod" },
> >    { vect_recog_mult_pattern, "mult" },
> >    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> > +  { vect_recog_gcond_pattern, "gcond" },
> >    { vect_recog_bool_pattern, "bool" },
> >    /* This must come before mask conversion, and includes the parts
> >       of mask conversion that are needed for gather and scatter
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea
> 2e00b4450023f9 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >    vec<tree> vec_oprnds0 = vNULL;
> >    vec<tree> vec_oprnds1 = vNULL;
> >    tree mask_type;
> > -  tree mask;
> > +  tree mask = NULL_TREE;
> >
> >    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> >      return false;
> > @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >    /* Transform.  */
> >
> >    /* Handle def.  */
> > -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> > -  mask = vect_create_destination_var (lhs, mask_type);
> > +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> > +  if (lhs)
> > +    mask = vect_create_destination_var (lhs, mask_type);
> >
> >    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
> >  		     rhs1, &vec_oprnds0, vectype,
> > @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >        gimple *new_stmt;
> >        vec_rhs2 = vec_oprnds1[i];
> >
> > -      new_temp = make_ssa_name (mask);
> > +      if (lhs)
> > +	new_temp = make_ssa_name (mask);
> > +      else
> > +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
> >        if (bitop1 == NOP_EXPR)
> >  	{
> >  	  new_stmt = gimple_build_assign (new_temp, code,
> > @@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo,
> >    return true;
> >  }
> >
> > +/* Check to see if the current early break given in STMT_INFO is valid for
> > +   vectorization.  */
> > +
> > +static bool
> > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> > +{
> > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > +  if (!loop_vinfo
> > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > +    return false;
> > +
> > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> > +    return false;
> > +
> > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > +    return false;
> > +
> > +  DUMP_VECT_SCOPE ("vectorizable_early_exit");
> > +
> > +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> > +
> > +  tree vectype_op0 = NULL_TREE;
> > +  slp_tree slp_op0;
> > +  tree op0;
> > +  enum vect_def_type dt0;
> > +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> > +			   &vectype_op0))
> > +    {
> > +      if (dump_enabled_p ())
> > +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			   "use not simple.\n");
> > +	return false;
> > +    }
> > +
> > +  stmt_vec_info op0_info = vinfo->lookup_def (op0);
> > +  tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info));
> > +  gcc_assert (vectype);
> > +
> > +  machine_mode mode = TYPE_MODE (vectype);
> > +  int ncopies;
> > +
> > +  if (slp_node)
> > +    ncopies = 1;
> > +  else
> > +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> > +
> > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > +
> > +  /* Analyze only.  */
> > +  if (!vec_stmt)
> > +    {
> > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target doesn't support flag setting vector "
> > +			       "comparisons.\n");
> > +	  return false;
> > +	}
> > +
> > +      if (ncopies > 1
> > +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > +	{
> > +	  if (dump_enabled_p ())
> > +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +			       "can't vectorize early exit because the "
> > +			       "target does not support boolean vector OR for "
> > +			       "type %T.\n", vectype);
> > +	  return false;
> > +	}
> > +
> > +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > +				      vec_stmt, slp_node, cost_vec))
> > +	return false;
> > +
> > +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> > +	{
> > +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> > +					      OPTIMIZE_FOR_SPEED))
> > +	    return false;
> > +	  else
> > +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> > +	}
> > +
> > +
> > +      return true;
> > +    }
> > +
> > +  /* Tranform.  */
> > +
> > +  tree new_temp = NULL_TREE;
> > +  gimple *new_stmt = NULL;
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> > +
> > +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > +				  vec_stmt, slp_node, cost_vec))
> > +    gcc_unreachable ();
> > +
> > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > +  basic_block cond_bb = gimple_bb (stmt);
> > +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> > +
> > +  auto_vec<tree> stmts;
> > +
> > +  tree mask = NULL_TREE;
> > +  if (masked_loop_p)
> > +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> > +
> > +  if (slp_node)
> > +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> > +  else
> > +    {
> > +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> > +      stmts.reserve_exact (vec_stmts.length ());
> > +      for (auto stmt : vec_stmts)
> > +	stmts.quick_push (gimple_assign_lhs (stmt));
> > +    }
> > +
> > +  /* Determine if we need to reduce the final value.  */
> > +  if (stmts.length () > 1)
> > +    {
> > +      /* We build the reductions in a way to maintain as much parallelism as
> > +	 possible.  */
> > +      auto_vec<tree> workset (stmts.length ());
> > +
> > +      /* Mask the statements as we queue them up.  Normally we loop over
> > +	 vec_num,  but since we inspect the exact results of vectorization
> > +	 we don't need to and instead can just use the stmts themselves.  */
> > +      if (masked_loop_p)
> > +	for (unsigned i = 0; i < stmts.length (); i++)
> > +	  {
> > +	    tree stmt_mask
> > +	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
> > +				    i);
> > +	    stmt_mask
> > +	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
> > +				  stmts[i], &cond_gsi);
> > +	    workset.quick_push (stmt_mask);
> > +	  }
> > +      else
> > +	workset.splice (stmts);
> > +
> > +      while (workset.length () > 1)
> > +	{
> > +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > +	  tree arg0 = workset.pop ();
> > +	  tree arg1 = workset.pop ();
> > +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > +				       &cond_gsi);
> > +	  workset.quick_insert (0, new_temp);
> > +	}
> > +    }
> > +  else
> > +    new_temp = stmts[0];
> > +
> > +  gcc_assert (new_temp);
> > +
> > +  tree cond = new_temp;
> > +  /* If we have multiple statements after reduction we should check all the
> > +     lanes and treat it as a full vector.  */
> > +  if (masked_loop_p)
> > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > +			     &cond_gsi);
> 
> This is still wrong, you are applying mask[0] on the IOR reduced result.
> As suggested do that in the else { new_temp = stmts[0] } clause instead
> (or simply elide the optimization of a single vector)

PEBKAC.. I had looked at it, and thought, it doesn't seem right since why would
mask[0] be used for both the elements and the final, but left it ☹

I'll wait for Richard's thoughts on the precision before re-spining. 

Thanks,
Tamar
> 
> > +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> > +     codegen so we must replace the original insn.  */
> > +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> > +  gcond *cond_stmt = as_a <gcond *>(stmt);
> > +  /* When vectorizing we assume that if the branch edge is taken that we're
> > +     exiting the loop.  This is not however always the case as the compiler will
> > +     rewrite conditions to always be a comparison against 0.  To do this it
> > +     sometimes flips the edges.  This is fine for scalar,  but for vector we
> > +     then have to flip the test, as we're still assuming that if you take the
> > +     branch edge that we found the exit condition.  */
> > +  auto new_code = NE_EXPR;
> > +  tree cst = build_zero_cst (vectype);
> > +  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> > +			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
> > +    {
> > +      new_code = EQ_EXPR;
> > +      cst = build_minus_one_cst (vectype);
> > +    }
> > +
> > +  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
> > +  update_stmt (stmt);
> > +
> > +  if (slp_node)
> > +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> > +   else
> > +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> > +
> > +
> > +  if (!slp_node)
> > +    *vec_stmt = stmt;
> > +
> > +  return true;
> > +}
> > +
> >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> >     can handle all live statements in the node.  Otherwise return true
> >     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> > @@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo,
> >  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> >  				  stmt_info, NULL, node)
> >  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > -				   stmt_info, NULL, node, cost_vec));
> > +				   stmt_info, NULL, node, cost_vec)
> > +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +				      cost_vec));
> >    else
> >      {
> >        if (bb_vinfo)
> > @@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo,
> >  					 NULL, NULL, node, cost_vec)
> >  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> >  					  cost_vec)
> > -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +					  cost_vec));
> > +
> >      }
> >
> >    if (node)
> > @@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo,
> >        gcc_assert (done);
> >        break;
> >
> > +    case loop_exit_ctrl_vec_info_type:
> > +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> > +				      slp_node, NULL);
> > +      gcc_assert (done);
> > +      break;
> > +
> >      default:
> >        if (!STMT_VINFO_LIVE_P (stmt_info))
> >  	{
> > @@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >      }
> >    else
> >      {
> > +      gcond *cond = NULL;
> >        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> >  	scalar_type = TREE_TYPE (DR_REF (dr));
> >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> >  	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > +      else if ((cond = dyn_cast <gcond *> (stmt)))
> > +	{
> > +	  /* We can't convert the scalar type to boolean yet, since booleans have a
> > +	     single bit precision and we need the vector boolean to be a
> > +	     representation of the integer mask.  So set the correct integer type and
> > +	     convert to boolean vector once we have a vectype.  */
> > +	  scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> 
> You should get into the vect_use_mask_type_p (stmt_info) path for
> early exit conditions (see above with regard to mask_precision).
> 
> > +	}
> >        else
> >  	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> >
> > @@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >  			     "get vectype for scalar type: %T\n", scalar_type);
> >  	}
> >        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
> > +
> >        if (!vectype)
> >  	return opt_result::failure_at (stmt,
> >  				       "not vectorized:"
> >  				       " unsupported data-type %T\n",
> >  				       scalar_type);
> >
> > +      /* If we were a gcond, convert the resulting type to a vector boolean type
> now
> > +	 that we have the correct integer mask type.  */
> > +      if (cond)
> > +	vectype = truth_type_for (vectype);
> > +
> 
> which makes this moot.
> 
> Richard.
> 
> >        if (dump_enabled_p ())
> >  	dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
> >      }
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-12 10:10                         ` Richard Biener
  2023-12-12 10:27                           ` Tamar Christina
@ 2023-12-12 10:59                           ` Richard Sandiford
  2023-12-12 11:30                             ` Richard Biener
  1 sibling, 1 reply; 200+ messages in thread
From: Richard Sandiford @ 2023-12-12 10:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: Tamar Christina, gcc-patches, nd, jlaw

Richard Biener <rguenther@suse.de> writes:
> On Mon, 11 Dec 2023, Tamar Christina wrote:
>> @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
>>    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
>>  }
>>  
>> +/* Function vect_recog_gcond_pattern
>> +
>> +   Try to find pattern like following:
>> +
>> +     if (a op b)
>> +
>> +   where operator 'op' is not != and convert it to an adjusted boolean pattern
>> +
>> +     mask = a op b
>> +     if (mask != 0)
>> +
>> +   and set the mask type on MASK.
>> +
>> +   Input:
>> +
>> +   * STMT_VINFO: The stmt at the end from which the pattern
>> +		 search begins, i.e. cast of a bool to
>> +		 an integer type.
>> +
>> +   Output:
>> +
>> +   * TYPE_OUT: The type of the output of this pattern.
>> +
>> +   * Return value: A new stmt that will be used to replace the pattern.  */
>> +
>> +static gimple *
>> +vect_recog_gcond_pattern (vec_info *vinfo,
>> +			 stmt_vec_info stmt_vinfo, tree *type_out)
>> +{
>> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
>> +  gcond* cond = NULL;
>> +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
>> +    return NULL;
>> +
>> +  auto lhs = gimple_cond_lhs (cond);
>> +  auto rhs = gimple_cond_rhs (cond);
>> +  auto code = gimple_cond_code (cond);
>> +
>> +  tree scalar_type = TREE_TYPE (lhs);
>> +  if (VECTOR_TYPE_P (scalar_type))
>> +    return NULL;
>> +
>> +  if (code == NE_EXPR && zerop (rhs))
>
> I think you need && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) here,
> an integer != 0 would not be an appropriate mask.  I guess two
> relevant testcases would have an early exit like
>
>    if (here[i] != 0)
>      break;
>
> once with a 'bool here[]' and once with a 'int here[]'.
>
>> +    return NULL;
>> +
>> +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
>> +  if (vecitype == NULL_TREE)
>> +    return NULL;
>> +
>> +  /* Build a scalar type for the boolean result that when vectorized matches the
>> +     vector type of the result in size and number of elements.  */
>> +  unsigned prec
>> +    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
>> +			   TYPE_VECTOR_SUBPARTS (vecitype));
>> +
>> +  scalar_type
>> +    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
>> +
>> +  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
>> +  if (vecitype == NULL_TREE)
>> +    return NULL;
>> +
>> +  tree vectype = truth_type_for (vecitype);
>
> That looks awfully complicated.  I guess one complication is that
> we compute mask_precision & friends before this pattern gets
> recognized.  See vect_determine_mask_precision and its handling
> of tcc_comparison, see also integer_type_for_mask.  For comparisons
> properly handled during pattern recog the vector type is determined
> in vect_get_vector_types_for_stmt via
>
>   else if (vect_use_mask_type_p (stmt_info))
>     {
>       unsigned int precision = stmt_info->mask_precision;
>       scalar_type = build_nonstandard_integer_type (precision, 1);
>       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type, 
> group_size);
>       if (!vectype)
>         return opt_result::failure_at (stmt, "not vectorized: unsupported"
>                                        " data-type %T\n", scalar_type);
>
> Richard, do you have any advice here?  I suppose vect_determine_precisions
> needs to handle the gcond case with bool != 0 somehow and for the
> extra mask producer we add here we have to emulate what it would have 
> done, right?

How about handling gconds directly in vect_determine_mask_precision?
In a sense it's not needed, since gconds are always roots, and so we
could calculate their precision on the fly instead.  But handling it in
vect_determine_mask_precision feels like it should reduce the number
of special cases.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-12 10:59                           ` Richard Sandiford
@ 2023-12-12 11:30                             ` Richard Biener
  2023-12-13 14:13                               ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-12-12 11:30 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Tamar Christina, gcc-patches, nd, jlaw

On Tue, 12 Dec 2023, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Mon, 11 Dec 2023, Tamar Christina wrote:
> >> @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
> >>    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
> >>  }
> >>  
> >> +/* Function vect_recog_gcond_pattern
> >> +
> >> +   Try to find pattern like following:
> >> +
> >> +     if (a op b)
> >> +
> >> +   where operator 'op' is not != and convert it to an adjusted boolean pattern
> >> +
> >> +     mask = a op b
> >> +     if (mask != 0)
> >> +
> >> +   and set the mask type on MASK.
> >> +
> >> +   Input:
> >> +
> >> +   * STMT_VINFO: The stmt at the end from which the pattern
> >> +		 search begins, i.e. cast of a bool to
> >> +		 an integer type.
> >> +
> >> +   Output:
> >> +
> >> +   * TYPE_OUT: The type of the output of this pattern.
> >> +
> >> +   * Return value: A new stmt that will be used to replace the pattern.  */
> >> +
> >> +static gimple *
> >> +vect_recog_gcond_pattern (vec_info *vinfo,
> >> +			 stmt_vec_info stmt_vinfo, tree *type_out)
> >> +{
> >> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> >> +  gcond* cond = NULL;
> >> +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
> >> +    return NULL;
> >> +
> >> +  auto lhs = gimple_cond_lhs (cond);
> >> +  auto rhs = gimple_cond_rhs (cond);
> >> +  auto code = gimple_cond_code (cond);
> >> +
> >> +  tree scalar_type = TREE_TYPE (lhs);
> >> +  if (VECTOR_TYPE_P (scalar_type))
> >> +    return NULL;
> >> +
> >> +  if (code == NE_EXPR && zerop (rhs))
> >
> > I think you need && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) here,
> > an integer != 0 would not be an appropriate mask.  I guess two
> > relevant testcases would have an early exit like
> >
> >    if (here[i] != 0)
> >      break;
> >
> > once with a 'bool here[]' and once with a 'int here[]'.
> >
> >> +    return NULL;
> >> +
> >> +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> >> +  if (vecitype == NULL_TREE)
> >> +    return NULL;
> >> +
> >> +  /* Build a scalar type for the boolean result that when vectorized matches the
> >> +     vector type of the result in size and number of elements.  */
> >> +  unsigned prec
> >> +    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
> >> +			   TYPE_VECTOR_SUBPARTS (vecitype));
> >> +
> >> +  scalar_type
> >> +    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
> >> +
> >> +  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> >> +  if (vecitype == NULL_TREE)
> >> +    return NULL;
> >> +
> >> +  tree vectype = truth_type_for (vecitype);
> >
> > That looks awfully complicated.  I guess one complication is that
> > we compute mask_precision & friends before this pattern gets
> > recognized.  See vect_determine_mask_precision and its handling
> > of tcc_comparison, see also integer_type_for_mask.  For comparisons
> > properly handled during pattern recog the vector type is determined
> > in vect_get_vector_types_for_stmt via
> >
> >   else if (vect_use_mask_type_p (stmt_info))
> >     {
> >       unsigned int precision = stmt_info->mask_precision;
> >       scalar_type = build_nonstandard_integer_type (precision, 1);
> >       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type, 
> > group_size);
> >       if (!vectype)
> >         return opt_result::failure_at (stmt, "not vectorized: unsupported"
> >                                        " data-type %T\n", scalar_type);
> >
> > Richard, do you have any advice here?  I suppose vect_determine_precisions
> > needs to handle the gcond case with bool != 0 somehow and for the
> > extra mask producer we add here we have to emulate what it would have 
> > done, right?
> 
> How about handling gconds directly in vect_determine_mask_precision?
> In a sense it's not needed, since gconds are always roots, and so we
> could calculate their precision on the fly instead.  But handling it in
> vect_determine_mask_precision feels like it should reduce the number
> of special cases.

Yeah, that sounds worth trying.

Richard.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-12 11:30                             ` Richard Biener
@ 2023-12-13 14:13                               ` Tamar Christina
  2023-12-14 13:12                                 ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-13 14:13 UTC (permalink / raw)
  To: Richard Biener, Richard Sandiford; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 25396 bytes --]

> > >   else if (vect_use_mask_type_p (stmt_info))
> > >     {
> > >       unsigned int precision = stmt_info->mask_precision;
> > >       scalar_type = build_nonstandard_integer_type (precision, 1);
> > >       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> > > group_size);
> > >       if (!vectype)
> > >         return opt_result::failure_at (stmt, "not vectorized: unsupported"
> > >                                        " data-type %T\n", scalar_type);
> > >
> > > Richard, do you have any advice here?  I suppose vect_determine_precisions
> > > needs to handle the gcond case with bool != 0 somehow and for the
> > > extra mask producer we add here we have to emulate what it would have
> > > done, right?
> >
> > How about handling gconds directly in vect_determine_mask_precision?
> > In a sense it's not needed, since gconds are always roots, and so we
> > could calculate their precision on the fly instead.  But handling it in
> > vect_determine_mask_precision feels like it should reduce the number
> > of special cases.
> 
> Yeah, that sounds worth trying.
> 
> Richard.

So here's a respin with this suggestion and the other issues fixed.
Note that the testcases still need to be updated with the right stanzas.

The patch is much smaller, I still have a small change to
vect_get_vector_types_for_stmt  in case we get there on a gcond where
vect_recog_gcond_pattern couldn't apply due to the target missing an
appropriate vectype.  The change only gracefully rejects the gcond.

Since patterns cannot apply to the same root twice I've had to also do
the split of the condition out of the gcond in bitfield lowering.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar
gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gcond
	(vect_recog_bitfield_ref_pattern): Update to split out bool.
	(vect_recog_gcond_pattern): New.
	(possible_vector_mask_operation_p): Support gcond.
	(vect_determine_mask_precision): Likewise.
	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
	lhs.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt, vect_transform_stmt): Use it.
	(vect_get_vector_types_for_stmt): Rejects gcond if not lowered by
	vect_recog_gcond_pattern.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-early-break_84.c: New test.
	* gcc.dg/vect/vect-early-break_85.c: New test.
	* gcc.dg/vect/vect-early-break_86.c: New test.
	* gcc.dg/vect/vect-early-break_87.c: New test.
	* gcc.dg/vect/vect-early-break_88.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
new file mode 100644
index 0000000000000000000000000000000000000000..0622339491d333b07c2ce895785b5216713097a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 17
+#endif
+bool vect_a[N] = { false, false, true, false, false, false,
+                   false, false, false, false, false, false,
+                   false, false, false, false, false };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(bool x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] == x)
+     return 1;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (true) != 1)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
new file mode 100644
index 0000000000000000000000000000000000000000..39b3d9bad8681a2d15d7fc7de86bdd3ce0f0bd4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
@@ -0,0 +1,35 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+int vect_a[N] = { 5, 4, 8, 4, 6 };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(int x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     return 1;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7) != 1)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
new file mode 100644
index 0000000000000000000000000000000000000000..66eb570f4028bca4b631329d7af50c646d3c0cb3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
@@ -0,0 +1,21 @@
+/* { dg-additional-options "-std=gnu89" } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+extern void exit (int);
+
+__attribute__((noinline, noipa))
+int f(x) {
+  int i;
+  for (i = 0; i < 8 && (x & 1) == 1; i++)
+    x >>= 1;
+  return i;
+}
+main() {
+  if (f(4) != 0)
+    abort();
+  exit(0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
new file mode 100644
index 0000000000000000000000000000000000000000..67be67da0583ba7feda3bed09c42fa735da9b98e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
@@ -0,0 +1,21 @@
+/* { dg-additional-options "-std=gnu89" } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+extern void exit (int);
+
+__attribute__((noinline, noipa))
+int f(x) {
+  int i;
+  for (i = 0; i < 8 && (x & 1) == 0; i++)
+    x >>= 1;
+  return i;
+}
+main() {
+  if (f(4) != 2)
+    abort();
+  exit(0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
new file mode 100644
index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
@@ -0,0 +1,36 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(double x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7.0) != 0)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..f6ce27a7c45aa6ce72c402987958ee395c045a14 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -2786,15 +2787,24 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 
   if (!lhs)
     {
+      if (!vectype)
+	return NULL;
+
       append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      vectype = truth_type_for (vectype);
+
+      tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
       gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
       tree cond_cst = gimple_cond_rhs (cond_stmt);
+      gimple *new_stmt
+	= gimple_build_assign (new_lhs, gimple_cond_code (cond_stmt),
+			       gimple_get_lhs (pattern_stmt),
+			       fold_convert (container_type, cond_cst));
+      append_pattern_def_seq (vinfo, stmt_info, new_stmt, vectype, container_type);
       pattern_stmt
-	= gimple_build_cond (gimple_cond_code (cond_stmt),
-			     gimple_get_lhs (pattern_stmt),
-			     fold_convert (ret_type, cond_cst),
-			     gimple_cond_true_label (cond_stmt),
-			     gimple_cond_false_label (cond_stmt));
+	= gimple_build_cond (NE_EXPR, new_lhs,
+			     build_zero_cst (TREE_TYPE (new_lhs)),
+			     NULL_TREE, NULL_TREE);
     }
 
   *type_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -5553,6 +5563,72 @@ integer_type_for_mask (tree var, vec_info *vinfo)
   return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
 }
 
+/* Function vect_recog_gcond_pattern
+
+   Try to find pattern like following:
+
+     if (a op b)
+
+   where operator 'op' is not != and convert it to an adjusted boolean pattern
+
+     mask = a op b
+     if (mask != 0)
+
+   and set the mask type on MASK.
+
+   Input:
+
+   * STMT_VINFO: The stmt at the end from which the pattern
+		 search begins, i.e. cast of a bool to
+		 an integer type.
+
+   Output:
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the pattern.  */
+
+static gimple *
+vect_recog_gcond_pattern (vec_info *vinfo,
+			 stmt_vec_info stmt_vinfo, tree *type_out)
+{
+  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
+  gcond* cond = NULL;
+  if (!(cond = dyn_cast <gcond *> (last_stmt)))
+    return NULL;
+
+  auto lhs = gimple_cond_lhs (cond);
+  auto rhs = gimple_cond_rhs (cond);
+  auto code = gimple_cond_code (cond);
+
+  tree scalar_type = TREE_TYPE (lhs);
+  if (VECTOR_TYPE_P (scalar_type))
+    return NULL;
+
+  if (code == NE_EXPR
+      && zerop (rhs)
+      && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
+    return NULL;
+
+  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  tree vectype = truth_type_for (vecitype);
+
+  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
+  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
+  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
+
+  gimple *pattern_stmt
+    = gimple_build_cond (NE_EXPR, new_lhs,
+			 build_int_cst (TREE_TYPE (new_lhs), 0),
+			 NULL_TREE, NULL_TREE);
+  *type_out = vectype;
+  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
+  return pattern_stmt;
+}
+
 /* Function vect_recog_bool_pattern
 
    Try to find pattern like following:
@@ -6581,15 +6657,26 @@ static bool
 possible_vector_mask_operation_p (stmt_vec_info stmt_info)
 {
   tree lhs = gimple_get_lhs (stmt_info->stmt);
+  tree_code code = ERROR_MARK;
+  gassign *assign = NULL;
+  gcond *cond = NULL;
+
+  if ((assign = dyn_cast <gassign *> (stmt_info->stmt)))
+    code = gimple_assign_rhs_code (assign);
+  else if ((cond = dyn_cast <gcond *> (stmt_info->stmt)))
+    {
+      lhs = gimple_cond_lhs (cond);
+      code = gimple_cond_code (cond);
+    }
+
   if (!lhs
       || TREE_CODE (lhs) != SSA_NAME
       || !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (lhs)))
     return false;
 
-  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
+  if (code != ERROR_MARK)
     {
-      tree_code rhs_code = gimple_assign_rhs_code (assign);
-      switch (rhs_code)
+      switch (code)
 	{
 	CASE_CONVERT:
 	case SSA_NAME:
@@ -6600,7 +6687,7 @@ possible_vector_mask_operation_p (stmt_vec_info stmt_info)
 	  return true;
 
 	default:
-	  return TREE_CODE_CLASS (rhs_code) == tcc_comparison;
+	  return TREE_CODE_CLASS (code) == tcc_comparison;
 	}
     }
   else if (is_a <gphi *> (stmt_info->stmt))
@@ -6647,12 +6734,35 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
      The number of operations are equal, but M16 would have given
      a shorter dependency chain and allowed more ILP.  */
   unsigned int precision = ~0U;
-  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+
+  /* If the statement compares two values that shouldn't use vector masks,
+     try comparing the values as normal scalars instead.  */
+  tree_code code = ERROR_MARK;
+  tree op0_type;
+  unsigned int nops = -1;
+  unsigned int ops_start = 0;
+
+  if (gassign *assign = dyn_cast <gassign *> (stmt))
+    {
+      code = gimple_assign_rhs_code (assign);
+      op0_type = TREE_TYPE (gimple_assign_rhs1 (assign));
+      nops = gimple_num_ops (assign);
+      ops_start = 1;
+    }
+  else if (gcond *cond = dyn_cast <gcond *> (stmt))
+    {
+      code = gimple_cond_code (cond);
+      op0_type = TREE_TYPE (gimple_cond_lhs (cond));
+      nops = 2;
+      ops_start = 0;
+    }
+
+  if (code != ERROR_MARK)
     {
-      unsigned int nops = gimple_num_ops (assign);
-      for (unsigned int i = 1; i < nops; ++i)
+      for (unsigned int i = ops_start; i < nops; ++i)
 	{
-	  tree rhs = gimple_op (assign, i);
+	  tree rhs = gimple_op (stmt, i);
 	  if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs)))
 	    continue;
 
@@ -6669,19 +6779,15 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
 	    }
 	}
 
-      /* If the statement compares two values that shouldn't use vector masks,
-	 try comparing the values as normal scalars instead.  */
-      tree_code rhs_code = gimple_assign_rhs_code (assign);
       if (precision == ~0U
-	  && TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+	  && TREE_CODE_CLASS (code) == tcc_comparison)
 	{
-	  tree rhs1_type = TREE_TYPE (gimple_assign_rhs1 (assign));
 	  scalar_mode mode;
 	  tree vectype, mask_type;
-	  if (is_a <scalar_mode> (TYPE_MODE (rhs1_type), &mode)
-	      && (vectype = get_vectype_for_scalar_type (vinfo, rhs1_type))
-	      && (mask_type = get_mask_type_for_scalar_type (vinfo, rhs1_type))
-	      && expand_vec_cmp_expr_p (vectype, mask_type, rhs_code))
+	  if (is_a <scalar_mode> (TYPE_MODE (op0_type), &mode)
+	      && (vectype = get_vectype_for_scalar_type (vinfo, op0_type))
+	      && (mask_type = get_mask_type_for_scalar_type (vinfo, op0_type))
+	      && expand_vec_cmp_expr_p (vectype, mask_type, code))
 	    precision = GET_MODE_BITSIZE (mode);
 	}
     }
@@ -6860,6 +6966,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
+  { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
   /* This must come before mask conversion, and includes the parts
      of mask conversion that are needed for gather and scatter
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..766450cd85b55ce4dfd45878c5dc44cd09c68681 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,207 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  DUMP_VECT_SCOPE ("vectorizable_early_exit");
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+
+  tree vectype = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  if (!vectype)
+    return false;
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  Normally we loop over
+	 vec_num,  but since we inspect the exact results of vectorization
+	 we don't need to and instead can just use the stmts themselves.  */
+      if (masked_loop_p)
+	for (unsigned i = 0; i < stmts.length (); i++)
+	  {
+	    tree stmt_mask
+	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
+				    i);
+	    stmt_mask
+	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
+				  stmts[i], &cond_gsi);
+	    workset.quick_push (stmt_mask);
+	  }
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    {
+      new_temp = stmts[0];
+      if (masked_loop_p)
+	{
+	  tree mask
+	    = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+	  new_temp = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
+				       new_temp, &cond_gsi);
+	}
+    }
+
+  gcc_assert (new_temp);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, new_temp, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13154,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13179,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13341,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,6 +14537,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      /* If we got here with a gcond it means that the target had no available vector
+	 mode for the scalar type.  We can't vectorize so abort.  */
+      if (is_a <gcond *> (stmt))
+	return opt_result::failure_at (stmt,
+				       "not vectorized:"
+				       " unsupported data-type for gcond %T\n",
+				       scalar_type);
+
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))

[-- Attachment #2: rb17969 (2).patch --]
[-- Type: application/octet-stream, Size: 22031 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
new file mode 100644
index 0000000000000000000000000000000000000000..0622339491d333b07c2ce895785b5216713097a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#include <stdbool.h>
+
+#ifndef N
+#define N 17
+#endif
+bool vect_a[N] = { false, false, true, false, false, false,
+                   false, false, false, false, false, false,
+                   false, false, false, false, false };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(bool x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] == x)
+     return 1;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (true) != 1)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
new file mode 100644
index 0000000000000000000000000000000000000000..39b3d9bad8681a2d15d7fc7de86bdd3ce0f0bd4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
@@ -0,0 +1,35 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+int vect_a[N] = { 5, 4, 8, 4, 6 };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(int x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     return 1;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7) != 1)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
new file mode 100644
index 0000000000000000000000000000000000000000..66eb570f4028bca4b631329d7af50c646d3c0cb3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
@@ -0,0 +1,21 @@
+/* { dg-additional-options "-std=gnu89" } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+extern void exit (int);
+
+__attribute__((noinline, noipa))
+int f(x) {
+  int i;
+  for (i = 0; i < 8 && (x & 1) == 1; i++)
+    x >>= 1;
+  return i;
+}
+main() {
+  if (f(4) != 0)
+    abort();
+  exit(0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
new file mode 100644
index 0000000000000000000000000000000000000000..67be67da0583ba7feda3bed09c42fa735da9b98e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
@@ -0,0 +1,21 @@
+/* { dg-additional-options "-std=gnu89" } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+extern void exit (int);
+
+__attribute__((noinline, noipa))
+int f(x) {
+  int i;
+  for (i = 0; i < 8 && (x & 1) == 0; i++)
+    x >>= 1;
+  return i;
+}
+main() {
+  if (f(4) != 2)
+    abort();
+  exit(0);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
new file mode 100644
index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
@@ -0,0 +1,36 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+#ifndef N
+#define N 5
+#endif
+float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
+unsigned vect_b[N] = { 0 };
+
+__attribute__ ((noinline, noipa))
+unsigned test4(double x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
+
+extern void abort ();
+
+int main ()
+{
+  if (test4 (7.0) != 0)
+    abort ();
+
+  if (vect_b[2] != 0 && vect_b[1] == 0)
+    abort ();
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..f6ce27a7c45aa6ce72c402987958ee395c045a14 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
     {
       gcc_assert (!vectype
+		  || is_a <gcond *> (pattern_stmt)
 		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
 		      == vect_use_mask_type_p (orig_stmt_info)));
       STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -2786,15 +2787,24 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 
   if (!lhs)
     {
+      if (!vectype)
+	return NULL;
+
       append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+      vectype = truth_type_for (vectype);
+
+      tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
       gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
       tree cond_cst = gimple_cond_rhs (cond_stmt);
+      gimple *new_stmt
+	= gimple_build_assign (new_lhs, gimple_cond_code (cond_stmt),
+			       gimple_get_lhs (pattern_stmt),
+			       fold_convert (container_type, cond_cst));
+      append_pattern_def_seq (vinfo, stmt_info, new_stmt, vectype, container_type);
       pattern_stmt
-	= gimple_build_cond (gimple_cond_code (cond_stmt),
-			     gimple_get_lhs (pattern_stmt),
-			     fold_convert (ret_type, cond_cst),
-			     gimple_cond_true_label (cond_stmt),
-			     gimple_cond_false_label (cond_stmt));
+	= gimple_build_cond (NE_EXPR, new_lhs,
+			     build_zero_cst (TREE_TYPE (new_lhs)),
+			     NULL_TREE, NULL_TREE);
     }
 
   *type_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -5553,6 +5563,72 @@ integer_type_for_mask (tree var, vec_info *vinfo)
   return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
 }
 
+/* Function vect_recog_gcond_pattern
+
+   Try to find pattern like following:
+
+     if (a op b)
+
+   where operator 'op' is not != and convert it to an adjusted boolean pattern
+
+     mask = a op b
+     if (mask != 0)
+
+   and set the mask type on MASK.
+
+   Input:
+
+   * STMT_VINFO: The stmt at the end from which the pattern
+		 search begins, i.e. cast of a bool to
+		 an integer type.
+
+   Output:
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the pattern.  */
+
+static gimple *
+vect_recog_gcond_pattern (vec_info *vinfo,
+			 stmt_vec_info stmt_vinfo, tree *type_out)
+{
+  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
+  gcond* cond = NULL;
+  if (!(cond = dyn_cast <gcond *> (last_stmt)))
+    return NULL;
+
+  auto lhs = gimple_cond_lhs (cond);
+  auto rhs = gimple_cond_rhs (cond);
+  auto code = gimple_cond_code (cond);
+
+  tree scalar_type = TREE_TYPE (lhs);
+  if (VECTOR_TYPE_P (scalar_type))
+    return NULL;
+
+  if (code == NE_EXPR
+      && zerop (rhs)
+      && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
+    return NULL;
+
+  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
+  if (vecitype == NULL_TREE)
+    return NULL;
+
+  tree vectype = truth_type_for (vecitype);
+
+  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
+  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
+  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
+
+  gimple *pattern_stmt
+    = gimple_build_cond (NE_EXPR, new_lhs,
+			 build_int_cst (TREE_TYPE (new_lhs), 0),
+			 NULL_TREE, NULL_TREE);
+  *type_out = vectype;
+  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
+  return pattern_stmt;
+}
+
 /* Function vect_recog_bool_pattern
 
    Try to find pattern like following:
@@ -6581,15 +6657,26 @@ static bool
 possible_vector_mask_operation_p (stmt_vec_info stmt_info)
 {
   tree lhs = gimple_get_lhs (stmt_info->stmt);
+  tree_code code = ERROR_MARK;
+  gassign *assign = NULL;
+  gcond *cond = NULL;
+
+  if ((assign = dyn_cast <gassign *> (stmt_info->stmt)))
+    code = gimple_assign_rhs_code (assign);
+  else if ((cond = dyn_cast <gcond *> (stmt_info->stmt)))
+    {
+      lhs = gimple_cond_lhs (cond);
+      code = gimple_cond_code (cond);
+    }
+
   if (!lhs
       || TREE_CODE (lhs) != SSA_NAME
       || !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (lhs)))
     return false;
 
-  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
+  if (code != ERROR_MARK)
     {
-      tree_code rhs_code = gimple_assign_rhs_code (assign);
-      switch (rhs_code)
+      switch (code)
 	{
 	CASE_CONVERT:
 	case SSA_NAME:
@@ -6600,7 +6687,7 @@ possible_vector_mask_operation_p (stmt_vec_info stmt_info)
 	  return true;
 
 	default:
-	  return TREE_CODE_CLASS (rhs_code) == tcc_comparison;
+	  return TREE_CODE_CLASS (code) == tcc_comparison;
 	}
     }
   else if (is_a <gphi *> (stmt_info->stmt))
@@ -6647,12 +6734,35 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
      The number of operations are equal, but M16 would have given
      a shorter dependency chain and allowed more ILP.  */
   unsigned int precision = ~0U;
-  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+
+  /* If the statement compares two values that shouldn't use vector masks,
+     try comparing the values as normal scalars instead.  */
+  tree_code code = ERROR_MARK;
+  tree op0_type;
+  unsigned int nops = -1;
+  unsigned int ops_start = 0;
+
+  if (gassign *assign = dyn_cast <gassign *> (stmt))
+    {
+      code = gimple_assign_rhs_code (assign);
+      op0_type = TREE_TYPE (gimple_assign_rhs1 (assign));
+      nops = gimple_num_ops (assign);
+      ops_start = 1;
+    }
+  else if (gcond *cond = dyn_cast <gcond *> (stmt))
+    {
+      code = gimple_cond_code (cond);
+      op0_type = TREE_TYPE (gimple_cond_lhs (cond));
+      nops = 2;
+      ops_start = 0;
+    }
+
+  if (code != ERROR_MARK)
     {
-      unsigned int nops = gimple_num_ops (assign);
-      for (unsigned int i = 1; i < nops; ++i)
+      for (unsigned int i = ops_start; i < nops; ++i)
 	{
-	  tree rhs = gimple_op (assign, i);
+	  tree rhs = gimple_op (stmt, i);
 	  if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs)))
 	    continue;
 
@@ -6669,19 +6779,15 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
 	    }
 	}
 
-      /* If the statement compares two values that shouldn't use vector masks,
-	 try comparing the values as normal scalars instead.  */
-      tree_code rhs_code = gimple_assign_rhs_code (assign);
       if (precision == ~0U
-	  && TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+	  && TREE_CODE_CLASS (code) == tcc_comparison)
 	{
-	  tree rhs1_type = TREE_TYPE (gimple_assign_rhs1 (assign));
 	  scalar_mode mode;
 	  tree vectype, mask_type;
-	  if (is_a <scalar_mode> (TYPE_MODE (rhs1_type), &mode)
-	      && (vectype = get_vectype_for_scalar_type (vinfo, rhs1_type))
-	      && (mask_type = get_mask_type_for_scalar_type (vinfo, rhs1_type))
-	      && expand_vec_cmp_expr_p (vectype, mask_type, rhs_code))
+	  if (is_a <scalar_mode> (TYPE_MODE (op0_type), &mode)
+	      && (vectype = get_vectype_for_scalar_type (vinfo, op0_type))
+	      && (mask_type = get_mask_type_for_scalar_type (vinfo, op0_type))
+	      && expand_vec_cmp_expr_p (vectype, mask_type, code))
 	    precision = GET_MODE_BITSIZE (mode);
 	}
     }
@@ -6860,6 +6966,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
+  { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
   /* This must come before mask conversion, and includes the parts
      of mask conversion that are needed for gather and scatter
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 582c5e678fad802d6e76300fe3c939b9f2978f17..766450cd85b55ce4dfd45878c5dc44cd09c68681 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   tree mask_type;
-  tree mask;
+  tree mask = NULL_TREE;
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
     return false;
@@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
   /* Transform.  */
 
   /* Handle def.  */
-  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
-  mask = vect_create_destination_var (lhs, mask_type);
+  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
+  if (lhs)
+    mask = vect_create_destination_var (lhs, mask_type);
 
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
 		     rhs1, &vec_oprnds0, vectype,
@@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
       gimple *new_stmt;
       vec_rhs2 = vec_oprnds1[i];
 
-      new_temp = make_ssa_name (mask);
+      if (lhs)
+	new_temp = make_ssa_name (mask);
+      else
+	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
       if (bitop1 == NOP_EXPR)
 	{
 	  new_stmt = gimple_build_assign (new_temp, code,
@@ -12723,6 +12727,207 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
+			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!loop_vinfo
+      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+    return false;
+
+  DUMP_VECT_SCOPE ("vectorizable_early_exit");
+
+  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
+
+  tree vectype = NULL_TREE;
+  slp_tree slp_op0;
+  tree op0;
+  enum vect_def_type dt0;
+  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
+			   &vectype))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "use not simple.\n");
+	return false;
+    }
+
+  if (!vectype)
+    return false;
+
+  machine_mode mode = TYPE_MODE (vectype);
+  int ncopies;
+
+  if (slp_node)
+    ncopies = 1;
+  else
+    ncopies = vect_get_num_copies (loop_vinfo, vectype);
+
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+    {
+      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target doesn't support flag setting vector "
+			       "comparisons.\n");
+	  return false;
+	}
+
+      if (ncopies > 1
+	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "can't vectorize early exit because the "
+			       "target does not support boolean vector OR for "
+			       "type %T.\n", vectype);
+	  return false;
+	}
+
+      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				      vec_stmt, slp_node, cost_vec))
+	return false;
+
+      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+	{
+	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
+					      OPTIMIZE_FOR_SPEED))
+	    return false;
+	  else
+	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
+	}
+
+
+      return true;
+    }
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+				  vec_stmt, slp_node, cost_vec))
+    gcc_unreachable ();
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  basic_block cond_bb = gimple_bb (stmt);
+  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
+
+  auto_vec<tree> stmts;
+
+  if (slp_node)
+    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
+  else
+    {
+      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
+      stmts.reserve_exact (vec_stmts.length ());
+      for (auto stmt : vec_stmts)
+	stmts.quick_push (gimple_assign_lhs (stmt));
+    }
+
+  /* Determine if we need to reduce the final value.  */
+  if (stmts.length () > 1)
+    {
+      /* We build the reductions in a way to maintain as much parallelism as
+	 possible.  */
+      auto_vec<tree> workset (stmts.length ());
+
+      /* Mask the statements as we queue them up.  Normally we loop over
+	 vec_num,  but since we inspect the exact results of vectorization
+	 we don't need to and instead can just use the stmts themselves.  */
+      if (masked_loop_p)
+	for (unsigned i = 0; i < stmts.length (); i++)
+	  {
+	    tree stmt_mask
+	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
+				    i);
+	    stmt_mask
+	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
+				  stmts[i], &cond_gsi);
+	    workset.quick_push (stmt_mask);
+	  }
+      else
+	workset.splice (stmts);
+
+      while (workset.length () > 1)
+	{
+	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
+	  tree arg0 = workset.pop ();
+	  tree arg1 = workset.pop ();
+	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
+	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
+				       &cond_gsi);
+	  workset.quick_insert (0, new_temp);
+	}
+    }
+  else
+    {
+      new_temp = stmts[0];
+      if (masked_loop_p)
+	{
+	  tree mask
+	    = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
+	  new_temp = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
+				       new_temp, &cond_gsi);
+	}
+    }
+
+  gcc_assert (new_temp);
+
+  /* Now build the new conditional.  Pattern gimple_conds get dropped during
+     codegen so we must replace the original insn.  */
+  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
+  gcond *cond_stmt = as_a <gcond *>(stmt);
+  /* When vectorizing we assume that if the branch edge is taken that we're
+     exiting the loop.  This is not however always the case as the compiler will
+     rewrite conditions to always be a comparison against 0.  To do this it
+     sometimes flips the edges.  This is fine for scalar,  but for vector we
+     then have to flip the test, as we're still assuming that if you take the
+     branch edge that we found the exit condition.  */
+  auto new_code = NE_EXPR;
+  tree cst = build_zero_cst (vectype);
+  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
+			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
+    {
+      new_code = EQ_EXPR;
+      cst = build_minus_one_cst (vectype);
+    }
+
+  gimple_cond_set_condition (cond_stmt, new_code, new_temp, cst);
+  update_stmt (stmt);
+
+  if (slp_node)
+    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
+   else
+    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
+
+  if (!slp_node)
+    *vec_stmt = stmt;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
    can handle all live statements in the node.  Otherwise return true
    if STMT_INFO is not live or if vectorizable_live_operation can handle it.
@@ -12949,7 +13154,9 @@ vect_analyze_stmt (vec_info *vinfo,
 	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
 				  stmt_info, NULL, node)
 	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
-				   stmt_info, NULL, node, cost_vec));
+				   stmt_info, NULL, node, cost_vec)
+	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+				      cost_vec));
   else
     {
       if (bb_vinfo)
@@ -12972,7 +13179,10 @@ vect_analyze_stmt (vec_info *vinfo,
 					 NULL, NULL, node, cost_vec)
 	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
 					  cost_vec)
-	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
+	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
+	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
+					  cost_vec));
+
     }
 
   if (node)
@@ -13131,6 +13341,12 @@ vect_transform_stmt (vec_info *vinfo,
       gcc_assert (done);
       break;
 
+    case loop_exit_ctrl_vec_info_type:
+      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
+				      slp_node, NULL);
+      gcc_assert (done);
+      break;
+
     default:
       if (!STMT_VINFO_LIVE_P (stmt_info))
 	{
@@ -14321,6 +14537,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
     }
   else
     {
+      /* If we got here with a gcond it means that the target had no available vector
+	 mode for the scalar type.  We can't vectorize so abort.  */
+      if (is_a <gcond *> (stmt))
+	return opt_result::failure_at (stmt,
+				       "not vectorized:"
+				       " unsupported data-type for gcond %T\n",
+				       scalar_type);
+
       if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
 	scalar_type = TREE_TYPE (DR_REF (dr));
       else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-13 14:13                               ` Tamar Christina
@ 2023-12-14 13:12                                 ` Richard Biener
  2023-12-14 18:44                                   ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-12-14 13:12 UTC (permalink / raw)
  To: Tamar Christina; +Cc: Richard Sandiford, gcc-patches, nd, jlaw

On Wed, 13 Dec 2023, Tamar Christina wrote:

> > > >   else if (vect_use_mask_type_p (stmt_info))
> > > >     {
> > > >       unsigned int precision = stmt_info->mask_precision;
> > > >       scalar_type = build_nonstandard_integer_type (precision, 1);
> > > >       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> > > > group_size);
> > > >       if (!vectype)
> > > >         return opt_result::failure_at (stmt, "not vectorized: unsupported"
> > > >                                        " data-type %T\n", scalar_type);
> > > >
> > > > Richard, do you have any advice here?  I suppose vect_determine_precisions
> > > > needs to handle the gcond case with bool != 0 somehow and for the
> > > > extra mask producer we add here we have to emulate what it would have
> > > > done, right?
> > >
> > > How about handling gconds directly in vect_determine_mask_precision?
> > > In a sense it's not needed, since gconds are always roots, and so we
> > > could calculate their precision on the fly instead.  But handling it in
> > > vect_determine_mask_precision feels like it should reduce the number
> > > of special cases.
> > 
> > Yeah, that sounds worth trying.
> > 
> > Richard.
> 
> So here's a respin with this suggestion and the other issues fixed.
> Note that the testcases still need to be updated with the right stanzas.
> 
> The patch is much smaller, I still have a small change to
> vect_get_vector_types_for_stmt  in case we get there on a gcond where
> vect_recog_gcond_pattern couldn't apply due to the target missing an
> appropriate vectype.  The change only gracefully rejects the gcond.
> 
> Since patterns cannot apply to the same root twice I've had to also do
> the split of the condition out of the gcond in bitfield lowering.

Bah.  Guess we want to fix that (next stage1).  Can you please add
a comment to the split out done in vect_recog_bitfield_ref_pattern?

> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues.
> 
> Ok for master?

OK with the above change.

Thanks,
Richard.

> Thanks,
> Tamar
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gcond
> 	(vect_recog_bitfield_ref_pattern): Update to split out bool.
> 	(vect_recog_gcond_pattern): New.
> 	(possible_vector_mask_operation_p): Support gcond.
> 	(vect_determine_mask_precision): Likewise.
> 	* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> 	lhs.
> 	(vectorizable_early_exit): New.
> 	(vect_analyze_stmt, vect_transform_stmt): Use it.
> 	(vect_get_vector_types_for_stmt): Rejects gcond if not lowered by
> 	vect_recog_gcond_pattern.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-early-break_84.c: New test.
> 	* gcc.dg/vect/vect-early-break_85.c: New test.
> 	* gcc.dg/vect/vect-early-break_86.c: New test.
> 	* gcc.dg/vect/vect-early-break_87.c: New test.
> 	* gcc.dg/vect/vect-early-break_88.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..0622339491d333b07c2ce895785b5216713097a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_84.c
> @@ -0,0 +1,39 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#include <stdbool.h>
> +
> +#ifndef N
> +#define N 17
> +#endif
> +bool vect_a[N] = { false, false, true, false, false, false,
> +                   false, false, false, false, false, false,
> +                   false, false, false, false, false };
> +unsigned vect_b[N] = { 0 };
> +
> +__attribute__ ((noinline, noipa))
> +unsigned test4(bool x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (vect_a[i] == x)
> +     return 1;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +  if (test4 (true) != 1)
> +    abort ();
> +
> +  if (vect_b[2] != 0 && vect_b[1] == 0)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..39b3d9bad8681a2d15d7fc7de86bdd3ce0f0bd4e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_85.c
> @@ -0,0 +1,35 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 5
> +#endif
> +int vect_a[N] = { 5, 4, 8, 4, 6 };
> +unsigned vect_b[N] = { 0 };
> +
> +__attribute__ ((noinline, noipa))
> +unsigned test4(int x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (vect_a[i] > x)
> +     return 1;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +  if (test4 (7) != 1)
> +    abort ();
> +
> +  if (vect_b[2] != 0 && vect_b[1] == 0)
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..66eb570f4028bca4b631329d7af50c646d3c0cb3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_86.c
> @@ -0,0 +1,21 @@
> +/* { dg-additional-options "-std=gnu89" } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +extern void abort ();
> +extern void exit (int);
> +
> +__attribute__((noinline, noipa))
> +int f(x) {
> +  int i;
> +  for (i = 0; i < 8 && (x & 1) == 1; i++)
> +    x >>= 1;
> +  return i;
> +}
> +main() {
> +  if (f(4) != 0)
> +    abort();
> +  exit(0);
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..67be67da0583ba7feda3bed09c42fa735da9b98e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_87.c
> @@ -0,0 +1,21 @@
> +/* { dg-additional-options "-std=gnu89" } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +extern void abort ();
> +extern void exit (int);
> +
> +__attribute__((noinline, noipa))
> +int f(x) {
> +  int i;
> +  for (i = 0; i < 8 && (x & 1) == 0; i++)
> +    x >>= 1;
> +  return i;
> +}
> +main() {
> +  if (f(4) != 2)
> +    abort();
> +  exit(0);
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b64becd588973f58601196bfcb15afbe4bab60f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> @@ -0,0 +1,36 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } */
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +#ifndef N
> +#define N 5
> +#endif
> +float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
> +unsigned vect_b[N] = { 0 };
> +
> +__attribute__ ((noinline, noipa))
> +unsigned test4(double x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> +
> +extern void abort ();
> +
> +int main ()
> +{
> +  if (test4 (7.0) != 0)
> +    abort ();
> +
> +  if (vect_b[2] != 0 && vect_b[1] == 0)
> +    abort ();
> +}
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 7debe7f0731673cd1bf25cd39d55e23990a73d0e..f6ce27a7c45aa6ce72c402987958ee395c045a14 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt,
>    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>      {
>        gcc_assert (!vectype
> +		  || is_a <gcond *> (pattern_stmt)
>  		  || (VECTOR_BOOLEAN_TYPE_P (vectype)
>  		      == vect_use_mask_type_p (orig_stmt_info)));
>        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -2786,15 +2787,24 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
>  
>    if (!lhs)
>      {
> +      if (!vectype)
> +	return NULL;
> +
>        append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
> +      vectype = truth_type_for (vectype);
> +
> +      tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
>        gcond *cond_stmt = dyn_cast <gcond *> (stmt_info->stmt);
>        tree cond_cst = gimple_cond_rhs (cond_stmt);
> +      gimple *new_stmt
> +	= gimple_build_assign (new_lhs, gimple_cond_code (cond_stmt),
> +			       gimple_get_lhs (pattern_stmt),
> +			       fold_convert (container_type, cond_cst));
> +      append_pattern_def_seq (vinfo, stmt_info, new_stmt, vectype, container_type);
>        pattern_stmt
> -	= gimple_build_cond (gimple_cond_code (cond_stmt),
> -			     gimple_get_lhs (pattern_stmt),
> -			     fold_convert (ret_type, cond_cst),
> -			     gimple_cond_true_label (cond_stmt),
> -			     gimple_cond_false_label (cond_stmt));
> +	= gimple_build_cond (NE_EXPR, new_lhs,
> +			     build_zero_cst (TREE_TYPE (new_lhs)),
> +			     NULL_TREE, NULL_TREE);
>      }
>  
>    *type_out = STMT_VINFO_VECTYPE (stmt_info);
> @@ -5553,6 +5563,72 @@ integer_type_for_mask (tree var, vec_info *vinfo)
>    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
>  }
>  
> +/* Function vect_recog_gcond_pattern
> +
> +   Try to find pattern like following:
> +
> +     if (a op b)
> +
> +   where operator 'op' is not != and convert it to an adjusted boolean pattern
> +
> +     mask = a op b
> +     if (mask != 0)
> +
> +   and set the mask type on MASK.
> +
> +   Input:
> +
> +   * STMT_VINFO: The stmt at the end from which the pattern
> +		 search begins, i.e. cast of a bool to
> +		 an integer type.
> +
> +   Output:
> +
> +   * TYPE_OUT: The type of the output of this pattern.
> +
> +   * Return value: A new stmt that will be used to replace the pattern.  */
> +
> +static gimple *
> +vect_recog_gcond_pattern (vec_info *vinfo,
> +			 stmt_vec_info stmt_vinfo, tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +  gcond* cond = NULL;
> +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
> +    return NULL;
> +
> +  auto lhs = gimple_cond_lhs (cond);
> +  auto rhs = gimple_cond_rhs (cond);
> +  auto code = gimple_cond_code (cond);
> +
> +  tree scalar_type = TREE_TYPE (lhs);
> +  if (VECTOR_TYPE_P (scalar_type))
> +    return NULL;
> +
> +  if (code == NE_EXPR
> +      && zerop (rhs)
> +      && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type))
> +    return NULL;
> +
> +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> +  if (vecitype == NULL_TREE)
> +    return NULL;
> +
> +  tree vectype = truth_type_for (vecitype);
> +
> +  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
> +  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
> +  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, scalar_type);
> +
> +  gimple *pattern_stmt
> +    = gimple_build_cond (NE_EXPR, new_lhs,
> +			 build_int_cst (TREE_TYPE (new_lhs), 0),
> +			 NULL_TREE, NULL_TREE);
> +  *type_out = vectype;
> +  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
> +  return pattern_stmt;
> +}
> +
>  /* Function vect_recog_bool_pattern
>  
>     Try to find pattern like following:
> @@ -6581,15 +6657,26 @@ static bool
>  possible_vector_mask_operation_p (stmt_vec_info stmt_info)
>  {
>    tree lhs = gimple_get_lhs (stmt_info->stmt);
> +  tree_code code = ERROR_MARK;
> +  gassign *assign = NULL;
> +  gcond *cond = NULL;
> +
> +  if ((assign = dyn_cast <gassign *> (stmt_info->stmt)))
> +    code = gimple_assign_rhs_code (assign);
> +  else if ((cond = dyn_cast <gcond *> (stmt_info->stmt)))
> +    {
> +      lhs = gimple_cond_lhs (cond);
> +      code = gimple_cond_code (cond);
> +    }
> +
>    if (!lhs
>        || TREE_CODE (lhs) != SSA_NAME
>        || !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (lhs)))
>      return false;
>  
> -  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
> +  if (code != ERROR_MARK)
>      {
> -      tree_code rhs_code = gimple_assign_rhs_code (assign);
> -      switch (rhs_code)
> +      switch (code)
>  	{
>  	CASE_CONVERT:
>  	case SSA_NAME:
> @@ -6600,7 +6687,7 @@ possible_vector_mask_operation_p (stmt_vec_info stmt_info)
>  	  return true;
>  
>  	default:
> -	  return TREE_CODE_CLASS (rhs_code) == tcc_comparison;
> +	  return TREE_CODE_CLASS (code) == tcc_comparison;
>  	}
>      }
>    else if (is_a <gphi *> (stmt_info->stmt))
> @@ -6647,12 +6734,35 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
>       The number of operations are equal, but M16 would have given
>       a shorter dependency chain and allowed more ILP.  */
>    unsigned int precision = ~0U;
> -  if (gassign *assign = dyn_cast <gassign *> (stmt_info->stmt))
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +
> +  /* If the statement compares two values that shouldn't use vector masks,
> +     try comparing the values as normal scalars instead.  */
> +  tree_code code = ERROR_MARK;
> +  tree op0_type;
> +  unsigned int nops = -1;
> +  unsigned int ops_start = 0;
> +
> +  if (gassign *assign = dyn_cast <gassign *> (stmt))
> +    {
> +      code = gimple_assign_rhs_code (assign);
> +      op0_type = TREE_TYPE (gimple_assign_rhs1 (assign));
> +      nops = gimple_num_ops (assign);
> +      ops_start = 1;
> +    }
> +  else if (gcond *cond = dyn_cast <gcond *> (stmt))
> +    {
> +      code = gimple_cond_code (cond);
> +      op0_type = TREE_TYPE (gimple_cond_lhs (cond));
> +      nops = 2;
> +      ops_start = 0;
> +    }
> +
> +  if (code != ERROR_MARK)
>      {
> -      unsigned int nops = gimple_num_ops (assign);
> -      for (unsigned int i = 1; i < nops; ++i)
> +      for (unsigned int i = ops_start; i < nops; ++i)
>  	{
> -	  tree rhs = gimple_op (assign, i);
> +	  tree rhs = gimple_op (stmt, i);
>  	  if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs)))
>  	    continue;
>  
> @@ -6669,19 +6779,15 @@ vect_determine_mask_precision (vec_info *vinfo, stmt_vec_info stmt_info)
>  	    }
>  	}
>  
> -      /* If the statement compares two values that shouldn't use vector masks,
> -	 try comparing the values as normal scalars instead.  */
> -      tree_code rhs_code = gimple_assign_rhs_code (assign);
>        if (precision == ~0U
> -	  && TREE_CODE_CLASS (rhs_code) == tcc_comparison)
> +	  && TREE_CODE_CLASS (code) == tcc_comparison)
>  	{
> -	  tree rhs1_type = TREE_TYPE (gimple_assign_rhs1 (assign));
>  	  scalar_mode mode;
>  	  tree vectype, mask_type;
> -	  if (is_a <scalar_mode> (TYPE_MODE (rhs1_type), &mode)
> -	      && (vectype = get_vectype_for_scalar_type (vinfo, rhs1_type))
> -	      && (mask_type = get_mask_type_for_scalar_type (vinfo, rhs1_type))
> -	      && expand_vec_cmp_expr_p (vectype, mask_type, rhs_code))
> +	  if (is_a <scalar_mode> (TYPE_MODE (op0_type), &mode)
> +	      && (vectype = get_vectype_for_scalar_type (vinfo, op0_type))
> +	      && (mask_type = get_mask_type_for_scalar_type (vinfo, op0_type))
> +	      && expand_vec_cmp_expr_p (vectype, mask_type, code))
>  	    precision = GET_MODE_BITSIZE (mode);
>  	}
>      }
> @@ -6860,6 +6966,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>    { vect_recog_divmod_pattern, "divmod" },
>    { vect_recog_mult_pattern, "mult" },
>    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> +  { vect_recog_gcond_pattern, "gcond" },
>    { vect_recog_bool_pattern, "bool" },
>    /* This must come before mask conversion, and includes the parts
>       of mask conversion that are needed for gather and scatter
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 582c5e678fad802d6e76300fe3c939b9f2978f17..766450cd85b55ce4dfd45878c5dc44cd09c68681 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    vec<tree> vec_oprnds0 = vNULL;
>    vec<tree> vec_oprnds1 = vNULL;
>    tree mask_type;
> -  tree mask;
> +  tree mask = NULL_TREE;
>  
>    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
>      return false;
> @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>    /* Transform.  */
>  
>    /* Handle def.  */
> -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> -  mask = vect_create_destination_var (lhs, mask_type);
> +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> +  if (lhs)
> +    mask = vect_create_destination_var (lhs, mask_type);
>  
>    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
>  		     rhs1, &vec_oprnds0, vectype,
> @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
>        gimple *new_stmt;
>        vec_rhs2 = vec_oprnds1[i];
>  
> -      new_temp = make_ssa_name (mask);
> +      if (lhs)
> +	new_temp = make_ssa_name (mask);
> +      else
> +	new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
>        if (bitop1 == NOP_EXPR)
>  	{
>  	  new_stmt = gimple_build_assign (new_temp, code,
> @@ -12723,6 +12727,207 @@ vectorizable_comparison (vec_info *vinfo,
>    return true;
>  }
>  
> +/* Check to see if the current early break given in STMT_INFO is valid for
> +   vectorization.  */
> +
> +static bool
> +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> +			 gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +			 slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +{
> +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> +  if (!loop_vinfo
> +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> +    return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> +    return false;
> +
> +  DUMP_VECT_SCOPE ("vectorizable_early_exit");
> +
> +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> +
> +  tree vectype = NULL_TREE;
> +  slp_tree slp_op0;
> +  tree op0;
> +  enum vect_def_type dt0;
> +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, &dt0,
> +			   &vectype))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			   "use not simple.\n");
> +	return false;
> +    }
> +
> +  if (!vectype)
> +    return false;
> +
> +  machine_mode mode = TYPE_MODE (vectype);
> +  int ncopies;
> +
> +  if (slp_node)
> +    ncopies = 1;
> +  else
> +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> +
> +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +
> +  /* Analyze only.  */
> +  if (!vec_stmt)
> +    {
> +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target doesn't support flag setting vector "
> +			       "comparisons.\n");
> +	  return false;
> +	}
> +
> +      if (ncopies > 1
> +	  && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> +	{
> +	  if (dump_enabled_p ())
> +	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			       "can't vectorize early exit because the "
> +			       "target does not support boolean vector OR for "
> +			       "type %T.\n", vectype);
> +	  return false;
> +	}
> +
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				      vec_stmt, slp_node, cost_vec))
> +	return false;
> +
> +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> +	{
> +	  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> +					      OPTIMIZE_FOR_SPEED))
> +	    return false;
> +	  else
> +	    vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> +	}
> +
> +
> +      return true;
> +    }
> +
> +  /* Tranform.  */
> +
> +  tree new_temp = NULL_TREE;
> +  gimple *new_stmt = NULL;
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> +
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +				  vec_stmt, slp_node, cost_vec))
> +    gcc_unreachable ();
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  basic_block cond_bb = gimple_bb (stmt);
> +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> +
> +  auto_vec<tree> stmts;
> +
> +  if (slp_node)
> +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> +  else
> +    {
> +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> +      stmts.reserve_exact (vec_stmts.length ());
> +      for (auto stmt : vec_stmts)
> +	stmts.quick_push (gimple_assign_lhs (stmt));
> +    }
> +
> +  /* Determine if we need to reduce the final value.  */
> +  if (stmts.length () > 1)
> +    {
> +      /* We build the reductions in a way to maintain as much parallelism as
> +	 possible.  */
> +      auto_vec<tree> workset (stmts.length ());
> +
> +      /* Mask the statements as we queue them up.  Normally we loop over
> +	 vec_num,  but since we inspect the exact results of vectorization
> +	 we don't need to and instead can just use the stmts themselves.  */
> +      if (masked_loop_p)
> +	for (unsigned i = 0; i < stmts.length (); i++)
> +	  {
> +	    tree stmt_mask
> +	      = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
> +				    i);
> +	    stmt_mask
> +	      = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
> +				  stmts[i], &cond_gsi);
> +	    workset.quick_push (stmt_mask);
> +	  }
> +      else
> +	workset.splice (stmts);
> +
> +      while (workset.length () > 1)
> +	{
> +	  new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> +	  tree arg0 = workset.pop ();
> +	  tree arg1 = workset.pop ();
> +	  new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> +	  vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> +				       &cond_gsi);
> +	  workset.quick_insert (0, new_temp);
> +	}
> +    }
> +  else
> +    {
> +      new_temp = stmts[0];
> +      if (masked_loop_p)
> +	{
> +	  tree mask
> +	    = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 0);
> +	  new_temp = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
> +				       new_temp, &cond_gsi);
> +	}
> +    }
> +
> +  gcc_assert (new_temp);
> +
> +  /* Now build the new conditional.  Pattern gimple_conds get dropped during
> +     codegen so we must replace the original insn.  */
> +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> +  gcond *cond_stmt = as_a <gcond *>(stmt);
> +  /* When vectorizing we assume that if the branch edge is taken that we're
> +     exiting the loop.  This is not however always the case as the compiler will
> +     rewrite conditions to always be a comparison against 0.  To do this it
> +     sometimes flips the edges.  This is fine for scalar,  but for vector we
> +     then have to flip the test, as we're still assuming that if you take the
> +     branch edge that we found the exit condition.  */
> +  auto new_code = NE_EXPR;
> +  tree cst = build_zero_cst (vectype);
> +  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> +			     BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
> +    {
> +      new_code = EQ_EXPR;
> +      cst = build_minus_one_cst (vectype);
> +    }
> +
> +  gimple_cond_set_condition (cond_stmt, new_code, new_temp, cst);
> +  update_stmt (stmt);
> +
> +  if (slp_node)
> +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> +   else
> +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> +
> +  if (!slp_node)
> +    *vec_stmt = stmt;
> +
> +  return true;
> +}
> +
>  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
>     can handle all live statements in the node.  Otherwise return true
>     if STMT_INFO is not live or if vectorizable_live_operation can handle it.
> @@ -12949,7 +13154,9 @@ vect_analyze_stmt (vec_info *vinfo,
>  	  || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
>  				  stmt_info, NULL, node)
>  	  || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> -				   stmt_info, NULL, node, cost_vec));
> +				   stmt_info, NULL, node, cost_vec)
> +	  || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +				      cost_vec));
>    else
>      {
>        if (bb_vinfo)
> @@ -12972,7 +13179,10 @@ vect_analyze_stmt (vec_info *vinfo,
>  					 NULL, NULL, node, cost_vec)
>  	      || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
>  					  cost_vec)
> -	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> +	      || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> +	      || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> +					  cost_vec));
> +
>      }
>  
>    if (node)
> @@ -13131,6 +13341,12 @@ vect_transform_stmt (vec_info *vinfo,
>        gcc_assert (done);
>        break;
>  
> +    case loop_exit_ctrl_vec_info_type:
> +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> +				      slp_node, NULL);
> +      gcc_assert (done);
> +      break;
> +
>      default:
>        if (!STMT_VINFO_LIVE_P (stmt_info))
>  	{
> @@ -14321,6 +14537,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>      }
>    else
>      {
> +      /* If we got here with a gcond it means that the target had no available vector
> +	 mode for the scalar type.  We can't vectorize so abort.  */
> +      if (is_a <gcond *> (stmt))
> +	return opt_result::failure_at (stmt,
> +				       "not vectorized:"
> +				       " unsupported data-type for gcond %T\n",
> +				       scalar_type);
> +
>        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
>  	scalar_type = TREE_TYPE (DR_REF (dr));
>        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD
  2023-12-07  0:56         ` Richard Sandiford
@ 2023-12-14 18:40           ` Tamar Christina
  2023-12-14 19:34             ` Richard Sandiford
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-14 18:40 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 5184 bytes --]

> I see you've changed it from:
> 
> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
> 
> to:
> 
> +  emit_jump_insn (gen_cbranchdi4 (operands[0], val, CONST0_RTX (DImode),
> +				  operands[3]));
> 
> Was that to fix a specific problem?  The original looked OK to me
> for that part (it was the vector comparison that I was asking about).
> 

No,It was to be more consistent with the Arm and MVE patch.  

Note that I may update the tests to disable scheduling.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (cbranch<mode>4): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/vect-early-break-cbranch.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c6f2d5828373f2a5272b9d1227bfe34365f9fd09..309ec9535294d6e9cdc530f71d9fe38bb916c966 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3911,6 +3911,45 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+        (if_then_else
+          (match_operator 0 "aarch64_equality_operator"
+            [(match_operand:VDQ_I 1 "register_operand")
+             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+          (label_ref (match_operand 3 ""))
+          (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+     so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (<MODE>mode))
+    {
+      tmp = gen_reg_rtx (<MODE>mode);
+      emit_insn (gen_xor<mode>3 (tmp, operands[1], operands[2]));
+    }
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      rtx reduc = gen_lowpart (V4SImode, tmp);
+      rtx res = gen_reg_rtx (V4SImode);
+      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+    }
+
+  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}

[-- Attachment #2: rb17509.patch --]
[-- Type: application/octet-stream, Size: 4063 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c6f2d5828373f2a5272b9d1227bfe34365f9fd09..309ec9535294d6e9cdc530f71d9fe38bb916c966 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3911,6 +3911,45 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+        (if_then_else
+          (match_operator 0 "aarch64_equality_operator"
+            [(match_operand:VDQ_I 1 "register_operand")
+             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+          (label_ref (match_operand 3 ""))
+          (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+     so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (<MODE>mode))
+    {
+      tmp = gen_reg_rtx (<MODE>mode);
+      emit_insn (gen_xor<mode>3 (tmp, operands[1], operands[2]));
+    }
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
+    {
+      /* Always reduce using a V4SI.  */
+      rtx reduc = gen_lowpart (V4SImode, tmp);
+      rtx res = gen_reg_rtx (V4SImode);
+      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
+    }
+
+  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp<mode><mode>"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+**	...
+**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] > 0)
+	break;
+    }
+}
+
+/*
+** f2:
+**	...
+**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] >= 0)
+	break;
+    }
+}
+
+/*
+** f3:
+**	...
+**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] == 0)
+	break;
+    }
+}
+
+/*
+** f4:
+**	...
+**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] != 0)
+	break;
+    }
+}
+
+/*
+** f5:
+**	...
+**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] < 0)
+	break;
+    }
+}
+
+/*
+** f6:
+**	...
+**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
+**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+**	fmov	x[0-9]+, d[0-9]+
+**	cbnz	x[0-9]+, \.L[0-9]+
+**	...
+*/
+void f6 ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      b[i] += a[i];
+      if (a[i] <= 0)
+	break;
+    }
+}

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code
  2023-12-14 13:12                                 ` Richard Biener
@ 2023-12-14 18:44                                   ` Tamar Christina
  0 siblings, 0 replies; 200+ messages in thread
From: Tamar Christina @ 2023-12-14 18:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Sandiford, gcc-patches, nd, jlaw

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, December 14, 2023 1:13 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; gcc-patches@gcc.gnu.org;
> nd <nd@arm.com>; jlaw@ventanamicro.com
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Wed, 13 Dec 2023, Tamar Christina wrote:
> 
> > > > >   else if (vect_use_mask_type_p (stmt_info))
> > > > >     {
> > > > >       unsigned int precision = stmt_info->mask_precision;
> > > > >       scalar_type = build_nonstandard_integer_type (precision, 1);
> > > > >       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> > > > > group_size);
> > > > >       if (!vectype)
> > > > >         return opt_result::failure_at (stmt, "not vectorized: unsupported"
> > > > >                                        " data-type %T\n", scalar_type);
> > > > >
> > > > > Richard, do you have any advice here?  I suppose
> vect_determine_precisions
> > > > > needs to handle the gcond case with bool != 0 somehow and for the
> > > > > extra mask producer we add here we have to emulate what it would have
> > > > > done, right?
> > > >
> > > > How about handling gconds directly in vect_determine_mask_precision?
> > > > In a sense it's not needed, since gconds are always roots, and so we
> > > > could calculate their precision on the fly instead.  But handling it in
> > > > vect_determine_mask_precision feels like it should reduce the number
> > > > of special cases.
> > >
> > > Yeah, that sounds worth trying.
> > >
> > > Richard.
> >
> > So here's a respin with this suggestion and the other issues fixed.
> > Note that the testcases still need to be updated with the right stanzas.
> >
> > The patch is much smaller, I still have a small change to
> > vect_get_vector_types_for_stmt  in case we get there on a gcond where
> > vect_recog_gcond_pattern couldn't apply due to the target missing an
> > appropriate vectype.  The change only gracefully rejects the gcond.
> >
> > Since patterns cannot apply to the same root twice I've had to also do
> > the split of the condition out of the gcond in bitfield lowering.
> 
> Bah.  Guess we want to fix that (next stage1).  Can you please add
> a comment to the split out done in vect_recog_bitfield_ref_pattern?

Done.

> 
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and
> no issues.
> >
> > Ok for master?
> 
> OK with the above change.
> 

Thanks!

That leaves one patch left. I'll have that for you Tuesday morning.  Currently going over it
to see if I can't clean it up (and usually a day or two helps) more to minimize respins.

I'll then also send the final testsuite patches.

Thanks for all the reviews!

Cheers,
Tamar

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD
  2023-12-14 18:40           ` Tamar Christina
@ 2023-12-14 19:34             ` Richard Sandiford
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Sandiford @ 2023-12-14 19:34 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Tamar Christina <Tamar.Christina@arm.com> writes:
>> I see you've changed it from:
>> 
>> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
>> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
>> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
>> 
>> to:
>> 
>> +  emit_jump_insn (gen_cbranchdi4 (operands[0], val, CONST0_RTX (DImode),
>> +				  operands[3]));
>> 
>> Was that to fix a specific problem?  The original looked OK to me
>> for that part (it was the vector comparison that I was asking about).
>> 
>
> No,It was to be more consistent with the Arm and MVE patch.  
>
> Note that I may update the tests to disable scheduling.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-simd.md (cbranch<mode>4): New.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/vect-early-break-cbranch.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index c6f2d5828373f2a5272b9d1227bfe34365f9fd09..309ec9535294d6e9cdc530f71d9fe38bb916c966 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3911,6 +3911,45 @@ (define_expand "vcond_mask_<mode><v_int_equiv>"
>    DONE;
>  })
>  
> +;; Patterns comparing two vectors and conditionally jump
> +
> +(define_expand "cbranch<mode>4"
> +  [(set (pc)
> +        (if_then_else
> +          (match_operator 0 "aarch64_equality_operator"
> +            [(match_operand:VDQ_I 1 "register_operand")
> +             (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
> +          (label_ref (match_operand 3 ""))
> +          (pc)))]
> +  "TARGET_SIMD"
> +{
> +  auto code = GET_CODE (operands[0]);
> +  rtx tmp = operands[1];
> +
> +  /* If comparing against a non-zero vector we have to do a comparison first

...an EOR first

(or XOR)

OK with that change, thanks.

Richard

> +     so we can have a != 0 comparison with the result.  */
> +  if (operands[2] != CONST0_RTX (<MODE>mode))
> +    {
> +      tmp = gen_reg_rtx (<MODE>mode);
> +      emit_insn (gen_xor<mode>3 (tmp, operands[1], operands[2]));
> +    }
> +
> +  /* For 64-bit vectors we need no reductions.  */
> +  if (known_eq (128, GET_MODE_BITSIZE (<MODE>mode)))
> +    {
> +      /* Always reduce using a V4SI.  */
> +      rtx reduc = gen_lowpart (V4SImode, tmp);
> +      rtx res = gen_reg_rtx (V4SImode);
> +      emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
> +      emit_move_insn (tmp, gen_lowpart (<MODE>mode, res));
> +    }
> +
> +  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
> +  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
> +  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
> +  DONE;
> +})
> +
>  ;; Patterns comparing two vectors to produce a mask.
>  
>  (define_expand "vec_cmp<mode><mode>"
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
> @@ -0,0 +1,124 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#pragma GCC target "+nosve"
> +
> +#define N 640
> +int a[N] = {0};
> +int b[N] = {0};
> +
> +
> +/*
> +** f1:
> +**	...
> +**	cmgt	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f1 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] > 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f2:
> +**	...
> +**	cmge	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f2 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] >= 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f3:
> +**	...
> +**	cmeq	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f3 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] == 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f4:
> +**	...
> +**	cmtst	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f4 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] != 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f5:
> +**	...
> +**	cmlt	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f5 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] < 0)
> +	break;
> +    }
> +}
> +
> +/*
> +** f6:
> +**	...
> +**	cmle	v[0-9]+.4s, v[0-9]+.4s, #0
> +**	umaxp	v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> +**	fmov	x[0-9]+, d[0-9]+
> +**	cbnz	x[0-9]+, \.L[0-9]+
> +**	...
> +*/
> +void f6 ()
> +{
> +  for (int i = 0; i < N; i++)
> +    {
> +      b[i] += a[i];
> +      if (a[i] <= 0)
> +	break;
> +    }
> +}

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks
  2023-11-07 14:23       ` Richard Biener
@ 2023-12-19 10:11         ` Tamar Christina
  2023-12-19 14:05           ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-19 10:11 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 27879 bytes --]

> > > > +      /* Save destination as we go, BB are visited in order and the last one
> > > > +	is where statements should be moved to.  */
> > > > +      if (!dest_bb)
> > > > +	dest_bb = gimple_bb (c);
> > > > +      else
> > > > +	{
> > > > +	  basic_block curr_bb = gimple_bb (c);
> > > > +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> > > > +	    dest_bb = curr_bb;
> > > > +	}
> > > > +    }
> > > > +
> > > > +  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
> > >
> > > no edge is the fallthru edge out of a condition, so this always selects
> > > EDGE_SUCC (dest_bb, 1) which cannot be correct (well, guess you're lucky).  I
> > > think you instead want
> > >
> > >   dest_bb = EDGE_SUCC (dest_bb, 0)->dest->loop_father == dest_bb-
> > > >loop_father ? EDGE_SUCC (dest_bb, 0)->dest : EDGE_SUCC (dest_bb, 1)-
> > > >dest;
> > >
> > > more nicely written, of course.
> > >
> > > > +  gcc_assert (dest_bb);
> > > > +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> > >
> > > Sorting the vector of early breaks as we gather them might be nicer than this -
> > > you'd then simply use the first or last.
> > >

I opted not to do the sorting since I don't really need a full order between the exits here
And only need to find the last one.   A sort would be more expensive than the linear
Check here.  But I also couldn't think of a good sort key since all you have is dominate yes/no.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): New.
	(vect_analyze_data_ref_dependences): Use them.
	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
	early_breaks.
	(move_early_exit_stmts): New.
	(vect_transform_loop): use it/
	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
	(class _loop_vec_info): Add early_breaks, early_break_conflict,
	early_break_vuses.
	(LOOP_VINFO_EARLY_BREAKS): New.
	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
	(LOOP_VINFO_EARLY_BRK_VUSES): New.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-early-break_57.c: New test.
	* gcc.dg/vect/vect-early-break_79.c: New test.
	* gcc.dg/vect/vect-early-break_80.c: New test.
	* gcc.dg/vect/vect-early-break_81.c: New test.
	* gcc.dg/vect/vect-early-break_83.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
index be4a0c7426093059ce37a9f824defb7ae270094d..9a4e795f92b7a8577ac71827f5cb0bd15d88ebe1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
@@ -5,6 +5,7 @@
 /* { dg-additional-options "-Ofast" } */
 
 /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
 
 void abort ();
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
new file mode 100644
index 0000000000000000000000000000000000000000..a26011ef1ba5aa000692babc90d46621efc2f8b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#undef N
+#define N 32
+
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < 1024; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
new file mode 100644
index 0000000000000000000000000000000000000000..ddf504e0c8787ae33a0e98045c1c91f2b9f533a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+
+int x;
+__attribute__ ((noinline, noipa))
+void foo (int *a, int *b)
+{
+  int local_x = x;
+  for (int i = 0; i < 1024; ++i)
+    {
+      if (i + local_x == 13)
+        break;
+      a[i] = 2 * b[i];
+    }
+}
+
+int main ()
+{
+  int a[1024] = {0};
+  int b[1024] = {0};
+
+  for (int i = 0; i < 1024; i++)
+    b[i] = i;
+
+  x = -512;
+  foo (a, b);
+
+  if (a[524] != 1048)
+    abort ();
+
+  if (a[525] != 0)
+    abort ();
+
+  if (a[1023] != 0)
+    abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
new file mode 100644
index 0000000000000000000000000000000000000000..c38e394ad87863f0702d422cb58018b979c9fba6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
+void abort ();
+
+unsigned short sa[32];
+unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[32];
+unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+int main2 (int n)
+{
+  int i;
+  for (i = 0; i < n - 3; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
new file mode 100644
index 0000000000000000000000000000000000000000..227dcf1b7ab2ace149e692a6aab41cdd5d47d098
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   volatile complex double z = vect_b[i];
+   vect_b[i] = x + i + z;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..8e9e780e01fd349b30da1f0a762c0306ec257ff7 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -613,6 +613,377 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* Funcion vect_analyze_early_break_dependences.
+
+   Examime all the data references in the loop and make sure that if we have
+   mulitple exits that we are able to safely move stores such that they become
+   safe for vectorization.  The function also calculates the place where to move
+   the instructions to and computes what the new vUSE chain should be.
+
+   This works in tandem with the CFG that will be produced by
+   slpeel_tree_duplicate_loop_to_edge_cfg later on.
+
+   This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.  */
+
+static opt_result
+vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
+
+  /* - CHAIN: Currently detected sequence of instructions that need to be moved
+	      if we are to vectorize this early break.
+     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable
+	      from one or more cond conditions.  If this set overlaps with CHAIN
+	      then FIXED takes precedence.  This deals with non-single use
+	      cases.
+     - BASES: List of all load data references found during traversal.  */
+  hash_set<tree> chain, fixed;
+  auto_vec<data_reference *> bases;
+  basic_block dest_bb = NULL;
+
+  hash_set <gimple *> visited;
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  class loop *loop_nest = loop_outer (loop);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "loop contains multiple exits, analyzing"
+		     " statement dependencies.\n");
+
+  for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	continue;
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (c);
+
+      /* First determine the list of statements that we can't move because they
+	 are required for the early break vectorization itself.  */
+      auto_vec <gimple *> workset;
+      workset.safe_push (c);
+      do {
+	gimple *op = workset.pop ();
+	if (visited.add (op)
+	    || is_a <gphi *> (op)
+	    || is_gimple_debug (op))
+	  continue;
+
+	if (gimple_has_lhs (op))
+	  fixed.add (gimple_get_lhs (op));
+
+	stmt_vec_info def_info = loop_vinfo->lookup_stmt (op);
+	if (!def_info)
+	  continue;
+
+	gimple *def_stmt = STMT_VINFO_STMT (def_info);
+	FOR_EACH_SSA_USE_OPERAND (use_p, def_stmt, iter, SSA_OP_USE)
+	  {
+	    tree use = USE_FROM_PTR (use_p);
+	    if (TREE_CODE (use) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (use))
+	      continue;
+
+	    if (gimple *g = SSA_NAME_DEF_STMT (use))
+	      workset.safe_push (g);
+	  }
+      } while (!workset.is_empty ());
+
+      /* Now analyze all the remaining statements and try to determine which
+	 instructions are allowed/needed to be moved.  */
+      while (!gsi_end_p (gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  gsi_prev (&gsi);
+	  if (!gimple_has_ops (stmt)
+	      || is_gimple_debug (stmt))
+	    continue;
+
+	  tree dest = NULL_TREE;
+	  /* Try to find the SSA_NAME being defined.  For Statements with an LHS
+	     use the LHS, if not, assume that the first argument of a call is
+	     the value being defined.  e.g. MASKED_LOAD etc.  */
+	  if (gimple_has_lhs (stmt))
+	    dest = gimple_get_lhs (stmt);
+	  else if (const gcall *call = dyn_cast <const gcall *> (stmt))
+	    dest = gimple_arg (call, 0);
+
+	  bool move = chain.contains (dest);
+
+	  stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+	  if (!stmt_vinfo)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks not supported. Unknown"
+				 " statement: %G", stmt);
+	      return opt_result::failure_at (c,
+				       "can't safely apply code motion to "
+				       "dependencies of %G to vectorize "
+				       "the early exit.\n", c);
+	    }
+
+	  auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+	  if (dr_ref)
+	    {
+	      /* We currently only support statically allocated objects due to
+		 not having first-faulting loads support or peeling for
+		 alignment support.  Compute the size of the referenced object
+		 (it could be dynamically allocated).  */
+	      tree obj = DR_BASE_ADDRESS (dr_ref);
+	      if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				     "early breaks only supported on statically"
+				     " allocated objects.\n");
+		  return opt_result::failure_at (c,
+				     "can't safely apply code motion to "
+				     "dependencies of %G to vectorize "
+				     "the early exit.\n", c);
+		}
+
+	      tree refop = TREE_OPERAND (obj, 0);
+	      tree refbase = get_base_address (refop);
+	      if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+		  || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+		{
+		  if (dump_enabled_p ())
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "early breaks only supported on"
+				       " statically allocated objects.\n");
+		  return opt_result::failure_at (c,
+				       "can't safely apply code motion to "
+				       "dependencies of %G to vectorize "
+				       "the early exit.\n", c);
+		}
+
+	      /* Check if vector accesses to the object will be within
+		 bounds.  */
+	      tree stype = TREE_TYPE (DECL_SIZE (refbase));
+	      tree access = fold_build2 (PLUS_EXPR, stype, DR_OFFSET (dr_ref),
+					 DR_INIT (dr_ref));
+	      tree final_adj
+		= fold_build2 (MULT_EXPR, stype, LOOP_VINFO_NITERS (loop_vinfo),
+			       DR_STEP (dr_ref));
+
+	      /* must be a constant or assume loop will be versioned or niters
+		 bounded by VF so accesses are within range.  */
+	      if (TREE_CODE (access) == INTEGER_CST
+		  && TREE_CODE (final_adj) == INTEGER_CST)
+		{
+		  access = fold_build2 (PLUS_EXPR, stype, access, final_adj);
+		  wide_int size = wi::to_wide (DECL_SIZE (refbase));
+		  wide_int off = wi::to_wide (access);
+		  if (wi::ge_p (off, size, UNSIGNED))
+		    {
+		      if (dump_enabled_p ())
+			dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+					 "early breaks not supported:"
+					 " vectorization would read beyond size"
+					 " of object %T.\n", obj);
+		      return opt_result::failure_at (c,
+					 "can't safely apply code motion to "
+					 "dependencies of %G to vectorize "
+					 "the early exit.\n", c);
+		    }
+		}
+
+	      if (DR_IS_READ (dr_ref))
+		bases.safe_push (dr_ref);
+	      else if (DR_IS_WRITE (dr_ref))
+		{
+		  /* We are moving writes down in the CFG.  To be sure that this
+		     is valid after vectorization we have to check all the loads
+		     we are hoisting the stores past to see if any of them may
+		     alias or are the same object.
+
+		     Same objects will not be an issue because unless the store
+		     is marked volatile the value can be forwarded.  If the
+		     store is marked volatile we don't vectorize the loop
+		     anyway.
+
+		     That leaves the check for aliasing.  We don't really need
+		     to care about the stores aliasing with each other since the
+		     stores are moved in order so the effects are still observed
+		     correctly.  This leaves the check for WAR dependencies
+		     which we would be introducing here if the DR can alias.
+		     The check is quadratic in loads/stores but I have not found
+		     a better API to do this.  I believe all loads and stores
+		     must be checked.  We also must check them when we
+		     encountered the store, since we don't care about loads past
+		     the store.  */
+
+		  for (auto dr_read : bases)
+		    if (dr_may_alias_p (dr_read, dr_ref, loop_nest))
+		      {
+			if (dump_enabled_p ())
+			    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+					     vect_location,
+					     "early breaks not supported: "
+					     "overlapping loads and stores "
+					     "found before the break "
+					     "statement.\n");
+
+			return opt_result::failure_at (stmt,
+			     "can't safely apply code motion to dependencies"
+			     " to vectorize the early exit. %G may alias with"
+			     " %G\n", stmt, dr_read->stmt);
+		      }
+
+		  /* Any writes starts a new chain. */
+		  move = true;
+		}
+	    }
+
+	  /* If a statement is live and escapes the loop through usage in the
+	     loop epilogue then we can't move it since we need to maintain its
+	     reachability through all exits.  */
+	  bool skip = false;
+	  if (STMT_VINFO_LIVE_P (stmt_vinfo)
+	      && !(dr_ref && DR_IS_WRITE (dr_ref)))
+	    {
+	      imm_use_iterator imm_iter;
+	      use_operand_p use_p;
+	      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
+		{
+		  basic_block bb = gimple_bb (USE_STMT (use_p));
+		  skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+		  if (skip)
+		    break;
+		}
+	    }
+
+	  /* If we found the defining statement of a something that's part of
+	     the chain then expand the chain with the new SSA_VARs being
+	     used.  */
+	  if (!skip && move)
+	    {
+	      use_operand_p use_p;
+	      ssa_op_iter iter;
+	      FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+		{
+		  tree op = USE_FROM_PTR (use_p);
+		  gcc_assert (TREE_CODE (op) == SSA_NAME);
+		  if (fixed.contains (dest))
+		    {
+		      move = false;
+		      fixed.add (op);
+		    }
+		  else
+		    chain.add (op);
+		}
+
+	      if (dump_enabled_p ())
+		{
+		  if (move)
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "found chain %G", stmt);
+		  else
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "ignored chain %G, not single use", stmt);
+		}
+	    }
+
+	  if (move)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "==> recording stmt %G", stmt);
+
+	      /* If we've moved a VDEF, extract the defining MEM and update
+		 usages of it.   */
+	      tree vdef;
+	      /* This statement is to be moved.  */
+	      if ((vdef = gimple_vdef (stmt)))
+		LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (
+		    stmt);
+	    }
+
+	  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
+	    {
+	      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_insert (0, stmt);
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "marked statement for vUSE update: %G", stmt);
+	    }
+	}
+
+      /* Save destination as we go, BB are visited in order and the last one
+	is where statements should be moved to.  */
+      if (!dest_bb)
+	dest_bb = gimple_bb (c);
+      else
+	{
+	  basic_block curr_bb = gimple_bb (c);
+	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+	    dest_bb = curr_bb;
+	}
+
+      /* Mark the statement as a condition.  */
+      STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
+    }
+
+  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
+  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
+  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
+  /* We don't allow outer -> inner loop transitions which should have been
+     trapped already during loop form analysis.  */
+  gcc_assert (dest_bb->loop_father == loop);
+
+  gcc_assert (dest_bb);
+  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+  if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
+    {
+      /* All uses shall be updated to that of the first load.  Entries are
+	 stored in reverse order.  */
+      tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).last ());
+      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "will update use: %T, mem_ref: %G", vuse, g);
+	}
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "recorded statements to be moved to BB %d\n",
+		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+
+  return opt_result::success ();
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -657,6 +1028,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return vect_analyze_early_break_dependences (loop_vinfo);
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fb8d999ee6bfaff551ac06ac2f3aea5354914659..0a90d2860b8d037b72fd41d4240804aa390467ea 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -11548,6 +11549,56 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("move_early_exit_stmts");
+
+  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
+    return;
+
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
+    {
+      /* Check to see if statement is still required for vect or has been
+	 elided.  */
+      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_info)
+	continue;
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  tree vuse
+    = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ());
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse to %T for load %G", vuse, p);
+      gimple_set_vuse (p, vuse);
+      update_stmt (p);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11697,6 +11748,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 96e4a6cffadebb43946c5cb7e9849c915da589bc..b3a09c0a804a38e17ef32b6ce13b98b077459fc7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -359,8 +359,8 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
-  if (is_ctrl_stmt (stmt_info->stmt)
-      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (dyn_cast <gcond *> (stmt))
     *relevant = vect_used_in_scope;
 
   /* changing memory.  */
@@ -13530,6 +13530,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+	case vect_condition_def:
+	  dump_printf (MSG_NOTE, "control flow\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index e4d7ab4567cef3c018b958f98eeff045d3477725..3c9478a3dc8750c71e0bf2a36a5b0815afc3fd94 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -66,6 +66,7 @@ enum vect_def_type {
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
+  vect_condition_def,
   vect_unknown_def_type
 };
 
@@ -888,6 +889,10 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -942,6 +947,20 @@ public:
   /* The controlling loop IV for the scalar loop being vectorized.  This IV
      controls the natural exits of the loop.  */
   edge scalar_loop_iv_exit;
+
+  /* Used to store the list of statements needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  These are stored in order that they need
+     to be moved.  */
+  auto_vec<gimple *> early_break_conflict;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<gimple*> early_break_vuses;
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -996,6 +1015,10 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies

[-- Attachment #2: rb17963.patch --]
[-- Type: application/octet-stream, Size: 24616 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
index be4a0c7426093059ce37a9f824defb7ae270094d..9a4e795f92b7a8577ac71827f5cb0bd15d88ebe1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
@@ -5,6 +5,7 @@
 /* { dg-additional-options "-Ofast" } */
 
 /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
 
 void abort ();
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
new file mode 100644
index 0000000000000000000000000000000000000000..a26011ef1ba5aa000692babc90d46621efc2f8b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#undef N
+#define N 32
+
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < 1024; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
new file mode 100644
index 0000000000000000000000000000000000000000..ddf504e0c8787ae33a0e98045c1c91f2b9f533a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+
+int x;
+__attribute__ ((noinline, noipa))
+void foo (int *a, int *b)
+{
+  int local_x = x;
+  for (int i = 0; i < 1024; ++i)
+    {
+      if (i + local_x == 13)
+        break;
+      a[i] = 2 * b[i];
+    }
+}
+
+int main ()
+{
+  int a[1024] = {0};
+  int b[1024] = {0};
+
+  for (int i = 0; i < 1024; i++)
+    b[i] = i;
+
+  x = -512;
+  foo (a, b);
+
+  if (a[524] != 1048)
+    abort ();
+
+  if (a[525] != 0)
+    abort ();
+
+  if (a[1023] != 0)
+    abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
new file mode 100644
index 0000000000000000000000000000000000000000..c38e394ad87863f0702d422cb58018b979c9fba6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
+void abort ();
+
+unsigned short sa[32];
+unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[32];
+unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+int main2 (int n)
+{
+  int i;
+  for (i = 0; i < n - 3; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
new file mode 100644
index 0000000000000000000000000000000000000000..227dcf1b7ab2ace149e692a6aab41cdd5d47d098
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   volatile complex double z = vect_b[i];
+   vect_b[i] = x + i + z;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..8e9e780e01fd349b30da1f0a762c0306ec257ff7 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -613,6 +613,377 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* Funcion vect_analyze_early_break_dependences.
+
+   Examime all the data references in the loop and make sure that if we have
+   mulitple exits that we are able to safely move stores such that they become
+   safe for vectorization.  The function also calculates the place where to move
+   the instructions to and computes what the new vUSE chain should be.
+
+   This works in tandem with the CFG that will be produced by
+   slpeel_tree_duplicate_loop_to_edge_cfg later on.
+
+   This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.  */
+
+static opt_result
+vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
+
+  /* - CHAIN: Currently detected sequence of instructions that need to be moved
+	      if we are to vectorize this early break.
+     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable
+	      from one or more cond conditions.  If this set overlaps with CHAIN
+	      then FIXED takes precedence.  This deals with non-single use
+	      cases.
+     - BASES: List of all load data references found during traversal.  */
+  hash_set<tree> chain, fixed;
+  auto_vec<data_reference *> bases;
+  basic_block dest_bb = NULL;
+
+  hash_set <gimple *> visited;
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  class loop *loop_nest = loop_outer (loop);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "loop contains multiple exits, analyzing"
+		     " statement dependencies.\n");
+
+  for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	continue;
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (c);
+
+      /* First determine the list of statements that we can't move because they
+	 are required for the early break vectorization itself.  */
+      auto_vec <gimple *> workset;
+      workset.safe_push (c);
+      do {
+	gimple *op = workset.pop ();
+	if (visited.add (op)
+	    || is_a <gphi *> (op)
+	    || is_gimple_debug (op))
+	  continue;
+
+	if (gimple_has_lhs (op))
+	  fixed.add (gimple_get_lhs (op));
+
+	stmt_vec_info def_info = loop_vinfo->lookup_stmt (op);
+	if (!def_info)
+	  continue;
+
+	gimple *def_stmt = STMT_VINFO_STMT (def_info);
+	FOR_EACH_SSA_USE_OPERAND (use_p, def_stmt, iter, SSA_OP_USE)
+	  {
+	    tree use = USE_FROM_PTR (use_p);
+	    if (TREE_CODE (use) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (use))
+	      continue;
+
+	    if (gimple *g = SSA_NAME_DEF_STMT (use))
+	      workset.safe_push (g);
+	  }
+      } while (!workset.is_empty ());
+
+      /* Now analyze all the remaining statements and try to determine which
+	 instructions are allowed/needed to be moved.  */
+      while (!gsi_end_p (gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  gsi_prev (&gsi);
+	  if (!gimple_has_ops (stmt)
+	      || is_gimple_debug (stmt))
+	    continue;
+
+	  tree dest = NULL_TREE;
+	  /* Try to find the SSA_NAME being defined.  For Statements with an LHS
+	     use the LHS, if not, assume that the first argument of a call is
+	     the value being defined.  e.g. MASKED_LOAD etc.  */
+	  if (gimple_has_lhs (stmt))
+	    dest = gimple_get_lhs (stmt);
+	  else if (const gcall *call = dyn_cast <const gcall *> (stmt))
+	    dest = gimple_arg (call, 0);
+
+	  bool move = chain.contains (dest);
+
+	  stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+	  if (!stmt_vinfo)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks not supported. Unknown"
+				 " statement: %G", stmt);
+	      return opt_result::failure_at (c,
+				       "can't safely apply code motion to "
+				       "dependencies of %G to vectorize "
+				       "the early exit.\n", c);
+	    }
+
+	  auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+	  if (dr_ref)
+	    {
+	      /* We currently only support statically allocated objects due to
+		 not having first-faulting loads support or peeling for
+		 alignment support.  Compute the size of the referenced object
+		 (it could be dynamically allocated).  */
+	      tree obj = DR_BASE_ADDRESS (dr_ref);
+	      if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				     "early breaks only supported on statically"
+				     " allocated objects.\n");
+		  return opt_result::failure_at (c,
+				     "can't safely apply code motion to "
+				     "dependencies of %G to vectorize "
+				     "the early exit.\n", c);
+		}
+
+	      tree refop = TREE_OPERAND (obj, 0);
+	      tree refbase = get_base_address (refop);
+	      if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+		  || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+		{
+		  if (dump_enabled_p ())
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "early breaks only supported on"
+				       " statically allocated objects.\n");
+		  return opt_result::failure_at (c,
+				       "can't safely apply code motion to "
+				       "dependencies of %G to vectorize "
+				       "the early exit.\n", c);
+		}
+
+	      /* Check if vector accesses to the object will be within
+		 bounds.  */
+	      tree stype = TREE_TYPE (DECL_SIZE (refbase));
+	      tree access = fold_build2 (PLUS_EXPR, stype, DR_OFFSET (dr_ref),
+					 DR_INIT (dr_ref));
+	      tree final_adj
+		= fold_build2 (MULT_EXPR, stype, LOOP_VINFO_NITERS (loop_vinfo),
+			       DR_STEP (dr_ref));
+
+	      /* must be a constant or assume loop will be versioned or niters
+		 bounded by VF so accesses are within range.  */
+	      if (TREE_CODE (access) == INTEGER_CST
+		  && TREE_CODE (final_adj) == INTEGER_CST)
+		{
+		  access = fold_build2 (PLUS_EXPR, stype, access, final_adj);
+		  wide_int size = wi::to_wide (DECL_SIZE (refbase));
+		  wide_int off = wi::to_wide (access);
+		  if (wi::ge_p (off, size, UNSIGNED))
+		    {
+		      if (dump_enabled_p ())
+			dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+					 "early breaks not supported:"
+					 " vectorization would read beyond size"
+					 " of object %T.\n", obj);
+		      return opt_result::failure_at (c,
+					 "can't safely apply code motion to "
+					 "dependencies of %G to vectorize "
+					 "the early exit.\n", c);
+		    }
+		}
+
+	      if (DR_IS_READ (dr_ref))
+		bases.safe_push (dr_ref);
+	      else if (DR_IS_WRITE (dr_ref))
+		{
+		  /* We are moving writes down in the CFG.  To be sure that this
+		     is valid after vectorization we have to check all the loads
+		     we are hoisting the stores past to see if any of them may
+		     alias or are the same object.
+
+		     Same objects will not be an issue because unless the store
+		     is marked volatile the value can be forwarded.  If the
+		     store is marked volatile we don't vectorize the loop
+		     anyway.
+
+		     That leaves the check for aliasing.  We don't really need
+		     to care about the stores aliasing with each other since the
+		     stores are moved in order so the effects are still observed
+		     correctly.  This leaves the check for WAR dependencies
+		     which we would be introducing here if the DR can alias.
+		     The check is quadratic in loads/stores but I have not found
+		     a better API to do this.  I believe all loads and stores
+		     must be checked.  We also must check them when we
+		     encountered the store, since we don't care about loads past
+		     the store.  */
+
+		  for (auto dr_read : bases)
+		    if (dr_may_alias_p (dr_read, dr_ref, loop_nest))
+		      {
+			if (dump_enabled_p ())
+			    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+					     vect_location,
+					     "early breaks not supported: "
+					     "overlapping loads and stores "
+					     "found before the break "
+					     "statement.\n");
+
+			return opt_result::failure_at (stmt,
+			     "can't safely apply code motion to dependencies"
+			     " to vectorize the early exit. %G may alias with"
+			     " %G\n", stmt, dr_read->stmt);
+		      }
+
+		  /* Any writes starts a new chain. */
+		  move = true;
+		}
+	    }
+
+	  /* If a statement is live and escapes the loop through usage in the
+	     loop epilogue then we can't move it since we need to maintain its
+	     reachability through all exits.  */
+	  bool skip = false;
+	  if (STMT_VINFO_LIVE_P (stmt_vinfo)
+	      && !(dr_ref && DR_IS_WRITE (dr_ref)))
+	    {
+	      imm_use_iterator imm_iter;
+	      use_operand_p use_p;
+	      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
+		{
+		  basic_block bb = gimple_bb (USE_STMT (use_p));
+		  skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
+		  if (skip)
+		    break;
+		}
+	    }
+
+	  /* If we found the defining statement of a something that's part of
+	     the chain then expand the chain with the new SSA_VARs being
+	     used.  */
+	  if (!skip && move)
+	    {
+	      use_operand_p use_p;
+	      ssa_op_iter iter;
+	      FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+		{
+		  tree op = USE_FROM_PTR (use_p);
+		  gcc_assert (TREE_CODE (op) == SSA_NAME);
+		  if (fixed.contains (dest))
+		    {
+		      move = false;
+		      fixed.add (op);
+		    }
+		  else
+		    chain.add (op);
+		}
+
+	      if (dump_enabled_p ())
+		{
+		  if (move)
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "found chain %G", stmt);
+		  else
+		    dump_printf_loc (MSG_NOTE, vect_location,
+				     "ignored chain %G, not single use", stmt);
+		}
+	    }
+
+	  if (move)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "==> recording stmt %G", stmt);
+
+	      /* If we've moved a VDEF, extract the defining MEM and update
+		 usages of it.   */
+	      tree vdef;
+	      /* This statement is to be moved.  */
+	      if ((vdef = gimple_vdef (stmt)))
+		LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (
+		    stmt);
+	    }
+
+	  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
+	    {
+	      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_insert (0, stmt);
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "marked statement for vUSE update: %G", stmt);
+	    }
+	}
+
+      /* Save destination as we go, BB are visited in order and the last one
+	is where statements should be moved to.  */
+      if (!dest_bb)
+	dest_bb = gimple_bb (c);
+      else
+	{
+	  basic_block curr_bb = gimple_bb (c);
+	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+	    dest_bb = curr_bb;
+	}
+
+      /* Mark the statement as a condition.  */
+      STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
+    }
+
+  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
+  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
+  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
+  /* We don't allow outer -> inner loop transitions which should have been
+     trapped already during loop form analysis.  */
+  gcc_assert (dest_bb->loop_father == loop);
+
+  gcc_assert (dest_bb);
+  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+  if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
+    {
+      /* All uses shall be updated to that of the first load.  Entries are
+	 stored in reverse order.  */
+      tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).last ());
+      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "will update use: %T, mem_ref: %G", vuse, g);
+	}
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "recorded statements to be moved to BB %d\n",
+		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+
+  return opt_result::success ();
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -657,6 +1028,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return vect_analyze_early_break_dependences (loop_vinfo);
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fb8d999ee6bfaff551ac06ac2f3aea5354914659..0a90d2860b8d037b72fd41d4240804aa390467ea 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -11548,6 +11549,56 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("move_early_exit_stmts");
+
+  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
+    return;
+
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
+    {
+      /* Check to see if statement is still required for vect or has been
+	 elided.  */
+      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_info)
+	continue;
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  tree vuse
+    = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ());
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse to %T for load %G", vuse, p);
+      gimple_set_vuse (p, vuse);
+      update_stmt (p);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11697,6 +11748,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 96e4a6cffadebb43946c5cb7e9849c915da589bc..b3a09c0a804a38e17ef32b6ce13b98b077459fc7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -359,8 +359,8 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
-  if (is_ctrl_stmt (stmt_info->stmt)
-      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (dyn_cast <gcond *> (stmt))
     *relevant = vect_used_in_scope;
 
   /* changing memory.  */
@@ -13530,6 +13530,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+	case vect_condition_def:
+	  dump_printf (MSG_NOTE, "control flow\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index e4d7ab4567cef3c018b958f98eeff045d3477725..3c9478a3dc8750c71e0bf2a36a5b0815afc3fd94 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -66,6 +66,7 @@ enum vect_def_type {
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
+  vect_condition_def,
   vect_unknown_def_type
 };
 
@@ -888,6 +889,10 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -942,6 +947,20 @@ public:
   /* The controlling loop IV for the scalar loop being vectorized.  This IV
      controls the natural exits of the loop.  */
   edge scalar_loop_iv_exit;
+
+  /* Used to store the list of statements needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  These are stored in order that they need
+     to be moved.  */
+  auto_vec<gimple *> early_break_conflict;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<gimple*> early_break_vuses;
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -996,6 +1015,10 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks
  2023-12-19 10:11         ` Tamar Christina
@ 2023-12-19 14:05           ` Richard Biener
  2023-12-20 10:51             ` Tamar Christina
  0 siblings, 1 reply; 200+ messages in thread
From: Richard Biener @ 2023-12-19 14:05 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Tue, 19 Dec 2023, Tamar Christina wrote:

> > > > > +      /* Save destination as we go, BB are visited in order and the last one
> > > > > +	is where statements should be moved to.  */
> > > > > +      if (!dest_bb)
> > > > > +	dest_bb = gimple_bb (c);
> > > > > +      else
> > > > > +	{
> > > > > +	  basic_block curr_bb = gimple_bb (c);
> > > > > +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> > > > > +	    dest_bb = curr_bb;
> > > > > +	}
> > > > > +    }
> > > > > +
> > > > > +  dest_bb = FALLTHRU_EDGE (dest_bb)->dest;
> > > >
> > > > no edge is the fallthru edge out of a condition, so this always selects
> > > > EDGE_SUCC (dest_bb, 1) which cannot be correct (well, guess you're lucky).  I
> > > > think you instead want
> > > >
> > > >   dest_bb = EDGE_SUCC (dest_bb, 0)->dest->loop_father == dest_bb-
> > > > >loop_father ? EDGE_SUCC (dest_bb, 0)->dest : EDGE_SUCC (dest_bb, 1)-
> > > > >dest;
> > > >
> > > > more nicely written, of course.
> > > >
> > > > > +  gcc_assert (dest_bb);
> > > > > +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> > > >
> > > > Sorting the vector of early breaks as we gather them might be nicer than this -
> > > > you'd then simply use the first or last.
> > > >
> 
> I opted not to do the sorting since I don't really need a full order between the exits here
> And only need to find the last one.   A sort would be more expensive than the linear
> Check here.  But I also couldn't think of a good sort key since all you have is dominate yes/no.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): New.
> 	(vect_analyze_data_ref_dependences): Use them.
> 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> 	early_breaks.
> 	(move_early_exit_stmts): New.
> 	(vect_transform_loop): use it/
> 	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
> 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> 	(class _loop_vec_info): Add early_breaks, early_break_conflict,
> 	early_break_vuses.
> 	(LOOP_VINFO_EARLY_BREAKS): New.
> 	(LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New.
> 	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
> 	(LOOP_VINFO_EARLY_BRK_VUSES): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-early-break_57.c: New test.
> 	* gcc.dg/vect/vect-early-break_79.c: New test.
> 	* gcc.dg/vect/vect-early-break_80.c: New test.
> 	* gcc.dg/vect/vect-early-break_81.c: New test.
> 	* gcc.dg/vect/vect-early-break_83.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> index be4a0c7426093059ce37a9f824defb7ae270094d..9a4e795f92b7a8577ac71827f5cb0bd15d88ebe1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> @@ -5,6 +5,7 @@
>  /* { dg-additional-options "-Ofast" } */
>  
>  /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
>  
>  void abort ();
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a26011ef1ba5aa000692babc90d46621efc2f8b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +
> +#undef N
> +#define N 32
> +
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < 1024; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ddf504e0c8787ae33a0e98045c1c91f2b9f533a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
> @@ -0,0 +1,43 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +extern void abort ();
> +
> +int x;
> +__attribute__ ((noinline, noipa))
> +void foo (int *a, int *b)
> +{
> +  int local_x = x;
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      if (i + local_x == 13)
> +        break;
> +      a[i] = 2 * b[i];
> +    }
> +}
> +
> +int main ()
> +{
> +  int a[1024] = {0};
> +  int b[1024] = {0};
> +
> +  for (int i = 0; i < 1024; i++)
> +    b[i] = i;
> +
> +  x = -512;
> +  foo (a, b);
> +
> +  if (a[524] != 1048)
> +    abort ();
> +
> +  if (a[525] != 0)
> +    abort ();
> +
> +  if (a[1023] != 0)
> +    abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c38e394ad87863f0702d422cb58018b979c9fba6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
> +void abort ();
> +
> +unsigned short sa[32];
> +unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned int ia[32];
> +unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +
> +int main2 (int n)
> +{
> +  int i;
> +  for (i = 0; i < n - 3; i++)
> +    {
> +      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> +        abort ();
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..227dcf1b7ab2ace149e692a6aab41cdd5d47d098
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +
> +#include <complex.h>
> +
> +#define N 1024
> +complex double vect_a[N];
> +complex double vect_b[N];
> +  
> +complex double test4(complex double x)
> +{
> + complex double ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   volatile complex double z = vect_b[i];
> +   vect_b[i] = x + i + z;
> +   if (vect_a[i] == x)
> +     return i;
> +   vect_a[i] += x * vect_b[i];
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..8e9e780e01fd349b30da1f0a762c0306ec257ff7 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -613,6 +613,377 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
>    return opt_result::success ();
>  }
>  
> +/* Funcion vect_analyze_early_break_dependences.
> +
> +   Examime all the data references in the loop and make sure that if we have
> +   mulitple exits that we are able to safely move stores such that they become
> +   safe for vectorization.  The function also calculates the place where to move
> +   the instructions to and computes what the new vUSE chain should be.
> +
> +   This works in tandem with the CFG that will be produced by
> +   slpeel_tree_duplicate_loop_to_edge_cfg later on.
> +
> +   This function tries to validate whether an early break vectorization
> +   is possible for the current instruction sequence. Returns True i
> +   possible, otherwise False.
> +
> +   Requirements:
> +     - Any memory access must be to a fixed size buffer.
> +     - There must not be any loads and stores to the same object.
> +     - Multiple loads are allowed as long as they don't alias.
> +
> +   NOTE:
> +     This implemementation is very conservative. Any overlappig loads/stores
> +     that take place before the early break statement gets rejected aside from
> +     WAR dependencies.
> +
> +     i.e.:
> +
> +	a[i] = 8
> +	c = a[i]
> +	if (b[i])
> +	  ...
> +
> +	is not allowed, but
> +
> +	c = a[i]
> +	a[i] = 8
> +	if (b[i])
> +	  ...
> +
> +	is which is the common case.  */
> +
> +static opt_result
> +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> +{
> +  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
> +
> +  /* - CHAIN: Currently detected sequence of instructions that need to be moved
> +	      if we are to vectorize this early break.
> +     - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable
> +	      from one or more cond conditions.  If this set overlaps with CHAIN
> +	      then FIXED takes precedence.  This deals with non-single use
> +	      cases.
> +     - BASES: List of all load data references found during traversal.  */
> +  hash_set<tree> chain, fixed;
> +  auto_vec<data_reference *> bases;
> +  basic_block dest_bb = NULL;
> +
> +  hash_set <gimple *> visited;
> +  use_operand_p use_p;
> +  ssa_op_iter iter;
> +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  class loop *loop_nest = loop_outer (loop);
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "loop contains multiple exits, analyzing"
> +		     " statement dependencies.\n");
> +
> +  for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> +    {
> +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> +	continue;
> +
> +      gimple_stmt_iterator gsi = gsi_for_stmt (c);
> +
> +      /* First determine the list of statements that we can't move because they
> +	 are required for the early break vectorization itself.  */
> +      auto_vec <gimple *> workset;
> +      workset.safe_push (c);
> +      do {
> +	gimple *op = workset.pop ();
> +	if (visited.add (op)
> +	    || is_a <gphi *> (op)
> +	    || is_gimple_debug (op))
> +	  continue;
> +
> +	if (gimple_has_lhs (op))
> +	  fixed.add (gimple_get_lhs (op));

so this adds the LHS of stmts not in the loop - wouldn't it be
easier to add the operand itself ... (X)

> +	stmt_vec_info def_info = loop_vinfo->lookup_stmt (op);
> +	if (!def_info)
> +	  continue;
> +
> +	gimple *def_stmt = STMT_VINFO_STMT (def_info);

that's actually 'op', no?

> +	FOR_EACH_SSA_USE_OPERAND (use_p, def_stmt, iter, SSA_OP_USE)
> +	  {
> +	    tree use = USE_FROM_PTR (use_p);
> +	    if (TREE_CODE (use) != SSA_NAME || SSA_NAME_IS_DEFAULT_DEF (use))
> +	      continue;

(X) ... here?  Or if we only care about in-loop defs add that
after the !def_info check.

> +	    if (gimple *g = SSA_NAME_DEF_STMT (use))
> +	      workset.safe_push (g);
> +	  }
> +      } while (!workset.is_empty ());
> +
> +      /* Now analyze all the remaining statements and try to determine which
> +	 instructions are allowed/needed to be moved.  */
> +      while (!gsi_end_p (gsi))
> +	{
> +	  gimple *stmt = gsi_stmt (gsi);
> +	  gsi_prev (&gsi);
> +	  if (!gimple_has_ops (stmt)
> +	      || is_gimple_debug (stmt))
> +	    continue;
> +
> +	  tree dest = NULL_TREE;
> +	  /* Try to find the SSA_NAME being defined.  For Statements with an LHS
> +	     use the LHS, if not, assume that the first argument of a call is
> +	     the value being defined.  e.g. MASKED_LOAD etc.  */
> +	  if (gimple_has_lhs (stmt))
> +	    dest = gimple_get_lhs (stmt);
> +	  else if (const gcall *call = dyn_cast <const gcall *> (stmt))
> +	    dest = gimple_arg (call, 0);

FOR_EACH_SSA_DEF_OPERAND (...)

(asms can have multiple defs)

> +	  bool move = chain.contains (dest);

move this down to where used first

> +
> +	  stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> +	  if (!stmt_vinfo)
> +	    {

I wonder when this hits?

> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "early breaks not supported. Unknown"
> +				 " statement: %G", stmt);
> +	      return opt_result::failure_at (c,
> +				       "can't safely apply code motion to "
> +				       "dependencies of %G to vectorize "
> +				       "the early exit.\n", c);
> +	    }
> +
> +	  auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> +	  if (dr_ref)
> +	    {
> +	      /* We currently only support statically allocated objects due to
> +		 not having first-faulting loads support or peeling for
> +		 alignment support.  Compute the size of the referenced object
> +		 (it could be dynamically allocated).  */
> +	      tree obj = DR_BASE_ADDRESS (dr_ref);
> +	      if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> +		{
> +		  if (dump_enabled_p ())
> +		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				     "early breaks only supported on statically"
> +				     " allocated objects.\n");
> +		  return opt_result::failure_at (c,
> +				     "can't safely apply code motion to "
> +				     "dependencies of %G to vectorize "
> +				     "the early exit.\n", c);
> +		}
> +
> +	      tree refop = TREE_OPERAND (obj, 0);
> +	      tree refbase = get_base_address (refop);
> +	      if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> +		  || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> +		{
> +		  if (dump_enabled_p ())
> +		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				       "early breaks only supported on"
> +				       " statically allocated objects.\n");
> +		  return opt_result::failure_at (c,
> +				       "can't safely apply code motion to "
> +				       "dependencies of %G to vectorize "
> +				       "the early exit.\n", c);
> +		}
> +
> +	      /* Check if vector accesses to the object will be within
> +		 bounds.  */
> +	      tree stype = TREE_TYPE (DECL_SIZE (refbase));
> +	      tree access = fold_build2 (PLUS_EXPR, stype, DR_OFFSET (dr_ref),
> +					 DR_INIT (dr_ref));
> +	      tree final_adj
> +		= fold_build2 (MULT_EXPR, stype, LOOP_VINFO_NITERS (loop_vinfo),
> +			       DR_STEP (dr_ref));
> +
> +	      /* must be a constant or assume loop will be versioned or niters
> +		 bounded by VF so accesses are within range.  */
> +	      if (TREE_CODE (access) == INTEGER_CST
> +		  && TREE_CODE (final_adj) == INTEGER_CST)
> +		{
> +		  access = fold_build2 (PLUS_EXPR, stype, access, final_adj);
> +		  wide_int size = wi::to_wide (DECL_SIZE (refbase));
> +		  wide_int off = wi::to_wide (access);
> +		  if (wi::ge_p (off, size, UNSIGNED))
> +		    {
> +		      if (dump_enabled_p ())
> +			dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +					 "early breaks not supported:"
> +					 " vectorization would read beyond size"
> +					 " of object %T.\n", obj);
> +		      return opt_result::failure_at (c,
> +					 "can't safely apply code motion to "
> +					 "dependencies of %G to vectorize "
> +					 "the early exit.\n", c);
> +		    }
> +		}

missing

       else
         return opt_result::failure_at (....);

because you couldn't prove the access is not out-of-bounds.

I think you want to do this totally different, looking at DR_REF
instead and use code like if-conversions ref_within_array_bound
(you possibly can use that literally here).


> +
> +	      if (DR_IS_READ (dr_ref))
> +		bases.safe_push (dr_ref);
> +	      else if (DR_IS_WRITE (dr_ref))
> +		{
> +		  /* We are moving writes down in the CFG.  To be sure that this
> +		     is valid after vectorization we have to check all the loads
> +		     we are hoisting the stores past to see if any of them may
> +		     alias or are the same object.
> +
> +		     Same objects will not be an issue because unless the store
> +		     is marked volatile the value can be forwarded.  If the
> +		     store is marked volatile we don't vectorize the loop
> +		     anyway.
> +
> +		     That leaves the check for aliasing.  We don't really need
> +		     to care about the stores aliasing with each other since the
> +		     stores are moved in order so the effects are still observed
> +		     correctly.  This leaves the check for WAR dependencies
> +		     which we would be introducing here if the DR can alias.
> +		     The check is quadratic in loads/stores but I have not found
> +		     a better API to do this.  I believe all loads and stores
> +		     must be checked.  We also must check them when we
> +		     encountered the store, since we don't care about loads past
> +		     the store.  */
> +
> +		  for (auto dr_read : bases)
> +		    if (dr_may_alias_p (dr_read, dr_ref, loop_nest))

I think you need to swap dr_read and dr_ref operands, since you
are walking stmts backwards and thus all reads from 'bases' are
after the write.

> +		      {
> +			if (dump_enabled_p ())
> +			    dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +					     vect_location,
> +					     "early breaks not supported: "
> +					     "overlapping loads and stores "
> +					     "found before the break "
> +					     "statement.\n");
> +
> +			return opt_result::failure_at (stmt,
> +			     "can't safely apply code motion to dependencies"
> +			     " to vectorize the early exit. %G may alias with"
> +			     " %G\n", stmt, dr_read->stmt);
> +		      }
> +
> +		  /* Any writes starts a new chain. */
> +		  move = true;
> +		}
> +	    }
> +
> +	  /* If a statement is live and escapes the loop through usage in the
> +	     loop epilogue then we can't move it since we need to maintain its
> +	     reachability through all exits.  */
> +	  bool skip = false;
> +	  if (STMT_VINFO_LIVE_P (stmt_vinfo)
> +	      && !(dr_ref && DR_IS_WRITE (dr_ref)))

You should be able to assert this?

> +	    {
> +	      imm_use_iterator imm_iter;
> +	      use_operand_p use_p;
> +	      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest)
> +		{
> +		  basic_block bb = gimple_bb (USE_STMT (use_p));
> +		  skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +		  if (skip)
> +		    break;
> +		}
> +	    }
> +
> +	  /* If we found the defining statement of a something that's part of
> +	     the chain then expand the chain with the new SSA_VARs being
> +	     used.  */
> +	  if (!skip && move)
> +	    {
> +	      use_operand_p use_p;
> +	      ssa_op_iter iter;
> +	      FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
> +		{
> +		  tree op = USE_FROM_PTR (use_p);
> +		  gcc_assert (TREE_CODE (op) == SSA_NAME);
> +		  if (fixed.contains (dest))
> +		    {
> +		      move = false;
> +		      fixed.add (op);

This looks odd.  When the LHS (dest) of 'stmt' is fixed, any of its
operands should already be fixed.  And if you perform special handling
of this here with respect to 'chain' then this becomes dependent on
the order of processing of exits.

IIRC I suggested you first fully populate 'fixed' based on _all_
exits and then in a second loop produce 'chain'?

> +		    }
> +		  else
> +		    chain.add (op);
> +		}
> +
> +	      if (dump_enabled_p ())
> +		{
> +		  if (move)
> +		    dump_printf_loc (MSG_NOTE, vect_location,
> +				     "found chain %G", stmt);
> +		  else
> +		    dump_printf_loc (MSG_NOTE, vect_location,
> +				     "ignored chain %G, not single use", stmt);
> +		}
> +	    }
> +
> +	  if (move)
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				 "==> recording stmt %G", stmt);
> +
> +	      /* If we've moved a VDEF, extract the defining MEM and update
> +		 usages of it.   */
> +	      tree vdef;
> +	      /* This statement is to be moved.  */
> +	      if ((vdef = gimple_vdef (stmt)))
> +		LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (
> +		    stmt);

I'm also unsure why you need 'chain' at all given you have the vector
of stores to be moved?

Thanks,
Richard.

> +	    }
> +
> +	  if (gimple_vuse (stmt) && !gimple_vdef (stmt))
> +	    {
> +	      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_insert (0, stmt);
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				 "marked statement for vUSE update: %G", stmt);
> +	    }
> +	}
> +
> +      /* Save destination as we go, BB are visited in order and the last one
> +	is where statements should be moved to.  */
> +      if (!dest_bb)
> +	dest_bb = gimple_bb (c);
> +      else
> +	{
> +	  basic_block curr_bb = gimple_bb (c);
> +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> +	    dest_bb = curr_bb;
> +	}
> +
> +      /* Mark the statement as a condition.  */
> +      STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
> +    }
> +
> +  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
> +  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
> +  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
> +  /* We don't allow outer -> inner loop transitions which should have been
> +     trapped already during loop form analysis.  */
> +  gcc_assert (dest_bb->loop_father == loop);
> +
> +  gcc_assert (dest_bb);
> +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> +
> +  if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
> +    {
> +      /* All uses shall be updated to that of the first load.  Entries are
> +	 stored in reverse order.  */
> +      tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).last ());
> +      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +	{
> +	  if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "will update use: %T, mem_ref: %G", vuse, g);
> +	}
> +    }
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "recorded statements to be moved to BB %d\n",
> +		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
> +
> +  return opt_result::success ();
> +}
> +
>  /* Function vect_analyze_data_ref_dependences.
>  
>     Examine all the data references in the loop, and make sure there do not
> @@ -657,6 +1028,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
>  	  return res;
>        }
>  
> +  /* If we have early break statements in the loop, check to see if they
> +     are of a form we can vectorizer.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    return vect_analyze_early_break_dependences (loop_vinfo);
> +
>    return opt_result::success ();
>  }
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index fb8d999ee6bfaff551ac06ac2f3aea5354914659..0a90d2860b8d037b72fd41d4240804aa390467ea 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>      partial_load_store_bias (0),
>      peeling_for_gaps (false),
>      peeling_for_niter (false),
> +    early_breaks (false),
>      no_data_dependencies (false),
>      has_mask_store (false),
>      scalar_loop_scaling (profile_probability::uninitialized ()),
> @@ -11548,6 +11549,56 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
>    epilogue_vinfo->shared->save_datarefs ();
>  }
>  
> +/*  When vectorizing early break statements instructions that happen before
> +    the early break in the current BB need to be moved to after the early
> +    break.  This function deals with that and assumes that any validity
> +    checks has already been performed.
> +
> +    While moving the instructions if it encounters a VUSE or VDEF it then
> +    corrects the VUSES as it moves the statements along.  GDEST is the location
> +    in which to insert the new statements.  */
> +
> +static void
> +move_early_exit_stmts (loop_vec_info loop_vinfo)
> +{
> +  DUMP_VECT_SCOPE ("move_early_exit_stmts");
> +
> +  if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ())
> +    return;
> +
> +  /* Move all stmts that need moving.  */
> +  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
> +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> +
> +  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo))
> +    {
> +      /* Check to see if statement is still required for vect or has been
> +	 elided.  */
> +      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
> +      if (!stmt_info)
> +	continue;
> +
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
> +
> +      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
> +      gsi_move_before (&stmt_gsi, &dest_gsi);
> +      gsi_prev (&dest_gsi);
> +    }
> +
> +  /* Update all the stmts with their new reaching VUSES.  */
> +  tree vuse
> +    = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ());
> +  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "updating vuse to %T for load %G", vuse, p);
> +      gimple_set_vuse (p, vuse);
> +      update_stmt (p);
> +    }
> +}
> +
>  /* Function vect_transform_loop.
>  
>     The analysis phase has determined that the loop is vectorizable.
> @@ -11697,6 +11748,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>        vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
>      }
>  
> +  /* Handle any code motion that we need to for early-break vectorization after
> +     we've done peeling but just before we start vectorizing.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    move_early_exit_stmts (loop_vinfo);
> +
>    /* FORNOW: the vectorizer supports only loops which body consist
>       of one basic block (header + empty latch). When the vectorizer will
>       support more involved loop forms, the order by which the BBs are
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 96e4a6cffadebb43946c5cb7e9849c915da589bc..b3a09c0a804a38e17ef32b6ce13b98b077459fc7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -359,8 +359,8 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>    *live_p = false;
>  
>    /* cond stmt other than loop exit cond.  */
> -  if (is_ctrl_stmt (stmt_info->stmt)
> -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  if (dyn_cast <gcond *> (stmt))
>      *relevant = vect_used_in_scope;
>  
>    /* changing memory.  */
> @@ -13530,6 +13530,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
>  	case vect_first_order_recurrence:
>  	  dump_printf (MSG_NOTE, "first order recurrence\n");
>  	  break;
> +	case vect_condition_def:
> +	  dump_printf (MSG_NOTE, "control flow\n");
> +	  break;
>  	case vect_unknown_def_type:
>  	  dump_printf (MSG_NOTE, "unknown\n");
>  	  break;
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index e4d7ab4567cef3c018b958f98eeff045d3477725..3c9478a3dc8750c71e0bf2a36a5b0815afc3fd94 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -66,6 +66,7 @@ enum vect_def_type {
>    vect_double_reduction_def,
>    vect_nested_cycle,
>    vect_first_order_recurrence,
> +  vect_condition_def,
>    vect_unknown_def_type
>  };
>  
> @@ -888,6 +889,10 @@ public:
>       we need to peel off iterations at the end to form an epilogue loop.  */
>    bool peeling_for_niter;
>  
> +  /* When the loop has early breaks that we can vectorize we need to peel
> +     the loop for the break finding loop.  */
> +  bool early_breaks;
> +
>    /* List of loop additional IV conditionals found in the loop.  */
>    auto_vec<gcond *> conds;
>  
> @@ -942,6 +947,20 @@ public:
>    /* The controlling loop IV for the scalar loop being vectorized.  This IV
>       controls the natural exits of the loop.  */
>    edge scalar_loop_iv_exit;
> +
> +  /* Used to store the list of statements needing to be moved if doing early
> +     break vectorization as they would violate the scalar loop semantics if
> +     vectorized in their current location.  These are stored in order that they need
> +     to be moved.  */
> +  auto_vec<gimple *> early_break_conflict;
> +
> +  /* The final basic block where to move statements to.  In the case of
> +     multiple exits this could be pretty far away.  */
> +  basic_block early_break_dest_bb;
> +
> +  /* Statements whose VUSES need updating if early break vectorization is to
> +     happen.  */
> +  auto_vec<gimple*> early_break_vuses;
>  } *loop_vec_info;
>  
>  /* Access Functions.  */
> @@ -996,6 +1015,10 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict
> +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> +#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
>  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
>  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks
  2023-12-19 14:05           ` Richard Biener
@ 2023-12-20 10:51             ` Tamar Christina
  2023-12-20 12:24               ` Richard Biener
  0 siblings, 1 reply; 200+ messages in thread
From: Tamar Christina @ 2023-12-20 10:51 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, jlaw

[-- Attachment #1: Type: text/plain, Size: 23848 bytes --]

> > +	      /* If we've moved a VDEF, extract the defining MEM and update
> > +		 usages of it.   */
> > +	      tree vdef;
> > +	      /* This statement is to be moved.  */
> > +	      if ((vdef = gimple_vdef (stmt)))
> > +		LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> (loop_vinfo).safe_push (
> > +		    stmt);
> 
> I'm also unsure why you need 'chain' at all given you have the vector
> of stores to be moved?
> 

Yeah, so originally I wanted to move statements other than stores.  While stores
are needed for correctness, the other statements would be so we didn't extend the
live range too much for intermediate values.

This proved difficult but eventually I got it to work, but as you saw it was meh code.
Instead I guess the better approach is to teach sched1 in GCC 15 to schedule across
branches in loops.

With that in mind, I changed it to move only stores.  Since stores never produce a
and are sinks, I don't really need fixed nor chain.

So here's a much cleaned up patch.

Bootstrapped Regtested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-if-conv.cc (ref_within_array_bound): Expose.
	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): New.
	(vect_analyze_data_ref_dependences): Use them.
	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
	early_breaks.
	(move_early_exit_stmts): New.
	(vect_transform_loop): use it/
	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
	(ref_within_array_bound): New.
	(class _loop_vec_info): Add early_breaks, early_break_conflict,
	early_break_vuses.
	(LOOP_VINFO_EARLY_BREAKS): New.
	(LOOP_VINFO_EARLY_BRK_STORES): New.
	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
	(LOOP_VINFO_EARLY_BRK_VUSES): New.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/vect-early-break_57.c: Update.
	* gcc.dg/vect/vect-early-break_79.c: New test.
	* gcc.dg/vect/vect-early-break_80.c: New test.
	* gcc.dg/vect/vect-early-break_81.c: New test.
	* gcc.dg/vect/vect-early-break_83.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
index be4a0c7426093059ce37a9f824defb7ae270094d..9a4e795f92b7a8577ac71827f5cb0bd15d88ebe1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
@@ -5,6 +5,7 @@
 /* { dg-additional-options "-Ofast" } */
 
 /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
 
 void abort ();
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
new file mode 100644
index 0000000000000000000000000000000000000000..a26011ef1ba5aa000692babc90d46621efc2f8b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#undef N
+#define N 32
+
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < 1024; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
new file mode 100644
index 0000000000000000000000000000000000000000..ddf504e0c8787ae33a0e98045c1c91f2b9f533a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+
+int x;
+__attribute__ ((noinline, noipa))
+void foo (int *a, int *b)
+{
+  int local_x = x;
+  for (int i = 0; i < 1024; ++i)
+    {
+      if (i + local_x == 13)
+        break;
+      a[i] = 2 * b[i];
+    }
+}
+
+int main ()
+{
+  int a[1024] = {0};
+  int b[1024] = {0};
+
+  for (int i = 0; i < 1024; i++)
+    b[i] = i;
+
+  x = -512;
+  foo (a, b);
+
+  if (a[524] != 1048)
+    abort ();
+
+  if (a[525] != 0)
+    abort ();
+
+  if (a[1023] != 0)
+    abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
new file mode 100644
index 0000000000000000000000000000000000000000..c38e394ad87863f0702d422cb58018b979c9fba6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
+void abort ();
+
+unsigned short sa[32];
+unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[32];
+unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+int main2 (int n)
+{
+  int i;
+  for (i = 0; i < n - 3; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
new file mode 100644
index 0000000000000000000000000000000000000000..227dcf1b7ab2ace149e692a6aab41cdd5d47d098
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   volatile complex double z = vect_b[i];
+   vect_b[i] = x + i + z;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 0bde281c2468d8e7f43afc4fe0f757e221ad5edb..a31e3d5161684878a79817d30a6955c8370444d8 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -844,7 +844,7 @@ idx_within_array_bound (tree ref, tree *idx, void *dta)
 
 /* Return TRUE if ref is a within bound array reference.  */
 
-static bool
+bool
 ref_within_array_bound (gimple *stmt, tree ref)
 {
   class loop *loop = loop_containing_stmt (stmt);
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..85ae75ff2eb12b4299e8b7b91d0cf16e4549d08e 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -613,6 +613,241 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* Funcion vect_analyze_early_break_dependences.
+
+   Examime all the data references in the loop and make sure that if we have
+   mulitple exits that we are able to safely move stores such that they become
+   safe for vectorization.  The function also calculates the place where to move
+   the instructions to and computes what the new vUSE chain should be.
+
+   This works in tandem with the CFG that will be produced by
+   slpeel_tree_duplicate_loop_to_edge_cfg later on.
+
+   This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.  */
+
+static opt_result
+vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
+
+  /* List of all load data references found during traversal.  */
+  auto_vec<data_reference *> bases;
+  basic_block dest_bb = NULL;
+
+  hash_set <gimple *> visited;
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  class loop *loop_nest = loop_outer (loop);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "loop contains multiple exits, analyzing"
+		     " statement dependencies.\n");
+
+  for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	continue;
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (c);
+
+      /* Now analyze all the remaining statements and try to determine which
+	 instructions are allowed/needed to be moved.  */
+      while (!gsi_end_p (gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  gsi_prev (&gsi);
+	  if (!gimple_has_ops (stmt)
+	      || is_gimple_debug (stmt))
+	    continue;
+
+	  stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+	  auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+	  if (!dr_ref)
+	    continue;
+
+	  /* We currently only support statically allocated objects due to
+	     not having first-faulting loads support or peeling for
+	     alignment support.  Compute the size of the referenced object
+	     (it could be dynamically allocated).  */
+	  tree obj = DR_BASE_ADDRESS (dr_ref);
+	  if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks only supported on statically"
+				 " allocated objects.\n");
+	      return opt_result::failure_at (c,
+				 "can't safely apply code motion to "
+				 "dependencies of %G to vectorize "
+				 "the early exit.\n", c);
+	    }
+
+	  tree refop = TREE_OPERAND (obj, 0);
+	  tree refbase = get_base_address (refop);
+	  if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+	      || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks only supported on"
+				 " statically allocated objects.\n");
+	      return opt_result::failure_at (c,
+				 "can't safely apply code motion to "
+				 "dependencies of %G to vectorize "
+				 "the early exit.\n", c);
+	    }
+
+	  /* Check if vector accesses to the object will be within bounds.
+	     must be a constant or assume loop will be versioned or niters
+	     bounded by VF so accesses are within range.  */
+	  if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks not supported: vectorization "
+				 "would %s beyond size of obj.",
+				 DR_IS_READ (dr_ref) ? "read" : "write");
+	      return opt_result::failure_at (c,
+				 "can't safely apply code motion to "
+				 "dependencies of %G to vectorize "
+				 "the early exit.\n", c);
+	    }
+
+	  if (DR_IS_READ (dr_ref))
+	    bases.safe_push (dr_ref);
+	  else if (DR_IS_WRITE (dr_ref))
+	    {
+	      /* We are moving writes down in the CFG.  To be sure that this
+		 is valid after vectorization we have to check all the loads
+		 we are sinking the stores past to see if any of them may
+		 alias or are the same object.
+
+		 Same objects will not be an issue because unless the store
+		 is marked volatile the value can be forwarded.  If the
+		 store is marked volatile we don't vectorize the loop
+		 anyway.
+
+		 That leaves the check for aliasing.  We don't really need
+		 to care about the stores aliasing with each other since the
+		 stores are moved in order so the effects are still observed
+		 correctly.  This leaves the check for WAR dependencies
+		 which we would be introducing here if the DR can alias.
+		 The check is quadratic in loads/stores but I have not found
+		 a better API to do this.  I believe all loads and stores
+		 must be checked.  We also must check them when we
+		 encountered the store, since we don't care about loads past
+		 the store.  */
+
+	      for (auto dr_read : bases)
+		if (dr_may_alias_p (dr_ref, dr_read, loop_nest))
+		  {
+		    if (dump_enabled_p ())
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+				       vect_location,
+				       "early breaks not supported: "
+				       "overlapping loads and stores "
+				       "found before the break "
+				       "statement.\n");
+
+		    return opt_result::failure_at (stmt,
+			     "can't safely apply code motion to dependencies"
+			     " to vectorize the early exit. %G may alias with"
+			     " %G\n", stmt, dr_read->stmt);
+		  }
+	    }
+
+	  if (gimple_vdef (stmt))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "==> recording stmt %G", stmt);
+
+	      LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).safe_push (stmt);
+	    }
+	  else if (gimple_vuse (stmt))
+	    {
+	      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_insert (0, stmt);
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "marked statement for vUSE update: %G", stmt);
+	    }
+	}
+
+      /* Save destination as we go, BB are visited in order and the last one
+	is where statements should be moved to.  */
+      if (!dest_bb)
+	dest_bb = gimple_bb (c);
+      else
+	{
+	  basic_block curr_bb = gimple_bb (c);
+	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+	    dest_bb = curr_bb;
+	}
+
+      /* Mark the statement as a condition.  */
+      STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
+    }
+
+  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
+  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
+  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
+  /* We don't allow outer -> inner loop transitions which should have been
+     trapped already during loop form analysis.  */
+  gcc_assert (dest_bb->loop_father == loop);
+
+  gcc_assert (dest_bb);
+  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+  if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
+    {
+      /* All uses shall be updated to that of the first load.  Entries are
+	 stored in reverse order.  */
+      tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).last ());
+      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "will update use: %T, mem_ref: %G", vuse, g);
+	}
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "recorded statements to be moved to BB %d\n",
+		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+
+  return opt_result::success ();
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -657,6 +892,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return vect_analyze_early_break_dependences (loop_vinfo);
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fb8d999ee6bfaff551ac06ac2f3aea5354914659..900826567fee36206c0711ea51495602a7a031a1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -11548,6 +11549,56 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("move_early_exit_stmts");
+
+  if (LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).is_empty ())
+    return;
+
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo))
+    {
+      /* Check to see if statement is still required for vect or has been
+	 elided.  */
+      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_info)
+	continue;
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  tree vuse
+    = gimple_vuse (LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).last ());
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse to %T for load %G", vuse, p);
+      gimple_set_vuse (p, vuse);
+      update_stmt (p);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11697,6 +11748,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 96e4a6cffadebb43946c5cb7e9849c915da589bc..b3a09c0a804a38e17ef32b6ce13b98b077459fc7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -359,8 +359,8 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
-  if (is_ctrl_stmt (stmt_info->stmt)
-      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (dyn_cast <gcond *> (stmt))
     *relevant = vect_used_in_scope;
 
   /* changing memory.  */
@@ -13530,6 +13530,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+	case vect_condition_def:
+	  dump_printf (MSG_NOTE, "control flow\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index e4d7ab4567cef3c018b958f98eeff045d3477725..744cdc86c969a62574be488df4f9c222b68f7994 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -66,6 +66,7 @@ enum vect_def_type {
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
+  vect_condition_def,
   vect_unknown_def_type
 };
 
@@ -888,6 +889,10 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -942,6 +947,20 @@ public:
   /* The controlling loop IV for the scalar loop being vectorized.  This IV
      controls the natural exits of the loop.  */
   edge scalar_loop_iv_exit;
+
+  /* Used to store the list of stores needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  These are stored in order that they
+     need to be moved.  */
+  auto_vec<gimple *> early_break_stores;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<gimple*> early_break_vuses;
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -996,6 +1015,10 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BRK_STORES(L)     (L)->early_break_stores
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
@@ -2298,6 +2321,9 @@ extern opt_result vect_get_vector_types_for_stmt (vec_info *,
 						  tree *, unsigned int = 0);
 extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, unsigned int = 0);
 
+/* In tree-if-conv.cc.  */
+extern bool ref_within_array_bound (gimple *, tree);
+
 /* In tree-vect-data-refs.cc.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
 extern enum dr_alignment_support vect_supportable_dr_alignment

[-- Attachment #2: rb17963 (1).patch --]
[-- Type: application/octet-stream, Size: 21054 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
index be4a0c7426093059ce37a9f824defb7ae270094d..9a4e795f92b7a8577ac71827f5cb0bd15d88ebe1 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
@@ -5,6 +5,7 @@
 /* { dg-additional-options "-Ofast" } */
 
 /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
 
 void abort ();
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
new file mode 100644
index 0000000000000000000000000000000000000000..a26011ef1ba5aa000692babc90d46621efc2f8b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#undef N
+#define N 32
+
+unsigned vect_a[N];
+unsigned vect_b[N];
+  
+unsigned test4(unsigned x)
+{
+ unsigned ret = 0;
+ for (int i = 0; i < 1024; i++)
+ {
+   vect_b[i] = x + i;
+   if (vect_a[i] > x)
+     break;
+   vect_a[i] = x;
+   
+ }
+ return ret;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
new file mode 100644
index 0000000000000000000000000000000000000000..ddf504e0c8787ae33a0e98045c1c91f2b9f533a9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+extern void abort ();
+
+int x;
+__attribute__ ((noinline, noipa))
+void foo (int *a, int *b)
+{
+  int local_x = x;
+  for (int i = 0; i < 1024; ++i)
+    {
+      if (i + local_x == 13)
+        break;
+      a[i] = 2 * b[i];
+    }
+}
+
+int main ()
+{
+  int a[1024] = {0};
+  int b[1024] = {0};
+
+  for (int i = 0; i < 1024; i++)
+    b[i] = i;
+
+  x = -512;
+  foo (a, b);
+
+  if (a[524] != 1048)
+    abort ();
+
+  if (a[525] != 0)
+    abort ();
+
+  if (a[1023] != 0)
+    abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
new file mode 100644
index 0000000000000000000000000000000000000000..c38e394ad87863f0702d422cb58018b979c9fba6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
+void abort ();
+
+unsigned short sa[32];
+unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
+  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+unsigned int ia[32];
+unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
+        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+
+int main2 (int n)
+{
+  int i;
+  for (i = 0; i < n - 3; i++)
+    {
+      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
+        abort ();
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
new file mode 100644
index 0000000000000000000000000000000000000000..227dcf1b7ab2ace149e692a6aab41cdd5d47d098
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-additional-options "-Ofast" } */
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+
+#include <complex.h>
+
+#define N 1024
+complex double vect_a[N];
+complex double vect_b[N];
+  
+complex double test4(complex double x)
+{
+ complex double ret = 0;
+ for (int i = 0; i < N; i++)
+ {
+   volatile complex double z = vect_b[i];
+   vect_b[i] = x + i + z;
+   if (vect_a[i] == x)
+     return i;
+   vect_a[i] += x * vect_b[i];
+   
+ }
+ return ret;
+}
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 0bde281c2468d8e7f43afc4fe0f757e221ad5edb..a31e3d5161684878a79817d30a6955c8370444d8 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -844,7 +844,7 @@ idx_within_array_bound (tree ref, tree *idx, void *dta)
 
 /* Return TRUE if ref is a within bound array reference.  */
 
-static bool
+bool
 ref_within_array_bound (gimple *stmt, tree ref)
 {
   class loop *loop = loop_containing_stmt (stmt);
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..85ae75ff2eb12b4299e8b7b91d0cf16e4549d08e 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -613,6 +613,241 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* Funcion vect_analyze_early_break_dependences.
+
+   Examime all the data references in the loop and make sure that if we have
+   mulitple exits that we are able to safely move stores such that they become
+   safe for vectorization.  The function also calculates the place where to move
+   the instructions to and computes what the new vUSE chain should be.
+
+   This works in tandem with the CFG that will be produced by
+   slpeel_tree_duplicate_loop_to_edge_cfg later on.
+
+   This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+     - Any memory access must be to a fixed size buffer.
+     - There must not be any loads and stores to the same object.
+     - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+     This implemementation is very conservative. Any overlappig loads/stores
+     that take place before the early break statement gets rejected aside from
+     WAR dependencies.
+
+     i.e.:
+
+	a[i] = 8
+	c = a[i]
+	if (b[i])
+	  ...
+
+	is not allowed, but
+
+	c = a[i]
+	a[i] = 8
+	if (b[i])
+	  ...
+
+	is which is the common case.  */
+
+static opt_result
+vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
+
+  /* List of all load data references found during traversal.  */
+  auto_vec<data_reference *> bases;
+  basic_block dest_bb = NULL;
+
+  hash_set <gimple *> visited;
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  class loop *loop_nest = loop_outer (loop);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "loop contains multiple exits, analyzing"
+		     " statement dependencies.\n");
+
+  for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
+    {
+      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
+      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
+	continue;
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (c);
+
+      /* Now analyze all the remaining statements and try to determine which
+	 instructions are allowed/needed to be moved.  */
+      while (!gsi_end_p (gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  gsi_prev (&gsi);
+	  if (!gimple_has_ops (stmt)
+	      || is_gimple_debug (stmt))
+	    continue;
+
+	  stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
+	  auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
+	  if (!dr_ref)
+	    continue;
+
+	  /* We currently only support statically allocated objects due to
+	     not having first-faulting loads support or peeling for
+	     alignment support.  Compute the size of the referenced object
+	     (it could be dynamically allocated).  */
+	  tree obj = DR_BASE_ADDRESS (dr_ref);
+	  if (!obj || TREE_CODE (obj) != ADDR_EXPR)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks only supported on statically"
+				 " allocated objects.\n");
+	      return opt_result::failure_at (c,
+				 "can't safely apply code motion to "
+				 "dependencies of %G to vectorize "
+				 "the early exit.\n", c);
+	    }
+
+	  tree refop = TREE_OPERAND (obj, 0);
+	  tree refbase = get_base_address (refop);
+	  if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
+	      || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks only supported on"
+				 " statically allocated objects.\n");
+	      return opt_result::failure_at (c,
+				 "can't safely apply code motion to "
+				 "dependencies of %G to vectorize "
+				 "the early exit.\n", c);
+	    }
+
+	  /* Check if vector accesses to the object will be within bounds.
+	     must be a constant or assume loop will be versioned or niters
+	     bounded by VF so accesses are within range.  */
+	  if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "early breaks not supported: vectorization "
+				 "would %s beyond size of obj.",
+				 DR_IS_READ (dr_ref) ? "read" : "write");
+	      return opt_result::failure_at (c,
+				 "can't safely apply code motion to "
+				 "dependencies of %G to vectorize "
+				 "the early exit.\n", c);
+	    }
+
+	  if (DR_IS_READ (dr_ref))
+	    bases.safe_push (dr_ref);
+	  else if (DR_IS_WRITE (dr_ref))
+	    {
+	      /* We are moving writes down in the CFG.  To be sure that this
+		 is valid after vectorization we have to check all the loads
+		 we are sinking the stores past to see if any of them may
+		 alias or are the same object.
+
+		 Same objects will not be an issue because unless the store
+		 is marked volatile the value can be forwarded.  If the
+		 store is marked volatile we don't vectorize the loop
+		 anyway.
+
+		 That leaves the check for aliasing.  We don't really need
+		 to care about the stores aliasing with each other since the
+		 stores are moved in order so the effects are still observed
+		 correctly.  This leaves the check for WAR dependencies
+		 which we would be introducing here if the DR can alias.
+		 The check is quadratic in loads/stores but I have not found
+		 a better API to do this.  I believe all loads and stores
+		 must be checked.  We also must check them when we
+		 encountered the store, since we don't care about loads past
+		 the store.  */
+
+	      for (auto dr_read : bases)
+		if (dr_may_alias_p (dr_ref, dr_read, loop_nest))
+		  {
+		    if (dump_enabled_p ())
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+				       vect_location,
+				       "early breaks not supported: "
+				       "overlapping loads and stores "
+				       "found before the break "
+				       "statement.\n");
+
+		    return opt_result::failure_at (stmt,
+			     "can't safely apply code motion to dependencies"
+			     " to vectorize the early exit. %G may alias with"
+			     " %G\n", stmt, dr_read->stmt);
+		  }
+	    }
+
+	  if (gimple_vdef (stmt))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "==> recording stmt %G", stmt);
+
+	      LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).safe_push (stmt);
+	    }
+	  else if (gimple_vuse (stmt))
+	    {
+	      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_insert (0, stmt);
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "marked statement for vUSE update: %G", stmt);
+	    }
+	}
+
+      /* Save destination as we go, BB are visited in order and the last one
+	is where statements should be moved to.  */
+      if (!dest_bb)
+	dest_bb = gimple_bb (c);
+      else
+	{
+	  basic_block curr_bb = gimple_bb (c);
+	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
+	    dest_bb = curr_bb;
+	}
+
+      /* Mark the statement as a condition.  */
+      STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
+    }
+
+  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
+  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
+  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
+  /* We don't allow outer -> inner loop transitions which should have been
+     trapped already during loop form analysis.  */
+  gcc_assert (dest_bb->loop_father == loop);
+
+  gcc_assert (dest_bb);
+  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
+
+  if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
+    {
+      /* All uses shall be updated to that of the first load.  Entries are
+	 stored in reverse order.  */
+      tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).last ());
+      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+	{
+	  if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "will update use: %T, mem_ref: %G", vuse, g);
+	}
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "recorded statements to be moved to BB %d\n",
+		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
+
+  return opt_result::success ();
+}
+
 /* Function vect_analyze_data_ref_dependences.
 
    Examine all the data references in the loop, and make sure there do not
@@ -657,6 +892,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
 	  return res;
       }
 
+  /* If we have early break statements in the loop, check to see if they
+     are of a form we can vectorizer.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    return vect_analyze_early_break_dependences (loop_vinfo);
+
   return opt_result::success ();
 }
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fb8d999ee6bfaff551ac06ac2f3aea5354914659..900826567fee36206c0711ea51495602a7a031a1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
+    early_breaks (false),
     no_data_dependencies (false),
     has_mask_store (false),
     scalar_loop_scaling (profile_probability::uninitialized ()),
@@ -11548,6 +11549,56 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
   epilogue_vinfo->shared->save_datarefs ();
 }
 
+/*  When vectorizing early break statements instructions that happen before
+    the early break in the current BB need to be moved to after the early
+    break.  This function deals with that and assumes that any validity
+    checks has already been performed.
+
+    While moving the instructions if it encounters a VUSE or VDEF it then
+    corrects the VUSES as it moves the statements along.  GDEST is the location
+    in which to insert the new statements.  */
+
+static void
+move_early_exit_stmts (loop_vec_info loop_vinfo)
+{
+  DUMP_VECT_SCOPE ("move_early_exit_stmts");
+
+  if (LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).is_empty ())
+    return;
+
+  /* Move all stmts that need moving.  */
+  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
+  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
+
+  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo))
+    {
+      /* Check to see if statement is still required for vect or has been
+	 elided.  */
+      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
+      if (!stmt_info)
+	continue;
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
+
+      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
+      gsi_move_before (&stmt_gsi, &dest_gsi);
+      gsi_prev (&dest_gsi);
+    }
+
+  /* Update all the stmts with their new reaching VUSES.  */
+  tree vuse
+    = gimple_vuse (LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).last ());
+  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "updating vuse to %T for load %G", vuse, p);
+      gimple_set_vuse (p, vuse);
+      update_stmt (p);
+    }
+}
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
@@ -11697,6 +11748,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
       vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
     }
 
+  /* Handle any code motion that we need to for early-break vectorization after
+     we've done peeling but just before we start vectorizing.  */
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    move_early_exit_stmts (loop_vinfo);
+
   /* FORNOW: the vectorizer supports only loops which body consist
      of one basic block (header + empty latch). When the vectorizer will
      support more involved loop forms, the order by which the BBs are
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 96e4a6cffadebb43946c5cb7e9849c915da589bc..b3a09c0a804a38e17ef32b6ce13b98b077459fc7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -359,8 +359,8 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
   *live_p = false;
 
   /* cond stmt other than loop exit cond.  */
-  if (is_ctrl_stmt (stmt_info->stmt)
-      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (dyn_cast <gcond *> (stmt))
     *relevant = vect_used_in_scope;
 
   /* changing memory.  */
@@ -13530,6 +13530,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
 	case vect_first_order_recurrence:
 	  dump_printf (MSG_NOTE, "first order recurrence\n");
 	  break;
+	case vect_condition_def:
+	  dump_printf (MSG_NOTE, "control flow\n");
+	  break;
 	case vect_unknown_def_type:
 	  dump_printf (MSG_NOTE, "unknown\n");
 	  break;
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index e4d7ab4567cef3c018b958f98eeff045d3477725..744cdc86c969a62574be488df4f9c222b68f7994 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -66,6 +66,7 @@ enum vect_def_type {
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_first_order_recurrence,
+  vect_condition_def,
   vect_unknown_def_type
 };
 
@@ -888,6 +889,10 @@ public:
      we need to peel off iterations at the end to form an epilogue loop.  */
   bool peeling_for_niter;
 
+  /* When the loop has early breaks that we can vectorize we need to peel
+     the loop for the break finding loop.  */
+  bool early_breaks;
+
   /* List of loop additional IV conditionals found in the loop.  */
   auto_vec<gcond *> conds;
 
@@ -942,6 +947,20 @@ public:
   /* The controlling loop IV for the scalar loop being vectorized.  This IV
      controls the natural exits of the loop.  */
   edge scalar_loop_iv_exit;
+
+  /* Used to store the list of stores needing to be moved if doing early
+     break vectorization as they would violate the scalar loop semantics if
+     vectorized in their current location.  These are stored in order that they
+     need to be moved.  */
+  auto_vec<gimple *> early_break_stores;
+
+  /* The final basic block where to move statements to.  In the case of
+     multiple exits this could be pretty far away.  */
+  basic_block early_break_dest_bb;
+
+  /* Statements whose VUSES need updating if early break vectorization is to
+     happen.  */
+  auto_vec<gimple*> early_break_vuses;
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -996,6 +1015,10 @@ public:
 #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
 #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
 #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
+#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
+#define LOOP_VINFO_EARLY_BRK_STORES(L)     (L)->early_break_stores
+#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
+#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
 #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
 #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
 #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
@@ -2298,6 +2321,9 @@ extern opt_result vect_get_vector_types_for_stmt (vec_info *,
 						  tree *, unsigned int = 0);
 extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, unsigned int = 0);
 
+/* In tree-if-conv.cc.  */
+extern bool ref_within_array_bound (gimple *, tree);
+
 /* In tree-vect-data-refs.cc.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
 extern enum dr_alignment_support vect_supportable_dr_alignment

^ permalink raw reply	[flat|nested] 200+ messages in thread

* RE: [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks
  2023-12-20 10:51             ` Tamar Christina
@ 2023-12-20 12:24               ` Richard Biener
  0 siblings, 0 replies; 200+ messages in thread
From: Richard Biener @ 2023-12-20 12:24 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Wed, 20 Dec 2023, Tamar Christina wrote:

> > > +	      /* If we've moved a VDEF, extract the defining MEM and update
> > > +		 usages of it.   */
> > > +	      tree vdef;
> > > +	      /* This statement is to be moved.  */
> > > +	      if ((vdef = gimple_vdef (stmt)))
> > > +		LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS
> > (loop_vinfo).safe_push (
> > > +		    stmt);
> > 
> > I'm also unsure why you need 'chain' at all given you have the vector
> > of stores to be moved?
> > 
> 
> Yeah, so originally I wanted to move statements other than stores.  While stores
> are needed for correctness, the other statements would be so we didn't extend the
> live range too much for intermediate values.
> 
> This proved difficult but eventually I got it to work, but as you saw it was meh code.
> Instead I guess the better approach is to teach sched1 in GCC 15 to schedule across
> branches in loops.
> 
> With that in mind, I changed it to move only stores.  Since stores never produce a
> and are sinks, I don't really need fixed nor chain.
> 
> So here's a much cleaned up patch.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.
 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-if-conv.cc (ref_within_array_bound): Expose.
> 	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): New.
> 	(vect_analyze_data_ref_dependences): Use them.
> 	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> 	early_breaks.
> 	(move_early_exit_stmts): New.
> 	(vect_transform_loop): use it/
> 	* tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def.
> 	* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> 	(ref_within_array_bound): New.
> 	(class _loop_vec_info): Add early_breaks, early_break_conflict,
> 	early_break_vuses.
> 	(LOOP_VINFO_EARLY_BREAKS): New.
> 	(LOOP_VINFO_EARLY_BRK_STORES): New.
> 	(LOOP_VINFO_EARLY_BRK_DEST_BB): New.
> 	(LOOP_VINFO_EARLY_BRK_VUSES): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/vect/vect-early-break_57.c: Update.
> 	* gcc.dg/vect/vect-early-break_79.c: New test.
> 	* gcc.dg/vect/vect-early-break_80.c: New test.
> 	* gcc.dg/vect/vect-early-break_81.c: New test.
> 	* gcc.dg/vect/vect-early-break_83.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> index be4a0c7426093059ce37a9f824defb7ae270094d..9a4e795f92b7a8577ac71827f5cb0bd15d88ebe1 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c
> @@ -5,6 +5,7 @@
>  /* { dg-additional-options "-Ofast" } */
>  
>  /* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
>  
>  void abort ();
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a26011ef1ba5aa000692babc90d46621efc2f8b5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_79.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +
> +#undef N
> +#define N 32
> +
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < 1024; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..ddf504e0c8787ae33a0e98045c1c91f2b9f533a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_80.c
> @@ -0,0 +1,43 @@
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +extern void abort ();
> +
> +int x;
> +__attribute__ ((noinline, noipa))
> +void foo (int *a, int *b)
> +{
> +  int local_x = x;
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      if (i + local_x == 13)
> +        break;
> +      a[i] = 2 * b[i];
> +    }
> +}
> +
> +int main ()
> +{
> +  int a[1024] = {0};
> +  int b[1024] = {0};
> +
> +  for (int i = 0; i < 1024; i++)
> +    b[i] = i;
> +
> +  x = -512;
> +  foo (a, b);
> +
> +  if (a[524] != 1048)
> +    abort ();
> +
> +  if (a[525] != 0)
> +    abort ();
> +
> +  if (a[1023] != 0)
> +    abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..c38e394ad87863f0702d422cb58018b979c9fba6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_81.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "epilog loop required" "vect" } } */
> +void abort ();
> +
> +unsigned short sa[32];
> +unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
> +  16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
> +unsigned int ia[32];
> +unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,
> +        0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +
> +int main2 (int n)
> +{
> +  int i;
> +  for (i = 0; i < n - 3; i++)
> +    {
> +      if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i])
> +        abort ();
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..227dcf1b7ab2ace149e692a6aab41cdd5d47d098
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_83.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-additional-options "-Ofast" } */
> +
> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
> +
> +#include <complex.h>
> +
> +#define N 1024
> +complex double vect_a[N];
> +complex double vect_b[N];
> +  
> +complex double test4(complex double x)
> +{
> + complex double ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   volatile complex double z = vect_b[i];
> +   vect_b[i] = x + i + z;
> +   if (vect_a[i] == x)
> +     return i;
> +   vect_a[i] += x * vect_b[i];
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 0bde281c2468d8e7f43afc4fe0f757e221ad5edb..a31e3d5161684878a79817d30a6955c8370444d8 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -844,7 +844,7 @@ idx_within_array_bound (tree ref, tree *idx, void *dta)
>  
>  /* Return TRUE if ref is a within bound array reference.  */
>  
> -static bool
> +bool
>  ref_within_array_bound (gimple *stmt, tree ref)
>  {
>    class loop *loop = loop_containing_stmt (stmt);
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..85ae75ff2eb12b4299e8b7b91d0cf16e4549d08e 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -613,6 +613,241 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
>    return opt_result::success ();
>  }
>  
> +/* Funcion vect_analyze_early_break_dependences.
> +
> +   Examime all the data references in the loop and make sure that if we have
> +   mulitple exits that we are able to safely move stores such that they become
> +   safe for vectorization.  The function also calculates the place where to move
> +   the instructions to and computes what the new vUSE chain should be.
> +
> +   This works in tandem with the CFG that will be produced by
> +   slpeel_tree_duplicate_loop_to_edge_cfg later on.
> +
> +   This function tries to validate whether an early break vectorization
> +   is possible for the current instruction sequence. Returns True i
> +   possible, otherwise False.
> +
> +   Requirements:
> +     - Any memory access must be to a fixed size buffer.
> +     - There must not be any loads and stores to the same object.
> +     - Multiple loads are allowed as long as they don't alias.
> +
> +   NOTE:
> +     This implemementation is very conservative. Any overlappig loads/stores
> +     that take place before the early break statement gets rejected aside from
> +     WAR dependencies.
> +
> +     i.e.:
> +
> +	a[i] = 8
> +	c = a[i]
> +	if (b[i])
> +	  ...
> +
> +	is not allowed, but
> +
> +	c = a[i]
> +	a[i] = 8
> +	if (b[i])
> +	  ...
> +
> +	is which is the common case.  */
> +
> +static opt_result
> +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
> +{
> +  DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences");
> +
> +  /* List of all load data references found during traversal.  */
> +  auto_vec<data_reference *> bases;
> +  basic_block dest_bb = NULL;
> +
> +  hash_set <gimple *> visited;
> +  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  class loop *loop_nest = loop_outer (loop);
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "loop contains multiple exits, analyzing"
> +		     " statement dependencies.\n");
> +
> +  for (gimple *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> +    {
> +      stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c);
> +      if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type)
> +	continue;
> +
> +      gimple_stmt_iterator gsi = gsi_for_stmt (c);
> +
> +      /* Now analyze all the remaining statements and try to determine which
> +	 instructions are allowed/needed to be moved.  */
> +      while (!gsi_end_p (gsi))
> +	{
> +	  gimple *stmt = gsi_stmt (gsi);
> +	  gsi_prev (&gsi);
> +	  if (!gimple_has_ops (stmt)
> +	      || is_gimple_debug (stmt))
> +	    continue;
> +
> +	  stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt);
> +	  auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo);
> +	  if (!dr_ref)
> +	    continue;
> +
> +	  /* We currently only support statically allocated objects due to
> +	     not having first-faulting loads support or peeling for
> +	     alignment support.  Compute the size of the referenced object
> +	     (it could be dynamically allocated).  */
> +	  tree obj = DR_BASE_ADDRESS (dr_ref);
> +	  if (!obj || TREE_CODE (obj) != ADDR_EXPR)
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "early breaks only supported on statically"
> +				 " allocated objects.\n");
> +	      return opt_result::failure_at (c,
> +				 "can't safely apply code motion to "
> +				 "dependencies of %G to vectorize "
> +				 "the early exit.\n", c);
> +	    }
> +
> +	  tree refop = TREE_OPERAND (obj, 0);
> +	  tree refbase = get_base_address (refop);
> +	  if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase)
> +	      || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST)
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "early breaks only supported on"
> +				 " statically allocated objects.\n");
> +	      return opt_result::failure_at (c,
> +				 "can't safely apply code motion to "
> +				 "dependencies of %G to vectorize "
> +				 "the early exit.\n", c);
> +	    }
> +
> +	  /* Check if vector accesses to the object will be within bounds.
> +	     must be a constant or assume loop will be versioned or niters
> +	     bounded by VF so accesses are within range.  */
> +	  if (!ref_within_array_bound (stmt, DR_REF (dr_ref)))
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "early breaks not supported: vectorization "
> +				 "would %s beyond size of obj.",
> +				 DR_IS_READ (dr_ref) ? "read" : "write");
> +	      return opt_result::failure_at (c,
> +				 "can't safely apply code motion to "
> +				 "dependencies of %G to vectorize "
> +				 "the early exit.\n", c);
> +	    }
> +
> +	  if (DR_IS_READ (dr_ref))
> +	    bases.safe_push (dr_ref);
> +	  else if (DR_IS_WRITE (dr_ref))
> +	    {
> +	      /* We are moving writes down in the CFG.  To be sure that this
> +		 is valid after vectorization we have to check all the loads
> +		 we are sinking the stores past to see if any of them may
> +		 alias or are the same object.
> +
> +		 Same objects will not be an issue because unless the store
> +		 is marked volatile the value can be forwarded.  If the
> +		 store is marked volatile we don't vectorize the loop
> +		 anyway.
> +
> +		 That leaves the check for aliasing.  We don't really need
> +		 to care about the stores aliasing with each other since the
> +		 stores are moved in order so the effects are still observed
> +		 correctly.  This leaves the check for WAR dependencies
> +		 which we would be introducing here if the DR can alias.
> +		 The check is quadratic in loads/stores but I have not found
> +		 a better API to do this.  I believe all loads and stores
> +		 must be checked.  We also must check them when we
> +		 encountered the store, since we don't care about loads past
> +		 the store.  */
> +
> +	      for (auto dr_read : bases)
> +		if (dr_may_alias_p (dr_ref, dr_read, loop_nest))
> +		  {
> +		    if (dump_enabled_p ())
> +		      dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +				       vect_location,
> +				       "early breaks not supported: "
> +				       "overlapping loads and stores "
> +				       "found before the break "
> +				       "statement.\n");
> +
> +		    return opt_result::failure_at (stmt,
> +			     "can't safely apply code motion to dependencies"
> +			     " to vectorize the early exit. %G may alias with"
> +			     " %G\n", stmt, dr_read->stmt);
> +		  }
> +	    }
> +
> +	  if (gimple_vdef (stmt))
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				 "==> recording stmt %G", stmt);
> +
> +	      LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).safe_push (stmt);
> +	    }
> +	  else if (gimple_vuse (stmt))
> +	    {
> +	      LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_insert (0, stmt);
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				 "marked statement for vUSE update: %G", stmt);
> +	    }
> +	}
> +
> +      /* Save destination as we go, BB are visited in order and the last one
> +	is where statements should be moved to.  */
> +      if (!dest_bb)
> +	dest_bb = gimple_bb (c);
> +      else
> +	{
> +	  basic_block curr_bb = gimple_bb (c);
> +	  if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb))
> +	    dest_bb = curr_bb;
> +	}
> +
> +      /* Mark the statement as a condition.  */
> +      STMT_VINFO_DEF_TYPE (loop_cond_info) = vect_condition_def;
> +    }
> +
> +  basic_block dest_bb0 = EDGE_SUCC (dest_bb, 0)->dest;
> +  basic_block dest_bb1 = EDGE_SUCC (dest_bb, 1)->dest;
> +  dest_bb = flow_bb_inside_loop_p (loop, dest_bb0) ? dest_bb0 : dest_bb1;
> +  /* We don't allow outer -> inner loop transitions which should have been
> +     trapped already during loop form analysis.  */
> +  gcc_assert (dest_bb->loop_father == loop);
> +
> +  gcc_assert (dest_bb);
> +  LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb;
> +
> +  if (!LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).is_empty ())
> +    {
> +      /* All uses shall be updated to that of the first load.  Entries are
> +	 stored in reverse order.  */
> +      tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).last ());
> +      for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +	{
> +	  if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "will update use: %T, mem_ref: %G", vuse, g);
> +	}
> +    }
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "recorded statements to be moved to BB %d\n",
> +		     LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index);
> +
> +  return opt_result::success ();
> +}
> +
>  /* Function vect_analyze_data_ref_dependences.
>  
>     Examine all the data references in the loop, and make sure there do not
> @@ -657,6 +892,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
>  	  return res;
>        }
>  
> +  /* If we have early break statements in the loop, check to see if they
> +     are of a form we can vectorizer.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    return vect_analyze_early_break_dependences (loop_vinfo);
> +
>    return opt_result::success ();
>  }
>  
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index fb8d999ee6bfaff551ac06ac2f3aea5354914659..900826567fee36206c0711ea51495602a7a031a1 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>      partial_load_store_bias (0),
>      peeling_for_gaps (false),
>      peeling_for_niter (false),
> +    early_breaks (false),
>      no_data_dependencies (false),
>      has_mask_store (false),
>      scalar_loop_scaling (profile_probability::uninitialized ()),
> @@ -11548,6 +11549,56 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
>    epilogue_vinfo->shared->save_datarefs ();
>  }
>  
> +/*  When vectorizing early break statements instructions that happen before
> +    the early break in the current BB need to be moved to after the early
> +    break.  This function deals with that and assumes that any validity
> +    checks has already been performed.
> +
> +    While moving the instructions if it encounters a VUSE or VDEF it then
> +    corrects the VUSES as it moves the statements along.  GDEST is the location
> +    in which to insert the new statements.  */
> +
> +static void
> +move_early_exit_stmts (loop_vec_info loop_vinfo)
> +{
> +  DUMP_VECT_SCOPE ("move_early_exit_stmts");
> +
> +  if (LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).is_empty ())
> +    return;
> +
> +  /* Move all stmts that need moving.  */
> +  basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo);
> +  gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb);
> +
> +  for (gimple *stmt : LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo))
> +    {
> +      /* Check to see if statement is still required for vect or has been
> +	 elided.  */
> +      auto stmt_info = loop_vinfo->lookup_stmt (stmt);
> +      if (!stmt_info)
> +	continue;
> +
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt);
> +
> +      gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt);
> +      gsi_move_before (&stmt_gsi, &dest_gsi);
> +      gsi_prev (&dest_gsi);
> +    }
> +
> +  /* Update all the stmts with their new reaching VUSES.  */
> +  tree vuse
> +    = gimple_vuse (LOOP_VINFO_EARLY_BRK_STORES (loop_vinfo).last ());
> +  for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, vect_location,
> +			   "updating vuse to %T for load %G", vuse, p);
> +      gimple_set_vuse (p, vuse);
> +      update_stmt (p);
> +    }
> +}
> +
>  /* Function vect_transform_loop.
>  
>     The analysis phase has determined that the loop is vectorizable.
> @@ -11697,6 +11748,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call)
>        vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
>      }
>  
> +  /* Handle any code motion that we need to for early-break vectorization after
> +     we've done peeling but just before we start vectorizing.  */
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +    move_early_exit_stmts (loop_vinfo);
> +
>    /* FORNOW: the vectorizer supports only loops which body consist
>       of one basic block (header + empty latch). When the vectorizer will
>       support more involved loop forms, the order by which the BBs are
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 96e4a6cffadebb43946c5cb7e9849c915da589bc..b3a09c0a804a38e17ef32b6ce13b98b077459fc7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -359,8 +359,8 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo,
>    *live_p = false;
>  
>    /* cond stmt other than loop exit cond.  */
> -  if (is_ctrl_stmt (stmt_info->stmt)
> -      && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type)
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  if (dyn_cast <gcond *> (stmt))
>      *relevant = vect_used_in_scope;
>  
>    /* changing memory.  */
> @@ -13530,6 +13530,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt,
>  	case vect_first_order_recurrence:
>  	  dump_printf (MSG_NOTE, "first order recurrence\n");
>  	  break;
> +	case vect_condition_def:
> +	  dump_printf (MSG_NOTE, "control flow\n");
> +	  break;
>  	case vect_unknown_def_type:
>  	  dump_printf (MSG_NOTE, "unknown\n");
>  	  break;
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index e4d7ab4567cef3c018b958f98eeff045d3477725..744cdc86c969a62574be488df4f9c222b68f7994 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -66,6 +66,7 @@ enum vect_def_type {
>    vect_double_reduction_def,
>    vect_nested_cycle,
>    vect_first_order_recurrence,
> +  vect_condition_def,
>    vect_unknown_def_type
>  };
>  
> @@ -888,6 +889,10 @@ public:
>       we need to peel off iterations at the end to form an epilogue loop.  */
>    bool peeling_for_niter;
>  
> +  /* When the loop has early breaks that we can vectorize we need to peel
> +     the loop for the break finding loop.  */
> +  bool early_breaks;
> +
>    /* List of loop additional IV conditionals found in the loop.  */
>    auto_vec<gcond *> conds;
>  
> @@ -942,6 +947,20 @@ public:
>    /* The controlling loop IV for the scalar loop being vectorized.  This IV
>       controls the natural exits of the loop.  */
>    edge scalar_loop_iv_exit;
> +
> +  /* Used to store the list of stores needing to be moved if doing early
> +     break vectorization as they would violate the scalar loop semantics if
> +     vectorized in their current location.  These are stored in order that they
> +     need to be moved.  */
> +  auto_vec<gimple *> early_break_stores;
> +
> +  /* The final basic block where to move statements to.  In the case of
> +     multiple exits this could be pretty far away.  */
> +  basic_block early_break_dest_bb;
> +
> +  /* Statements whose VUSES need updating if early break vectorization is to
> +     happen.  */
> +  auto_vec<gimple*> early_break_vuses;
>  } *loop_vec_info;
>  
>  /* Access Functions.  */
> @@ -996,6 +1015,10 @@ public:
>  #define LOOP_VINFO_REDUCTION_CHAINS(L)     (L)->reduction_chains
>  #define LOOP_VINFO_PEELING_FOR_GAPS(L)     (L)->peeling_for_gaps
>  #define LOOP_VINFO_PEELING_FOR_NITER(L)    (L)->peeling_for_niter
> +#define LOOP_VINFO_EARLY_BREAKS(L)         (L)->early_breaks
> +#define LOOP_VINFO_EARLY_BRK_STORES(L)     (L)->early_break_stores
> +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
> +#define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
>  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
>  #define LOOP_VINFO_LOOP_IV_COND(L)         (L)->loop_iv_cond
>  #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies
> @@ -2298,6 +2321,9 @@ extern opt_result vect_get_vector_types_for_stmt (vec_info *,
>  						  tree *, unsigned int = 0);
>  extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, unsigned int = 0);
>  
> +/* In tree-if-conv.cc.  */
> +extern bool ref_within_array_bound (gimple *, tree);
> +
>  /* In tree-vect-data-refs.cc.  */
>  extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
>  extern enum dr_alignment_support vect_supportable_dr_alignment
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 200+ messages in thread

end of thread, other threads:[~2023-12-20 12:25 UTC | newest]

Thread overview: 200+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
2023-07-04 11:29   ` Richard Biener
2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
2023-06-29 22:17   ` Jason Merrill
2023-06-30 16:18     ` Tamar Christina
2023-06-30 16:44       ` Jason Merrill
2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
2023-06-28 13:54   ` Tamar Christina
2023-07-04 11:31   ` Richard Biener
2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
2023-07-04 11:52   ` Richard Biener
2023-07-04 14:57     ` Jan Hubicka
2023-07-06 14:34       ` Jan Hubicka
2023-07-07  5:59         ` Richard Biener
2023-07-07 12:20           ` Jan Hubicka
2023-07-07 12:27             ` Tamar Christina
2023-07-07 14:10               ` Jan Hubicka
2023-07-10  7:07             ` Richard Biener
2023-07-10  8:33               ` Jan Hubicka
2023-07-10  9:24                 ` Richard Biener
2023-07-10  9:23               ` Jan Hubicka
2023-07-10  9:29                 ` Richard Biener
2023-07-11  9:28                   ` Jan Hubicka
2023-07-11 10:31                     ` Richard Biener
2023-07-11 12:40                       ` Jan Hubicka
2023-07-11 13:04                         ` Richard Biener
2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
2023-07-04 12:05   ` Richard Biener
2023-07-10 15:32     ` Tamar Christina
2023-07-11 11:03       ` Richard Biener
2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
2023-07-04 12:10   ` Richard Biener
2023-07-06 10:37     ` Tamar Christina
2023-07-06 10:51       ` Richard Biener
2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
2023-07-13 11:32   ` Richard Biener
2023-07-13 11:54     ` Tamar Christina
2023-07-13 12:10       ` Richard Biener
2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
2023-07-13 11:49   ` Richard Biener
2023-07-13 12:03     ` Tamar Christina
2023-07-14  9:09     ` Richard Biener
2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
2023-06-28 13:55   ` [PATCH 9/19] " Tamar Christina
2023-07-13 16:23     ` Richard Biener
2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
2023-07-13 17:31   ` Richard Biener
2023-07-13 19:05     ` Tamar Christina
2023-07-14 13:34       ` Richard Biener
2023-07-17 10:56         ` Tamar Christina
2023-07-17 12:48           ` Richard Biener
2023-08-18 11:35         ` Tamar Christina
2023-08-18 12:53           ` Richard Biener
2023-08-18 13:12             ` Tamar Christina
2023-08-18 13:15               ` Richard Biener
2023-10-23 20:21         ` Tamar Christina
2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-06-28 13:50 ` [PATCH 19/19]Arm: Add MVE " Tamar Christina
     [not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
2023-06-28 14:49   ` FW: [PATCH v5 0/19] Support early break/return auto-vectorization 钟居哲
2023-06-28 16:00     ` Tamar Christina
2023-11-06  7:36 ` [PATCH v6 0/21]middle-end: " Tamar Christina
2023-11-06  7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
2023-11-07  9:46   ` Richard Biener
2023-11-06  7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
2023-11-07  9:52   ` Richard Biener
2023-11-16 10:53     ` Richard Biener
2023-11-06  7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
2023-11-07 10:53   ` Richard Biener
2023-11-07 11:34     ` Tamar Christina
2023-11-07 14:23       ` Richard Biener
2023-12-19 10:11         ` Tamar Christina
2023-12-19 14:05           ` Richard Biener
2023-12-20 10:51             ` Tamar Christina
2023-12-20 12:24               ` Richard Biener
2023-11-06  7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
2023-11-15  0:00   ` Tamar Christina
2023-11-15 12:40     ` Richard Biener
2023-11-20 21:51       ` Tamar Christina
2023-11-24 10:16         ` Tamar Christina
2023-11-24 12:38           ` Richard Biener
2023-11-06  7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
2023-11-07 15:04   ` Richard Biener
2023-11-07 23:10     ` Tamar Christina
2023-11-13 20:11     ` Tamar Christina
2023-11-14  7:56       ` Richard Biener
2023-11-14  8:07         ` Tamar Christina
2023-11-14 23:59           ` Tamar Christina
2023-11-15 12:14             ` Richard Biener
2023-11-06  7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
2023-11-07 14:54   ` Richard Biener
2023-11-06  7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
2023-11-15  0:03   ` Tamar Christina
2023-11-15 13:01     ` Richard Biener
2023-11-15 13:09       ` Tamar Christina
2023-11-15 13:22         ` Richard Biener
2023-11-15 14:14           ` Tamar Christina
2023-11-16 10:40             ` Richard Biener
2023-11-16 11:08               ` Tamar Christina
2023-11-16 11:27                 ` Richard Biener
2023-11-16 12:01                   ` Tamar Christina
2023-11-16 12:30                     ` Richard Biener
2023-11-16 13:22                       ` Tamar Christina
2023-11-16 13:35                         ` Richard Biener
2023-11-16 14:14                           ` Tamar Christina
2023-11-16 14:17                             ` Richard Biener
2023-11-16 15:19                               ` Tamar Christina
2023-11-16 18:41                                 ` Tamar Christina
2023-11-17 10:40                                   ` Tamar Christina
2023-11-17 12:13                                     ` Richard Biener
2023-11-20 21:54                                       ` Tamar Christina
2023-11-24 10:18                                         ` Tamar Christina
2023-11-24 12:41                                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
2023-11-15  0:05   ` Tamar Christina
2023-11-15 13:41     ` Richard Biener
2023-11-15 14:26       ` Tamar Christina
2023-11-16 11:16         ` Richard Biener
2023-11-20 21:57           ` Tamar Christina
2023-11-24 10:20             ` Tamar Christina
2023-11-24 13:23               ` Richard Biener
2023-11-27 22:47                 ` Tamar Christina
2023-11-29 13:28                   ` Richard Biener
2023-11-29 21:22                     ` Tamar Christina
2023-11-30 13:23                       ` Richard Biener
2023-12-06  4:21                         ` Tamar Christina
2023-12-06  9:33                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 13:50     ` Richard Biener
2023-12-06  4:37       ` Tamar Christina
2023-12-06  9:37         ` Richard Biener
2023-12-08  8:58           ` Tamar Christina
2023-12-08 10:28             ` Richard Biener
2023-12-08 13:45               ` Tamar Christina
2023-12-08 13:59                 ` Richard Biener
2023-12-08 15:01                   ` Tamar Christina
2023-12-11  7:09                   ` Tamar Christina
2023-12-11  9:36                     ` Richard Biener
2023-12-11 23:12                       ` Tamar Christina
2023-12-12 10:10                         ` Richard Biener
2023-12-12 10:27                           ` Tamar Christina
2023-12-12 10:59                           ` Richard Sandiford
2023-12-12 11:30                             ` Richard Biener
2023-12-13 14:13                               ` Tamar Christina
2023-12-14 13:12                                 ` Richard Biener
2023-12-14 18:44                                   ` Tamar Christina
2023-11-06  7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 14:47     ` Richard Biener
2023-12-06  4:10       ` Tamar Christina
2023-12-06  9:44         ` Richard Biener
2023-11-06  7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  8:31   ` Richard Biener
2023-12-06  9:10     ` Tamar Christina
2023-12-06  9:27       ` Richard Biener
2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  4:00     ` Tamar Christina
2023-12-06  8:18   ` Richard Biener
2023-12-06  8:52     ` Tamar Christina
2023-12-06  9:15       ` Richard Biener
2023-12-06  9:29         ` Tamar Christina
2023-11-06  7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
2023-11-06 14:44   ` Richard Biener
2023-11-06  7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
2023-12-09 10:38   ` Richard Sandiford
2023-12-11  7:38     ` Richard Biener
2023-12-11  8:49       ` Tamar Christina
2023-12-11  9:00         ` Richard Biener
2023-11-06  7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-11-06  7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-11-28 16:37   ` Richard Sandiford
2023-11-28 17:55     ` Richard Sandiford
2023-12-06 16:25       ` Tamar Christina
2023-12-07  0:56         ` Richard Sandiford
2023-12-14 18:40           ` Tamar Christina
2023-12-14 19:34             ` Richard Sandiford
2023-11-06  7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-11-06  7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-11-06  7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-11-27 12:48   ` Kyrylo Tkachov
2023-11-06  7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
2023-11-27 12:47   ` Kyrylo Tkachov
2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
2023-11-06 15:17   ` Tamar Christina
2023-11-07  9:42     ` Richard Biener
2023-11-07 10:47       ` Tamar Christina
2023-11-07 13:58         ` Richard Biener
2023-11-27 18:30           ` Richard Sandiford
2023-11-28  8:11             ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).