[PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
@ 2015-09-10 15:11 Alan Hayward
  2015-09-10 22:34 ` Bill Schmidt
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Alan Hayward @ 2015-09-10 15:11 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3325 bytes --]

Hi,
This patch (attached) adds support for vectorizing conditional expressions
(PR 65947), for example:

int condition_reduction (int *a, int min_v)
{
  int last = 0;
  for (int i = 0; i < N; i++)
    if (a[i] < min_v)
      last = a[i];
  return last;
}

To do this the loop is vectorised to create a vector of data results (ie
of matching a[i] values). Using an induction variable, an additional
vector is added containing the indexes where the matches occured. In the
function epilogue this is reduced to a single max value and then used to
index into the vector of data results.
When no values are matched in the loop, the indexes vector will contain
all zeroes, eventually matching the first entry in the data results vector.

To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This is
supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
REDUC_MAX_EXPR is not supported for the required modes, failing the
vectorization. On mips it complains that the required vcond expression is
not supported. It is suggested the relevant backend experts add the
required backend support.

Using a simple testcase based around a large number of N and run on an
aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
it's original time.

This patch caused binary differences in three spec2006 binaries on aarch64
- 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
showed no improvement or degregation in runtime.


In the near future I hope to submit a further patch (as PR 66558) which
optimises the case where the result is simply the index of the loop, for
example:
int condition_reduction (int *a, int min_v)
{
  int last = 0;
  for (int i = 0; i < N; i++)
    if (a[i] < min_v)
      last = i;
  return last;
}
In this case a lot of the new code can be optimized away.

I have run check for aarch64, arm and x86 and have seen no regressions.


Changelog:

    2015-08-28  Alan Hayward <alan.hayward@arm.com>

        PR tree-optimization/65947
        * tree-vect-loop.c
        (vect_is_simple_reduction_1): Find condition reductions.
        (vect_model_reduction_cost): Add condition reduction costs.
        (get_initial_def_for_reduction): Add condition reduction initial
var.
        (vect_create_epilog_for_reduction): Add condition reduction epilog.
        (vectorizable_reduction): Condition reduction support.
        * tree-vect-stmts.c
        (vectorizable_condition): Add vect reduction arg
        * doc/sourcebuild.texi (Vector-specific attributes): Document
        vect_max_reduc

    testsuite/Changelog:

        PR tree-optimization/65947
        * lib/target-supports.exp
        (check_effective_target_vect_max_reduc): Add.
        * gcc.dg/vect/pr65947-1.c: New test.
        * gcc.dg/vect/pr65947-2.c: New test.
        * gcc.dg/vect/pr65947-3.c: New test.
        * gcc.dg/vect/pr65947-4.c: New test.
        * gcc.dg/vect/pr65947-5.c: New test.
        * gcc.dg/vect/pr65947-6.c: New test.
        * gcc.dg/vect/pr65947-7.c: New test.
        * gcc.dg/vect/pr65947-8.c: New test.
        * gcc.dg/vect/pr65947-9.c: New test.
        * gcc.dg/vect/pr65947-10.c: New test.
        * gcc.dg/vect/pr65947-11.c: New test.



Thanks,
Alan



[-- Attachment #2: 0001-Support-for-vectorizing-conditional-expressions.patch --]
[-- Type: application/octet-stream, Size: 52309 bytes --]

From 898b4908b32452091da79373129933d5d816cdfc Mon Sep 17 00:00:00 2001
From: Alan Hayward <alan.hayward@arm.com>
Date: Fri, 28 Aug 2015 10:01:15 +0100
Subject: [PATCH] Support for vectorizing conditional expressions

2015-08-28  Alan Hayward <alan.hayward@arm.com>

	PR tree-optimization/65947
	* tree-vect-loop.c
	(vect_is_simple_reduction_1): Find condition reductions.
	(vect_model_reduction_cost): Add condition reduction costs.
	(get_initial_def_for_reduction): Add condition reduction initial var.
	(vect_create_epilog_for_reduction): Add condition reduction epilog.
	(vectorizable_reduction): Condition reduction support.
	* tree-vect-stmts.c
	(vectorizable_condition): Add vect reduction arg
        * doc/sourcebuild.texi (Vector-specific attributes): Document
	vect_max_reduc

    testsuite/Changelog:

	PR tree-optimization/65947
	* lib/target-supports.exp
	(check_effective_target_vect_max_reduc): Add.
	* gcc.dg/vect/pr65947-1.c: New test.
	* gcc.dg/vect/pr65947-2.c: New test.
	* gcc.dg/vect/pr65947-3.c: New test.
	* gcc.dg/vect/pr65947-4.c: New test.
	* gcc.dg/vect/pr65947-5.c: New test.
	* gcc.dg/vect/pr65947-6.c: New test.
	* gcc.dg/vect/pr65947-7.c: New test.
	* gcc.dg/vect/pr65947-8.c: New test.
	* gcc.dg/vect/pr65947-9.c: New test.
	* gcc.dg/vect/pr65947-10.c: New test.
	* gcc.dg/vect/pr65947-11.c: New test.
---
 gcc/doc/sourcebuild.texi               |   3 +
 gcc/testsuite/gcc.dg/vect/pr65947-1.c  |  39 +++
 gcc/testsuite/gcc.dg/vect/pr65947-10.c |  40 +++
 gcc/testsuite/gcc.dg/vect/pr65947-11.c |  48 +++
 gcc/testsuite/gcc.dg/vect/pr65947-2.c  |  40 +++
 gcc/testsuite/gcc.dg/vect/pr65947-3.c  |  50 ++++
 gcc/testsuite/gcc.dg/vect/pr65947-4.c  |  40 +++
 gcc/testsuite/gcc.dg/vect/pr65947-5.c  |  41 +++
 gcc/testsuite/gcc.dg/vect/pr65947-6.c  |  39 +++
 gcc/testsuite/gcc.dg/vect/pr65947-7.c  |  51 ++++
 gcc/testsuite/gcc.dg/vect/pr65947-8.c  |  41 +++
 gcc/testsuite/gcc.dg/vect/pr65947-9.c  |  42 +++
 gcc/testsuite/lib/target-supports.exp  |  10 +
 gcc/tree-vect-loop.c                   | 515 ++++++++++++++++++++++++++-------
 gcc/tree-vect-stmts.c                  |  44 +--
 gcc/tree-vectorizer.h                  |  11 +-
 16 files changed, 931 insertions(+), 123 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-10.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-11.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-9.c

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 5dc7c81..61de4a5 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1466,6 +1466,9 @@ Target supports conversion from @code{float} to @code{signed int}.
 
 @item vect_floatuint_cvt
 Target supports conversion from @code{float} to @code{unsigned int}.
+
+@item vect_max_reduc
+Target supports max reduction for vectors.
 @end table
 
 @subsubsection Thread Local Storage attributes
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-1.c b/gcc/testsuite/gcc.dg/vect/pr65947-1.c
new file mode 100644
index 0000000..7933f5c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 32
+
+/* Simple condition reduction.  */
+
+int
+condition_reduction (int *a, int min_v)
+{
+  int last = -1;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = i;
+
+  return last;
+}
+
+int
+main (void)
+{
+  int a[N] = {
+  11, -12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, -3, 4, 5, 6, 7, -8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+
+  int ret = condition_reduction (a, 16);
+
+  if (ret != 19)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-10.c b/gcc/testsuite/gcc.dg/vect/pr65947-10.c
new file mode 100644
index 0000000..9a43a60
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-10.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 32
+
+/* Non-integer data types.  */
+
+float
+condition_reduction (float *a, float min_v)
+{
+  float last = 0;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+int
+main (void)
+{
+  float a[N] = {
+  11.5, 12.2, 13.22, 14.1, 15.2, 16.3, 17, 18.7, 19, 20,
+  1, 2, 3.3, 4.3333, 5.5, 6.23, 7, 8.63, 9, 10.6,
+  21, 22.12, 23.55, 24.76, 25, 26, 27.34, 28.765, 29, 30,
+  31.111, 32.322
+  };
+
+  float ret = condition_reduction (a, 16.7);
+
+  if (ret != (float)10.6)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-11.c b/gcc/testsuite/gcc.dg/vect/pr65947-11.c
new file mode 100644
index 0000000..25064bb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-11.c
@@ -0,0 +1,48 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 37
+
+/* Re-use the result of the condition inside the loop.  Will fail to
+   vectorize.  */
+
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v, unsigned int *b)
+{
+  unsigned int last = N + 65;
+
+  for (unsigned int i = 0; i < N; i++)
+    {
+      if (b[i] < min_v)
+	last = i;
+      a[i] = last;
+    }
+  return last;
+}
+
+int
+main (void)
+{
+  unsigned int a[N] = {
+  31, 32, 33, 34, 35, 36, 37,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20
+  };
+  unsigned int b[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  31, 32, 33, 34, 35, 36, 37
+  };
+
+  unsigned int ret = condition_reduction (a, 16, b);
+
+  if (ret != 29)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 0 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-2.c b/gcc/testsuite/gcc.dg/vect/pr65947-2.c
new file mode 100644
index 0000000..9c627d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-2.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 254
+
+/* Non-simple condition reduction.  */
+
+unsigned char
+condition_reduction (unsigned char *a, unsigned char min_v)
+{
+  unsigned char last = 65;
+
+  for (unsigned char i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+int
+main (void)
+{
+  unsigned char a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+  __builtin_memset (a+32, 43, N-32);
+
+  unsigned char ret = condition_reduction (a, 16);
+
+  if (ret != 10)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-3.c b/gcc/testsuite/gcc.dg/vect/pr65947-3.c
new file mode 100644
index 0000000..e115de2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-3.c
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 37
+
+/* Non-simple condition reduction with additional variable and unsigned
+   types.  */
+
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v, unsigned int *b)
+{
+  unsigned int last = N + 65;
+  unsigned int aval;
+
+  for (unsigned int i = 0; i < N; i++)
+    {
+      aval = a[i];
+      if (b[i] < min_v)
+	last = aval;
+    }
+  return last;
+}
+
+
+int
+main (void)
+{
+  unsigned int a[N] = {
+  31, 32, 33, 34, 35, 36, 37,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20
+  };
+  unsigned int b[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  31, 32, 33, 34, 35, 36, 37
+  };
+
+  unsigned int ret = condition_reduction (a, 16, b);
+
+  if (ret != 13)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-4.c b/gcc/testsuite/gcc.dg/vect/pr65947-4.c
new file mode 100644
index 0000000..76a0567
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-4.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 27
+
+/* Condition reduction with no valid matches at runtime.  */
+
+int
+condition_reduction (int *a, int min_v)
+{
+  int last = N + 96;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] > min_v)
+      last = i;
+
+  return last;
+}
+
+int
+main (void)
+{
+  int a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27
+  };
+
+  int ret = condition_reduction (a, 46);
+
+  /* loop should never have found a value.  */
+  if (ret != N + 96)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-5.c b/gcc/testsuite/gcc.dg/vect/pr65947-5.c
new file mode 100644
index 0000000..360e3b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-5.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 32
+
+/* Condition reduction where loop size is not known at compile time.  Will fail
+   to vectorize.  Version inlined into main loop will vectorize.  */
+
+unsigned char
+condition_reduction (unsigned char *a, unsigned char min_v, int count)
+{
+  unsigned char last = 65;
+
+  for (int i = 0; i < count; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+int
+main (void)
+{
+  unsigned char a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+
+  unsigned char ret = condition_reduction (a, 16, N);
+
+  if (ret != 10)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-6.c b/gcc/testsuite/gcc.dg/vect/pr65947-6.c
new file mode 100644
index 0000000..4997ef7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-6.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 30
+
+/* Condition reduction where loop type is different than the data type.  */
+
+int
+condition_reduction (int *a, int min_v)
+{
+  int last = N + 65;
+
+  for (char i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+
+int
+main (void)
+{
+  int a[N] = {
+  67, 32, 45, 43, 21, -11, 12, 3, 4, 5,
+  6, 76, -32, 56, -32, -1, 4, 5, 6, 99,
+  43, 22, -3, 22, 16, 34, 55, 31, 87, 324
+  };
+
+  int ret = condition_reduction (a, 16);
+
+  if (ret != -3)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-7.c b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
new file mode 100644
index 0000000..c86f1fd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 43
+
+/* Condition reduction with comparison is a different type to the data.  Will
+   fail to vectorize.  */
+
+int
+condition_reduction (short *a, int min_v, int *b)
+{
+  int last = N + 65;
+  short aval;
+
+  for (int i = 0; i < N; i++)
+    {
+      aval = a[i];
+      if (b[i] < min_v)
+	last = aval;
+    }
+  return last;
+}
+
+int
+main (void)
+{
+  short a[N] = {
+  31, -32, 133, 324, 335, 36, 37, 45, 11, 65,
+  1, -28, 3, 48, 5, -68, 7, 88, 89, 180,
+  121, -122, 123, 124, -125, 126, 127, 128, 129, 130,
+  11, 12, 13, 14, -15, -16, 17, 18, 19, 20,
+  33, 27, 99
+  };
+  int b[N] = {
+  11, -12, -13, 14, 15, 16, 17, 18, 19, 20,
+  21, -22, 23, 24, -25, 26, 27, 28, 29, 30,
+  1, 62, 3, 14, -15, 6, 37, 48, 99, 10,
+  31, -32, 33, 34, -35, 36, 37, 56, 54, 22,
+  73, 2, 87
+  };
+
+  int ret = condition_reduction (a, 16, b);
+
+  if (ret != 27)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 0 "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-8.c b/gcc/testsuite/gcc.dg/vect/pr65947-8.c
new file mode 100644
index 0000000..d2d3e44
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-8.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 27
+
+/* Condition reduction with multiple types in the comparison.  Will fail to
+   vectorize.  */
+
+int
+condition_reduction (char *a, int min_v)
+{
+  int last = N + 65;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+
+int
+main (void)
+{
+  char a[N] = {
+  1, 28, 3, 48, 5, 68, 7, -88, 89, 180,
+  121, 122, -123, 124, 12, -12, 12, 67, 84, 122,
+  67, 55, 112, 22, 45, 23, 111
+  };
+
+  int ret = condition_reduction (a, 16);
+
+  if (ret != 12)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 0 "vect" } } */
+/* { dg-final { scan-tree-dump "multiple types in double reduction or condition reduction" "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-9.c b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
new file mode 100644
index 0000000..d2ffea9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 255
+
+/* Condition reduction with maximum possible loop size.  Will fail to
+   vectorize because the vectorisation requires a slot for default values.  */
+
+char
+condition_reduction (char *a, char min_v)
+{
+  char last = -72;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+char
+main (void)
+{
+  char a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+  __builtin_memset (a+32, 43, N-32);
+
+  char ret = condition_reduction (a, 16);
+
+  if (ret != 10)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 0 "vect" } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index a465eb1..cf07a56 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6449,3 +6449,13 @@ proc check_effective_target_comdat_group {} {
 	int (*fn) () = foo;
     }]
 }
+
+
+# Return 1 if the target supports max reduction for vectors.
+
+proc check_effective_target_vect_max_reduc { } {
+    if { [istarget aarch64*-*-*] || [istarget arm*-*-*] } {
+	return 1
+    }
+    return 0
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 59c75af..528c80e 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2331,6 +2331,11 @@ vect_is_slp_reduction (loop_vec_info loop_info, gimple phi, gimple first_stmt)
      inner loop (def of a3)
      a2 = phi < a3 >
 
+   (4) Detect condition expressions, ie:
+     for (int i = 0; i < N; i++)
+       if (a[i] < val)
+	ret_val = a[i];
+
    If MODIFY is true it tries also to rework the code in-place to enable
    detection of more reduction patterns.  For the time being we rewrite
    "res -= RHS" into "rhs += -RHS" when it seems worthwhile.
@@ -2339,7 +2344,8 @@ vect_is_slp_reduction (loop_vec_info loop_info, gimple phi, gimple first_stmt)
 static gimple
 vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 			    bool check_reduction, bool *double_reduc,
-			    bool modify, bool need_wrapping_integral_overflow)
+			    bool modify, bool need_wrapping_integral_overflow,
+			    enum vect_reduction_type *v_reduc_type)
 {
   struct loop *loop = (gimple_bb (phi))->loop_father;
   struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
@@ -2356,6 +2362,7 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
   bool phi_def;
 
   *double_reduc = false;
+  *v_reduc_type = TREE_CODE_REDUCTION;
 
   /* If CHECK_REDUCTION is true, we assume inner-most loop vectorization,
      otherwise, we assume outer loop vectorization.  */
@@ -2501,13 +2508,19 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
       && SSA_NAME_DEF_STMT (op1) == phi)
     code = PLUS_EXPR;
 
-  if (check_reduction
-      && (!commutative_tree_code (code) || !associative_tree_code (code)))
+  if (check_reduction)
     {
-      if (dump_enabled_p ())
-        report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: not commutative/associative: ");
-      return NULL;
+      if (code != COND_EXPR
+	  && (!commutative_tree_code (code) || !associative_tree_code (code)))
+	{
+	  if (dump_enabled_p ())
+	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			    "reduction: not commutative/associative: ");
+	  return NULL;
+	}
+
+      if (code == COND_EXPR)
+	*v_reduc_type = COND_REDUCTION;
     }
 
   if (get_gimple_rhs_class (code) != GIMPLE_BINARY_RHS)
@@ -2603,47 +2616,50 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
      and therefore vectorizing reductions in the inner-loop during
      outer-loop vectorization is safe.  */
 
-  /* CHECKME: check for !flag_finite_math_only too?  */
-  if (SCALAR_FLOAT_TYPE_P (type) && !flag_associative_math
-      && check_reduction)
+  if (*v_reduc_type != COND_REDUCTION)
     {
-      /* Changing the order of operations changes the semantics.  */
-      if (dump_enabled_p ())
-	report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: unsafe fp math optimization: ");
-      return NULL;
-    }
-  else if (INTEGRAL_TYPE_P (type) && check_reduction)
-    {
-      if (!operation_no_trapping_overflow (type, code))
+      /* CHECKME: check for !flag_finite_math_only too?  */
+      if (SCALAR_FLOAT_TYPE_P (type) && !flag_associative_math
+	  && check_reduction)
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
 	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			    "reduction: unsafe int math optimization"
-			    " (overflow traps): ");
+			"reduction: unsafe fp math optimization: ");
 	  return NULL;
 	}
-      if (need_wrapping_integral_overflow
-	  && !TYPE_OVERFLOW_WRAPS (type)
-	  && operation_can_overflow (code))
+      else if (INTEGRAL_TYPE_P (type) && check_reduction)
+	{
+	  if (!operation_no_trapping_overflow (type, code))
+	    {
+	      /* Changing the order of operations changes the semantics.  */
+	      if (dump_enabled_p ())
+		report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+				"reduction: unsafe int math optimization"
+				" (overflow traps): ");
+	      return NULL;
+	    }
+	  if (need_wrapping_integral_overflow
+	      && !TYPE_OVERFLOW_WRAPS (type)
+	      && operation_can_overflow (code))
+	    {
+	      /* Changing the order of operations changes the semantics.  */
+	      if (dump_enabled_p ())
+		report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+				"reduction: unsafe int math optimization"
+				" (overflow doesn't wrap): ");
+	      return NULL;
+	    }
+	}
+      else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
-	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			    "reduction: unsafe int math optimization"
-			    " (overflow doesn't wrap): ");
+	  report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			  "reduction: unsafe fixed-point math optimization: ");
 	  return NULL;
 	}
     }
-  else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
-    {
-      /* Changing the order of operations changes the semantics.  */
-      if (dump_enabled_p ())
-	report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: unsafe fixed-point math optimization: ");
-      return NULL;
-    }
 
   /* If we detected "res -= x[i]" earlier, rewrite it into
      "res += -x[i]" now.  If this turns out to be useless reassoc
@@ -2719,6 +2735,16 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
     {
       if (check_reduction)
         {
+	  if (code == COND_EXPR)
+	    {
+	      /* No current known use where this case would be useful.  */
+	      if (dump_enabled_p ())
+		report_vect_op (MSG_NOTE, def_stmt,
+				"detected reduction: cannot currently swap "
+				"operands for cond_expr");
+	      return NULL;
+	    }
+
           /* Swap operands (just for simplicity - so that the rest of the code
 	     can assume that the reduction variable is always the last (second)
 	     argument).  */
@@ -2742,7 +2768,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
     }
 
   /* Try to find SLP reduction chain.  */
-  if (check_reduction && vect_is_slp_reduction (loop_info, phi, def_stmt))
+  if (check_reduction && code != COND_EXPR &&
+      vect_is_slp_reduction (loop_info, phi, def_stmt))
     {
       if (dump_enabled_p ())
         report_vect_op (MSG_NOTE, def_stmt,
@@ -2764,11 +2791,13 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 static gimple
 vect_is_simple_reduction (loop_vec_info loop_info, gimple phi,
 			  bool check_reduction, bool *double_reduc,
-			  bool need_wrapping_integral_overflow)
+			  bool need_wrapping_integral_overflow,
+			  enum vect_reduction_type *v_reduc_type)
 {
   return vect_is_simple_reduction_1 (loop_info, phi, check_reduction,
 				     double_reduc, false,
-				     need_wrapping_integral_overflow);
+				     need_wrapping_integral_overflow,
+				     v_reduc_type);
 }
 
 /* Wrapper around vect_is_simple_reduction_1, which will modify code
@@ -2780,9 +2809,11 @@ vect_force_simple_reduction (loop_vec_info loop_info, gimple phi,
 			     bool check_reduction, bool *double_reduc,
 			     bool need_wrapping_integral_overflow)
 {
+  enum vect_reduction_type v_reduc_type;
   return vect_is_simple_reduction_1 (loop_info, phi, check_reduction,
 				     double_reduc, true,
-				     need_wrapping_integral_overflow);
+				     need_wrapping_integral_overflow,
+				     &v_reduc_type);
 }
 
 /* Calculate cost of peeling the loop PEEL_ITERS_PROLOGUE times.  */
@@ -3266,7 +3297,8 @@ get_reduction_op (gimple stmt, int reduc_index)
 
 static bool
 vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
-			   int ncopies, int reduc_index)
+			   int ncopies, int reduc_index,
+			   enum vect_reduction_type v_reduc_type)
 {
   int prologue_cost = 0, epilogue_cost = 0;
   enum tree_code code;
@@ -3287,6 +3319,10 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
   else
     target_cost_data = BB_VINFO_TARGET_COST_DATA (STMT_VINFO_BB_VINFO (stmt_info));
 
+  /* Condition reductions generate two reductions in the loop.  */
+  if (v_reduc_type == COND_REDUCTION)
+    ncopies *= 2;
+
   /* Cost of reduction op inside loop.  */
   unsigned inside_cost = add_stmt_cost (target_cost_data, ncopies, vector_stmt,
 					stmt_info, 0, vect_body);
@@ -3316,9 +3352,13 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
 
   code = gimple_assign_rhs_code (orig_stmt);
 
-  /* Add in cost for initial definition.  */
-  prologue_cost += add_stmt_cost (target_cost_data, 1, scalar_to_vec,
-				  stmt_info, 0, vect_prologue);
+  /* Add in cost for initial definition.
+     For cond reduction we have four vectors: initial index, step, initial
+     result of the data reduction, initial value of the index reduction.  */
+  int prologue_stmts = v_reduc_type == COND_REDUCTION ? 4 : 1;
+  prologue_cost += add_stmt_cost (target_cost_data, prologue_stmts,
+				  scalar_to_vec, stmt_info, 0,
+				  vect_prologue);
 
   /* Determine cost of epilogue code.
 
@@ -3329,10 +3369,29 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
     {
       if (reduc_code != ERROR_MARK)
 	{
-	  epilogue_cost += add_stmt_cost (target_cost_data, 1, vector_stmt,
-					  stmt_info, 0, vect_epilogue);
-	  epilogue_cost += add_stmt_cost (target_cost_data, 1, vec_to_scalar,
-					  stmt_info, 0, vect_epilogue);
+	  if (v_reduc_type == COND_REDUCTION)
+	    {
+	      /* An EQ stmt and an AND stmt.  */
+	      epilogue_cost += add_stmt_cost (target_cost_data, 2,
+					      vector_stmt, stmt_info, 0,
+					      vect_epilogue);
+	      /* Reduction of the max index and a reduction of the found
+		 values.  */
+	      epilogue_cost += add_stmt_cost (target_cost_data, 1,
+					      vec_to_scalar, stmt_info, 0,
+					      vect_epilogue);
+	      /* A broadcast of the max value.  */
+	      epilogue_cost += add_stmt_cost (target_cost_data, 2,
+					      scalar_to_vec, stmt_info, 0,
+					      vect_epilogue);
+	    }
+	  else
+	    {
+	      epilogue_cost += add_stmt_cost (target_cost_data, 1, vector_stmt,
+					      stmt_info, 0, vect_epilogue);
+	      epilogue_cost += add_stmt_cost (target_cost_data, 1, vec_to_scalar,
+					      stmt_info, 0, vect_epilogue);
+	    }
 	}
       else
 	{
@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi)
          the final vector of induction results:  */
       exit_phi = NULL;
       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
-        {
+	{
 	  gimple use_stmt = USE_STMT (use_p);
 	  if (is_gimple_debug (use_stmt))
 	    continue;
@@ -3774,7 +3833,8 @@ get_initial_def_for_induction (gimple iv_phi)
 
    Input:
    STMT - a stmt that performs a reduction operation in the loop.
-   INIT_VAL - the initial value of the reduction variable
+   INIT_VAL - the initial value of the reduction variable.
+   V_REDUC_TYPE - the type of reduction.
 
    Output:
    ADJUSTMENT_DEF - a tree that holds a value to be added to the final result
@@ -3784,16 +3844,20 @@ get_initial_def_for_induction (gimple iv_phi)
         vector of partial results.
 
    Option1 (adjust in epilog): Initialize the vector as follows:
-     add/bit or/xor:    [0,0,...,0,0]
-     mult/bit and:      [1,1,...,1,1]
-     min/max/cond_expr: [init_val,init_val,..,init_val,init_val]
+     add/bit or/xor:     [0,0,...,0,0]
+     mult/bit and:       [1,1,...,1,1]
+     min/max:		 [init_val,init_val,..,init_val,init_val]
+     nested cond_expr:   [init_val,init_val,..,init_val,init_val]
+     unnested cond_expr: [init_val,0,0,...,0]
    and when necessary (e.g. add/mult case) let the caller know
    that it needs to adjust the result by init_val.
 
    Option2: Initialize the vector as follows:
-     add/bit or/xor:    [init_val,0,0,...,0]
-     mult/bit and:      [init_val,1,1,...,1]
-     min/max/cond_expr: [init_val,init_val,...,init_val]
+     add/bit or/xor:     [init_val,0,0,...,0]
+     mult/bit and:       [init_val,1,1,...,1]
+     min/max:		 [init_val,init_val,...,init_val]
+     nested cond_expr:   [init_val,init_val,...,init_val]
+     unnested cond_expr: [init_val,0,0,...,0]
    and no adjustments are needed.
 
    For example, for the following code:
@@ -3815,7 +3879,8 @@ get_initial_def_for_induction (gimple iv_phi)
 
 tree
 get_initial_def_for_reduction (gimple stmt, tree init_val,
-                               tree *adjustment_def)
+			       tree *adjustment_def,
+			       enum vect_reduction_type v_reduc_type)
 {
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
@@ -3936,18 +4001,39 @@ get_initial_def_for_reduction (gimple stmt, tree init_val,
 
         break;
 
+      case COND_EXPR:
+	if (v_reduc_type == COND_REDUCTION)
+	  {
+	    if (adjustment_def)
+	      *adjustment_def = NULL_TREE;
+
+	    /* Create a vector of {init_value, 0, 0, 0...}.  */
+	    vec<constructor_elt, va_gc> *v;
+	    vec_alloc (v, nunits);
+	    CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
+	    if (SCALAR_FLOAT_TYPE_P (scalar_type))
+	      for (i = 1; i < nunits; ++i)
+		CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+					build_real (scalar_type, dconst0));
+	    else
+	      for (i = 1; i < nunits; ++i)
+		CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+					build_int_cst (scalar_type, 0));
+	    init_def = build_constructor (vectype, v);
+	    break;
+	  }
+	/* Fall through.  */
+
       case MIN_EXPR:
       case MAX_EXPR:
-      case COND_EXPR:
-        if (adjustment_def)
+	if (adjustment_def)
           {
-            *adjustment_def = NULL_TREE;
+	    *adjustment_def = NULL_TREE;
             init_def = vect_get_vec_def_for_operand (init_val, stmt, NULL);
             break;
           }
-
 	init_def = build_vector_from_val (vectype, init_value);
-        break;
+	break;
 
       default:
         gcc_unreachable ();
@@ -3977,6 +4063,9 @@ get_initial_def_for_reduction (gimple stmt, tree init_val,
    DOUBLE_REDUC is TRUE if double reduction phi nodes should be handled.
    SLP_NODE is an SLP node containing a group of reduction statements. The 
      first one in this group is STMT.
+   V_REDUC_TYPE is the type of reduction.
+   INDUCTION_INDEX is the index of the loop for condition reductions. Otherwise
+     it is undefined.
 
    This function:
    1. Creates the reduction def-use cycles: sets the arguments for 
@@ -4022,7 +4111,9 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
 				  int ncopies, enum tree_code reduc_code,
 				  vec<gimple> reduction_phis,
                                   int reduc_index, bool double_reduc, 
-                                  slp_tree slp_node)
+				  slp_tree slp_node,
+				  enum vect_reduction_type v_reduc_type,
+				  tree induction_index)
 {
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   stmt_vec_info prev_phi_info;
@@ -4321,11 +4412,97 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
     }
   else
     new_phi_result = PHI_RESULT (new_phis[0]);
- 
+
+  if (v_reduc_type == COND_REDUCTION)
+    {
+      tree index_vec_type = TREE_TYPE (induction_index);
+      tree index_vec_type_signed = signed_type_for (index_vec_type);
+      tree index_scalar_type = TREE_TYPE (index_vec_type);
+      machine_mode index_vector_mode = TYPE_MODE (index_vec_type);
+
+      /* Find maximum value from the vector of found indexes.  */
+      tree max_index = make_temp_ssa_name (index_scalar_type, NULL, "");
+      gimple max_index_stmt = gimple_build_assign (max_index, REDUC_MAX_EXPR,
+						   induction_index);
+      gsi_insert_before (&exit_gsi, max_index_stmt, GSI_SAME_STMT);
+
+      /* Vector of {max_index, max_index, max_index,...}.  */
+      tree max_index_vec = make_temp_ssa_name (index_vec_type, NULL, "");
+      tree max_index_vec_rhs = build_vector_from_val (index_vec_type,
+						      max_index);
+      gimple max_index_vec_stmt = gimple_build_assign (max_index_vec,
+						       max_index_vec_rhs);
+      gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT);
+
+      /* Compare the max index vector to the vector of found indexes to find
+	 the postion of the max value.  This will result in either a single
+	 match or all of the values.  */
+      tree vec_compare = make_temp_ssa_name (index_vec_type_signed, NULL, "");
+      gimple vec_compare_stmt = gimple_build_assign (vec_compare, EQ_EXPR,
+						     induction_index,
+						     max_index_vec);
+      gsi_insert_before (&exit_gsi, vec_compare_stmt, GSI_SAME_STMT);
+
+      /* Convert the vector of data to the same type as the EQ.  */
+      tree vec_data_cast;
+      if ( TYPE_UNSIGNED (index_vec_type))
+	{
+	  vec_data_cast = make_temp_ssa_name (index_vec_type_signed, NULL,
+					       "");
+	  tree vec_data_cast_rhs = build1 (VIEW_CONVERT_EXPR,
+					   index_vec_type_signed,
+					   new_phi_result);
+	  gimple vec_data_cast_stmt = gimple_build_assign (vec_data_cast,
+							   VIEW_CONVERT_EXPR,
+							   vec_data_cast_rhs);
+	  gsi_insert_before (&exit_gsi, vec_data_cast_stmt, GSI_SAME_STMT);
+	}
+      else
+	vec_data_cast = new_phi_result;
+
+      /* Where the max index occured, use the value from the data vector.  */
+      tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL, "");
+      gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR,
+						 vec_compare, vec_data_cast);
+      gsi_insert_before (&exit_gsi, vec_and_stmt, GSI_SAME_STMT);
+
+      /* Make the matched data values unsigned.  */
+      tree vec_and_cast = make_temp_ssa_name (index_vec_type, NULL, "");
+      tree vec_and_cast_rhs = build1 (VIEW_CONVERT_EXPR, index_vec_type,
+				      vec_and);
+      gimple vec_and_cast_stmt = gimple_build_assign (vec_and_cast,
+						      VIEW_CONVERT_EXPR,
+						      vec_and_cast_rhs);
+      gsi_insert_before (&exit_gsi, vec_and_cast_stmt, GSI_SAME_STMT);
+
+      /* Reduce down to a scalar value.  */
+      tree matched_data_reduc = make_temp_ssa_name (index_scalar_type, NULL,
+						    "");
+      gimple matched_data_reduc_stmt;
+      optab ot = optab_for_tree_code (REDUC_MAX_EXPR, index_vec_type,
+				      optab_default);
+      gcc_assert (optab_handler (ot, index_vector_mode) != CODE_FOR_nothing);
+      matched_data_reduc_stmt = gimple_build_assign (matched_data_reduc,
+						     REDUC_MAX_EXPR,
+						     vec_and_cast);
+      gsi_insert_before (&exit_gsi, matched_data_reduc_stmt, GSI_SAME_STMT);
+
+      /* Convert the reduced value to the result type and set as the
+	 result.  */
+      tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR, scalar_type,
+					     matched_data_reduc);
+      epilog_stmt = gimple_build_assign (new_scalar_dest,
+					 matched_data_reduc_cast);
+      new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
+      gimple_assign_set_lhs (epilog_stmt, new_temp);
+      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+      scalar_results.safe_push (new_temp);
+    }
+
   /* 2.3 Create the reduction code, using one of the three schemes described
          above. In SLP we simply need to extract all the elements from the 
          vector (without reducing them), so we use scalar shifts.  */
-  if (reduc_code != ERROR_MARK && !slp_reduc)
+  else if (reduc_code != ERROR_MARK && !slp_reduc)
     {
       tree tmp;
       tree vec_elem_type;
@@ -4847,6 +5024,15 @@ vect_finalize_reduction:
    and it's STMT_VINFO_RELATED_STMT points to the last stmt in the original
    sequence that had been detected and replaced by the pattern-stmt (STMT).
 
+   This function also handles reduction of condition expressions, for example:
+     for (int i = 0; i < N; i++)
+       if (a[i] < value)
+	 last = a[i];
+   This is handled by vectorising the loop and creating an additional vector
+   containing the loop indexes for which "a[i] < value" was true.  In the
+   function epilogue this is reduced to a single max value and then used to
+   index into the vector of results.
+
    In some cases of reduction patterns, the type of the reduction variable X is
    different than the type of the other arguments of STMT.
    In such cases, the vectype that is used when transforming STMT into a vector
@@ -4922,6 +5108,8 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
   int vec_num;
   tree def0, def1, tem, op0, op1 = NULL_TREE;
   bool first_p = true;
+  enum vect_reduction_type v_reduc_type = TREE_CODE_REDUCTION;
+  tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
 
   /* In case of reduction chain we switch to the first stmt in the chain, but
      we don't update STMT_INFO, since only the last stmt is marked as reduction
@@ -5092,7 +5280,8 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
     }
 
   gimple tmp = vect_is_simple_reduction (loop_vinfo, reduc_def_stmt,
-					 !nested_cycle, &dummy, false);
+					 !nested_cycle, &dummy, false,
+					 &v_reduc_type);
   if (orig_stmt)
     gcc_assert (tmp == orig_stmt
 		|| GROUP_FIRST_ELEMENT (vinfo_for_stmt (tmp)) == orig_stmt);
@@ -5117,12 +5306,12 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
 
   if (code == COND_EXPR)
     {
-      if (!vectorizable_condition (stmt, gsi, NULL, ops[reduc_index], 0, NULL))
+      if (!vectorizable_condition (stmt, gsi, NULL, ops[reduc_index], 0, NULL,
+				   v_reduc_type))
         {
           if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "unsupported condition in reduction\n");
-
 	  return false;
         }
     }
@@ -5153,7 +5342,7 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
         }
 
       if (optab_handler (optab, vec_mode) == CODE_FOR_nothing)
-        {
+	{
           if (dump_enabled_p ())
             dump_printf (MSG_NOTE, "op not supported by target.\n");
 
@@ -5246,49 +5435,71 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
     }
 
   epilog_reduc_code = ERROR_MARK;
-  if (reduction_code_for_scalar_code (orig_code, &epilog_reduc_code))
+
+  if (v_reduc_type == TREE_CODE_REDUCTION)
     {
-      reduc_optab = optab_for_tree_code (epilog_reduc_code, vectype_out,
+      if (reduction_code_for_scalar_code (orig_code, &epilog_reduc_code))
+	{
+	  reduc_optab = optab_for_tree_code (epilog_reduc_code, vectype_out,
                                          optab_default);
-      if (!reduc_optab)
-        {
-          if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "no optab for reduction.\n");
-
-          epilog_reduc_code = ERROR_MARK;
-        }
-      else if (optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing)
-        {
-          optab = scalar_reduc_to_vector (reduc_optab, vectype_out);
-          if (optab_handler (optab, vec_mode) == CODE_FOR_nothing)
-            {
-              if (dump_enabled_p ())
-	        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "reduc op not supported by target.\n");
+	  if (!reduc_optab)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "no optab for reduction.\n");
 
 	      epilog_reduc_code = ERROR_MARK;
 	    }
-        }
+	  else if (optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing)
+	    {
+	      optab = scalar_reduc_to_vector (reduc_optab, vectype_out);
+	      if (optab_handler (optab, vec_mode) == CODE_FOR_nothing)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				     "reduc op not supported by target.\n");
+
+		  epilog_reduc_code = ERROR_MARK;
+		}
+	    }
+	}
+      else
+	{
+	  if (!nested_cycle || double_reduc)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "no reduc code for scalar code.\n");
+
+	      return false;
+	    }
+	}
     }
   else
     {
-      if (!nested_cycle || double_reduc)
-        {
-          if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "no reduc code for scalar code.\n");
+      int scalar_precision = GET_MODE_PRECISION (TYPE_MODE (scalar_type));
+      cr_index_scalar_type = make_unsigned_type (scalar_precision);
+      cr_index_vector_type = build_vector_type
+	(cr_index_scalar_type, TYPE_VECTOR_SUBPARTS (vectype_out));
 
-          return false;
-        }
+      epilog_reduc_code = REDUC_MAX_EXPR;
+      optab = optab_for_tree_code (REDUC_MAX_EXPR, cr_index_vector_type,
+				   optab_default);
+      if (optab_handler (optab, TYPE_MODE (cr_index_vector_type)) == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "reduc max op not supported by target.\n");
+	  return false;
+	}
     }
 
-  if (double_reduc && ncopies > 1)
+  if ((double_reduc || v_reduc_type == COND_REDUCTION) && ncopies > 1)
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "multiple types in double reduction\n");
-
+			 "multiple types in double reduction or condition "
+			 "reduction.\n");
       return false;
     }
 
@@ -5312,11 +5523,39 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
         }
     }
 
+  if (v_reduc_type == COND_REDUCTION)
+    {
+      widest_int ni;
+
+      if (! max_loop_iterations (loop, &ni))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "loop count not known, cannot create cond "
+			     "reduction.\n");
+	  return false;
+	}
+      /* Convert backedges to iterations.  */
+      ni += 1;
+
+      /* The additional index will be the same type as the condition.  Check
+	 that the loop can fit into this less one (because we'll use up the
+	 zero slot for when there are no matches).  */
+      tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
+      if (wi::geu_p (ni, wi::to_widest (max_index)))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "loop size is greater than data size.\n");
+	  return false;
+	}
+    }
+
   if (!vec_stmt) /* transformation not required.  */
     {
       if (first_p
 	  && !vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies,
-					 reduc_index))
+					 reduc_index, v_reduc_type))
         return false;
       STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
       return true;
@@ -5327,6 +5566,8 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
 
+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+
   /* FORNOW: Multiple types are not supported for condition.  */
   if (code == COND_EXPR)
     gcc_assert (ncopies == 1);
@@ -5406,9 +5647,8 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
       if (code == COND_EXPR)
         {
           gcc_assert (!slp_node);
-          vectorizable_condition (stmt, gsi, vec_stmt, 
-                                  PHI_RESULT (phis[0]), 
-                                  reduc_index, NULL);
+	  vectorizable_condition (stmt, gsi, vec_stmt, PHI_RESULT (phis[0]),
+				  reduc_index, NULL, v_reduc_type);
           /* Multiple types are not supported for condition.  */
           break;
         }
@@ -5528,17 +5768,88 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
       prev_phi_info = vinfo_for_stmt (new_phi);
     }
 
+  tree indx_before_incr, indx_after_incr, cond_name = NULL;
+
   /* Finalize the reduction-phi (set its arguments) and create the
      epilog reduction code.  */
   if ((!single_defuse_cycle || code == COND_EXPR) && !slp_node)
     {
       new_temp = gimple_assign_lhs (*vec_stmt);
       vect_defs[0] = new_temp;
+
+      /* For cond reductions we need to add an additional conditional based on
+	 the loop index.  */
+      if (v_reduc_type == COND_REDUCTION)
+	{
+	  int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
+	  int k;
+
+	  gcc_assert (gimple_assign_rhs_code (*vec_stmt) == VEC_COND_EXPR);
+
+	  /* Create a {1,2,3,...} vector.  */
+	  tree *vtemp = XALLOCAVEC (tree, nunits_out);
+	  for (k = 0; k < nunits_out; ++k)
+	    vtemp[k] = build_int_cst (cr_index_scalar_type, k + 1);
+	  tree series_vect = build_vector (cr_index_vector_type, vtemp);
+
+	  /* Create a vector of the step value.  */
+	  tree step = build_int_cst (cr_index_scalar_type, nunits_out);
+	  tree vec_step = build_vector_from_val (cr_index_vector_type, step);
+
+	  /* Create a vector of 0s.  */
+	  tree zero = build_zero_cst (cr_index_scalar_type);
+	  tree vec_zero = build_vector_from_val (cr_index_vector_type, zero);
+
+	  /* Create an induction variable, starting at series_vect, and
+	     incrementing by vec_step.  */
+	  gimple_stmt_iterator incr_gsi;
+	  bool insert_after;
+	  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+	  create_iv (series_vect, vec_step, NULL_TREE, loop, &incr_gsi,
+		     insert_after, &indx_before_incr, &indx_after_incr);
+
+	  /* Create a vector phi node from the VEC_COND_EXPR (see below) and
+	     0s.  */
+	  tree new_phi_tree = make_temp_ssa_name (cr_index_vector_type, NULL, "");
+	  new_phi = create_phi_node (new_phi_tree, loop->header);
+	  set_vinfo_for_stmt (new_phi, new_stmt_vec_info (new_phi, loop_vinfo,
+							  NULL));
+	  add_phi_arg (new_phi, vec_zero, loop_preheader_edge (loop),
+		       UNKNOWN_LOCATION);
+
+	  /* Turn the condition from vec_stmt into an ssa name.  */
+	  gimple index_condition;
+	  gimple_stmt_iterator vec_stmt_gsi = gsi_for_stmt (*vec_stmt);
+	  tree ccompare = gimple_assign_rhs1 (*vec_stmt);
+	  tree ccompare_name = make_temp_ssa_name (TREE_TYPE (ccompare), NULL,
+						   "");
+	  gimple ccompare_stmt = gimple_build_assign (ccompare_name, ccompare);
+	  gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT);
+	  gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
+	  update_stmt (*vec_stmt);
+
+	  /* Create a conditional, where the condition is the same as from
+	     vec_stmt, then is the induction index, else is the phi.  */
+	  tree cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type, ccompare_name,
+				   indx_before_incr, new_phi_tree);
+	  cond_name = make_temp_ssa_name (cr_index_vector_type, NULL, "");
+	  index_condition = gimple_build_assign (cond_name, cond_expr);
+	  gsi_insert_before (&incr_gsi, index_condition, GSI_SAME_STMT);
+	  stmt_vec_info index_vec_info = new_stmt_vec_info (index_condition,
+							    loop_vinfo, NULL);
+	  STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
+	  set_vinfo_for_stmt (index_condition, index_vec_info);
+
+	  /* Update the phi with the vec cond.  */
+	  add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
+		       UNKNOWN_LOCATION);
+	}
     }
 
   vect_create_epilog_for_reduction (vect_defs, stmt, epilog_copies,
                                     epilog_reduc_code, phis, reduc_index,
-                                    double_reduc, slp_node);
+				    double_reduc, slp_node, v_reduc_type,
+				    cond_name);
 
   return true;
 }
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 359e010..7bcf575 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7293,7 +7293,7 @@ vect_is_simple_cond (tree cond, gimple stmt, loop_vec_info loop_vinfo,
 bool
 vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
 			gimple *vec_stmt, tree reduc_def, int reduc_index,
-			slp_tree slp_node)
+			slp_tree slp_node, enum vect_reduction_type v_reduc_type)
 {
   tree scalar_dest = NULL_TREE;
   tree vec_dest = NULL_TREE;
@@ -7321,21 +7321,24 @@ vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
   if (reduc_index && STMT_SLP_TYPE (stmt_info))
     return false;
 
-  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
-    return false;
+  if (v_reduc_type == TREE_CODE_REDUCTION)
+    {
+      if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+	return false;
 
-  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
-      && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
-           && reduc_def))
-    return false;
+      if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
+	  && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+	       && reduc_def))
+	return false;
 
-  /* FORNOW: not yet supported.  */
-  if (STMT_VINFO_LIVE_P (stmt_info))
-    {
-      if (dump_enabled_p ())
-        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                         "value used after loop.\n");
-      return false;
+      /* FORNOW: not yet supported.  */
+      if (STMT_VINFO_LIVE_P (stmt_info))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "value used after loop.\n");
+	  return false;
+	}
     }
 
   /* Is vectorizable conditional operation?  */
@@ -7739,7 +7742,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	  || vectorizable_call (stmt, NULL, NULL, node)
 	  || vectorizable_store (stmt, NULL, NULL, node)
 	  || vectorizable_reduction (stmt, NULL, NULL, node)
-	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node,
+				     TREE_CODE_REDUCTION));
   else
     {
       if (bb_vinfo)
@@ -7751,7 +7755,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	      || vectorizable_load (stmt, NULL, NULL, node, NULL)
 	      || vectorizable_call (stmt, NULL, NULL, node)
 	      || vectorizable_store (stmt, NULL, NULL, node)
-	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node,
+					 TREE_CODE_REDUCTION));
     }
 
   if (!ok)
@@ -7863,7 +7868,8 @@ vect_transform_stmt (gimple stmt, gimple_stmt_iterator *gsi,
       break;
 
     case condition_vec_info_type:
-      done = vectorizable_condition (stmt, gsi, &vec_stmt, NULL, 0, slp_node);
+      done = vectorizable_condition (stmt, gsi, &vec_stmt, NULL, 0, slp_node,
+				     TREE_CODE_REDUCTION);
       gcc_assert (done);
       break;
 
@@ -8262,8 +8268,8 @@ vect_is_simple_use (tree operand, gimple stmt, loop_vec_info loop_vinfo,
   if (TREE_CODE (operand) != SSA_NAME)
     {
       if (dump_enabled_p ())
-        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                         "not ssa-name.\n");
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "not ssa-name.\n");
       return false;
     }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 95276fa..34f76d4 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -60,6 +60,12 @@ enum vect_def_type {
   vect_unknown_def_type
 };
 
+/* Define type of reduction.  */
+enum vect_reduction_type {
+  TREE_CODE_REDUCTION,
+  COND_REDUCTION
+};
+
 #define VECTORIZABLE_CYCLE_DEF(D) (((D) == vect_reduction_def)           \
                                    || ((D) == vect_double_reduction_def) \
                                    || ((D) == vect_nested_cycle))
@@ -1037,7 +1043,7 @@ extern bool vect_transform_stmt (gimple, gimple_stmt_iterator *,
 extern void vect_remove_stores (gimple);
 extern bool vect_analyze_stmt (gimple, bool *, slp_tree);
 extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
-                                    tree, int, slp_tree);
+				    tree, int, slp_tree, enum vect_reduction_type);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
 				unsigned int *, unsigned int *,
 				stmt_vector_for_cost *,
@@ -1105,7 +1111,8 @@ extern bool vectorizable_live_operation (gimple, gimple_stmt_iterator *,
 extern bool vectorizable_reduction (gimple, gimple_stmt_iterator *, gimple *,
                                     slp_tree);
 extern bool vectorizable_induction (gimple, gimple_stmt_iterator *, gimple *);
-extern tree get_initial_def_for_reduction (gimple, tree, tree *);
+extern tree get_initial_def_for_reduction
+	(gimple, tree, tree *, enum vect_reduction_type = TREE_CODE_REDUCTION);
 extern int vect_min_worthwhile_factor (enum tree_code);
 extern int vect_get_known_peeling_cost (loop_vec_info, int, int *,
 					stmt_vector_for_cost *,
-- 
1.9.3 (Apple Git-50)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-10 15:11 [PATCH] vectorizing conditional expressions (PR tree-optimization/65947) Alan Hayward
@ 2015-09-10 22:34 ` Bill Schmidt
  2015-09-11  9:19   ` Alan Hayward
  2015-09-15 12:10 ` Richard Biener
  2015-09-15 12:12 ` Richard Biener
  2 siblings, 1 reply; 26+ messages in thread
From: Bill Schmidt @ 2015-09-10 22:34 UTC (permalink / raw)
  To: Alan Hayward; +Cc: gcc-patches

Hi Alan,

The cost modeling of the epilogue code seems pretty target-specific ("An
EQ stmt and an AND stmt, reduction of the max index and a reduction of
the found values, a broadcast of the max value," resulting in two
vector_stmts, one vec_to_scalar, and two scalar_to_vecs).  On powerpc,
this will not represent the cost accurately, and the cost will indeed be
quite different depending on the mode (logarithmic in the number of
elements).  I think that you need to create a new entry in
vect_cost_for_stmt to represent the cost of a COND_REDUCTION, and allow
each target to calculate the cost appropriately.

(Powerpc doesn't have a max-reduction hardware instruction, but because
the reduction will be only in the epilogue code, it may still be
profitable for us to generate the somewhat expensive reduction sequence
in order to vectorize the loop.  But we definitely want to model it as
costly in and of itself.  Also, the sequence will produce the maximum
value in all positions without a separate broadcast.)

Thanks,
Bill

On Thu, 2015-09-10 at 15:51 +0100, Alan Hayward wrote:
> Hi,
> This patch (attached) adds support for vectorizing conditional expressions
> (PR 65947), for example:
> 
> int condition_reduction (int *a, int min_v)
> {
>   int last = 0;
>   for (int i = 0; i < N; i++)
>     if (a[i] < min_v)
>       last = a[i];
>   return last;
> }
> 
> To do this the loop is vectorised to create a vector of data results (ie
> of matching a[i] values). Using an induction variable, an additional
> vector is added containing the indexes where the matches occured. In the
> function epilogue this is reduced to a single max value and then used to
> index into the vector of data results.
> When no values are matched in the loop, the indexes vector will contain
> all zeroes, eventually matching the first entry in the data results vector.
> 
> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This is
> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
> REDUC_MAX_EXPR is not supported for the required modes, failing the
> vectorization. On mips it complains that the required vcond expression is
> not supported. It is suggested the relevant backend experts add the
> required backend support.
> 
> Using a simple testcase based around a large number of N and run on an
> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
> it's original time.
> 
> This patch caused binary differences in three spec2006 binaries on aarch64
> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
> showed no improvement or degregation in runtime.
> 
> 
> In the near future I hope to submit a further patch (as PR 66558) which
> optimises the case where the result is simply the index of the loop, for
> example:
> int condition_reduction (int *a, int min_v)
> {
>   int last = 0;
>   for (int i = 0; i < N; i++)
>     if (a[i] < min_v)
>       last = i;
>   return last;
> }
> In this case a lot of the new code can be optimized away.
> 
> I have run check for aarch64, arm and x86 and have seen no regressions.
> 
> 
> Changelog:
> 
>     2015-08-28  Alan Hayward <alan.hayward@arm.com>
> 
>         PR tree-optimization/65947
>         * tree-vect-loop.c
>         (vect_is_simple_reduction_1): Find condition reductions.
>         (vect_model_reduction_cost): Add condition reduction costs.
>         (get_initial_def_for_reduction): Add condition reduction initial
> var.
>         (vect_create_epilog_for_reduction): Add condition reduction epilog.
>         (vectorizable_reduction): Condition reduction support.
>         * tree-vect-stmts.c
>         (vectorizable_condition): Add vect reduction arg
>         * doc/sourcebuild.texi (Vector-specific attributes): Document
>         vect_max_reduc
> 
>     testsuite/Changelog:
> 
>         PR tree-optimization/65947
>         * lib/target-supports.exp
>         (check_effective_target_vect_max_reduc): Add.
>         * gcc.dg/vect/pr65947-1.c: New test.
>         * gcc.dg/vect/pr65947-2.c: New test.
>         * gcc.dg/vect/pr65947-3.c: New test.
>         * gcc.dg/vect/pr65947-4.c: New test.
>         * gcc.dg/vect/pr65947-5.c: New test.
>         * gcc.dg/vect/pr65947-6.c: New test.
>         * gcc.dg/vect/pr65947-7.c: New test.
>         * gcc.dg/vect/pr65947-8.c: New test.
>         * gcc.dg/vect/pr65947-9.c: New test.
>         * gcc.dg/vect/pr65947-10.c: New test.
>         * gcc.dg/vect/pr65947-11.c: New test.
> 
> 
> 
> Thanks,
> Alan
> 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-10 22:34 ` Bill Schmidt
@ 2015-09-11  9:19   ` Alan Hayward
  2015-09-11 13:23     ` Bill Schmidt
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Hayward @ 2015-09-11  9:19 UTC (permalink / raw)
  To: Bill Schmidt, gcc-patches

Hi Bill,

I’d be a bit worried about asking the backend for the cost of a
COND_REDUCTION, as that will rely on the backend understanding the
implementation the vectorizer is using - every time the vectorizer
changed, the backends would need to be updated too. I’m hoping soon to get
together a patch to reduce the stmts produced on the simpler cases, which
would require a different set of costings. I can also imagine further
improvements being added for other special cases over time. Having the
backends understand every variation would be a little cumbersome.

As it stands today, we correctly exit the optimisation if max reduction
isn’t supported in hardware, which is what the cost model is expecting.


If power wanted to use this implementation, then I think it’d probably
need some code in tree-vect-generic.c to implement on emulated max
reduction, which would then require updates to the costs modelling of
anything that uses max reduction (not just cond reduction). All of that is
outside the scope of this patch.


Thanks,
Alan.

On 10/09/2015 23:14, "Bill Schmidt" <wschmidt@linux.vnet.ibm.com> wrote:

>Hi Alan,
>
>The cost modeling of the epilogue code seems pretty target-specific ("An
>EQ stmt and an AND stmt, reduction of the max index and a reduction of
>the found values, a broadcast of the max value," resulting in two
>vector_stmts, one vec_to_scalar, and two scalar_to_vecs).  On powerpc,
>this will not represent the cost accurately, and the cost will indeed be
>quite different depending on the mode (logarithmic in the number of
>elements).  I think that you need to create a new entry in
>vect_cost_for_stmt to represent the cost of a COND_REDUCTION, and allow
>each target to calculate the cost appropriately.
>
>(Powerpc doesn't have a max-reduction hardware instruction, but because
>the reduction will be only in the epilogue code, it may still be
>profitable for us to generate the somewhat expensive reduction sequence
>in order to vectorize the loop.  But we definitely want to model it as
>costly in and of itself.  Also, the sequence will produce the maximum
>value in all positions without a separate broadcast.)
>
>Thanks,
>Bill
>
>On Thu, 2015-09-10 at 15:51 +0100, Alan Hayward wrote:
>> Hi,
>> This patch (attached) adds support for vectorizing conditional
>>expressions
>> (PR 65947), for example:
>> 
>> int condition_reduction (int *a, int min_v)
>> {
>>   int last = 0;
>>   for (int i = 0; i < N; i++)
>>     if (a[i] < min_v)
>>       last = a[i];
>>   return last;
>> }
>> 
>> To do this the loop is vectorised to create a vector of data results (ie
>> of matching a[i] values). Using an induction variable, an additional
>> vector is added containing the indexes where the matches occured. In the
>> function epilogue this is reduced to a single max value and then used to
>> index into the vector of data results.
>> When no values are matched in the loop, the indexes vector will contain
>> all zeroes, eventually matching the first entry in the data results
>>vector.
>> 
>> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This
>>is
>> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
>> REDUC_MAX_EXPR is not supported for the required modes, failing the
>> vectorization. On mips it complains that the required vcond expression
>>is
>> not supported. It is suggested the relevant backend experts add the
>> required backend support.
>> 
>> Using a simple testcase based around a large number of N and run on an
>> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
>> it's original time.
>> 
>> This patch caused binary differences in three spec2006 binaries on
>>aarch64
>> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
>> showed no improvement or degregation in runtime.
>> 
>> 
>> In the near future I hope to submit a further patch (as PR 66558) which
>> optimises the case where the result is simply the index of the loop, for
>> example:
>> int condition_reduction (int *a, int min_v)
>> {
>>   int last = 0;
>>   for (int i = 0; i < N; i++)
>>     if (a[i] < min_v)
>>       last = i;
>>   return last;
>> }
>> In this case a lot of the new code can be optimized away.
>> 
>> I have run check for aarch64, arm and x86 and have seen no regressions.
>> 
>> 
>> Changelog:
>> 
>>     2015-08-28  Alan Hayward <alan.hayward@arm.com>
>> 
>>         PR tree-optimization/65947
>>         * tree-vect-loop.c
>>         (vect_is_simple_reduction_1): Find condition reductions.
>>         (vect_model_reduction_cost): Add condition reduction costs.
>>         (get_initial_def_for_reduction): Add condition reduction initial
>> var.
>>         (vect_create_epilog_for_reduction): Add condition reduction
>>epilog.
>>         (vectorizable_reduction): Condition reduction support.
>>         * tree-vect-stmts.c
>>         (vectorizable_condition): Add vect reduction arg
>>         * doc/sourcebuild.texi (Vector-specific attributes): Document
>>         vect_max_reduc
>> 
>>     testsuite/Changelog:
>> 
>>         PR tree-optimization/65947
>>         * lib/target-supports.exp
>>         (check_effective_target_vect_max_reduc): Add.
>>         * gcc.dg/vect/pr65947-1.c: New test.
>>         * gcc.dg/vect/pr65947-2.c: New test.
>>         * gcc.dg/vect/pr65947-3.c: New test.
>>         * gcc.dg/vect/pr65947-4.c: New test.
>>         * gcc.dg/vect/pr65947-5.c: New test.
>>         * gcc.dg/vect/pr65947-6.c: New test.
>>         * gcc.dg/vect/pr65947-7.c: New test.
>>         * gcc.dg/vect/pr65947-8.c: New test.
>>         * gcc.dg/vect/pr65947-9.c: New test.
>>         * gcc.dg/vect/pr65947-10.c: New test.
>>         * gcc.dg/vect/pr65947-11.c: New test.
>> 
>> 
>> 
>> Thanks,
>> Alan
>> 
>> 
>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-11  9:19   ` Alan Hayward
@ 2015-09-11 13:23     ` Bill Schmidt
  2015-09-11 13:55       ` Ramana Radhakrishnan
  2015-09-14  9:50       ` Alan Lawrence
  0 siblings, 2 replies; 26+ messages in thread
From: Bill Schmidt @ 2015-09-11 13:23 UTC (permalink / raw)
  To: Alan Hayward; +Cc: gcc-patches

Hi Alan,

I probably wasn't clear enough.  The implementation in the vectorizer is
fine and I'm not asking that to change per target.  What I'm objecting
to is the equivalence between a REDUC_MAX_EXPR and a cost associated
with vec_to_scalar.  This assumes that the back end will implement a
REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
But those back ends should be free to model the cost of the
REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
ARM, this cost will be the same as a vec_to_scalar.  For others, it may
not be; for powerpc, it certainly will not be.

We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
expansion, and therefore it is not correct for us to explode this in
tree-vect-generic.  This would expand the code size without providing
any significant optimization opportunity, and could reduce the ability
to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
gimple vectorizers.

I apologize if my loose use of language confused the issue.  It isn't
the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
that are used by it.

(The costs in powerpc won't be enormous, but they are definitely
mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
instructions, where n is the number of elements in the mode being
vectorized.)

A secondary concern for powerpc is that REDUC_MAX_EXPR produces a scalar
that has to be broadcast back to a vector, and the best way to implement
it for us already has the max value in all positions of a vector.  But
that is something we should be able to fix with simplify-rtx in the back
end.

Thanks,
Bill


On Fri, 2015-09-11 at 10:15 +0100, Alan Hayward wrote:
> Hi Bill,
> 
> Iâ€™d be a bit worried about asking the backend for the cost of a
> COND_REDUCTION, as that will rely on the backend understanding the
> implementation the vectorizer is using - every time the vectorizer
> changed, the backends would need to be updated too. Iâ€™m hoping soon to get
> together a patch to reduce the stmts produced on the simpler cases, which
> would require a different set of costings. I can also imagine further
> improvements being added for other special cases over time. Having the
> backends understand every variation would be a little cumbersome.
> 
> As it stands today, we correctly exit the optimisation if max reduction
> isnâ€™t supported in hardware, which is what the cost model is expecting.
> 
> 
> If power wanted to use this implementation, then I think itâ€™d probably
> need some code in tree-vect-generic.c to implement on emulated max
> reduction, which would then require updates to the costs modelling of
> anything that uses max reduction (not just cond reduction). All of that is
> outside the scope of this patch.
> 
> 
> Thanks,
> Alan.
> 
> On 10/09/2015 23:14, "Bill Schmidt" <wschmidt@linux.vnet.ibm.com> wrote:
> 
> >Hi Alan,
> >
> >The cost modeling of the epilogue code seems pretty target-specific ("An
> >EQ stmt and an AND stmt, reduction of the max index and a reduction of
> >the found values, a broadcast of the max value," resulting in two
> >vector_stmts, one vec_to_scalar, and two scalar_to_vecs).  On powerpc,
> >this will not represent the cost accurately, and the cost will indeed be
> >quite different depending on the mode (logarithmic in the number of
> >elements).  I think that you need to create a new entry in
> >vect_cost_for_stmt to represent the cost of a COND_REDUCTION, and allow
> >each target to calculate the cost appropriately.
> >
> >(Powerpc doesn't have a max-reduction hardware instruction, but because
> >the reduction will be only in the epilogue code, it may still be
> >profitable for us to generate the somewhat expensive reduction sequence
> >in order to vectorize the loop.  But we definitely want to model it as
> >costly in and of itself.  Also, the sequence will produce the maximum
> >value in all positions without a separate broadcast.)
> >
> >Thanks,
> >Bill
> >
> >On Thu, 2015-09-10 at 15:51 +0100, Alan Hayward wrote:
> >> Hi,
> >> This patch (attached) adds support for vectorizing conditional
> >>expressions
> >> (PR 65947), for example:
> >> 
> >> int condition_reduction (int *a, int min_v)
> >> {
> >>   int last = 0;
> >>   for (int i = 0; i < N; i++)
> >>     if (a[i] < min_v)
> >>       last = a[i];
> >>   return last;
> >> }
> >> 
> >> To do this the loop is vectorised to create a vector of data results (ie
> >> of matching a[i] values). Using an induction variable, an additional
> >> vector is added containing the indexes where the matches occured. In the
> >> function epilogue this is reduced to a single max value and then used to
> >> index into the vector of data results.
> >> When no values are matched in the loop, the indexes vector will contain
> >> all zeroes, eventually matching the first entry in the data results
> >>vector.
> >> 
> >> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This
> >>is
> >> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
> >> REDUC_MAX_EXPR is not supported for the required modes, failing the
> >> vectorization. On mips it complains that the required vcond expression
> >>is
> >> not supported. It is suggested the relevant backend experts add the
> >> required backend support.
> >> 
> >> Using a simple testcase based around a large number of N and run on an
> >> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
> >> it's original time.
> >> 
> >> This patch caused binary differences in three spec2006 binaries on
> >>aarch64
> >> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
> >> showed no improvement or degregation in runtime.
> >> 
> >> 
> >> In the near future I hope to submit a further patch (as PR 66558) which
> >> optimises the case where the result is simply the index of the loop, for
> >> example:
> >> int condition_reduction (int *a, int min_v)
> >> {
> >>   int last = 0;
> >>   for (int i = 0; i < N; i++)
> >>     if (a[i] < min_v)
> >>       last = i;
> >>   return last;
> >> }
> >> In this case a lot of the new code can be optimized away.
> >> 
> >> I have run check for aarch64, arm and x86 and have seen no regressions.
> >> 
> >> 
> >> Changelog:
> >> 
> >>     2015-08-28  Alan Hayward <alan.hayward@arm.com>
> >> 
> >>         PR tree-optimization/65947
> >>         * tree-vect-loop.c
> >>         (vect_is_simple_reduction_1): Find condition reductions.
> >>         (vect_model_reduction_cost): Add condition reduction costs.
> >>         (get_initial_def_for_reduction): Add condition reduction initial
> >> var.
> >>         (vect_create_epilog_for_reduction): Add condition reduction
> >>epilog.
> >>         (vectorizable_reduction): Condition reduction support.
> >>         * tree-vect-stmts.c
> >>         (vectorizable_condition): Add vect reduction arg
> >>         * doc/sourcebuild.texi (Vector-specific attributes): Document
> >>         vect_max_reduc
> >> 
> >>     testsuite/Changelog:
> >> 
> >>         PR tree-optimization/65947
> >>         * lib/target-supports.exp
> >>         (check_effective_target_vect_max_reduc): Add.
> >>         * gcc.dg/vect/pr65947-1.c: New test.
> >>         * gcc.dg/vect/pr65947-2.c: New test.
> >>         * gcc.dg/vect/pr65947-3.c: New test.
> >>         * gcc.dg/vect/pr65947-4.c: New test.
> >>         * gcc.dg/vect/pr65947-5.c: New test.
> >>         * gcc.dg/vect/pr65947-6.c: New test.
> >>         * gcc.dg/vect/pr65947-7.c: New test.
> >>         * gcc.dg/vect/pr65947-8.c: New test.
> >>         * gcc.dg/vect/pr65947-9.c: New test.
> >>         * gcc.dg/vect/pr65947-10.c: New test.
> >>         * gcc.dg/vect/pr65947-11.c: New test.
> >> 
> >> 
> >> 
> >> Thanks,
> >> Alan
> >> 
> >> 
> >
> >
> 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-11 13:23     ` Bill Schmidt
@ 2015-09-11 13:55       ` Ramana Radhakrishnan
  2015-09-11 14:41         ` Richard Sandiford
  2015-09-14  9:50       ` Alan Lawrence
  1 sibling, 1 reply; 26+ messages in thread
From: Ramana Radhakrishnan @ 2015-09-11 13:55 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: Alan Hayward, gcc-patches

On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi Alan,
>
> I probably wasn't clear enough.  The implementation in the vectorizer is
> fine and I'm not asking that to change per target.  What I'm objecting
> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
> with vec_to_scalar.  This assumes that the back end will implement a
> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
> But those back ends should be free to model the cost of the
> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
> not be; for powerpc, it certainly will not be.



>
> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
> expansion, and therefore it is not correct for us to explode this in
> tree-vect-generic.  This would expand the code size without providing
> any significant optimization opportunity, and could reduce the ability
> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
> gimple vectorizers.
>
> I apologize if my loose use of language confused the issue.  It isn't
> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
> that are used by it.
>
> (The costs in powerpc won't be enormous, but they are definitely
> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
> instructions, where n is the number of elements in the mode being
> vectorized.)


IIUC, on AArch64 a reduc_max_expr matches with a single reduction
operation but on AArch32 Neon a reduc_smax gets implemented as a
sequence of vpmax instructions which sounds similar to the PowerPC
example as well. Thus mapping a reduc_smax expression to the cost of a
vec_to_scalar is probably not right in this particular situation.


regards
Ramana
>
> A secondary concern for powerpc is that REDUC_MAX_EXPR produces a scalar
> that has to be broadcast back to a vector, and the best way to implement
> it for us already has the max value in all positions of a vector.  But
> that is something we should be able to fix with simplify-rtx in the back
> end.
>
> Thanks,
> Bill
>
>
> On Fri, 2015-09-11 at 10:15 +0100, Alan Hayward wrote:
>> Hi Bill,
>>
>> I’d be a bit worried about asking the backend for the cost of a
>> COND_REDUCTION, as that will rely on the backend understanding the
>> implementation the vectorizer is using - every time the vectorizer
>> changed, the backends would need to be updated too. I’m hoping soon to get
>> together a patch to reduce the stmts produced on the simpler cases, which
>> would require a different set of costings. I can also imagine further
>> improvements being added for other special cases over time. Having the
>> backends understand every variation would be a little cumbersome.
>>
>> As it stands today, we correctly exit the optimisation if max reduction
>> isn’t supported in hardware, which is what the cost model is expecting.
>>
>>
>> If power wanted to use this implementation, then I think it’d probably
>> need some code in tree-vect-generic.c to implement on emulated max
>> reduction, which would then require updates to the costs modelling of
>> anything that uses max reduction (not just cond reduction). All of that is
>> outside the scope of this patch.
>>
>>
>> Thanks,
>> Alan.
>>
>> On 10/09/2015 23:14, "Bill Schmidt" <wschmidt@linux.vnet.ibm.com> wrote:
>>
>> >Hi Alan,
>> >
>> >The cost modeling of the epilogue code seems pretty target-specific ("An
>> >EQ stmt and an AND stmt, reduction of the max index and a reduction of
>> >the found values, a broadcast of the max value," resulting in two
>> >vector_stmts, one vec_to_scalar, and two scalar_to_vecs).  On powerpc,
>> >this will not represent the cost accurately, and the cost will indeed be
>> >quite different depending on the mode (logarithmic in the number of
>> >elements).  I think that you need to create a new entry in
>> >vect_cost_for_stmt to represent the cost of a COND_REDUCTION, and allow
>> >each target to calculate the cost appropriately.
>> >
>> >(Powerpc doesn't have a max-reduction hardware instruction, but because
>> >the reduction will be only in the epilogue code, it may still be
>> >profitable for us to generate the somewhat expensive reduction sequence
>> >in order to vectorize the loop.  But we definitely want to model it as
>> >costly in and of itself.  Also, the sequence will produce the maximum
>> >value in all positions without a separate broadcast.)
>> >
>> >Thanks,
>> >Bill
>> >
>> >On Thu, 2015-09-10 at 15:51 +0100, Alan Hayward wrote:
>> >> Hi,
>> >> This patch (attached) adds support for vectorizing conditional
>> >>expressions
>> >> (PR 65947), for example:
>> >>
>> >> int condition_reduction (int *a, int min_v)
>> >> {
>> >>   int last = 0;
>> >>   for (int i = 0; i < N; i++)
>> >>     if (a[i] < min_v)
>> >>       last = a[i];
>> >>   return last;
>> >> }
>> >>
>> >> To do this the loop is vectorised to create a vector of data results (ie
>> >> of matching a[i] values). Using an induction variable, an additional
>> >> vector is added containing the indexes where the matches occured. In the
>> >> function epilogue this is reduced to a single max value and then used to
>> >> index into the vector of data results.
>> >> When no values are matched in the loop, the indexes vector will contain
>> >> all zeroes, eventually matching the first entry in the data results
>> >>vector.
>> >>
>> >> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This
>> >>is
>> >> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
>> >> REDUC_MAX_EXPR is not supported for the required modes, failing the
>> >> vectorization. On mips it complains that the required vcond expression
>> >>is
>> >> not supported. It is suggested the relevant backend experts add the
>> >> required backend support.
>> >>
>> >> Using a simple testcase based around a large number of N and run on an
>> >> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
>> >> it's original time.
>> >>
>> >> This patch caused binary differences in three spec2006 binaries on
>> >>aarch64
>> >> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
>> >> showed no improvement or degregation in runtime.
>> >>
>> >>
>> >> In the near future I hope to submit a further patch (as PR 66558) which
>> >> optimises the case where the result is simply the index of the loop, for
>> >> example:
>> >> int condition_reduction (int *a, int min_v)
>> >> {
>> >>   int last = 0;
>> >>   for (int i = 0; i < N; i++)
>> >>     if (a[i] < min_v)
>> >>       last = i;
>> >>   return last;
>> >> }
>> >> In this case a lot of the new code can be optimized away.
>> >>
>> >> I have run check for aarch64, arm and x86 and have seen no regressions.
>> >>
>> >>
>> >> Changelog:
>> >>
>> >>     2015-08-28  Alan Hayward <alan.hayward@arm.com>
>> >>
>> >>         PR tree-optimization/65947
>> >>         * tree-vect-loop.c
>> >>         (vect_is_simple_reduction_1): Find condition reductions.
>> >>         (vect_model_reduction_cost): Add condition reduction costs.
>> >>         (get_initial_def_for_reduction): Add condition reduction initial
>> >> var.
>> >>         (vect_create_epilog_for_reduction): Add condition reduction
>> >>epilog.
>> >>         (vectorizable_reduction): Condition reduction support.
>> >>         * tree-vect-stmts.c
>> >>         (vectorizable_condition): Add vect reduction arg
>> >>         * doc/sourcebuild.texi (Vector-specific attributes): Document
>> >>         vect_max_reduc
>> >>
>> >>     testsuite/Changelog:
>> >>
>> >>         PR tree-optimization/65947
>> >>         * lib/target-supports.exp
>> >>         (check_effective_target_vect_max_reduc): Add.
>> >>         * gcc.dg/vect/pr65947-1.c: New test.
>> >>         * gcc.dg/vect/pr65947-2.c: New test.
>> >>         * gcc.dg/vect/pr65947-3.c: New test.
>> >>         * gcc.dg/vect/pr65947-4.c: New test.
>> >>         * gcc.dg/vect/pr65947-5.c: New test.
>> >>         * gcc.dg/vect/pr65947-6.c: New test.
>> >>         * gcc.dg/vect/pr65947-7.c: New test.
>> >>         * gcc.dg/vect/pr65947-8.c: New test.
>> >>         * gcc.dg/vect/pr65947-9.c: New test.
>> >>         * gcc.dg/vect/pr65947-10.c: New test.
>> >>         * gcc.dg/vect/pr65947-11.c: New test.
>> >>
>> >>
>> >>
>> >> Thanks,
>> >> Alan
>> >>
>> >>
>> >
>> >
>>
>>
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-11 13:55       ` Ramana Radhakrishnan
@ 2015-09-11 14:41         ` Richard Sandiford
  2015-09-11 15:14           ` Bill Schmidt
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Sandiford @ 2015-09-11 14:41 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Bill Schmidt, Alan Hayward, gcc-patches

Ramana Radhakrishnan <ramana.gcc@googlemail.com> writes:
> On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
>> Hi Alan,
>>
>> I probably wasn't clear enough.  The implementation in the vectorizer is
>> fine and I'm not asking that to change per target.  What I'm objecting
>> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
>> with vec_to_scalar.  This assumes that the back end will implement a
>> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
>> But those back ends should be free to model the cost of the
>> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
>> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
>> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
>> not be; for powerpc, it certainly will not be.
>>
>> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
>> expansion, and therefore it is not correct for us to explode this in
>> tree-vect-generic.  This would expand the code size without providing
>> any significant optimization opportunity, and could reduce the ability
>> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
>> gimple vectorizers.
>>
>> I apologize if my loose use of language confused the issue.  It isn't
>> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
>> that are used by it.
>>
>> (The costs in powerpc won't be enormous, but they are definitely
>> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
>> instructions, where n is the number of elements in the mode being
>> vectorized.)
>
> IIUC, on AArch64 a reduc_max_expr matches with a single reduction
> operation but on AArch32 Neon a reduc_smax gets implemented as a
> sequence of vpmax instructions which sounds similar to the PowerPC
> example as well. Thus mapping a reduc_smax expression to the cost of a
> vec_to_scalar is probably not right in this particular situation.

But AIUI vec_to_scalar exists to represent reduction operations.
(I see it was also used for strided stores.)  So for better or worse,
I think the interface that Alan's patch uses is the defined interface
for measuring the cost of a reduction.

If a backend implemented reduc_umax_scal_optab in current sources,
without Alan's patch, then that optab would be used for a "natural"
unsigned max reduction (i.e. a reduction of a MAX_EXPR with unsigned
inputs).  vec_to_scalar would be used to weigh the cost of the epilogue
reduction statement in that case.

So if defining a new Power pattern might cause Alan's patch to trigger
in cases where the transformation is actually too expensive, I would
expect the same to be true for a natural umax without Alan's patch.
The two cases ought to underestimate the true cost by the same degree.

In other words, whether the cost interface is flexible enough is
definitely interesting but seems orthogonal to this patch.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-11 14:41         ` Richard Sandiford
@ 2015-09-11 15:14           ` Bill Schmidt
  2015-09-11 15:30             ` Richard Sandiford
  0 siblings, 1 reply; 26+ messages in thread
From: Bill Schmidt @ 2015-09-11 15:14 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Ramana Radhakrishnan, Alan Hayward, gcc-patches

On Fri, 2015-09-11 at 15:29 +0100, Richard Sandiford wrote:
> Ramana Radhakrishnan <ramana.gcc@googlemail.com> writes:
> > On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
> > <wschmidt@linux.vnet.ibm.com> wrote:
> >> Hi Alan,
> >>
> >> I probably wasn't clear enough.  The implementation in the vectorizer is
> >> fine and I'm not asking that to change per target.  What I'm objecting
> >> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
> >> with vec_to_scalar.  This assumes that the back end will implement a
> >> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
> >> But those back ends should be free to model the cost of the
> >> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
> >> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
> >> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
> >> not be; for powerpc, it certainly will not be.
> >>
> >> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
> >> expansion, and therefore it is not correct for us to explode this in
> >> tree-vect-generic.  This would expand the code size without providing
> >> any significant optimization opportunity, and could reduce the ability
> >> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
> >> gimple vectorizers.
> >>
> >> I apologize if my loose use of language confused the issue.  It isn't
> >> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
> >> that are used by it.
> >>
> >> (The costs in powerpc won't be enormous, but they are definitely
> >> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
> >> instructions, where n is the number of elements in the mode being
> >> vectorized.)
> >
> > IIUC, on AArch64 a reduc_max_expr matches with a single reduction
> > operation but on AArch32 Neon a reduc_smax gets implemented as a
> > sequence of vpmax instructions which sounds similar to the PowerPC
> > example as well. Thus mapping a reduc_smax expression to the cost of a
> > vec_to_scalar is probably not right in this particular situation.
> 
> But AIUI vec_to_scalar exists to represent reduction operations.
> (I see it was also used for strided stores.)  So for better or worse,
> I think the interface that Alan's patch uses is the defined interface
> for measuring the cost of a reduction.
>
> If a backend implemented reduc_umax_scal_optab in current sources,
> without Alan's patch, then that optab would be used for a "natural"
> unsigned max reduction (i.e. a reduction of a MAX_EXPR with unsigned
> inputs).  vec_to_scalar would be used to weigh the cost of the epilogue
> reduction statement in that case.
> 
> So if defining a new Power pattern might cause Alan's patch to trigger
> in cases where the transformation is actually too expensive, I would
> expect the same to be true for a natural umax without Alan's patch.
> The two cases ought to underestimate the true cost by the same degree.
> 
> In other words, whether the cost interface is flexible enough is
> definitely interesting but seems orthogonal to this patch.

That's a reasonable argument, but is this not a good opportunity to fix
an incorrect assumption in the vectorizer cost model?  I would prefer
for this issue not to get lost on a technicality.

The vectorizer cost model has many small flaws, and we all need to be
mindful of trying to improve it at every opportunity, rather than
allowing it to continue to degrade.  We just had a big discussion about
improving cost models at the last Cauldron, and my request is consistent
with that direction.

Saying that all reductions have equivalent performance is unlikely to be
true for many platforms.  On PowerPC, for example, a PLUS reduction has
very different cost from a MAX reduction.  If the model isn't
fine-grained enough, let's please be aggressive about fixing it.  I'm
fine if it's a separate patch, but in my mind this shouldn't be allowed
to languish.

Thanks,
Bill

> 
> Thanks,
> Richard
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-11 15:14           ` Bill Schmidt
@ 2015-09-11 15:30             ` Richard Sandiford
  2015-09-11 15:50               ` Bill Schmidt
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Sandiford @ 2015-09-11 15:30 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: Ramana Radhakrishnan, Alan Hayward, gcc-patches

Bill Schmidt <wschmidt@linux.vnet.ibm.com> writes:
> On Fri, 2015-09-11 at 15:29 +0100, Richard Sandiford wrote:
>> Ramana Radhakrishnan <ramana.gcc@googlemail.com> writes:
>> > On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
>> > <wschmidt@linux.vnet.ibm.com> wrote:
>> >> Hi Alan,
>> >>
>> >> I probably wasn't clear enough.  The implementation in the vectorizer is
>> >> fine and I'm not asking that to change per target.  What I'm objecting
>> >> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
>> >> with vec_to_scalar.  This assumes that the back end will implement a
>> >> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
>> >> But those back ends should be free to model the cost of the
>> >> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
>> >> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
>> >> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
>> >> not be; for powerpc, it certainly will not be.
>> >>
>> >> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
>> >> expansion, and therefore it is not correct for us to explode this in
>> >> tree-vect-generic.  This would expand the code size without providing
>> >> any significant optimization opportunity, and could reduce the ability
>> >> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
>> >> gimple vectorizers.
>> >>
>> >> I apologize if my loose use of language confused the issue.  It isn't
>> >> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
>> >> that are used by it.
>> >>
>> >> (The costs in powerpc won't be enormous, but they are definitely
>> >> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
>> >> instructions, where n is the number of elements in the mode being
>> >> vectorized.)
>> >
>> > IIUC, on AArch64 a reduc_max_expr matches with a single reduction
>> > operation but on AArch32 Neon a reduc_smax gets implemented as a
>> > sequence of vpmax instructions which sounds similar to the PowerPC
>> > example as well. Thus mapping a reduc_smax expression to the cost of a
>> > vec_to_scalar is probably not right in this particular situation.
>> 
>> But AIUI vec_to_scalar exists to represent reduction operations.
>> (I see it was also used for strided stores.)  So for better or worse,
>> I think the interface that Alan's patch uses is the defined interface
>> for measuring the cost of a reduction.
>>
>> If a backend implemented reduc_umax_scal_optab in current sources,
>> without Alan's patch, then that optab would be used for a "natural"
>> unsigned max reduction (i.e. a reduction of a MAX_EXPR with unsigned
>> inputs).  vec_to_scalar would be used to weigh the cost of the epilogue
>> reduction statement in that case.
>> 
>> So if defining a new Power pattern might cause Alan's patch to trigger
>> in cases where the transformation is actually too expensive, I would
>> expect the same to be true for a natural umax without Alan's patch.
>> The two cases ought to underestimate the true cost by the same degree.
>> 
>> In other words, whether the cost interface is flexible enough is
>> definitely interesting but seems orthogonal to this patch.
>
> That's a reasonable argument, but is this not a good opportunity to fix
> an incorrect assumption in the vectorizer cost model?  I would prefer
> for this issue not to get lost on a technicality.

I think it's more than technicality though.  I don't think it should be
Alan's responsibility to extend the cost model when (a) his patch uses the
current model in the way that it was intended to be used (at least AIUI) and
(b) in this case, the motivating example for the new model is a pattern
that hasn't been written yet. :-)

So...

> The vectorizer cost model has many small flaws, and we all need to be
> mindful of trying to improve it at every opportunity, rather than
> allowing it to continue to degrade.  We just had a big discussion about
> improving cost models at the last Cauldron, and my request is consistent
> with that direction.
>
> Saying that all reductions have equivalent performance is unlikely to be
> true for many platforms.  On PowerPC, for example, a PLUS reduction has
> very different cost from a MAX reduction.  If the model isn't
> fine-grained enough, let's please be aggressive about fixing it.  I'm
> fine if it's a separate patch, but in my mind this shouldn't be allowed
> to languish.

...I agree that the general vectoriser cost model could probably be
improved, but it seems fairer for that improvement to be done by whoever
adds the patterns that need it.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-11 15:30             ` Richard Sandiford
@ 2015-09-11 15:50               ` Bill Schmidt
  2015-09-11 16:54                 ` Ramana Radhakrishnan
  0 siblings, 1 reply; 26+ messages in thread
From: Bill Schmidt @ 2015-09-11 15:50 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Ramana Radhakrishnan, Alan Hayward, gcc-patches

On Fri, 2015-09-11 at 16:28 +0100, Richard Sandiford wrote:
> Bill Schmidt <wschmidt@linux.vnet.ibm.com> writes:
> > On Fri, 2015-09-11 at 15:29 +0100, Richard Sandiford wrote:
> >> Ramana Radhakrishnan <ramana.gcc@googlemail.com> writes:
> >> > On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
> >> > <wschmidt@linux.vnet.ibm.com> wrote:
> >> >> Hi Alan,
> >> >>
> >> >> I probably wasn't clear enough.  The implementation in the vectorizer is
> >> >> fine and I'm not asking that to change per target.  What I'm objecting
> >> >> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
> >> >> with vec_to_scalar.  This assumes that the back end will implement a
> >> >> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
> >> >> But those back ends should be free to model the cost of the
> >> >> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
> >> >> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
> >> >> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
> >> >> not be; for powerpc, it certainly will not be.
> >> >>
> >> >> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
> >> >> expansion, and therefore it is not correct for us to explode this in
> >> >> tree-vect-generic.  This would expand the code size without providing
> >> >> any significant optimization opportunity, and could reduce the ability
> >> >> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
> >> >> gimple vectorizers.
> >> >>
> >> >> I apologize if my loose use of language confused the issue.  It isn't
> >> >> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
> >> >> that are used by it.
> >> >>
> >> >> (The costs in powerpc won't be enormous, but they are definitely
> >> >> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
> >> >> instructions, where n is the number of elements in the mode being
> >> >> vectorized.)
> >> >
> >> > IIUC, on AArch64 a reduc_max_expr matches with a single reduction
> >> > operation but on AArch32 Neon a reduc_smax gets implemented as a
> >> > sequence of vpmax instructions which sounds similar to the PowerPC
> >> > example as well. Thus mapping a reduc_smax expression to the cost of a
> >> > vec_to_scalar is probably not right in this particular situation.
> >> 
> >> But AIUI vec_to_scalar exists to represent reduction operations.
> >> (I see it was also used for strided stores.)  So for better or worse,
> >> I think the interface that Alan's patch uses is the defined interface
> >> for measuring the cost of a reduction.
> >>
> >> If a backend implemented reduc_umax_scal_optab in current sources,
> >> without Alan's patch, then that optab would be used for a "natural"
> >> unsigned max reduction (i.e. a reduction of a MAX_EXPR with unsigned
> >> inputs).  vec_to_scalar would be used to weigh the cost of the epilogue
> >> reduction statement in that case.
> >> 
> >> So if defining a new Power pattern might cause Alan's patch to trigger
> >> in cases where the transformation is actually too expensive, I would
> >> expect the same to be true for a natural umax without Alan's patch.
> >> The two cases ought to underestimate the true cost by the same degree.
> >> 
> >> In other words, whether the cost interface is flexible enough is
> >> definitely interesting but seems orthogonal to this patch.
> >
> > That's a reasonable argument, but is this not a good opportunity to fix
> > an incorrect assumption in the vectorizer cost model?  I would prefer
> > for this issue not to get lost on a technicality.
> 
> I think it's more than technicality though.  I don't think it should be
> Alan's responsibility to extend the cost model when (a) his patch uses the
> current model in the way that it was intended to be used (at least AIUI) and
> (b) in this case, the motivating example for the new model is a pattern
> that hasn't been written yet. :-)

Agreed.  However, the original patch description said in essence, this
is good for everybody, powerpc and x86 should go and implement their
patterns and use it.  It turns out not to be so simple, unfortunately.

> 
> So...
> 
> > The vectorizer cost model has many small flaws, and we all need to be
> > mindful of trying to improve it at every opportunity, rather than
> > allowing it to continue to degrade.  We just had a big discussion about
> > improving cost models at the last Cauldron, and my request is consistent
> > with that direction.
> >
> > Saying that all reductions have equivalent performance is unlikely to be
> > true for many platforms.  On PowerPC, for example, a PLUS reduction has
> > very different cost from a MAX reduction.  If the model isn't
> > fine-grained enough, let's please be aggressive about fixing it.  I'm
> > fine if it's a separate patch, but in my mind this shouldn't be allowed
> > to languish.
> 
> ...I agree that the general vectoriser cost model could probably be
> improved, but it seems fairer for that improvement to be done by whoever
> adds the patterns that need it.

All right.  But in response to Ramana's comment, are all relevant
reductions of similar cost on each ARM platform?  Obviously they don't
have the same cost on different platforms, but the question is whether a
reduc_plus, reduc_max, etc., has identical cost on each individual
platform.  If not, ARM may have a concern as well.  I don't know the
situation for x86 either.

I assume that the IBM folks that originally created a lot of this stuff
are responsible for how reductions are treated.  It looks like there is
some commentary that a reduction will only be used if it is a single
instruction (not sure this is true).  In any case, by not implementing
the smax patterns for some reason, this kept us from seeing a problem.

So I'll put this on my list of windmills to tilt at in a few weeks.

Thanks for the discussion,
Bill

> 
> Thanks,
> Richard
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-11 15:50               ` Bill Schmidt
@ 2015-09-11 16:54                 ` Ramana Radhakrishnan
  2015-09-15 11:47                   ` Richard Biener
  0 siblings, 1 reply; 26+ messages in thread
From: Ramana Radhakrishnan @ 2015-09-11 16:54 UTC (permalink / raw)
  To: Bill Schmidt, Richard Sandiford
  Cc: Ramana Radhakrishnan, Alan Hayward, gcc-patches

>>> Saying that all reductions have equivalent performance is unlikely to be
>>> true for many platforms.  On PowerPC, for example, a PLUS reduction has
>>> very different cost from a MAX reduction.  If the model isn't
>>> fine-grained enough, let's please be aggressive about fixing it.  I'm
>>> fine if it's a separate patch, but in my mind this shouldn't be allowed
>>> to languish.
>>
>> ...I agree that the general vectoriser cost model could probably be
>> improved, but it seems fairer for that improvement to be done by whoever
>> adds the patterns that need it.
> 
> All right.  But in response to Ramana's comment, are all relevant
> reductions of similar cost on each ARM platform?  Obviously they don't
> have the same cost on different platforms, but the question is whether a
> reduc_plus, reduc_max, etc., has identical cost on each individual
> platform.  If not, ARM may have a concern as well.  I don't know the
> situation for x86 either.

From cauldron I have a note that we need to look at the vectorizer cost model
for both the ARM and AArch64 backends and move away from
the set of magic constants that it currently returns.

On AArch32, all the reduc_ patterns are emulated with pair-wise operations
while on AArch64 they aren't. Thus they aren't likely to be the same cost as a
standard vector arithmetic instruction. What difference this makes in practice
remains to be seen, however the first step is moving towards the newer vectorizer
cost model interface.

I'll put this on a list of things for us to look at but I'm not sure who/when
will get around to looking at this.

regards
Ramana

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-11 13:23     ` Bill Schmidt
  2015-09-11 13:55       ` Ramana Radhakrishnan
@ 2015-09-14  9:50       ` Alan Lawrence
  2015-09-14 14:20         ` Bill Schmidt
  1 sibling, 1 reply; 26+ messages in thread
From: Alan Lawrence @ 2015-09-14  9:50 UTC (permalink / raw)
  To: gcc-patches, Bill Schmidt, Ramana Radhakrishnan,
	Richard Sandiford, Alan Hayward

On 11/09/15 14:19, Bill Schmidt wrote:
>
> A secondary concern for powerpc is that REDUC_MAX_EXPR produces a scalar
> that has to be broadcast back to a vector, and the best way to implement
> it for us already has the max value in all positions of a vector.  But
> that is something we should be able to fix with simplify-rtx in the back
> end.

Reading this thread again, this bit stands out as unaddressed. Yes PowerPC can 
"fix" this with simplify-rtx, but the vector cost model will not take this into 
account - it will think that the broadcast-back-to-a-vector requires an extra 
operation after the reduction, whereas in fact it will not.

Does that suggest we should have a new entry in vect_cost_for_stmt for 
vec_to_scalar-and-back-to-vector (that defaults to vec_to_scalar+scalar_to_vec, 
but on some architectures e.g. PowerPC would be the same as vec_to_scalar)?

(I agree that if that's the limit of how "different" conditional reductions may 
be between architectures, then we should not have a vec_cost_for_stmt for a 
whole conditional reduction.)

Cheers, Alan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-14  9:50       ` Alan Lawrence
@ 2015-09-14 14:20         ` Bill Schmidt
  0 siblings, 0 replies; 26+ messages in thread
From: Bill Schmidt @ 2015-09-14 14:20 UTC (permalink / raw)
  To: Alan Lawrence
  Cc: gcc-patches, Ramana Radhakrishnan, Richard Sandiford, Alan Hayward

On Mon, 2015-09-14 at 10:47 +0100, Alan Lawrence wrote:
> On 11/09/15 14:19, Bill Schmidt wrote:
> >
> > A secondary concern for powerpc is that REDUC_MAX_EXPR produces a scalar
> > that has to be broadcast back to a vector, and the best way to implement
> > it for us already has the max value in all positions of a vector.  But
> > that is something we should be able to fix with simplify-rtx in the back
> > end.
> 
> Reading this thread again, this bit stands out as unaddressed. Yes PowerPC can 
> "fix" this with simplify-rtx, but the vector cost model will not take this into 
> account - it will think that the broadcast-back-to-a-vector requires an extra 
> operation after the reduction, whereas in fact it will not.
> 
> Does that suggest we should have a new entry in vect_cost_for_stmt for 
> vec_to_scalar-and-back-to-vector (that defaults to vec_to_scalar+scalar_to_vec, 
> but on some architectures e.g. PowerPC would be the same as vec_to_scalar)?

Ideally I think we need to do something for that, yeah.  The back ends
could try to patch up the cost when finishing costs for the loop body,
epilogue, etc., but that would be somewhat of a guess; it would be
better to just be up-front that we're doing a reduction to a vector.

As part of this, I dislike the term "vec_to_scalar", which is somewhat
vague about what's going on (it sound like it could mean a vector
extract operation, which is more of an inverse of "scalar_to_vec" than a
reduction is).  GIMPLE calls it a reduction, and the optabs call it a
reduction, so we ought to call it a reduction in the vectorizer cost
model, too.

To cover our bases for PowerPC and AArch32, we probably need:

  plus_reduc_to_scalar
  plus_reduc_to_vector
  minmax_reduc_to_scalar
  minmax_reduc_to_vector

although I think plus_reduc_to_vector wouldn't be used yet, so could be
omitted.  If we go this route, then at that time we would change your
code to use minmax_reduc_to_vector and let the back ends determine
whether that requires a scalar reduction followed by a broadcast, or
whether it would be performed directly.

Using direct reduction to vector for MIN and MAX on PowerPC would be a
big cost savings over scalar reduction/broadcast.

Thanks,
Bill

> 
> (I agree that if that's the limit of how "different" conditional reductions may 
> be between architectures, then we should not have a vec_cost_for_stmt for a 
> whole conditional reduction.)
> 
> Cheers, Alan
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-11 16:54                 ` Ramana Radhakrishnan
@ 2015-09-15 11:47                   ` Richard Biener
  0 siblings, 0 replies; 26+ messages in thread
From: Richard Biener @ 2015-09-15 11:47 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Bill Schmidt, Richard Sandiford, Ramana Radhakrishnan,
	Alan Hayward, gcc-patches

On Fri, Sep 11, 2015 at 6:29 PM, Ramana Radhakrishnan
<ramana.radhakrishnan@foss.arm.com> wrote:
>
>>>> Saying that all reductions have equivalent performance is unlikely to be
>>>> true for many platforms.  On PowerPC, for example, a PLUS reduction has
>>>> very different cost from a MAX reduction.  If the model isn't
>>>> fine-grained enough, let's please be aggressive about fixing it.  I'm
>>>> fine if it's a separate patch, but in my mind this shouldn't be allowed
>>>> to languish.
>>>
>>> ...I agree that the general vectoriser cost model could probably be
>>> improved, but it seems fairer for that improvement to be done by whoever
>>> adds the patterns that need it.
>>
>> All right.  But in response to Ramana's comment, are all relevant
>> reductions of similar cost on each ARM platform?  Obviously they don't
>> have the same cost on different platforms, but the question is whether a
>> reduc_plus, reduc_max, etc., has identical cost on each individual
>> platform.  If not, ARM may have a concern as well.  I don't know the
>> situation for x86 either.
>
> From cauldron I have a note that we need to look at the vectorizer cost model
> for both the ARM and AArch64 backends and move away from
> the set of magic constants that it currently returns.

Indeed.

> On AArch32, all the reduc_ patterns are emulated with pair-wise operations
> while on AArch64 they aren't. Thus they aren't likely to be the same cost as a
> standard vector arithmetic instruction. What difference this makes in practice
> remains to be seen, however the first step is moving towards the newer vectorizer
> cost model interface.
>
> I'll put this on a list of things for us to look at but I'm not sure who/when
> will get around to looking at this.

Note that the target should be able to "see" the cond reduction via the
add_stmt_cost hook calls.  It should see 'where' as the epilogue and
'kind' as a hint - recording stmt_info which should have sufficient info
for a good guess.  At finish_cost () time the target can compute a proper
overall cost.

Yes, the vectorizer "IL" (the stmt-infos) isn't very powerful and esp. for
code not corresponding to existing gimple stmts it doesn't even exist.
We need to improve in that area to better represent the desired transform.

Richard.

> regards
> Ramana

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-10 15:11 [PATCH] vectorizing conditional expressions (PR tree-optimization/65947) Alan Hayward
  2015-09-10 22:34 ` Bill Schmidt
@ 2015-09-15 12:10 ` Richard Biener
  2015-09-15 15:41   ` Alan Hayward
  2015-09-15 12:12 ` Richard Biener
  2 siblings, 1 reply; 26+ messages in thread
From: Richard Biener @ 2015-09-15 12:10 UTC (permalink / raw)
  To: Alan Hayward; +Cc: gcc-patches

On Thu, Sep 10, 2015 at 4:51 PM, Alan Hayward <alan.hayward@arm.com> wrote:
> Hi,
> This patch (attached) adds support for vectorizing conditional expressions
> (PR 65947), for example:
>
> int condition_reduction (int *a, int min_v)
> {
>   int last = 0;
>   for (int i = 0; i < N; i++)
>     if (a[i] < min_v)
>       last = a[i];
>   return last;
> }
>
> To do this the loop is vectorised to create a vector of data results (ie
> of matching a[i] values). Using an induction variable, an additional
> vector is added containing the indexes where the matches occured. In the
> function epilogue this is reduced to a single max value and then used to
> index into the vector of data results.
> When no values are matched in the loop, the indexes vector will contain
> all zeroes, eventually matching the first entry in the data results vector.
>
> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This is
> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
> REDUC_MAX_EXPR is not supported for the required modes, failing the
> vectorization. On mips it complains that the required vcond expression is
> not supported. It is suggested the relevant backend experts add the
> required backend support.
>
> Using a simple testcase based around a large number of N and run on an
> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
> it's original time.
>
> This patch caused binary differences in three spec2006 binaries on aarch64
> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
> showed no improvement or degregation in runtime.
>
>
> In the near future I hope to submit a further patch (as PR 66558) which
> optimises the case where the result is simply the index of the loop, for
> example:
> int condition_reduction (int *a, int min_v)
> {
>   int last = 0;
>   for (int i = 0; i < N; i++)
>     if (a[i] < min_v)
>       last = i;
>   return last;
> }
> In this case a lot of the new code can be optimized away.
>
> I have run check for aarch64, arm and x86 and have seen no regressions.

Now comments on the patch itself.

+      if (code == COND_EXPR)
+       *v_reduc_type = COND_REDUCTION;

so why not simply use COND_EXPR instead of the new v_reduc_type?

+  if (check_reduction && code != COND_EXPR &&
+      vect_is_slp_reduction (loop_info, phi, def_stmt))

&&s go to the next line

+             /* Reduction of the max index and a reduction of the found
+                values.  */
+             epilogue_cost += add_stmt_cost (target_cost_data, 1,
+                                             vec_to_scalar, stmt_info, 0,
+                                             vect_epilogue);

vec_to_scalar once isn't what the comment suggests.  Instead the
comment suggests twice what a regular reduction would do
but I guess we can "hide" the vec_to_scalar cost and "merge" it
with the broadcast.  Thus make the above two vector_stmt costs?

+             /* A broadcast of the max value.  */
+             epilogue_cost += add_stmt_cost (target_cost_data, 2,
+                                             scalar_to_vec, stmt_info, 0,
+                                             vect_epilogue);

comment suggests a single broadcast.

@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi)
          the final vector of induction results:  */
       exit_phi = NULL;
       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
-        {
+       {
          gimple use_stmt = USE_STMT (use_p);
          if (is_gimple_debug (use_stmt))
            continue;

please avoid unrelated whitespace changes.

+      case COND_EXPR:
+       if (v_reduc_type == COND_REDUCTION)
+         {
...
+       /* Fall through.  */
+
       case MIN_EXPR:
       case MAX_EXPR:
-      case COND_EXPR:

aww, so we could already handle COND_EXPR reductions?  How do they
differ from what you add?  Did you see if that path is even exercised today?

+           /* Create a vector of {init_value, 0, 0, 0...}.  */
+           vec<constructor_elt, va_gc> *v;
+           vec_alloc (v, nunits);
+           CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
+           if (SCALAR_FLOAT_TYPE_P (scalar_type))
+             for (i = 1; i < nunits; ++i)
+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+                                       build_real (scalar_type, dconst0));
+           else
+             for (i = 1; i < nunits; ++i)
+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+                                       build_int_cst (scalar_type, 0));
+           init_def = build_constructor (vectype, v);

you can unify the float/int case by using build_zero_cst (scalar_type).
Note that you should build a vector constant instead of a constructor
if init_val is a constant.  The convenient way is to build the vector
elements into a tree[] array and use build_vector_stat in that case.

+      /* Find maximum value from the vector of found indexes.  */
+      tree max_index = make_temp_ssa_name (index_scalar_type, NULL, "");

just use make_ssa_name (index_scalar_type);

+      /* Convert the vector of data to the same type as the EQ.  */
+      tree vec_data_cast;
+      if ( TYPE_UNSIGNED (index_vec_type))
+       {

How come it never happens the element
sizes do not match?  (int index type and double data type?)

+      /* Where the max index occured, use the value from the data vector.  */
+      tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL, "");
+      gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR,
+                                                vec_compare, vec_data_cast);

that is, don't you need to do some widening/shortening on the comparison result?
(also what happens in the case "or all of the values"?)

Definitely too much VIEW_CONVERT_MAGIC here for my taste ;)

+   This function also handles reduction of condition expressions, for example:
+     for (int i = 0; i < N; i++)
+       if (a[i] < value)
+        last = a[i];
+   This is handled by vectorising the loop and creating an additional vector
+   containing the loop indexes for which "a[i] < value" was true.  In the
+   function epilogue this is reduced to a single max value and then used to
+   index into the vector of results.

I miss a comment that shows the kind of code we transform this to.
"an additional vector containing the loop indexes" can't work - the vector
will not be large enough ;)  Naiively I would have made 'last' a vector,
performing the reduction element-wise and in the epilogue reduce
'last' itself.  And it looks like we are already doing that for

int foo (int *a)
{
  int val = 0;
  for (int i = 0; i < 1024; ++i)
    if (a[i] > val)
      val = a[i];
  return val;
}

I must be missing something.  Yes, I think we can't do index reduction yet,
but pr65947-10.c is alrady handled?

Stopping here.

Thanks,
Richard.

>
> Changelog:
>
>     2015-08-28  Alan Hayward <alan.hayward@arm.com>
>
>         PR tree-optimization/65947
>         * tree-vect-loop.c
>         (vect_is_simple_reduction_1): Find condition reductions.
>         (vect_model_reduction_cost): Add condition reduction costs.
>         (get_initial_def_for_reduction): Add condition reduction initial
> var.
>         (vect_create_epilog_for_reduction): Add condition reduction epilog.
>         (vectorizable_reduction): Condition reduction support.
>         * tree-vect-stmts.c
>         (vectorizable_condition): Add vect reduction arg
>         * doc/sourcebuild.texi (Vector-specific attributes): Document
>         vect_max_reduc
>
>     testsuite/Changelog:
>
>         PR tree-optimization/65947
>         * lib/target-supports.exp
>         (check_effective_target_vect_max_reduc): Add.
>         * gcc.dg/vect/pr65947-1.c: New test.
>         * gcc.dg/vect/pr65947-2.c: New test.
>         * gcc.dg/vect/pr65947-3.c: New test.
>         * gcc.dg/vect/pr65947-4.c: New test.
>         * gcc.dg/vect/pr65947-5.c: New test.
>         * gcc.dg/vect/pr65947-6.c: New test.
>         * gcc.dg/vect/pr65947-7.c: New test.
>         * gcc.dg/vect/pr65947-8.c: New test.
>         * gcc.dg/vect/pr65947-9.c: New test.
>         * gcc.dg/vect/pr65947-10.c: New test.
>         * gcc.dg/vect/pr65947-11.c: New test.
>
>
>
> Thanks,
> Alan
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-10 15:11 [PATCH] vectorizing conditional expressions (PR tree-optimization/65947) Alan Hayward
  2015-09-10 22:34 ` Bill Schmidt
  2015-09-15 12:10 ` Richard Biener
@ 2015-09-15 12:12 ` Richard Biener
  2 siblings, 0 replies; 26+ messages in thread
From: Richard Biener @ 2015-09-15 12:12 UTC (permalink / raw)
  To: Alan Hayward; +Cc: gcc-patches

On Thu, Sep 10, 2015 at 4:51 PM, Alan Hayward <alan.hayward@arm.com> wrote:
> Hi,
> This patch (attached) adds support for vectorizing conditional expressions
> (PR 65947), for example:
>
> int condition_reduction (int *a, int min_v)
> {
>   int last = 0;
>   for (int i = 0; i < N; i++)
>     if (a[i] < min_v)
>       last = a[i];
>   return last;
> }
>
> To do this the loop is vectorised to create a vector of data results (ie
> of matching a[i] values). Using an induction variable, an additional
> vector is added containing the indexes where the matches occured. In the
> function epilogue this is reduced to a single max value and then used to
> index into the vector of data results.
> When no values are matched in the loop, the indexes vector will contain
> all zeroes, eventually matching the first entry in the data results vector.
>
> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This is
> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
> REDUC_MAX_EXPR is not supported for the required modes, failing the
> vectorization. On mips it complains that the required vcond expression is
> not supported. It is suggested the relevant backend experts add the
> required backend support.
>
> Using a simple testcase based around a large number of N and run on an
> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
> it's original time.
>
> This patch caused binary differences in three spec2006 binaries on aarch64
> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
> showed no improvement or degregation in runtime.
>
>
> In the near future I hope to submit a further patch (as PR 66558) which
> optimises the case where the result is simply the index of the loop, for
> example:
> int condition_reduction (int *a, int min_v)
> {
>   int last = 0;
>   for (int i = 0; i < N; i++)
>     if (a[i] < min_v)
>       last = i;
>   return last;
> }
> In this case a lot of the new code can be optimized away.
>
> I have run check for aarch64, arm and x86 and have seen no regressions.

Now comments on the patch itself.

+      if (code == COND_EXPR)
+       *v_reduc_type = COND_REDUCTION;

so why not simply use COND_EXPR instead of the new v_reduc_type?

+  if (check_reduction && code != COND_EXPR &&
+      vect_is_slp_reduction (loop_info, phi, def_stmt))

&&s go to the next line

+             /* Reduction of the max index and a reduction of the found
+                values.  */
+             epilogue_cost += add_stmt_cost (target_cost_data, 1,
+                                             vec_to_scalar, stmt_info, 0,
+                                             vect_epilogue);

vec_to_scalar once isn't what the comment suggests.  Instead the
comment suggests twice what a regular reduction would do
but I guess we can "hide" the vec_to_scalar cost and "merge" it
with the broadcast.  Thus make the above two vector_stmt costs?

+             /* A broadcast of the max value.  */
+             epilogue_cost += add_stmt_cost (target_cost_data, 2,
+                                             scalar_to_vec, stmt_info, 0,
+                                             vect_epilogue);

comment suggests a single broadcast.

@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi)
          the final vector of induction results:  */
       exit_phi = NULL;
       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
-        {
+       {
          gimple use_stmt = USE_STMT (use_p);
          if (is_gimple_debug (use_stmt))
            continue;

please avoid unrelated whitespace changes.

+      case COND_EXPR:
+       if (v_reduc_type == COND_REDUCTION)
+         {
...
+       /* Fall through.  */
+
       case MIN_EXPR:
       case MAX_EXPR:
-      case COND_EXPR:

aww, so we could already handle COND_EXPR reductions?  How do they
differ from what you add?  Did you see if that path is even exercised today?

+           /* Create a vector of {init_value, 0, 0, 0...}.  */
+           vec<constructor_elt, va_gc> *v;
+           vec_alloc (v, nunits);
+           CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
+           if (SCALAR_FLOAT_TYPE_P (scalar_type))
+             for (i = 1; i < nunits; ++i)
+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+                                       build_real (scalar_type, dconst0));
+           else
+             for (i = 1; i < nunits; ++i)
+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+                                       build_int_cst (scalar_type, 0));
+           init_def = build_constructor (vectype, v);

you can unify the float/int case by using build_zero_cst (scalar_type).
Note that you should build a vector constant instead of a constructor
if init_val is a constant.  The convenient way is to build the vector
elements into a tree[] array and use build_vector_stat in that case.

+      /* Find maximum value from the vector of found indexes.  */
+      tree max_index = make_temp_ssa_name (index_scalar_type, NULL, "");

just use make_ssa_name (index_scalar_type);

+      /* Convert the vector of data to the same type as the EQ.  */
+      tree vec_data_cast;
+      if ( TYPE_UNSIGNED (index_vec_type))
+       {

How come it never happens the element
sizes do not match?  (int index type and double data type?)

+      /* Where the max index occured, use the value from the data vector.  */
+      tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL, "");
+      gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR,
+                                                vec_compare, vec_data_cast);

that is, don't you need to do some widening/shortening on the comparison result?
(also what happens in the case "or all of the values"?)

Definitely too much VIEW_CONVERT_MAGIC here for my taste ;)

+   This function also handles reduction of condition expressions, for example:
+     for (int i = 0; i < N; i++)
+       if (a[i] < value)
+        last = a[i];
+   This is handled by vectorising the loop and creating an additional vector
+   containing the loop indexes for which "a[i] < value" was true.  In the
+   function epilogue this is reduced to a single max value and then used to
+   index into the vector of results.

I miss a comment that shows the kind of code we transform this to.
"an additional vector containing the loop indexes" can't work - the vector
will not be large enough ;)  Naiively I would have made 'last' a vector,
performing the reduction element-wise and in the epilogue reduce
'last' itself.  And it looks like we are already doing that for

int foo (int *a)
{
  int val = 0;
  for (int i = 0; i < 1024; ++i)
    if (a[i] > val)
      val = a[i];
  return val;
}

I must be missing something.  Yes, I think we can't do index reduction yet,
but pr65947-10.c is alrady handled?

Stopping here.

Thanks,
Richard.

>
> Changelog:
>
>     2015-08-28  Alan Hayward <alan.hayward@arm.com>
>
>         PR tree-optimization/65947
>         * tree-vect-loop.c
>         (vect_is_simple_reduction_1): Find condition reductions.
>         (vect_model_reduction_cost): Add condition reduction costs.
>         (get_initial_def_for_reduction): Add condition reduction initial
> var.
>         (vect_create_epilog_for_reduction): Add condition reduction epilog.
>         (vectorizable_reduction): Condition reduction support.
>         * tree-vect-stmts.c
>         (vectorizable_condition): Add vect reduction arg
>         * doc/sourcebuild.texi (Vector-specific attributes): Document
>         vect_max_reduc
>
>     testsuite/Changelog:
>
>         PR tree-optimization/65947
>         * lib/target-supports.exp
>         (check_effective_target_vect_max_reduc): Add.
>         * gcc.dg/vect/pr65947-1.c: New test.
>         * gcc.dg/vect/pr65947-2.c: New test.
>         * gcc.dg/vect/pr65947-3.c: New test.
>         * gcc.dg/vect/pr65947-4.c: New test.
>         * gcc.dg/vect/pr65947-5.c: New test.
>         * gcc.dg/vect/pr65947-6.c: New test.
>         * gcc.dg/vect/pr65947-7.c: New test.
>         * gcc.dg/vect/pr65947-8.c: New test.
>         * gcc.dg/vect/pr65947-9.c: New test.
>         * gcc.dg/vect/pr65947-10.c: New test.
>         * gcc.dg/vect/pr65947-11.c: New test.
>
>
>
> Thanks,
> Alan
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-15 12:10 ` Richard Biener
@ 2015-09-15 15:41   ` Alan Hayward
  2015-09-18 12:22     ` Richard Biener
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Hayward @ 2015-09-15 15:41 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On 15/09/2015 13:09, "Richard Biener" <richard.guenther@gmail.com> wrote:

>
>Now comments on the patch itself.
>
>+      if (code == COND_EXPR)
>+       *v_reduc_type = COND_REDUCTION;
>
>so why not simply use COND_EXPR instead of the new v_reduc_type?

v_reduc_type is also dependant on check_reduction (which comes from
!nested_cycle in vectorizable_reduction).
It seemed messy to keep on checking for both those things throughout.

In my patch to catch simpler condition reductions, I’ll be adding another
value to this enum too. v_reduc_type will be set to this new value based
on the same properties for COND_REDUCTION plus some additional constraints.

>
>+  if (check_reduction && code != COND_EXPR &&
>+      vect_is_slp_reduction (loop_info, phi, def_stmt))
>
>&&s go to the next line

ok

>
>+             /* Reduction of the max index and a reduction of the found
>+                values.  */
>+             epilogue_cost += add_stmt_cost (target_cost_data, 1,
>+                                             vec_to_scalar, stmt_info, 0,
>+                                             vect_epilogue);
>
>vec_to_scalar once isn't what the comment suggests.  Instead the
>comment suggests twice what a regular reduction would do
>but I guess we can "hide" the vec_to_scalar cost and "merge" it
>with the broadcast.  Thus make the above two vector_stmt costs?
>
>+             /* A broadcast of the max value.  */
>+             epilogue_cost += add_stmt_cost (target_cost_data, 2,
>+                                             scalar_to_vec, stmt_info, 0,
>+                                             vect_epilogue);
>
>comment suggests a single broadcast.

I’ve made a copy/paste error here. Just need to swap the 1 and the 2.

>
>@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi)
>          the final vector of induction results:  */
>       exit_phi = NULL;
>       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
>-        {
>+       {
>          gimple use_stmt = USE_STMT (use_p);
>          if (is_gimple_debug (use_stmt))
>            continue;
>
>please avoid unrelated whitespace changes.

Ok. I was changing “8 spaces” to a tab, but happy to revert.

>
>+      case COND_EXPR:
>+       if (v_reduc_type == COND_REDUCTION)
>+         {
>...
>+       /* Fall through.  */
>+
>       case MIN_EXPR:
>       case MAX_EXPR:
>-      case COND_EXPR:
>
>aww, so we could already handle COND_EXPR reductions?  How do they
>differ from what you add?  Did you see if that path is even exercised
>today?

Today, COND_EXPRs are only supported when they are nested inside a loop.
See the vect-cond-*.c tests.
For example:

for (j = 0; j < M; j++)
    {
  x = x_in[j];
  curr_a = a[0];

    for (i = 0; i < N; i++)
      {
        next_a = a[i+1];
        curr_a = x > c[i] ? curr_a : next_a;
      }
  x_out[j] = curr_a;
}

In that case, only the outer loop is vectorised.

>
>+           /* Create a vector of {init_value, 0, 0, 0...}.  */
>+           vec<constructor_elt, va_gc> *v;
>+           vec_alloc (v, nunits);
>+           CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
>+           if (SCALAR_FLOAT_TYPE_P (scalar_type))
>+             for (i = 1; i < nunits; ++i)
>+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>+                                       build_real (scalar_type,
>dconst0));
>+           else
>+             for (i = 1; i < nunits; ++i)
>+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>+                                       build_int_cst (scalar_type, 0));
>+           init_def = build_constructor (vectype, v);
>
>you can unify the float/int case by using build_zero_cst (scalar_type).
>Note that you should build a vector constant instead of a constructor
>if init_val is a constant.  The convenient way is to build the vector
>elements into a tree[] array and use build_vector_stat in that case.

Ok, will switch to build_zero_cst.
Also, I will switch my vector to  {init_value, init_value, init_value…}.
I had {init_value, 0, 0, 0…} because I was going have the option of
ADD_REDUC_EXPR,
But that got removed along the way.

>
>+      /* Find maximum value from the vector of found indexes.  */
>+      tree max_index = make_temp_ssa_name (index_scalar_type, NULL, "");
>
>just use make_ssa_name (index_scalar_type);

Ok

>
>+      /* Convert the vector of data to the same type as the EQ.  */
>+      tree vec_data_cast;
>+      if ( TYPE_UNSIGNED (index_vec_type))
>+       {
>
>How come it never happens the element
>sizes do not match?  (int index type and double data type?)

This was a little unclear.
The induction index is originally created as an unsigned version of the
type as the data vector.
(see the definition of cr_index_vector_type in vectorizable_reduction(),
which is then used to create cond_name)

I will remove the if and replace with a gcc_checking_assert(TYPE_UNSIGNED
(index_vec_type))

>
>+      /* Where the max index occured, use the value from the data
>vector.  */
>+      tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL,
>"");
>+      gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR,
>+                                                vec_compare,
>vec_data_cast);
>
>that is, don't you need to do some widening/shortening on the comparison
>result?
>(also what happens in the case "or all of the values"?)
>
>Definitely too much VIEW_CONVERT_MAGIC here for my taste ;)

Given the induction index is the same size as the data, this will be ok.

I don’t like the VIEW_CONVERT_MAGIC either! But I couldn’t see any other
way of making it work.

The case of "all of the values” is when there were no matches in the loop.
When this happens, the data vector will contain only the default values
{init_value, 0, 0, 0…}
(… once I change my code due to the previous comment, we’ll have
{init_value, init_value, init_value…} ).
The REDUC_MAX_EXPR will turn either of those vectors into a “init_value”.

>
>+   This function also handles reduction of condition expressions, for
>example:
>+     for (int i = 0; i < N; i++)
>+       if (a[i] < value)
>+        last = a[i];
>+   This is handled by vectorising the loop and creating an additional
>vector
>+   containing the loop indexes for which "a[i] < value" was true.  In the
>+   function epilogue this is reduced to a single max value and then used
>to
>+   index into the vector of results.
>
>I miss a comment that shows the kind of code we transform this to.
>"an additional vector containing the loop indexes" can't work - the vector
>will not be large enough ;)  Naiively I would have made 'last' a vector,
>performing the reduction element-wise and in the epilogue reduce
>'last' itself.  And it looks like we are already doing that for
>
>int foo (int *a)
>{
>  int val = 0;
>  for (int i = 0; i < 1024; ++i)
>    if (a[i] > val)
>      val = a[i];
>  return val;
>}
>
>I must be missing something.  Yes, I think we can't do index reduction
>yet,
>but pr65947-10.c is alrady handled?

I think an easier way of looking thinking about this is to see what
happens each iteration of the loop:

Assuming the example above and a vector length of 4.
The index vector starts as (0,0,0,0).
On the first iteration, the condition passes for, lets say, i=2 and i=3,
then index vector is (0,0,2,3)
on the next iteration, the condition passes for i=4, then index vector is
now (4,0,2,3)
on the next iteration, the condition passes for i=8 and i=9, then index
vector is now (8,9,2,3)
Etc
In the end we might end up with something like index vector=(26,2,54,14).
The “2” has not been set since the first iteration.
The only value we care about is the last one set - the 54. The rest are
meaningless to us.
At the same time the data vector has been storing values at the same time
as the index vector, and so now contains (a[26], a[2], a[54], a[14])

In the epilogue, we reduce the index vector down to “54”, then create a
vector (54,54,54,54).
This is matched against the index vector to give us
(false,false,true,false).
Then we use that vector to index the data vector, giving us (0, 0, a[54],
0).
Which we then reduce down to a single value.

>
>Stopping here.

Thanks for the comments so far.
I’ll put together a new patch with the changes above.

Alan.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-15 15:41   ` Alan Hayward
@ 2015-09-18 12:22     ` Richard Biener
  2015-09-18 13:36       ` Alan Lawrence
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Biener @ 2015-09-18 12:22 UTC (permalink / raw)
  To: Alan Hayward; +Cc: gcc-patches

On Tue, Sep 15, 2015 at 5:32 PM, Alan Hayward <alan.hayward@arm.com> wrote:
>
>
> On 15/09/2015 13:09, "Richard Biener" <richard.guenther@gmail.com> wrote:
>
>>
>>Now comments on the patch itself.
>>
>>+      if (code == COND_EXPR)
>>+       *v_reduc_type = COND_REDUCTION;
>>
>>so why not simply use COND_EXPR instead of the new v_reduc_type?
>
> v_reduc_type is also dependant on check_reduction (which comes from
> !nested_cycle in vectorizable_reduction).
> It seemed messy to keep on checking for both those things throughout.
>
> In my patch to catch simpler condition reductions, I’ll be adding another
> value to this enum too. v_reduc_type will be set to this new value based
> on the same properties for COND_REDUCTION plus some additional constraints.
>
>>
>>+  if (check_reduction && code != COND_EXPR &&
>>+      vect_is_slp_reduction (loop_info, phi, def_stmt))
>>
>>&&s go to the next line
>
> ok
>
>>
>>+             /* Reduction of the max index and a reduction of the found
>>+                values.  */
>>+             epilogue_cost += add_stmt_cost (target_cost_data, 1,
>>+                                             vec_to_scalar, stmt_info, 0,
>>+                                             vect_epilogue);
>>
>>vec_to_scalar once isn't what the comment suggests.  Instead the
>>comment suggests twice what a regular reduction would do
>>but I guess we can "hide" the vec_to_scalar cost and "merge" it
>>with the broadcast.  Thus make the above two vector_stmt costs?
>>
>>+             /* A broadcast of the max value.  */
>>+             epilogue_cost += add_stmt_cost (target_cost_data, 2,
>>+                                             scalar_to_vec, stmt_info, 0,
>>+                                             vect_epilogue);
>>
>>comment suggests a single broadcast.
>
> I’ve made a copy/paste error here. Just need to swap the 1 and the 2.
>
>
>>
>>@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi)
>>          the final vector of induction results:  */
>>       exit_phi = NULL;
>>       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
>>-        {
>>+       {
>>          gimple use_stmt = USE_STMT (use_p);
>>          if (is_gimple_debug (use_stmt))
>>            continue;
>>
>>please avoid unrelated whitespace changes.
>
> Ok. I was changing “8 spaces” to a tab, but happy to revert.
>
>>
>>+      case COND_EXPR:
>>+       if (v_reduc_type == COND_REDUCTION)
>>+         {
>>...
>>+       /* Fall through.  */
>>+
>>       case MIN_EXPR:
>>       case MAX_EXPR:
>>-      case COND_EXPR:
>>
>>aww, so we could already handle COND_EXPR reductions?  How do they
>>differ from what you add?  Did you see if that path is even exercised
>>today?
>
> Today, COND_EXPRs are only supported when they are nested inside a loop.
> See the vect-cond-*.c tests.
> For example:
>
> for (j = 0; j < M; j++)
>     {
>   x = x_in[j];
>   curr_a = a[0];
>
>     for (i = 0; i < N; i++)
>       {
>         next_a = a[i+1];
>         curr_a = x > c[i] ? curr_a : next_a;
>       }
>   x_out[j] = curr_a;
> }
>
>
> In that case, only the outer loop is vectorised.
>
>>
>>+           /* Create a vector of {init_value, 0, 0, 0...}.  */
>>+           vec<constructor_elt, va_gc> *v;
>>+           vec_alloc (v, nunits);
>>+           CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
>>+           if (SCALAR_FLOAT_TYPE_P (scalar_type))
>>+             for (i = 1; i < nunits; ++i)
>>+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>>+                                       build_real (scalar_type,
>>dconst0));
>>+           else
>>+             for (i = 1; i < nunits; ++i)
>>+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
>>+                                       build_int_cst (scalar_type, 0));
>>+           init_def = build_constructor (vectype, v);
>>
>>you can unify the float/int case by using build_zero_cst (scalar_type).
>>Note that you should build a vector constant instead of a constructor
>>if init_val is a constant.  The convenient way is to build the vector
>>elements into a tree[] array and use build_vector_stat in that case.
>
> Ok, will switch to build_zero_cst.
> Also, I will switch my vector to  {init_value, init_value, init_value…}.
> I had {init_value, 0, 0, 0…} because I was going have the option of
> ADD_REDUC_EXPR,
> But that got removed along the way.

You can then simply use build_vector_from_val ().

>>
>>+      /* Find maximum value from the vector of found indexes.  */
>>+      tree max_index = make_temp_ssa_name (index_scalar_type, NULL, "");
>>
>>just use make_ssa_name (index_scalar_type);
>
> Ok
>
>>
>>+      /* Convert the vector of data to the same type as the EQ.  */
>>+      tree vec_data_cast;
>>+      if ( TYPE_UNSIGNED (index_vec_type))
>>+       {
>>
>>How come it never happens the element
>>sizes do not match?  (int index type and double data type?)
>
> This was a little unclear.
> The induction index is originally created as an unsigned version of the
> type as the data vector.
> (see the definition of cr_index_vector_type in vectorizable_reduction(),
> which is then used to create cond_name)
>
> I will remove the if and replace with a gcc_checking_assert(TYPE_UNSIGNED
> (index_vec_type))
>
>
>>
>>+      /* Where the max index occured, use the value from the data
>>vector.  */
>>+      tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL,
>>"");
>>+      gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR,
>>+                                                vec_compare,
>>vec_data_cast);
>>
>>that is, don't you need to do some widening/shortening on the comparison
>>result?
>>(also what happens in the case "or all of the values"?)
>>
>>Definitely too much VIEW_CONVERT_MAGIC here for my taste ;)
>
> Given the induction index is the same size as the data, this will be ok.
>
> I don’t like the VIEW_CONVERT_MAGIC either! But I couldn’t see any other
> way of making it work.
>
>
> The case of "all of the values” is when there were no matches in the loop.
> When this happens, the data vector will contain only the default values
> {init_value, 0, 0, 0…}
> (… once I change my code due to the previous comment, we’ll have
> {init_value, init_value, init_value…} ).
> The REDUC_MAX_EXPR will turn either of those vectors into a “init_value”.
>
>>
>>+   This function also handles reduction of condition expressions, for
>>example:
>>+     for (int i = 0; i < N; i++)
>>+       if (a[i] < value)
>>+        last = a[i];
>>+   This is handled by vectorising the loop and creating an additional
>>vector
>>+   containing the loop indexes for which "a[i] < value" was true.  In the
>>+   function epilogue this is reduced to a single max value and then used
>>to
>>+   index into the vector of results.
>>
>>I miss a comment that shows the kind of code we transform this to.
>>"an additional vector containing the loop indexes" can't work - the vector
>>will not be large enough ;)  Naiively I would have made 'last' a vector,
>>performing the reduction element-wise and in the epilogue reduce
>>'last' itself.  And it looks like we are already doing that for
>>
>>int foo (int *a)
>>{
>>  int val = 0;
>>  for (int i = 0; i < 1024; ++i)
>>    if (a[i] > val)
>>      val = a[i];
>>  return val;
>>}
>>
>>I must be missing something.  Yes, I think we can't do index reduction
>>yet,
>>but pr65947-10.c is alrady handled?
>
>
> I think an easier way of looking thinking about this is to see what
> happens each iteration of the loop:
>
> Assuming the example above and a vector length of 4.
> The index vector starts as (0,0,0,0).
> On the first iteration, the condition passes for, lets say, i=2 and i=3,
> then index vector is (0,0,2,3)
> on the next iteration, the condition passes for i=4, then index vector is
> now (4,0,2,3)
> on the next iteration, the condition passes for i=8 and i=9, then index
> vector is now (8,9,2,3)
> Etc
> In the end we might end up with something like index vector=(26,2,54,14).
> The “2” has not been set since the first iteration.
> The only value we care about is the last one set - the 54. The rest are
> meaningless to us.
> At the same time the data vector has been storing values at the same time
> as the index vector, and so now contains (a[26], a[2], a[54], a[14])
>
> In the epilogue, we reduce the index vector down to “54”, then create a
> vector (54,54,54,54).
> This is matched against the index vector to give us
> (false,false,true,false).
> Then we use that vector to index the data vector, giving us (0, 0, a[54],
> 0).
> Which we then reduce down to a single value.

Ok, I see.

That this case is already vectorized is because it implements MAX_EXPR,
modifying it slightly to

int foo (int *a)
{
  int val = 0;
  for (int i = 0; i < 1024; ++i)
    if (a[i] > val)
      val = a[i] + 1;
  return val;
}

makes it no longer handled by current code.

>
>
>
>
>>
>>Stopping here.
>
> Thanks for the comments so far.
> I’ll put together a new patch with the changes above.
>
>
> Alan.
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-18 12:22     ` Richard Biener
@ 2015-09-18 13:36       ` Alan Lawrence
  2015-09-18 14:14         ` Alan Hayward
       [not found]         ` <D221D55E.8386%alan.hayward@arm.com>
  0 siblings, 2 replies; 26+ messages in thread
From: Alan Lawrence @ 2015-09-18 13:36 UTC (permalink / raw)
  To: Richard Biener, Alan Hayward; +Cc: gcc-patches

On 18/09/15 13:17, Richard Biener wrote:
>
> Ok, I see.
>
> That this case is already vectorized is because it implements MAX_EXPR,
> modifying it slightly to
>
> int foo (int *a)
> {
>    int val = 0;
>    for (int i = 0; i < 1024; ++i)
>      if (a[i] > val)
>        val = a[i] + 1;
>    return val;
> }
>
> makes it no longer handled by current code.
>

Yes. I believe the idea for the patch is to handle arbitrary expressions like

int foo (int *a)
{
    int val = 0;
    for (int i = 0; i < 1024; ++i)
      if (some_expression (i))
        val = another_expression (i);
    return val;
}

Cheers,
Alan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-18 13:36       ` Alan Lawrence
@ 2015-09-18 14:14         ` Alan Hayward
       [not found]         ` <D221D55E.8386%alan.hayward@arm.com>
  1 sibling, 0 replies; 26+ messages in thread
From: Alan Hayward @ 2015-09-18 14:14 UTC (permalink / raw)
  To: Alan Lawrence, Richard Biener; +Cc: gcc-patches


On 18/09/2015 14:26, "Alan Lawrence" <Alan.Lawrence@arm.com> wrote:

>On 18/09/15 13:17, Richard Biener wrote:
>>
>> Ok, I see.
>>
>> That this case is already vectorized is because it implements MAX_EXPR,
>> modifying it slightly to
>>
>> int foo (int *a)
>> {
>>    int val = 0;
>>    for (int i = 0; i < 1024; ++i)
>>      if (a[i] > val)
>>        val = a[i] + 1;
>>    return val;
>> }
>>
>> makes it no longer handled by current code.
>>
>
>Yes. I believe the idea for the patch is to handle arbitrary expressions
>like
>
>int foo (int *a)
>{
>    int val = 0;
>    for (int i = 0; i < 1024; ++i)
>      if (some_expression (i))
>        val = another_expression (i);
>    return val;
>}


Yes, that’s correct. Hopefully my new test cases should cover everything.


Alan.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
       [not found]         ` <D221D55E.8386%alan.hayward@arm.com>
@ 2015-09-23 16:07           ` Alan Hayward
  2015-09-30 12:49             ` Richard Biener
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Hayward @ 2015-09-23 16:07 UTC (permalink / raw)
  To: Alan Lawrence, Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1037 bytes --]



On 18/09/2015 14:53, "Alan Hayward" <Alan.Hayward@arm.com> wrote:

>
>
>On 18/09/2015 14:26, "Alan Lawrence" <Alan.Lawrence@arm.com> wrote:
>
>>On 18/09/15 13:17, Richard Biener wrote:
>>>
>>> Ok, I see.
>>>
>>> That this case is already vectorized is because it implements MAX_EXPR,
>>> modifying it slightly to
>>>
>>> int foo (int *a)
>>> {
>>>    int val = 0;
>>>    for (int i = 0; i < 1024; ++i)
>>>      if (a[i] > val)
>>>        val = a[i] + 1;
>>>    return val;
>>> }
>>>
>>> makes it no longer handled by current code.
>>>
>>
>>Yes. I believe the idea for the patch is to handle arbitrary expressions
>>like
>>
>>int foo (int *a)
>>{
>>    int val = 0;
>>    for (int i = 0; i < 1024; ++i)
>>      if (some_expression (i))
>>        val = another_expression (i);
>>    return val;
>>}
>
>Yes, that’s correct. Hopefully my new test cases should cover everything.
>

Attached is a new version of the patch containing all the changes
requested by Richard.


Thanks,
Alan.



[-- Attachment #2: 0001-Support-for-vectorizing-conditional-expressions.patch --]
[-- Type: application/octet-stream, Size: 49637 bytes --]

From f94705da0c35e1720a56dab8af18871e214272fb Mon Sep 17 00:00:00 2001
From: Alan Hayward <alan.hayward@arm.com>
Date: Fri, 28 Aug 2015 10:01:15 +0100
Subject: [PATCH] Support for vectorizing conditional expressions

2015-08-28  Alan Hayward <alan.hayward@arm.com>

	PR tree-optimization/65947
	* tree-vect-loop.c
	(vect_is_simple_reduction_1): Find condition reductions.
	(vect_model_reduction_cost): Add condition reduction costs.
	(get_initial_def_for_reduction): Add condition reduction initial var.
	(vect_create_epilog_for_reduction): Add condition reduction epilog.
	(vectorizable_reduction): Condition reduction support.
	* tree-vect-stmts.c
	(vectorizable_condition): Add vect reduction arg
	* doc/sourcebuild.texi (Vector-specific attributes): Document
	vect_max_reduc

    testsuite/Changelog:

	PR tree-optimization/65947
	* lib/target-supports.exp
	(check_effective_target_vect_max_reduc): Add.
	* gcc.dg/vect/pr65947-1.c: New test.
	* gcc.dg/vect/pr65947-2.c: New test.
	* gcc.dg/vect/pr65947-3.c: New test.
	* gcc.dg/vect/pr65947-4.c: New test.
	* gcc.dg/vect/pr65947-5.c: New test.
	* gcc.dg/vect/pr65947-6.c: New test.
	* gcc.dg/vect/pr65947-7.c: New test.
	* gcc.dg/vect/pr65947-8.c: New test.
	* gcc.dg/vect/pr65947-9.c: New test.
	* gcc.dg/vect/pr65947-10.c: New test.
	* gcc.dg/vect/pr65947-11.c: New test.
---
 gcc/doc/sourcebuild.texi               |   3 +
 gcc/testsuite/gcc.dg/vect/pr65947-1.c  |  39 +++
 gcc/testsuite/gcc.dg/vect/pr65947-10.c |  40 +++
 gcc/testsuite/gcc.dg/vect/pr65947-11.c |  48 ++++
 gcc/testsuite/gcc.dg/vect/pr65947-2.c  |  40 +++
 gcc/testsuite/gcc.dg/vect/pr65947-3.c  |  50 ++++
 gcc/testsuite/gcc.dg/vect/pr65947-4.c  |  40 +++
 gcc/testsuite/gcc.dg/vect/pr65947-5.c  |  41 +++
 gcc/testsuite/gcc.dg/vect/pr65947-6.c  |  39 +++
 gcc/testsuite/gcc.dg/vect/pr65947-7.c  |  51 ++++
 gcc/testsuite/gcc.dg/vect/pr65947-8.c  |  41 +++
 gcc/testsuite/gcc.dg/vect/pr65947-9.c  |  42 +++
 gcc/testsuite/lib/target-supports.exp  |  10 +
 gcc/tree-vect-loop.c                   | 473 ++++++++++++++++++++++++++-------
 gcc/tree-vect-stmts.c                  |  45 ++--
 gcc/tree-vectorizer.h                  |  12 +-
 16 files changed, 899 insertions(+), 115 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-10.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-11.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-9.c

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 5dc7c81..61de4a5 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1466,6 +1466,9 @@ Target supports conversion from @code{float} to @code{signed int}.
 
 @item vect_floatuint_cvt
 Target supports conversion from @code{float} to @code{unsigned int}.
+
+@item vect_max_reduc
+Target supports max reduction for vectors.
 @end table
 
 @subsubsection Thread Local Storage attributes
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-1.c b/gcc/testsuite/gcc.dg/vect/pr65947-1.c
new file mode 100644
index 0000000..7933f5c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 32
+
+/* Simple condition reduction.  */
+
+int
+condition_reduction (int *a, int min_v)
+{
+  int last = -1;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = i;
+
+  return last;
+}
+
+int
+main (void)
+{
+  int a[N] = {
+  11, -12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, -3, 4, 5, 6, 7, -8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+
+  int ret = condition_reduction (a, 16);
+
+  if (ret != 19)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-10.c b/gcc/testsuite/gcc.dg/vect/pr65947-10.c
new file mode 100644
index 0000000..9a43a60
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-10.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 32
+
+/* Non-integer data types.  */
+
+float
+condition_reduction (float *a, float min_v)
+{
+  float last = 0;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+int
+main (void)
+{
+  float a[N] = {
+  11.5, 12.2, 13.22, 14.1, 15.2, 16.3, 17, 18.7, 19, 20,
+  1, 2, 3.3, 4.3333, 5.5, 6.23, 7, 8.63, 9, 10.6,
+  21, 22.12, 23.55, 24.76, 25, 26, 27.34, 28.765, 29, 30,
+  31.111, 32.322
+  };
+
+  float ret = condition_reduction (a, 16.7);
+
+  if (ret != (float)10.6)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-11.c b/gcc/testsuite/gcc.dg/vect/pr65947-11.c
new file mode 100644
index 0000000..6deff00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-11.c
@@ -0,0 +1,48 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 37
+
+/* Re-use the result of the condition inside the loop.  Will fail to
+   vectorize.  */
+
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v, unsigned int *b)
+{
+  unsigned int last = N + 65;
+
+  for (unsigned int i = 0; i < N; i++)
+    {
+      if (b[i] < min_v)
+	last = i;
+      a[i] = last;
+    }
+  return last;
+}
+
+int
+main (void)
+{
+  unsigned int a[N] = {
+  31, 32, 33, 34, 35, 36, 37,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20
+  };
+  unsigned int b[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  31, 32, 33, 34, 35, 36, 37
+  };
+
+  unsigned int ret = condition_reduction (a, 16, b);
+
+  if (ret != 29)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-2.c b/gcc/testsuite/gcc.dg/vect/pr65947-2.c
new file mode 100644
index 0000000..9c627d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-2.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 254
+
+/* Non-simple condition reduction.  */
+
+unsigned char
+condition_reduction (unsigned char *a, unsigned char min_v)
+{
+  unsigned char last = 65;
+
+  for (unsigned char i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+int
+main (void)
+{
+  unsigned char a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+  __builtin_memset (a+32, 43, N-32);
+
+  unsigned char ret = condition_reduction (a, 16);
+
+  if (ret != 10)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-3.c b/gcc/testsuite/gcc.dg/vect/pr65947-3.c
new file mode 100644
index 0000000..e115de2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-3.c
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 37
+
+/* Non-simple condition reduction with additional variable and unsigned
+   types.  */
+
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v, unsigned int *b)
+{
+  unsigned int last = N + 65;
+  unsigned int aval;
+
+  for (unsigned int i = 0; i < N; i++)
+    {
+      aval = a[i];
+      if (b[i] < min_v)
+	last = aval;
+    }
+  return last;
+}
+
+
+int
+main (void)
+{
+  unsigned int a[N] = {
+  31, 32, 33, 34, 35, 36, 37,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20
+  };
+  unsigned int b[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  31, 32, 33, 34, 35, 36, 37
+  };
+
+  unsigned int ret = condition_reduction (a, 16, b);
+
+  if (ret != 13)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-4.c b/gcc/testsuite/gcc.dg/vect/pr65947-4.c
new file mode 100644
index 0000000..76a0567
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-4.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 27
+
+/* Condition reduction with no valid matches at runtime.  */
+
+int
+condition_reduction (int *a, int min_v)
+{
+  int last = N + 96;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] > min_v)
+      last = i;
+
+  return last;
+}
+
+int
+main (void)
+{
+  int a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27
+  };
+
+  int ret = condition_reduction (a, 46);
+
+  /* loop should never have found a value.  */
+  if (ret != N + 96)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-5.c b/gcc/testsuite/gcc.dg/vect/pr65947-5.c
new file mode 100644
index 0000000..360e3b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-5.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 32
+
+/* Condition reduction where loop size is not known at compile time.  Will fail
+   to vectorize.  Version inlined into main loop will vectorize.  */
+
+unsigned char
+condition_reduction (unsigned char *a, unsigned char min_v, int count)
+{
+  unsigned char last = 65;
+
+  for (int i = 0; i < count; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+int
+main (void)
+{
+  unsigned char a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+
+  unsigned char ret = condition_reduction (a, 16, N);
+
+  if (ret != 10)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-6.c b/gcc/testsuite/gcc.dg/vect/pr65947-6.c
new file mode 100644
index 0000000..4997ef7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-6.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 30
+
+/* Condition reduction where loop type is different than the data type.  */
+
+int
+condition_reduction (int *a, int min_v)
+{
+  int last = N + 65;
+
+  for (char i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+
+int
+main (void)
+{
+  int a[N] = {
+  67, 32, 45, 43, 21, -11, 12, 3, 4, 5,
+  6, 76, -32, 56, -32, -1, 4, 5, 6, 99,
+  43, 22, -3, 22, 16, 34, 55, 31, 87, 324
+  };
+
+  int ret = condition_reduction (a, 16);
+
+  if (ret != -3)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-7.c b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
new file mode 100644
index 0000000..1044119
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 43
+
+/* Condition reduction with comparison is a different type to the data.  Will
+   fail to vectorize.  */
+
+int
+condition_reduction (short *a, int min_v, int *b)
+{
+  int last = N + 65;
+  short aval;
+
+  for (int i = 0; i < N; i++)
+    {
+      aval = a[i];
+      if (b[i] < min_v)
+	last = aval;
+    }
+  return last;
+}
+
+int
+main (void)
+{
+  short a[N] = {
+  31, -32, 133, 324, 335, 36, 37, 45, 11, 65,
+  1, -28, 3, 48, 5, -68, 7, 88, 89, 180,
+  121, -122, 123, 124, -125, 126, 127, 128, 129, 130,
+  11, 12, 13, 14, -15, -16, 17, 18, 19, 20,
+  33, 27, 99
+  };
+  int b[N] = {
+  11, -12, -13, 14, 15, 16, 17, 18, 19, 20,
+  21, -22, 23, 24, -25, 26, 27, 28, 29, 30,
+  1, 62, 3, 14, -15, 6, 37, 48, 99, 10,
+  31, -32, 33, 34, -35, 36, 37, 56, 54, 22,
+  73, 2, 87
+  };
+
+  int ret = condition_reduction (a, 16, b);
+
+  if (ret != 27)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-8.c b/gcc/testsuite/gcc.dg/vect/pr65947-8.c
new file mode 100644
index 0000000..5cdbbe0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-8.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 27
+
+/* Condition reduction with multiple types in the comparison.  Will fail to
+   vectorize.  */
+
+int
+condition_reduction (char *a, int min_v)
+{
+  int last = N + 65;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+
+int
+main (void)
+{
+  char a[N] = {
+  1, 28, 3, 48, 5, 68, 7, -88, 89, 180,
+  121, 122, -123, 124, 12, -12, 12, 67, 84, 122,
+  67, 55, 112, 22, 45, 23, 111
+  };
+
+  int ret = condition_reduction (a, 16);
+
+  if (ret != 12)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "multiple types in double reduction or condition reduction" "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-9.c b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
new file mode 100644
index 0000000..d0da13f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 255
+
+/* Condition reduction with maximum possible loop size.  Will fail to
+   vectorize because the vectorisation requires a slot for default values.  */
+
+char
+condition_reduction (char *a, char min_v)
+{
+  char last = -72;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+char
+main (void)
+{
+  char a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+  __builtin_memset (a+32, 43, N-32);
+
+  char ret = condition_reduction (a, 16);
+
+  if (ret != 10)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index a465eb1..cf07a56 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6449,3 +6449,13 @@ proc check_effective_target_comdat_group {} {
 	int (*fn) () = foo;
     }]
 }
+
+
+# Return 1 if the target supports max reduction for vectors.
+
+proc check_effective_target_vect_max_reduc { } {
+    if { [istarget aarch64*-*-*] || [istarget arm*-*-*] } {
+	return 1
+    }
+    return 0
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 59c75af..08df100 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2331,6 +2331,11 @@ vect_is_slp_reduction (loop_vec_info loop_info, gimple phi, gimple first_stmt)
      inner loop (def of a3)
      a2 = phi < a3 >
 
+   (4) Detect condition expressions, ie:
+     for (int i = 0; i < N; i++)
+       if (a[i] < val)
+	ret_val = a[i];
+
    If MODIFY is true it tries also to rework the code in-place to enable
    detection of more reduction patterns.  For the time being we rewrite
    "res -= RHS" into "rhs += -RHS" when it seems worthwhile.
@@ -2339,7 +2344,8 @@ vect_is_slp_reduction (loop_vec_info loop_info, gimple phi, gimple first_stmt)
 static gimple
 vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 			    bool check_reduction, bool *double_reduc,
-			    bool modify, bool need_wrapping_integral_overflow)
+			    bool modify, bool need_wrapping_integral_overflow,
+			    enum vect_reduction_type *v_reduc_type)
 {
   struct loop *loop = (gimple_bb (phi))->loop_father;
   struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
@@ -2356,6 +2362,7 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
   bool phi_def;
 
   *double_reduc = false;
+  *v_reduc_type = TREE_CODE_REDUCTION;
 
   /* If CHECK_REDUCTION is true, we assume inner-most loop vectorization,
      otherwise, we assume outer loop vectorization.  */
@@ -2501,13 +2508,19 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
       && SSA_NAME_DEF_STMT (op1) == phi)
     code = PLUS_EXPR;
 
-  if (check_reduction
-      && (!commutative_tree_code (code) || !associative_tree_code (code)))
+  if (check_reduction)
     {
-      if (dump_enabled_p ())
-        report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: not commutative/associative: ");
-      return NULL;
+      if (code != COND_EXPR
+	  && (!commutative_tree_code (code) || !associative_tree_code (code)))
+	{
+	  if (dump_enabled_p ())
+	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			    "reduction: not commutative/associative: ");
+	  return NULL;
+	}
+
+      if (code == COND_EXPR)
+	*v_reduc_type = COND_REDUCTION;
     }
 
   if (get_gimple_rhs_class (code) != GIMPLE_BINARY_RHS)
@@ -2603,47 +2616,50 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
      and therefore vectorizing reductions in the inner-loop during
      outer-loop vectorization is safe.  */
 
-  /* CHECKME: check for !flag_finite_math_only too?  */
-  if (SCALAR_FLOAT_TYPE_P (type) && !flag_associative_math
-      && check_reduction)
-    {
-      /* Changing the order of operations changes the semantics.  */
-      if (dump_enabled_p ())
-	report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: unsafe fp math optimization: ");
-      return NULL;
-    }
-  else if (INTEGRAL_TYPE_P (type) && check_reduction)
+  if (*v_reduc_type != COND_REDUCTION)
     {
-      if (!operation_no_trapping_overflow (type, code))
+      /* CHECKME: check for !flag_finite_math_only too?  */
+      if (SCALAR_FLOAT_TYPE_P (type) && !flag_associative_math
+	  && check_reduction)
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
 	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			    "reduction: unsafe int math optimization"
-			    " (overflow traps): ");
+			"reduction: unsafe fp math optimization: ");
 	  return NULL;
 	}
-      if (need_wrapping_integral_overflow
-	  && !TYPE_OVERFLOW_WRAPS (type)
-	  && operation_can_overflow (code))
+      else if (INTEGRAL_TYPE_P (type) && check_reduction)
+	{
+	  if (!operation_no_trapping_overflow (type, code))
+	    {
+	      /* Changing the order of operations changes the semantics.  */
+	      if (dump_enabled_p ())
+		report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+				"reduction: unsafe int math optimization"
+				" (overflow traps): ");
+	      return NULL;
+	    }
+	  if (need_wrapping_integral_overflow
+	      && !TYPE_OVERFLOW_WRAPS (type)
+	      && operation_can_overflow (code))
+	    {
+	      /* Changing the order of operations changes the semantics.  */
+	      if (dump_enabled_p ())
+		report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+				"reduction: unsafe int math optimization"
+				" (overflow doesn't wrap): ");
+	      return NULL;
+	    }
+	}
+      else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
-	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			    "reduction: unsafe int math optimization"
-			    " (overflow doesn't wrap): ");
+	  report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			  "reduction: unsafe fixed-point math optimization: ");
 	  return NULL;
 	}
     }
-  else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
-    {
-      /* Changing the order of operations changes the semantics.  */
-      if (dump_enabled_p ())
-	report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: unsafe fixed-point math optimization: ");
-      return NULL;
-    }
 
   /* If we detected "res -= x[i]" earlier, rewrite it into
      "res += -x[i]" now.  If this turns out to be useless reassoc
@@ -2719,6 +2735,16 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
     {
       if (check_reduction)
         {
+	  if (code == COND_EXPR)
+	    {
+	      /* No current known use where this case would be useful.  */
+	      if (dump_enabled_p ())
+		report_vect_op (MSG_NOTE, def_stmt,
+				"detected reduction: cannot currently swap "
+				"operands for cond_expr");
+	      return NULL;
+	    }
+
           /* Swap operands (just for simplicity - so that the rest of the code
 	     can assume that the reduction variable is always the last (second)
 	     argument).  */
@@ -2742,7 +2768,8 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
     }
 
   /* Try to find SLP reduction chain.  */
-  if (check_reduction && vect_is_slp_reduction (loop_info, phi, def_stmt))
+  if (check_reduction && code != COND_EXPR
+      && vect_is_slp_reduction (loop_info, phi, def_stmt))
     {
       if (dump_enabled_p ())
         report_vect_op (MSG_NOTE, def_stmt,
@@ -2764,11 +2791,13 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi,
 static gimple
 vect_is_simple_reduction (loop_vec_info loop_info, gimple phi,
 			  bool check_reduction, bool *double_reduc,
-			  bool need_wrapping_integral_overflow)
+			  bool need_wrapping_integral_overflow,
+			  enum vect_reduction_type *v_reduc_type)
 {
   return vect_is_simple_reduction_1 (loop_info, phi, check_reduction,
 				     double_reduc, false,
-				     need_wrapping_integral_overflow);
+				     need_wrapping_integral_overflow,
+				     v_reduc_type);
 }
 
 /* Wrapper around vect_is_simple_reduction_1, which will modify code
@@ -2780,9 +2809,11 @@ vect_force_simple_reduction (loop_vec_info loop_info, gimple phi,
 			     bool check_reduction, bool *double_reduc,
 			     bool need_wrapping_integral_overflow)
 {
+  enum vect_reduction_type v_reduc_type;
   return vect_is_simple_reduction_1 (loop_info, phi, check_reduction,
 				     double_reduc, true,
-				     need_wrapping_integral_overflow);
+				     need_wrapping_integral_overflow,
+				     &v_reduc_type);
 }
 
 /* Calculate cost of peeling the loop PEEL_ITERS_PROLOGUE times.  */
@@ -3266,7 +3297,8 @@ get_reduction_op (gimple stmt, int reduc_index)
 
 static bool
 vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
-			   int ncopies, int reduc_index)
+			   int ncopies, int reduc_index,
+			   enum vect_reduction_type v_reduc_type)
 {
   int prologue_cost = 0, epilogue_cost = 0;
   enum tree_code code;
@@ -3287,6 +3319,10 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
   else
     target_cost_data = BB_VINFO_TARGET_COST_DATA (STMT_VINFO_BB_VINFO (stmt_info));
 
+  /* Condition reductions generate two reductions in the loop.  */
+  if (v_reduc_type == COND_REDUCTION)
+    ncopies *= 2;
+
   /* Cost of reduction op inside loop.  */
   unsigned inside_cost = add_stmt_cost (target_cost_data, ncopies, vector_stmt,
 					stmt_info, 0, vect_body);
@@ -3316,9 +3352,13 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
 
   code = gimple_assign_rhs_code (orig_stmt);
 
-  /* Add in cost for initial definition.  */
-  prologue_cost += add_stmt_cost (target_cost_data, 1, scalar_to_vec,
-				  stmt_info, 0, vect_prologue);
+  /* Add in cost for initial definition.
+     For cond reduction we have four vectors: initial index, step, initial
+     result of the data reduction, initial value of the index reduction.  */
+  int prologue_stmts = v_reduc_type == COND_REDUCTION ? 4 : 1;
+  prologue_cost += add_stmt_cost (target_cost_data, prologue_stmts,
+				  scalar_to_vec, stmt_info, 0,
+				  vect_prologue);
 
   /* Determine cost of epilogue code.
 
@@ -3329,10 +3369,30 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
     {
       if (reduc_code != ERROR_MARK)
 	{
-	  epilogue_cost += add_stmt_cost (target_cost_data, 1, vector_stmt,
-					  stmt_info, 0, vect_epilogue);
-	  epilogue_cost += add_stmt_cost (target_cost_data, 1, vec_to_scalar,
-					  stmt_info, 0, vect_epilogue);
+	  if (v_reduc_type == COND_REDUCTION)
+	    {
+	      /* An EQ stmt and an AND stmt.  */
+	      epilogue_cost += add_stmt_cost (target_cost_data, 2,
+					      vector_stmt, stmt_info, 0,
+					      vect_epilogue);
+	      /* Reduction of the max index and a reduction of the found
+		 values.  */
+	      epilogue_cost += add_stmt_cost (target_cost_data, 2,
+					      vec_to_scalar, stmt_info, 0,
+					      vect_epilogue);
+	      /* A broadcast of the max value.  */
+	      epilogue_cost += add_stmt_cost (target_cost_data, 1,
+					      scalar_to_vec, stmt_info, 0,
+					      vect_epilogue);
+	    }
+	  else
+	    {
+	      epilogue_cost += add_stmt_cost (target_cost_data, 1, vector_stmt,
+					      stmt_info, 0, vect_epilogue);
+	      epilogue_cost += add_stmt_cost (target_cost_data, 1,
+					      vec_to_scalar, stmt_info, 0,
+					      vect_epilogue);
+	    }
 	}
       else
 	{
@@ -3774,7 +3834,8 @@ get_initial_def_for_induction (gimple iv_phi)
 
    Input:
    STMT - a stmt that performs a reduction operation in the loop.
-   INIT_VAL - the initial value of the reduction variable
+   INIT_VAL - the initial value of the reduction variable.
+   V_REDUC_TYPE - the type of reduction.
 
    Output:
    ADJUSTMENT_DEF - a tree that holds a value to be added to the final result
@@ -3815,7 +3876,8 @@ get_initial_def_for_induction (gimple iv_phi)
 
 tree
 get_initial_def_for_reduction (gimple stmt, tree init_val,
-                               tree *adjustment_def)
+			       tree *adjustment_def,
+			       enum vect_reduction_type v_reduc_type)
 {
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
@@ -3939,15 +4001,18 @@ get_initial_def_for_reduction (gimple stmt, tree init_val,
       case MIN_EXPR:
       case MAX_EXPR:
       case COND_EXPR:
-        if (adjustment_def)
+	if (adjustment_def)
           {
-            *adjustment_def = NULL_TREE;
-            init_def = vect_get_vec_def_for_operand (init_val, stmt, NULL);
-            break;
-          }
+	    *adjustment_def = NULL_TREE;
 
+	    if (v_reduc_type != COND_REDUCTION)
+	      {
+		init_def = vect_get_vec_def_for_operand (init_val, stmt, NULL);
+		break;
+	      }
+	  }
 	init_def = build_vector_from_val (vectype, init_value);
-        break;
+	break;
 
       default:
         gcc_unreachable ();
@@ -3977,6 +4042,9 @@ get_initial_def_for_reduction (gimple stmt, tree init_val,
    DOUBLE_REDUC is TRUE if double reduction phi nodes should be handled.
    SLP_NODE is an SLP node containing a group of reduction statements. The 
      first one in this group is STMT.
+   V_REDUC_TYPE is the type of reduction.
+   INDUCTION_INDEX is the index of the loop for condition reductions.
+     Otherwise it is undefined.
 
    This function:
    1. Creates the reduction def-use cycles: sets the arguments for 
@@ -4022,7 +4090,9 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
 				  int ncopies, enum tree_code reduc_code,
 				  vec<gimple> reduction_phis,
                                   int reduc_index, bool double_reduc, 
-                                  slp_tree slp_node)
+				  slp_tree slp_node,
+				  enum vect_reduction_type v_reduc_type,
+				  tree induction_index)
 {
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   stmt_vec_info prev_phi_info;
@@ -4321,11 +4391,90 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
     }
   else
     new_phi_result = PHI_RESULT (new_phis[0]);
- 
+
+  if (v_reduc_type == COND_REDUCTION)
+    {
+      tree index_vec_type = TREE_TYPE (induction_index);
+      gcc_checking_assert (TYPE_UNSIGNED (index_vec_type));
+      tree index_vec_type_signed = signed_type_for (index_vec_type);
+      tree index_scalar_type = TREE_TYPE (index_vec_type);
+      machine_mode index_vector_mode = TYPE_MODE (index_vec_type);
+
+      /* Find maximum value from the vector of found indexes.  */
+      tree max_index = make_ssa_name (index_scalar_type);
+      gimple max_index_stmt = gimple_build_assign (max_index, REDUC_MAX_EXPR,
+						   induction_index);
+      gsi_insert_before (&exit_gsi, max_index_stmt, GSI_SAME_STMT);
+
+      /* Vector of {max_index, max_index, max_index,...}.  */
+      tree max_index_vec = make_ssa_name (index_vec_type);
+      tree max_index_vec_rhs = build_vector_from_val (index_vec_type,
+						      max_index);
+      gimple max_index_vec_stmt = gimple_build_assign (max_index_vec,
+						       max_index_vec_rhs);
+      gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT);
+
+      /* Compare the max index vector to the vector of found indexes to find
+	 the postion of the max value.  This will result in either a single
+	 match or all of the values.  */
+      tree vec_compare = make_ssa_name (index_vec_type_signed);
+      gimple vec_compare_stmt = gimple_build_assign (vec_compare, EQ_EXPR,
+						     induction_index,
+						     max_index_vec);
+      gsi_insert_before (&exit_gsi, vec_compare_stmt, GSI_SAME_STMT);
+
+      /* Convert the vector of data to the same type as the EQ.  */
+      tree vec_data_cast = make_ssa_name (index_vec_type_signed);
+      tree vec_data_cast_rhs = build1 (VIEW_CONVERT_EXPR,
+				       index_vec_type_signed,
+				       new_phi_result);
+      gimple vec_data_cast_stmt = gimple_build_assign (vec_data_cast,
+						       VIEW_CONVERT_EXPR,
+						       vec_data_cast_rhs);
+      gsi_insert_before (&exit_gsi, vec_data_cast_stmt, GSI_SAME_STMT);
+
+      /* Where the max index occured, use the value from the data vector.  */
+      tree vec_and = make_ssa_name (index_vec_type_signed);
+      gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR,
+						 vec_compare, vec_data_cast);
+      gsi_insert_before (&exit_gsi, vec_and_stmt, GSI_SAME_STMT);
+
+      /* Make the matched data values unsigned.  */
+      tree vec_and_cast = make_ssa_name (index_vec_type);
+      tree vec_and_cast_rhs = build1 (VIEW_CONVERT_EXPR, index_vec_type,
+				      vec_and);
+      gimple vec_and_cast_stmt = gimple_build_assign (vec_and_cast,
+						      VIEW_CONVERT_EXPR,
+						      vec_and_cast_rhs);
+      gsi_insert_before (&exit_gsi, vec_and_cast_stmt, GSI_SAME_STMT);
+
+      /* Reduce down to a scalar value.  */
+      tree matched_data_reduc = make_ssa_name (index_scalar_type);
+      gimple matched_data_reduc_stmt;
+      optab ot = optab_for_tree_code (REDUC_MAX_EXPR, index_vec_type,
+				      optab_default);
+      gcc_assert (optab_handler (ot, index_vector_mode) != CODE_FOR_nothing);
+      matched_data_reduc_stmt = gimple_build_assign (matched_data_reduc,
+						     REDUC_MAX_EXPR,
+						     vec_and_cast);
+      gsi_insert_before (&exit_gsi, matched_data_reduc_stmt, GSI_SAME_STMT);
+
+      /* Convert the reduced value to the result type and set as the
+	 result.  */
+      tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR, scalar_type,
+					     matched_data_reduc);
+      epilog_stmt = gimple_build_assign (new_scalar_dest,
+					 matched_data_reduc_cast);
+      new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
+      gimple_assign_set_lhs (epilog_stmt, new_temp);
+      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+      scalar_results.safe_push (new_temp);
+    }
+
   /* 2.3 Create the reduction code, using one of the three schemes described
          above. In SLP we simply need to extract all the elements from the 
          vector (without reducing them), so we use scalar shifts.  */
-  if (reduc_code != ERROR_MARK && !slp_reduc)
+  else if (reduc_code != ERROR_MARK && !slp_reduc)
     {
       tree tmp;
       tree vec_elem_type;
@@ -4847,6 +4996,15 @@ vect_finalize_reduction:
    and it's STMT_VINFO_RELATED_STMT points to the last stmt in the original
    sequence that had been detected and replaced by the pattern-stmt (STMT).
 
+   This function also handles reduction of condition expressions, for example:
+     for (int i = 0; i < N; i++)
+       if (a[i] < value)
+	 last = a[i];
+   This is handled by vectorising the loop and creating an additional vector
+   containing the loop indexes for which "a[i] < value" was true.  In the
+   function epilogue this is reduced to a single max value and then used to
+   index into the vector of results.
+
    In some cases of reduction patterns, the type of the reduction variable X is
    different than the type of the other arguments of STMT.
    In such cases, the vectype that is used when transforming STMT into a vector
@@ -4922,6 +5080,8 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
   int vec_num;
   tree def0, def1, tem, op0, op1 = NULL_TREE;
   bool first_p = true;
+  enum vect_reduction_type v_reduc_type = TREE_CODE_REDUCTION;
+  tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
 
   /* In case of reduction chain we switch to the first stmt in the chain, but
      we don't update STMT_INFO, since only the last stmt is marked as reduction
@@ -5092,7 +5252,8 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
     }
 
   gimple tmp = vect_is_simple_reduction (loop_vinfo, reduc_def_stmt,
-					 !nested_cycle, &dummy, false);
+					 !nested_cycle, &dummy, false,
+					 &v_reduc_type);
   if (orig_stmt)
     gcc_assert (tmp == orig_stmt
 		|| GROUP_FIRST_ELEMENT (vinfo_for_stmt (tmp)) == orig_stmt);
@@ -5117,7 +5278,8 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
 
   if (code == COND_EXPR)
     {
-      if (!vectorizable_condition (stmt, gsi, NULL, ops[reduc_index], 0, NULL))
+      if (!vectorizable_condition (stmt, gsi, NULL, ops[reduc_index], 0, NULL,
+				   v_reduc_type))
         {
           if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5246,49 +5408,72 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
     }
 
   epilog_reduc_code = ERROR_MARK;
-  if (reduction_code_for_scalar_code (orig_code, &epilog_reduc_code))
+
+  if (v_reduc_type == TREE_CODE_REDUCTION)
     {
-      reduc_optab = optab_for_tree_code (epilog_reduc_code, vectype_out,
+      if (reduction_code_for_scalar_code (orig_code, &epilog_reduc_code))
+	{
+	  reduc_optab = optab_for_tree_code (epilog_reduc_code, vectype_out,
                                          optab_default);
-      if (!reduc_optab)
-        {
-          if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "no optab for reduction.\n");
-
-          epilog_reduc_code = ERROR_MARK;
-        }
-      else if (optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing)
-        {
-          optab = scalar_reduc_to_vector (reduc_optab, vectype_out);
-          if (optab_handler (optab, vec_mode) == CODE_FOR_nothing)
-            {
-              if (dump_enabled_p ())
-	        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "reduc op not supported by target.\n");
+	  if (!reduc_optab)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "no optab for reduction.\n");
 
 	      epilog_reduc_code = ERROR_MARK;
 	    }
-        }
+	  else if (optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing)
+	    {
+	      optab = scalar_reduc_to_vector (reduc_optab, vectype_out);
+	      if (optab_handler (optab, vec_mode) == CODE_FOR_nothing)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				     "reduc op not supported by target.\n");
+
+		  epilog_reduc_code = ERROR_MARK;
+		}
+	    }
+	}
+      else
+	{
+	  if (!nested_cycle || double_reduc)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "no reduc code for scalar code.\n");
+
+	      return false;
+	    }
+	}
     }
   else
     {
-      if (!nested_cycle || double_reduc)
-        {
-          if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "no reduc code for scalar code.\n");
+      int scalar_precision = GET_MODE_PRECISION (TYPE_MODE (scalar_type));
+      cr_index_scalar_type = make_unsigned_type (scalar_precision);
+      cr_index_vector_type = build_vector_type
+	(cr_index_scalar_type, TYPE_VECTOR_SUBPARTS (vectype_out));
 
-          return false;
-        }
+      epilog_reduc_code = REDUC_MAX_EXPR;
+      optab = optab_for_tree_code (REDUC_MAX_EXPR, cr_index_vector_type,
+				   optab_default);
+      if (optab_handler (optab, TYPE_MODE (cr_index_vector_type))
+	  == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "reduc max op not supported by target.\n");
+	  return false;
+	}
     }
 
-  if (double_reduc && ncopies > 1)
+  if ((double_reduc || v_reduc_type == COND_REDUCTION) && ncopies > 1)
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "multiple types in double reduction\n");
-
+			 "multiple types in double reduction or condition "
+			 "reduction.\n");
       return false;
     }
 
@@ -5312,11 +5497,39 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
         }
     }
 
+  if (v_reduc_type == COND_REDUCTION)
+    {
+      widest_int ni;
+
+      if (! max_loop_iterations (loop, &ni))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "loop count not known, cannot create cond "
+			     "reduction.\n");
+	  return false;
+	}
+      /* Convert backedges to iterations.  */
+      ni += 1;
+
+      /* The additional index will be the same type as the condition.  Check
+	 that the loop can fit into this less one (because we'll use up the
+	 zero slot for when there are no matches).  */
+      tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
+      if (wi::geu_p (ni, wi::to_widest (max_index)))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "loop size is greater than data size.\n");
+	  return false;
+	}
+    }
+
   if (!vec_stmt) /* transformation not required.  */
     {
       if (first_p
 	  && !vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies,
-					 reduc_index))
+					 reduc_index, v_reduc_type))
         return false;
       STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
       return true;
@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
 
+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+
   /* FORNOW: Multiple types are not supported for condition.  */
   if (code == COND_EXPR)
     gcc_assert (ncopies == 1);
@@ -5406,9 +5621,8 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
       if (code == COND_EXPR)
         {
           gcc_assert (!slp_node);
-          vectorizable_condition (stmt, gsi, vec_stmt, 
-                                  PHI_RESULT (phis[0]), 
-                                  reduc_index, NULL);
+	  vectorizable_condition (stmt, gsi, vec_stmt, PHI_RESULT (phis[0]),
+				  reduc_index, NULL, v_reduc_type);
           /* Multiple types are not supported for condition.  */
           break;
         }
@@ -5528,17 +5742,88 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
       prev_phi_info = vinfo_for_stmt (new_phi);
     }
 
+  tree indx_before_incr, indx_after_incr, cond_name = NULL;
+
   /* Finalize the reduction-phi (set its arguments) and create the
      epilog reduction code.  */
   if ((!single_defuse_cycle || code == COND_EXPR) && !slp_node)
     {
       new_temp = gimple_assign_lhs (*vec_stmt);
       vect_defs[0] = new_temp;
+
+      /* For cond reductions we need to add an additional conditional based on
+	 the loop index.  */
+      if (v_reduc_type == COND_REDUCTION)
+	{
+	  int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
+	  int k;
+
+	  gcc_assert (gimple_assign_rhs_code (*vec_stmt) == VEC_COND_EXPR);
+
+	  /* Create a {1,2,3,...} vector.  */
+	  tree *vtemp = XALLOCAVEC (tree, nunits_out);
+	  for (k = 0; k < nunits_out; ++k)
+	    vtemp[k] = build_int_cst (cr_index_scalar_type, k + 1);
+	  tree series_vect = build_vector (cr_index_vector_type, vtemp);
+
+	  /* Create a vector of the step value.  */
+	  tree step = build_int_cst (cr_index_scalar_type, nunits_out);
+	  tree vec_step = build_vector_from_val (cr_index_vector_type, step);
+
+	  /* Create a vector of 0s.  */
+	  tree zero = build_zero_cst (cr_index_scalar_type);
+	  tree vec_zero = build_vector_from_val (cr_index_vector_type, zero);
+
+	  /* Create an induction variable, starting at series_vect, and
+	     incrementing by vec_step.  */
+	  gimple_stmt_iterator incr_gsi;
+	  bool insert_after;
+	  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+	  create_iv (series_vect, vec_step, NULL_TREE, loop, &incr_gsi,
+		     insert_after, &indx_before_incr, &indx_after_incr);
+
+	  /* Create a vector phi node from the VEC_COND_EXPR (see below) and
+	     0s.  */
+	  tree new_phi_tree = make_ssa_name (cr_index_vector_type);
+	  new_phi = create_phi_node (new_phi_tree, loop->header);
+	  set_vinfo_for_stmt (new_phi, new_stmt_vec_info (new_phi, loop_vinfo,
+							  NULL));
+	  add_phi_arg (new_phi, vec_zero, loop_preheader_edge (loop),
+		       UNKNOWN_LOCATION);
+
+	  /* Turn the condition from vec_stmt into an ssa name.  */
+	  gimple index_condition;
+	  gimple_stmt_iterator vec_stmt_gsi = gsi_for_stmt (*vec_stmt);
+	  tree ccompare = gimple_assign_rhs1 (*vec_stmt);
+	  tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
+	  gimple ccompare_stmt = gimple_build_assign (ccompare_name, ccompare);
+	  gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT);
+	  gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
+	  update_stmt (*vec_stmt);
+
+	  /* Create a conditional, where the condition is the same as from
+	     vec_stmt, then is the induction index, else is the phi.  */
+	  tree cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type,
+				   ccompare_name, indx_before_incr,
+				   new_phi_tree);
+	  cond_name = make_ssa_name (cr_index_vector_type);
+	  index_condition = gimple_build_assign (cond_name, cond_expr);
+	  gsi_insert_before (&incr_gsi, index_condition, GSI_SAME_STMT);
+	  stmt_vec_info index_vec_info = new_stmt_vec_info (index_condition,
+							    loop_vinfo, NULL);
+	  STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
+	  set_vinfo_for_stmt (index_condition, index_vec_info);
+
+	  /* Update the phi with the vec cond.  */
+	  add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
+		       UNKNOWN_LOCATION);
+	}
     }
 
   vect_create_epilog_for_reduction (vect_defs, stmt, epilog_copies,
                                     epilog_reduc_code, phis, reduc_index,
-                                    double_reduc, slp_node);
+				    double_reduc, slp_node, v_reduc_type,
+				    cond_name);
 
   return true;
 }
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 359e010..0540085 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7293,7 +7293,8 @@ vect_is_simple_cond (tree cond, gimple stmt, loop_vec_info loop_vinfo,
 bool
 vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
 			gimple *vec_stmt, tree reduc_def, int reduc_index,
-			slp_tree slp_node)
+			slp_tree slp_node,
+			enum vect_reduction_type v_reduc_type)
 {
   tree scalar_dest = NULL_TREE;
   tree vec_dest = NULL_TREE;
@@ -7321,21 +7322,24 @@ vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
   if (reduc_index && STMT_SLP_TYPE (stmt_info))
     return false;
 
-  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
-    return false;
+  if (v_reduc_type == TREE_CODE_REDUCTION)
+    {
+      if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+	return false;
 
-  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
-      && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
-           && reduc_def))
-    return false;
+      if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
+	  && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+	       && reduc_def))
+	return false;
 
-  /* FORNOW: not yet supported.  */
-  if (STMT_VINFO_LIVE_P (stmt_info))
-    {
-      if (dump_enabled_p ())
-        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                         "value used after loop.\n");
-      return false;
+      /* FORNOW: not yet supported.  */
+      if (STMT_VINFO_LIVE_P (stmt_info))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "value used after loop.\n");
+	  return false;
+	}
     }
 
   /* Is vectorizable conditional operation?  */
@@ -7739,7 +7743,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	  || vectorizable_call (stmt, NULL, NULL, node)
 	  || vectorizable_store (stmt, NULL, NULL, node)
 	  || vectorizable_reduction (stmt, NULL, NULL, node)
-	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node,
+				     TREE_CODE_REDUCTION));
   else
     {
       if (bb_vinfo)
@@ -7751,7 +7756,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	      || vectorizable_load (stmt, NULL, NULL, node, NULL)
 	      || vectorizable_call (stmt, NULL, NULL, node)
 	      || vectorizable_store (stmt, NULL, NULL, node)
-	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node,
+					 TREE_CODE_REDUCTION));
     }
 
   if (!ok)
@@ -7863,7 +7869,8 @@ vect_transform_stmt (gimple stmt, gimple_stmt_iterator *gsi,
       break;
 
     case condition_vec_info_type:
-      done = vectorizable_condition (stmt, gsi, &vec_stmt, NULL, 0, slp_node);
+      done = vectorizable_condition (stmt, gsi, &vec_stmt, NULL, 0, slp_node,
+				     TREE_CODE_REDUCTION);
       gcc_assert (done);
       break;
 
@@ -8262,8 +8269,8 @@ vect_is_simple_use (tree operand, gimple stmt, loop_vec_info loop_vinfo,
   if (TREE_CODE (operand) != SSA_NAME)
     {
       if (dump_enabled_p ())
-        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                         "not ssa-name.\n");
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "not ssa-name.\n");
       return false;
     }
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 95276fa..7dfc77f 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -60,6 +60,12 @@ enum vect_def_type {
   vect_unknown_def_type
 };
 
+/* Define type of reduction.  */
+enum vect_reduction_type {
+  TREE_CODE_REDUCTION,
+  COND_REDUCTION
+};
+
 #define VECTORIZABLE_CYCLE_DEF(D) (((D) == vect_reduction_def)           \
                                    || ((D) == vect_double_reduction_def) \
                                    || ((D) == vect_nested_cycle))
@@ -1037,7 +1043,8 @@ extern bool vect_transform_stmt (gimple, gimple_stmt_iterator *,
 extern void vect_remove_stores (gimple);
 extern bool vect_analyze_stmt (gimple, bool *, slp_tree);
 extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
-                                    tree, int, slp_tree);
+				    tree, int, slp_tree,
+				    enum vect_reduction_type);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
 				unsigned int *, unsigned int *,
 				stmt_vector_for_cost *,
@@ -1105,7 +1112,8 @@ extern bool vectorizable_live_operation (gimple, gimple_stmt_iterator *,
 extern bool vectorizable_reduction (gimple, gimple_stmt_iterator *, gimple *,
                                     slp_tree);
 extern bool vectorizable_induction (gimple, gimple_stmt_iterator *, gimple *);
-extern tree get_initial_def_for_reduction (gimple, tree, tree *);
+extern tree get_initial_def_for_reduction
+	(gimple, tree, tree *, enum vect_reduction_type = TREE_CODE_REDUCTION);
 extern int vect_min_worthwhile_factor (enum tree_code);
 extern int vect_get_known_peeling_cost (loop_vec_info, int, int *,
 					stmt_vector_for_cost *,
-- 
1.9.3 (Apple Git-50)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-23 16:07           ` Alan Hayward
@ 2015-09-30 12:49             ` Richard Biener
  2015-10-01 15:22               ` Alan Hayward
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Biener @ 2015-09-30 12:49 UTC (permalink / raw)
  To: Alan Hayward; +Cc: Alan Lawrence, gcc-patches

On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward <alan.hayward@arm.com> wrote:
>
>
> On 18/09/2015 14:53, "Alan Hayward" <Alan.Hayward@arm.com> wrote:
>
>>
>>
>>On 18/09/2015 14:26, "Alan Lawrence" <Alan.Lawrence@arm.com> wrote:
>>
>>>On 18/09/15 13:17, Richard Biener wrote:
>>>>
>>>> Ok, I see.
>>>>
>>>> That this case is already vectorized is because it implements MAX_EXPR,
>>>> modifying it slightly to
>>>>
>>>> int foo (int *a)
>>>> {
>>>>    int val = 0;
>>>>    for (int i = 0; i < 1024; ++i)
>>>>      if (a[i] > val)
>>>>        val = a[i] + 1;
>>>>    return val;
>>>> }
>>>>
>>>> makes it no longer handled by current code.
>>>>
>>>
>>>Yes. I believe the idea for the patch is to handle arbitrary expressions
>>>like
>>>
>>>int foo (int *a)
>>>{
>>>    int val = 0;
>>>    for (int i = 0; i < 1024; ++i)
>>>      if (some_expression (i))
>>>        val = another_expression (i);
>>>    return val;
>>>}
>>
>>Yes, that’s correct. Hopefully my new test cases should cover everything.
>>
>
> Attached is a new version of the patch containing all the changes
> requested by Richard.

+      /* Compare the max index vector to the vector of found indexes to find
+        the postion of the max value.  This will result in either a single
+        match or all of the values.  */
+      tree vec_compare = make_ssa_name (index_vec_type_signed);
+      gimple vec_compare_stmt = gimple_build_assign (vec_compare, EQ_EXPR,
+                                                    induction_index,
+                                                    max_index_vec);

I'm not sure all targets can handle this.  If I deciper the code
correctly then we do

  mask = induction_index == max_index_vec;
  vec_and = mask & vec_data;

plus some casts.  So this is basically

  vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... };

without the need to relate the induction index vector type to the data
vector type.
I believe this is also the form all targets support.

I am missing a comment before all this code-generation that shows the transform
result with the variable names used in the code-gen.  I have a hard
time connecting
things here.

+      tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR, scalar_type,
+                                            matched_data_reduc);
+      epilog_stmt = gimple_build_assign (new_scalar_dest,
+                                        matched_data_reduc_cast);
+      new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
+      gimple_assign_set_lhs (epilog_stmt, new_temp);

this will leave the stmt unsimplified.  scalar sign-changes should use NOP_EXPR,
not VIEW_CONVERT_EXPR.  The easiest fix is to use fold_convert instead.
Also just do like before - first make_ssa_name and then directly use it in the
gimple_build_assign.

The patch is somewhat hard to parse with all the indentation changes.  A context
diff would be much easier to read in those contexts.

+  if (v_reduc_type == COND_REDUCTION)
+    {
+      widest_int ni;
+
+      if (! max_loop_iterations (loop, &ni))
+       {
+         if (dump_enabled_p ())
+           dump_printf_loc (MSG_NOTE, vect_location,
+                            "loop count not known, cannot create cond "
+                            "reduction.\n");

ugh.  That's bad.

+      /* The additional index will be the same type as the condition.  Check
+        that the loop can fit into this less one (because we'll use up the
+        zero slot for when there are no matches).  */
+      tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
+      if (wi::geu_p (ni, wi::to_widest (max_index)))
+       {
+         if (dump_enabled_p ())
+           dump_printf_loc (MSG_NOTE, vect_location,
+                            "loop size is greater than data size.\n");
+         return false;

Likewise.

@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt,
gimple_stmt_iterator *gsi,
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");

+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+
   /* FORNOW: Multiple types are not supported for condition.  */
   if (code == COND_EXPR)

this change looks odd (or wrong).  The type should be _only_ set/changed during
analysis.

+
+      /* For cond reductions we need to add an additional conditional based on
+        the loop index.  */
+      if (v_reduc_type == COND_REDUCTION)
+       {
+         int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
+         int k;
...
+         STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
+         set_vinfo_for_stmt (index_condition, index_vec_info);
+
+         /* Update the phi with the vec cond.  */
+         add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
+                      UNKNOWN_LOCATION);

same as before - I am missing a comment that shows the generated code
and connects
the local vars used.


+         tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
+         gimple ccompare_stmt = gimple_build_assign (ccompare_name, ccompare);
+         gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT);
+         gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);

hum - are you sure this works with ncopies > 1?  Will it use the
correct vec_stmt?

I still dislike the v_reduc_type plastered and passed everywhere.  Can
you explore
adding the reduction kind to stmt_info?

Thanks,
Richard.

>
> Thanks,
> Alan.
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-09-30 12:49             ` Richard Biener
@ 2015-10-01 15:22               ` Alan Hayward
  0 siblings, 0 replies; 26+ messages in thread
From: Alan Hayward @ 2015-10-01 15:22 UTC (permalink / raw)
  To: Richard Biener, gcc-patches



On 30/09/2015 13:45, "Richard Biener" <richard.guenther@gmail.com> wrote:

>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward <alan.hayward@arm.com>
>wrote:
>>
>>
>> On 18/09/2015 14:53, "Alan Hayward" <Alan.Hayward@arm.com> wrote:
>>
>>>
>>>
>>>On 18/09/2015 14:26, "Alan Lawrence" <Alan.Lawrence@arm.com> wrote:
>>>
>>>>On 18/09/15 13:17, Richard Biener wrote:
>>>>>
>>>>> Ok, I see.
>>>>>
>>>>> That this case is already vectorized is because it implements
>>>>>MAX_EXPR,
>>>>> modifying it slightly to
>>>>>
>>>>> int foo (int *a)
>>>>> {
>>>>>    int val = 0;
>>>>>    for (int i = 0; i < 1024; ++i)
>>>>>      if (a[i] > val)
>>>>>        val = a[i] + 1;
>>>>>    return val;
>>>>> }
>>>>>
>>>>> makes it no longer handled by current code.
>>>>>
>>>>
>>>>Yes. I believe the idea for the patch is to handle arbitrary
>>>>expressions
>>>>like
>>>>
>>>>int foo (int *a)
>>>>{
>>>>    int val = 0;
>>>>    for (int i = 0; i < 1024; ++i)
>>>>      if (some_expression (i))
>>>>        val = another_expression (i);
>>>>    return val;
>>>>}
>>>
>>>Yes, that’s correct. Hopefully my new test cases should cover
>>>everything.
>>>
>>
>> Attached is a new version of the patch containing all the changes
>> requested by Richard.
>
>+      /* Compare the max index vector to the vector of found indexes to
>find
>+        the postion of the max value.  This will result in either a
>single
>+        match or all of the values.  */
>+      tree vec_compare = make_ssa_name (index_vec_type_signed);
>+      gimple vec_compare_stmt = gimple_build_assign (vec_compare,
>EQ_EXPR,
>+                                                    induction_index,
>+                                                    max_index_vec);
>
>I'm not sure all targets can handle this.  If I deciper the code
>correctly then we do
>
>  mask = induction_index == max_index_vec;
>  vec_and = mask & vec_data;
>
>plus some casts.  So this is basically
>
>  vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... };
>
>without the need to relate the induction index vector type to the data
>vector type.
>I believe this is also the form all targets support.


Ok, I’ll replace this.

>
>I am missing a comment before all this code-generation that shows the
>transform
>result with the variable names used in the code-gen.  I have a hard
>time connecting
>things here.

Ok, I’ll add some comments.

>
>+      tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR,
>scalar_type,
>+                                            matched_data_reduc);
>+      epilog_stmt = gimple_build_assign (new_scalar_dest,
>+                                        matched_data_reduc_cast);
>+      new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
>+      gimple_assign_set_lhs (epilog_stmt, new_temp);
>
>this will leave the stmt unsimplified.  scalar sign-changes should use
>NOP_EXPR,
>not VIEW_CONVERT_EXPR.  The easiest fix is to use fold_convert instead.
>Also just do like before - first make_ssa_name and then directly use it
>in the
>gimple_build_assign.

We need the VIEW_CONVERT_EXPR for the cases where we have float data
values. The index is always integer.


>
>The patch is somewhat hard to parse with all the indentation changes.  A
>context
>diff would be much easier to read in those contexts.

Ok, I’ll make the next patch like that

>
>+  if (v_reduc_type == COND_REDUCTION)
>+    {
>+      widest_int ni;
>+
>+      if (! max_loop_iterations (loop, &ni))
>+       {
>+         if (dump_enabled_p ())
>+           dump_printf_loc (MSG_NOTE, vect_location,
>+                            "loop count not known, cannot create cond "
>+                            "reduction.\n");
>
>ugh.  That's bad.
>
>+      /* The additional index will be the same type as the condition.
>Check
>+        that the loop can fit into this less one (because we'll use up
>the
>+        zero slot for when there are no matches).  */
>+      tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
>+      if (wi::geu_p (ni, wi::to_widest (max_index)))
>+       {
>+         if (dump_enabled_p ())
>+           dump_printf_loc (MSG_NOTE, vect_location,
>+                            "loop size is greater than data size.\n");
>+         return false;
>
>Likewise.

We could do better if we made the index type larger.
But as a first implementation of this optimisation, I didn’t want to
overcomplicate things more.

>
>@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt,
>gimple_stmt_iterator *gsi,
>   if (dump_enabled_p ())
>     dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
>
>+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
>+
>   /* FORNOW: Multiple types are not supported for condition.  */
>   if (code == COND_EXPR)
>
>this change looks odd (or wrong).  The type should be _only_ set/changed
>during
>analysis.


The problem is, for COND_EXPRs, this function calls
vectorizable_condition(), which sets STMT_VINFO_TYPE to
condition_vec_info_type.

Therefore we need something to restore it back to reduc_vec_info_type on
the non-analysis call.

I considered setting STMT_VINFO_TYPE to reduc_vec_info_type directly after
the call to vectorizable_condition(), but that looked worse.
I could back up the value of STMT_VINFO_TYPE before calling
vectorizable_condition() and then restore it after? I think that’ll look a
lot better.


>
>+
>+      /* For cond reductions we need to add an additional conditional
>based on
>+        the loop index.  */
>+      if (v_reduc_type == COND_REDUCTION)
>+       {
>+         int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>+         int k;
>...
>+         STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
>+         set_vinfo_for_stmt (index_condition, index_vec_info);
>+
>+         /* Update the phi with the vec cond.  */
>+         add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
>+                      UNKNOWN_LOCATION);
>
>same as before - I am missing a comment that shows the generated code
>and connects
>the local vars used.

Ok, I’ll add something

>
>
>+         tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
>+         gimple ccompare_stmt = gimple_build_assign (ccompare_name,
>ccompare);
>+         gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT);
>+         gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
>
>hum - are you sure this works with ncopies > 1?  Will it use the
>correct vec_stmt?

We don’t support this when ncopies >1.

In vectorizable_reduction():

if ((double_reduc || v_reduc_type == COND_REDUCTION) && ncopies > 1)
    {
      if (dump_enabled_p ())
	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
			 "multiple types in double reduction or condition "
			 "reduction.\n");
      return false;
    }



>
>I still dislike the v_reduc_type plastered and passed everywhere.  Can
>you explore
>adding the reduction kind to stmt_info?

Ok, I can do that.


Thanks for the comments.
I’ll put together a patch with the above changes.

Thanks,
Alan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-10-22 14:18 ` Alan Lawrence
@ 2015-10-22 14:23   ` Alan Hayward
  0 siblings, 0 replies; 26+ messages in thread
From: Alan Hayward @ 2015-10-22 14:23 UTC (permalink / raw)
  To: Alan Lawrence, Richard Biener; +Cc: gcc-patches



On 22/10/2015 15:15, "Alan Lawrence" <Alan.Lawrence@arm.com> wrote:

>Just one very small point...
>
>On 19/10/15 09:17, Alan Hayward wrote:
>
> > -  if (check_reduction
> > -      && (!commutative_tree_code (code) || !associative_tree_code
>(code)))
> > +  if (check_reduction)
> >      {
> > -      if (dump_enabled_p ())
> > -        report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
> > -			"reduction: not commutative/associative: ");
> > -      return NULL;
> > +      if (code != COND_EXPR
> > +	  && (!commutative_tree_code (code) || !associative_tree_code
>(code)))
> > +	{
> > +	  if (dump_enabled_p ())
> > +	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
> > +			    "reduction: not commutative/associative: ");
> > +	  return NULL;
> > +	}
> > +
> > +      if (code == COND_EXPR)
> > +	*v_reduc_type = COND_REDUCTION;
>
>Wouldn't this be easier written as
>
>if (code == COND_EXPR)
>   *v_reduc_type = COND_REDUCTION;
>else if (!commutative_tree_code (code) || !associative_tree_code (code))
>   {...}
>
>? Your call!
>
>Cheers, Alan


Good spot! I suspect that’s slipped through when I’ve rewritten bits.
I’ll add this in with Richard’s suggestion of the change around the call
to vectorizable_condition.


Alan.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-10-19  8:34 Alan Hayward
  2015-10-21 10:46 ` Richard Biener
@ 2015-10-22 14:18 ` Alan Lawrence
  2015-10-22 14:23   ` Alan Hayward
  1 sibling, 1 reply; 26+ messages in thread
From: Alan Lawrence @ 2015-10-22 14:18 UTC (permalink / raw)
  To: Alan Hayward, Richard Biener; +Cc: gcc-patches

Just one very small point...

On 19/10/15 09:17, Alan Hayward wrote:

 > -  if (check_reduction
 > -      && (!commutative_tree_code (code) || !associative_tree_code (code)))
 > +  if (check_reduction)
 >      {
 > -      if (dump_enabled_p ())
 > -        report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
 > -			"reduction: not commutative/associative: ");
 > -      return NULL;
 > +      if (code != COND_EXPR
 > +	  && (!commutative_tree_code (code) || !associative_tree_code (code)))
 > +	{
 > +	  if (dump_enabled_p ())
 > +	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
 > +			    "reduction: not commutative/associative: ");
 > +	  return NULL;
 > +	}
 > +
 > +      if (code == COND_EXPR)
 > +	*v_reduc_type = COND_REDUCTION;

Wouldn't this be easier written as

if (code == COND_EXPR)
   *v_reduc_type = COND_REDUCTION;
else if (!commutative_tree_code (code) || !associative_tree_code (code))
   {...}

? Your call!

Cheers, Alan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
  2015-10-19  8:34 Alan Hayward
@ 2015-10-21 10:46 ` Richard Biener
  2015-10-22 14:18 ` Alan Lawrence
  1 sibling, 0 replies; 26+ messages in thread
From: Richard Biener @ 2015-10-21 10:46 UTC (permalink / raw)
  To: Alan Hayward; +Cc: Alan Lawrence, gcc-patches

On Mon, Oct 19, 2015 at 10:17 AM, Alan Hayward <alan.hayward@arm.com> wrote:
>
>
>>On 30/09/2015 13:45, "Richard Biener" <richard.guenther@gmail.com> wrote:
>>
>>>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward <alan.hayward@arm.com>
>>>wrote:
>>>>
>>>>
>>>> On 18/09/2015 14:53, "Alan Hayward" <Alan.Hayward@arm.com> wrote:
>>>>
>>>>>
>>>>>
>>>>>On 18/09/2015 14:26, "Alan Lawrence" <Alan.Lawrence@arm.com> wrote:
>>>>>
>>>>>>On 18/09/15 13:17, Richard Biener wrote:
>>>>>>>
>>>>>>> Ok, I see.
>>>>>>>
>>>>>>> That this case is already vectorized is because it implements
>>>>>>>MAX_EXPR,
>>>>>>> modifying it slightly to
>>>>>>>
>>>>>>> int foo (int *a)
>>>>>>> {
>>>>>>>    int val = 0;
>>>>>>>    for (int i = 0; i < 1024; ++i)
>>>>>>>      if (a[i] > val)
>>>>>>>        val = a[i] + 1;
>>>>>>>    return val;
>>>>>>> }
>>>>>>>
>>>>>>> makes it no longer handled by current code.
>>>>>>>
>>>>>>
>>>>>>Yes. I believe the idea for the patch is to handle arbitrary
>>>>>>expressions
>>>>>>like
>>>>>>
>>>>>>int foo (int *a)
>>>>>>{
>>>>>>    int val = 0;
>>>>>>    for (int i = 0; i < 1024; ++i)
>>>>>>      if (some_expression (i))
>>>>>>        val = another_expression (i);
>>>>>>    return val;
>>>>>>}
>>>>>
>>>>>Yes, thatâs correct. Hopefully my new test cases should cover
>>>>>everything.
>>>>>
>>>>
>>>> Attached is a new version of the patch containing all the changes
>>>> requested by Richard.
>>>
>>>+      /* Compare the max index vector to the vector of found indexes to
>>>find
>>>+        the postion of the max value.  This will result in either a
>>>single
>>>+        match or all of the values.  */
>>>+      tree vec_compare = make_ssa_name (index_vec_type_signed);
>>>+      gimple vec_compare_stmt = gimple_build_assign (vec_compare,
>>>EQ_EXPR,
>>>+                                                    induction_index,
>>>+                                                    max_index_vec);
>>>
>>>I'm not sure all targets can handle this.  If I deciper the code
>>>correctly then we do
>>>
>>>  mask = induction_index == max_index_vec;
>>>  vec_and = mask & vec_data;
>>>
>>>plus some casts.  So this is basically
>>>
>>>  vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... };
>>>
>>>without the need to relate the induction index vector type to the data
>>>vector type.
>>>I believe this is also the form all targets support.
>>
>>
>>Ok, Iâll replace this.
>>
>>>
>>>I am missing a comment before all this code-generation that shows the
>>>transform
>>>result with the variable names used in the code-gen.  I have a hard
>>>time connecting
>>>things here.
>>
>>Ok, Iâll add some comments.
>>
>>>
>>>+      tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR,
>>>scalar_type,
>>>+                                            matched_data_reduc);
>>>+      epilog_stmt = gimple_build_assign (new_scalar_dest,
>>>+                                        matched_data_reduc_cast);
>>>+      new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
>>>+      gimple_assign_set_lhs (epilog_stmt, new_temp);
>>>
>>>this will leave the stmt unsimplified.  scalar sign-changes should use
>>>NOP_EXPR,
>>>not VIEW_CONVERT_EXPR.  The easiest fix is to use fold_convert instead.
>>>Also just do like before - first make_ssa_name and then directly use it
>>>in the
>>>gimple_build_assign.
>>
>>We need the VIEW_CONVERT_EXPR for the cases where we have float data
>>values. The index is always integer.
>>
>>
>>>
>>>The patch is somewhat hard to parse with all the indentation changes.  A
>>>context
>>>diff would be much easier to read in those contexts.
>>
>>Ok, Iâll make the next patch like that
>>
>>>
>>>+  if (v_reduc_type == COND_REDUCTION)
>>>+    {
>>>+      widest_int ni;
>>>+
>>>+      if (! max_loop_iterations (loop, &ni))
>>>+       {
>>>+         if (dump_enabled_p ())
>>>+           dump_printf_loc (MSG_NOTE, vect_location,
>>>+                            "loop count not known, cannot create cond "
>>>+                            "reduction.\n");
>>>
>>>ugh.  That's bad.
>>>
>>>+      /* The additional index will be the same type as the condition.
>>>Check
>>>+        that the loop can fit into this less one (because we'll use up
>>>the
>>>+        zero slot for when there are no matches).  */
>>>+      tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
>>>+      if (wi::geu_p (ni, wi::to_widest (max_index)))
>>>+       {
>>>+         if (dump_enabled_p ())
>>>+           dump_printf_loc (MSG_NOTE, vect_location,
>>>+                            "loop size is greater than data size.\n");
>>>+         return false;
>>>
>>>Likewise.
>>
>>We could do better if we made the index type larger.
>>But as a first implementation of this optimisation, I didnât want to
>>overcomplicate things more.
>>
>>>
>>>@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt,
>>>gimple_stmt_iterator *gsi,
>>>   if (dump_enabled_p ())
>>>     dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
>>>
>>>+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
>>>+
>>>   /* FORNOW: Multiple types are not supported for condition.  */
>>>   if (code == COND_EXPR)
>>>
>>>this change looks odd (or wrong).  The type should be _only_ set/changed
>>>during
>>>analysis.
>>
>>
>>The problem is, for COND_EXPRs, this function calls
>>vectorizable_condition(), which sets STMT_VINFO_TYPE to
>>condition_vec_info_type.

Ah, the pre-existing issue of the transform phase re-doing the analysis...
a fix would be to condition that call on vec_stmt == NULL, thus analysis
phase.

>>Therefore we need something to restore it back to reduc_vec_info_type on
>>the non-analysis call.
>>
>>I considered setting STMT_VINFO_TYPE to reduc_vec_info_type directly after
>>the call to vectorizable_condition(), but that looked worse.
>>I could back up the value of STMT_VINFO_TYPE before calling
>>vectorizable_condition() and then restore it after? I think thatâll look a
>>lot better.
>>
>>
>>>
>>>+
>>>+      /* For cond reductions we need to add an additional conditional
>>>based on
>>>+        the loop index.  */
>>>+      if (v_reduc_type == COND_REDUCTION)
>>>+       {
>>>+         int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>>>+         int k;
>>>...
>>>+         STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
>>>+         set_vinfo_for_stmt (index_condition, index_vec_info);
>>>+
>>>+         /* Update the phi with the vec cond.  */
>>>+         add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
>>>+                      UNKNOWN_LOCATION);
>>>
>>>same as before - I am missing a comment that shows the generated code
>>>and connects
>>>the local vars used.
>>
>>Ok, Iâll add something
>>
>>>
>>>
>>>+         tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
>>>+         gimple ccompare_stmt = gimple_build_assign (ccompare_name,
>>>ccompare);
>>>+         gsi_insert_before (&vec_stmt_gsi, ccompare_stmt,
>>>GSI_SAME_STMT);
>>>+         gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
>>>
>>>hum - are you sure this works with ncopies > 1?  Will it use the
>>>correct vec_stmt?
>>
>>We donât support this when ncopies >1.
>>
>>In vectorizable_reduction():
>>
>>if ((double_reduc || v_reduc_type == COND_REDUCTION) && ncopies > 1)
>>    {
>>      if (dump_enabled_p ())
>>       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>                        "multiple types in double reduction or condition "
>>                        "reduction.\n");
>>      return false;
>>    }
>>
>>
>>
>>>
>>>I still dislike the v_reduc_type plastered and passed everywhere.  Can
>>>you explore
>>>adding the reduction kind to stmt_info?
>>
>>Ok, I can do that.
>>
>>
>>Thanks for the comments.
>>Iâll put together a patch with the above changes.
>>
>>Thanks,
>>Alan.
>>
>>
>
> Richard, as requested I've updated with the follow changes:
>
> * AND and EQ replaced with a COND_EXPR
> * Better comments for the code gen, included references to variable names
> * Kept the VIEW_CONVERT_EXPR - we need this for when the data is float type
> * Backed up STMT_VINFO_TYPE before the call to vectorizable_condition()
> and restored after the call. I considered extracting the relvant parts of
> vectorizable_condition() into a sub function, but it had too many
> dependencies with the rest of vectorizable_condition().
> * v_reduc_type is now part of stmt_info
> * Created a diff using 50 lines of context

Ok with the vectorizable_condition call guarded as suggested instead.

Thanks,
Richard.

>
> Thanks,
> Alan.
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
@ 2015-10-19  8:34 Alan Hayward
  2015-10-21 10:46 ` Richard Biener
  2015-10-22 14:18 ` Alan Lawrence
  0 siblings, 2 replies; 26+ messages in thread
From: Alan Hayward @ 2015-10-19  8:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: Alan Lawrence, gcc-patches, Alan Hayward

[-- Attachment #1: Type: text/plain, Size: 8139 bytes --]



>On 30/09/2015 13:45, "Richard Biener" <richard.guenther@gmail.com> wrote:
>
>>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward <alan.hayward@arm.com>
>>wrote:
>>>
>>>
>>> On 18/09/2015 14:53, "Alan Hayward" <Alan.Hayward@arm.com> wrote:
>>>
>>>>
>>>>
>>>>On 18/09/2015 14:26, "Alan Lawrence" <Alan.Lawrence@arm.com> wrote:
>>>>
>>>>>On 18/09/15 13:17, Richard Biener wrote:
>>>>>>
>>>>>> Ok, I see.
>>>>>>
>>>>>> That this case is already vectorized is because it implements
>>>>>>MAX_EXPR,
>>>>>> modifying it slightly to
>>>>>>
>>>>>> int foo (int *a)
>>>>>> {
>>>>>>    int val = 0;
>>>>>>    for (int i = 0; i < 1024; ++i)
>>>>>>      if (a[i] > val)
>>>>>>        val = a[i] + 1;
>>>>>>    return val;
>>>>>> }
>>>>>>
>>>>>> makes it no longer handled by current code.
>>>>>>
>>>>>
>>>>>Yes. I believe the idea for the patch is to handle arbitrary
>>>>>expressions
>>>>>like
>>>>>
>>>>>int foo (int *a)
>>>>>{
>>>>>    int val = 0;
>>>>>    for (int i = 0; i < 1024; ++i)
>>>>>      if (some_expression (i))
>>>>>        val = another_expression (i);
>>>>>    return val;
>>>>>}
>>>>
>>>>Yes, thatâs correct. Hopefully my new test cases should cover
>>>>everything.
>>>>
>>>
>>> Attached is a new version of the patch containing all the changes
>>> requested by Richard.
>>
>>+      /* Compare the max index vector to the vector of found indexes to
>>find
>>+        the postion of the max value.  This will result in either a
>>single
>>+        match or all of the values.  */
>>+      tree vec_compare = make_ssa_name (index_vec_type_signed);
>>+      gimple vec_compare_stmt = gimple_build_assign (vec_compare,
>>EQ_EXPR,
>>+                                                    induction_index,
>>+                                                    max_index_vec);
>>
>>I'm not sure all targets can handle this.  If I deciper the code
>>correctly then we do
>>
>>  mask = induction_index == max_index_vec;
>>  vec_and = mask & vec_data;
>>
>>plus some casts.  So this is basically
>>
>>  vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... };
>>
>>without the need to relate the induction index vector type to the data
>>vector type.
>>I believe this is also the form all targets support.
>
>
>Ok, Iâll replace this.
>
>>
>>I am missing a comment before all this code-generation that shows the
>>transform
>>result with the variable names used in the code-gen.  I have a hard
>>time connecting
>>things here.
>
>Ok, Iâll add some comments.
>
>>
>>+      tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR,
>>scalar_type,
>>+                                            matched_data_reduc);
>>+      epilog_stmt = gimple_build_assign (new_scalar_dest,
>>+                                        matched_data_reduc_cast);
>>+      new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
>>+      gimple_assign_set_lhs (epilog_stmt, new_temp);
>>
>>this will leave the stmt unsimplified.  scalar sign-changes should use
>>NOP_EXPR,
>>not VIEW_CONVERT_EXPR.  The easiest fix is to use fold_convert instead.
>>Also just do like before - first make_ssa_name and then directly use it
>>in the
>>gimple_build_assign.
>
>We need the VIEW_CONVERT_EXPR for the cases where we have float data
>values. The index is always integer.
>
>
>>
>>The patch is somewhat hard to parse with all the indentation changes.  A
>>context
>>diff would be much easier to read in those contexts.
>
>Ok, Iâll make the next patch like that
>
>>
>>+  if (v_reduc_type == COND_REDUCTION)
>>+    {
>>+      widest_int ni;
>>+
>>+      if (! max_loop_iterations (loop, &ni))
>>+       {
>>+         if (dump_enabled_p ())
>>+           dump_printf_loc (MSG_NOTE, vect_location,
>>+                            "loop count not known, cannot create cond "
>>+                            "reduction.\n");
>>
>>ugh.  That's bad.
>>
>>+      /* The additional index will be the same type as the condition.
>>Check
>>+        that the loop can fit into this less one (because we'll use up
>>the
>>+        zero slot for when there are no matches).  */
>>+      tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
>>+      if (wi::geu_p (ni, wi::to_widest (max_index)))
>>+       {
>>+         if (dump_enabled_p ())
>>+           dump_printf_loc (MSG_NOTE, vect_location,
>>+                            "loop size is greater than data size.\n");
>>+         return false;
>>
>>Likewise.
>
>We could do better if we made the index type larger.
>But as a first implementation of this optimisation, I didnât want to
>overcomplicate things more.
>
>>
>>@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt,
>>gimple_stmt_iterator *gsi,
>>   if (dump_enabled_p ())
>>     dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
>>
>>+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
>>+
>>   /* FORNOW: Multiple types are not supported for condition.  */
>>   if (code == COND_EXPR)
>>
>>this change looks odd (or wrong).  The type should be _only_ set/changed
>>during
>>analysis.
>
>
>The problem is, for COND_EXPRs, this function calls
>vectorizable_condition(), which sets STMT_VINFO_TYPE to
>condition_vec_info_type.
>
>Therefore we need something to restore it back to reduc_vec_info_type on
>the non-analysis call.
>
>I considered setting STMT_VINFO_TYPE to reduc_vec_info_type directly after
>the call to vectorizable_condition(), but that looked worse.
>I could back up the value of STMT_VINFO_TYPE before calling
>vectorizable_condition() and then restore it after? I think thatâll look a
>lot better.
>
>
>>
>>+
>>+      /* For cond reductions we need to add an additional conditional
>>based on
>>+        the loop index.  */
>>+      if (v_reduc_type == COND_REDUCTION)
>>+       {
>>+         int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>>+         int k;
>>...
>>+         STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
>>+         set_vinfo_for_stmt (index_condition, index_vec_info);
>>+
>>+         /* Update the phi with the vec cond.  */
>>+         add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
>>+                      UNKNOWN_LOCATION);
>>
>>same as before - I am missing a comment that shows the generated code
>>and connects
>>the local vars used.
>
>Ok, Iâll add something
>
>>
>>
>>+         tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
>>+         gimple ccompare_stmt = gimple_build_assign (ccompare_name,
>>ccompare);
>>+         gsi_insert_before (&vec_stmt_gsi, ccompare_stmt,
>>GSI_SAME_STMT);
>>+         gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
>>
>>hum - are you sure this works with ncopies > 1?  Will it use the
>>correct vec_stmt?
>
>We donât support this when ncopies >1.
>
>In vectorizable_reduction():
>
>if ((double_reduc || v_reduc_type == COND_REDUCTION) && ncopies > 1)
>    {
>      if (dump_enabled_p ())
>	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>			 "multiple types in double reduction or condition "
>			 "reduction.\n");
>      return false;
>    }
>
>
>
>>
>>I still dislike the v_reduc_type plastered and passed everywhere.  Can
>>you explore
>>adding the reduction kind to stmt_info?
>
>Ok, I can do that.
>
>
>Thanks for the comments.
>Iâll put together a patch with the above changes.
>
>Thanks,
>Alan.
>
>

Richard, as requested I've updated with the follow changes:

* AND and EQ replaced with a COND_EXPR
* Better comments for the code gen, included references to variable names
* Kept the VIEW_CONVERT_EXPR - we need this for when the data is float type
* Backed up STMT_VINFO_TYPE before the call to vectorizable_condition()
and restored after the call. I considered extracting the relvant parts of
vectorizable_condition() into a sub function, but it had too many
dependencies with the rest of vectorizable_condition().
* v_reduc_type is now part of stmt_info
* Created a diff using 50 lines of context


Thanks,
Alan.



[-- Attachment #2: 0001-Support-for-vectorizing-conditional-expressions.patch --]
[-- Type: application/octet-stream, Size: 111526 bytes --]

From c8c27b2d630a99cc3b5e508608ebcdd7c1b9f9e1 Mon Sep 17 00:00:00 2001
From: Alan Hayward <alan.hayward@arm.com>
Date: Fri, 28 Aug 2015 10:01:15 +0100
Subject: [PATCH] Support for vectorizing conditional expressions

2015-08-28  Alan Hayward <alan.hayward@arm.com>

	PR tree-optimization/65947
	* tree-vect-loop.c
	(vect_is_simple_reduction_1): Find condition reductions.
	(vect_model_reduction_cost): Add condition reduction costs.
	(get_initial_def_for_reduction): Add condition reduction initial var.
	(vect_create_epilog_for_reduction): Add condition reduction epilog.
	(vectorizable_reduction): Condition reduction support.
	* tree-vect-stmts.c
	(vectorizable_condition): Add vect reduction arg
	* doc/sourcebuild.texi (Vector-specific attributes): Document
	vect_max_reduc

    testsuite/Changelog:

	PR tree-optimization/65947
	* lib/target-supports.exp
	(check_effective_target_vect_max_reduc): Add.
	* gcc.dg/vect/pr65947-1.c: New test.
	* gcc.dg/vect/pr65947-2.c: New test.
	* gcc.dg/vect/pr65947-3.c: New test.
	* gcc.dg/vect/pr65947-4.c: New test.
	* gcc.dg/vect/pr65947-5.c: New test.
	* gcc.dg/vect/pr65947-6.c: New test.
	* gcc.dg/vect/pr65947-7.c: New test.
	* gcc.dg/vect/pr65947-8.c: New test.
	* gcc.dg/vect/pr65947-9.c: New test.
	* gcc.dg/vect/pr65947-10.c: New test.
	* gcc.dg/vect/pr65947-11.c: New test.
---
 gcc/doc/sourcebuild.texi               |   3 +
 gcc/testsuite/gcc.dg/vect/pr65947-1.c  |  39 +++
 gcc/testsuite/gcc.dg/vect/pr65947-10.c |  40 +++
 gcc/testsuite/gcc.dg/vect/pr65947-11.c |  48 +++
 gcc/testsuite/gcc.dg/vect/pr65947-2.c  |  40 +++
 gcc/testsuite/gcc.dg/vect/pr65947-3.c  |  50 ++++
 gcc/testsuite/gcc.dg/vect/pr65947-4.c  |  40 +++
 gcc/testsuite/gcc.dg/vect/pr65947-5.c  |  41 +++
 gcc/testsuite/gcc.dg/vect/pr65947-6.c  |  39 +++
 gcc/testsuite/gcc.dg/vect/pr65947-7.c  |  51 ++++
 gcc/testsuite/gcc.dg/vect/pr65947-8.c  |  41 +++
 gcc/testsuite/gcc.dg/vect/pr65947-9.c  |  42 +++
 gcc/testsuite/lib/target-supports.exp  |   9 +
 gcc/tree-vect-loop.c                   | 513 +++++++++++++++++++++++++++------
 gcc/tree-vect-stmts.c                  |  34 ++-
 gcc/tree-vectorizer.h                  |  11 +
 16 files changed, 938 insertions(+), 103 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-10.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-11.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-5.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-6.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-7.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-8.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr65947-9.c

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 5dc7c81..61de4a5 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1419,100 +1419,103 @@ to @code{short}.
 
 @item vect_widen_sum_qi_to_si
 Target supports a vector widening summation of @code{char} operands
 into @code{int} results.
 
 @item vect_widen_mult_qi_to_hi
 Target supports a vector widening multiplication of @code{char} operands
 into @code{short} results, or can promote (unpack) from @code{char} to
 @code{short} and perform non-widening multiplication of @code{short}.
 
 @item vect_widen_mult_hi_to_si
 Target supports a vector widening multiplication of @code{short} operands
 into @code{int} results, or can promote (unpack) from @code{short} to
 @code{int} and perform non-widening multiplication of @code{int}.
 
 @item vect_widen_mult_si_to_di_pattern
 Target supports a vector widening multiplication of @code{int} operands
 into @code{long} results.
 
 @item vect_sdot_qi
 Target supports a vector dot-product of @code{signed char}.
 
 @item vect_udot_qi
 Target supports a vector dot-product of @code{unsigned char}.
 
 @item vect_sdot_hi
 Target supports a vector dot-product of @code{signed short}.
 
 @item vect_udot_hi
 Target supports a vector dot-product of @code{unsigned short}.
 
 @item vect_pack_trunc
 Target supports a vector demotion (packing) of @code{short} to @code{char}
 and from @code{int} to @code{short} using modulo arithmetic.
 
 @item vect_unpack
 Target supports a vector promotion (unpacking) of @code{char} to @code{short}
 and from @code{char} to @code{int}.
 
 @item vect_intfloat_cvt
 Target supports conversion from @code{signed int} to @code{float}.
 
 @item vect_uintfloat_cvt
 Target supports conversion from @code{unsigned int} to @code{float}.
 
 @item vect_floatint_cvt
 Target supports conversion from @code{float} to @code{signed int}.
 
 @item vect_floatuint_cvt
 Target supports conversion from @code{float} to @code{unsigned int}.
+
+@item vect_max_reduc
+Target supports max reduction for vectors.
 @end table
 
 @subsubsection Thread Local Storage attributes
 
 @table @code
 @item tls
 Target supports thread-local storage.
 
 @item tls_native
 Target supports native (rather than emulated) thread-local storage.
 
 @item tls_runtime
 Test system supports executing TLS executables.
 @end table
 
 @subsubsection Decimal floating point attributes
 
 @table @code
 @item dfp
 Targets supports compiling decimal floating point extension to C.
 
 @item dfp_nocache
 Including the options used to compile this particular test, the
 target supports compiling decimal floating point extension to C.
 
 @item dfprt
 Test system can execute decimal floating point tests.
 
 @item dfprt_nocache
 Including the options used to compile this particular test, the
 test system can execute decimal floating point tests.
 
 @item hard_dfp
 Target generates decimal floating point instructions with current options.
 @end table
 
 @subsubsection ARM-specific attributes
 
 @table @code
 @item arm32
 ARM target generates 32-bit code.
 
 @item arm_eabi
 ARM target adheres to the ABI for the ARM Architecture.
 
 @item arm_hf_eabi
 ARM target adheres to the VFP and Advanced SIMD Register Arguments
 variant of the ABI for the ARM Architecture (as selected with
 @code{-mfloat-abi=hard}).
 
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-1.c b/gcc/testsuite/gcc.dg/vect/pr65947-1.c
new file mode 100644
index 0000000..7933f5c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-1.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 32
+
+/* Simple condition reduction.  */
+
+int
+condition_reduction (int *a, int min_v)
+{
+  int last = -1;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = i;
+
+  return last;
+}
+
+int
+main (void)
+{
+  int a[N] = {
+  11, -12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, -3, 4, 5, 6, 7, -8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+
+  int ret = condition_reduction (a, 16);
+
+  if (ret != 19)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-10.c b/gcc/testsuite/gcc.dg/vect/pr65947-10.c
new file mode 100644
index 0000000..9a43a60
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-10.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 32
+
+/* Non-integer data types.  */
+
+float
+condition_reduction (float *a, float min_v)
+{
+  float last = 0;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+int
+main (void)
+{
+  float a[N] = {
+  11.5, 12.2, 13.22, 14.1, 15.2, 16.3, 17, 18.7, 19, 20,
+  1, 2, 3.3, 4.3333, 5.5, 6.23, 7, 8.63, 9, 10.6,
+  21, 22.12, 23.55, 24.76, 25, 26, 27.34, 28.765, 29, 30,
+  31.111, 32.322
+  };
+
+  float ret = condition_reduction (a, 16.7);
+
+  if (ret != (float)10.6)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-11.c b/gcc/testsuite/gcc.dg/vect/pr65947-11.c
new file mode 100644
index 0000000..6deff00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-11.c
@@ -0,0 +1,48 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 37
+
+/* Re-use the result of the condition inside the loop.  Will fail to
+   vectorize.  */
+
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v, unsigned int *b)
+{
+  unsigned int last = N + 65;
+
+  for (unsigned int i = 0; i < N; i++)
+    {
+      if (b[i] < min_v)
+	last = i;
+      a[i] = last;
+    }
+  return last;
+}
+
+int
+main (void)
+{
+  unsigned int a[N] = {
+  31, 32, 33, 34, 35, 36, 37,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20
+  };
+  unsigned int b[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  31, 32, 33, 34, 35, 36, 37
+  };
+
+  unsigned int ret = condition_reduction (a, 16, b);
+
+  if (ret != 29)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-2.c b/gcc/testsuite/gcc.dg/vect/pr65947-2.c
new file mode 100644
index 0000000..9c627d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-2.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 254
+
+/* Non-simple condition reduction.  */
+
+unsigned char
+condition_reduction (unsigned char *a, unsigned char min_v)
+{
+  unsigned char last = 65;
+
+  for (unsigned char i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+int
+main (void)
+{
+  unsigned char a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+  __builtin_memset (a+32, 43, N-32);
+
+  unsigned char ret = condition_reduction (a, 16);
+
+  if (ret != 10)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-3.c b/gcc/testsuite/gcc.dg/vect/pr65947-3.c
new file mode 100644
index 0000000..e115de2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-3.c
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 37
+
+/* Non-simple condition reduction with additional variable and unsigned
+   types.  */
+
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v, unsigned int *b)
+{
+  unsigned int last = N + 65;
+  unsigned int aval;
+
+  for (unsigned int i = 0; i < N; i++)
+    {
+      aval = a[i];
+      if (b[i] < min_v)
+	last = aval;
+    }
+  return last;
+}
+
+
+int
+main (void)
+{
+  unsigned int a[N] = {
+  31, 32, 33, 34, 35, 36, 37,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20
+  };
+  unsigned int b[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  31, 32, 33, 34, 35, 36, 37
+  };
+
+  unsigned int ret = condition_reduction (a, 16, b);
+
+  if (ret != 13)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-4.c b/gcc/testsuite/gcc.dg/vect/pr65947-4.c
new file mode 100644
index 0000000..76a0567
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-4.c
@@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 27
+
+/* Condition reduction with no valid matches at runtime.  */
+
+int
+condition_reduction (int *a, int min_v)
+{
+  int last = N + 96;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] > min_v)
+      last = i;
+
+  return last;
+}
+
+int
+main (void)
+{
+  int a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27
+  };
+
+  int ret = condition_reduction (a, 46);
+
+  /* loop should never have found a value.  */
+  if (ret != N + 96)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-5.c b/gcc/testsuite/gcc.dg/vect/pr65947-5.c
new file mode 100644
index 0000000..360e3b5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-5.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 32
+
+/* Condition reduction where loop size is not known at compile time.  Will fail
+   to vectorize.  Version inlined into main loop will vectorize.  */
+
+unsigned char
+condition_reduction (unsigned char *a, unsigned char min_v, int count)
+{
+  unsigned char last = 65;
+
+  for (int i = 0; i < count; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+int
+main (void)
+{
+  unsigned char a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+
+  unsigned char ret = condition_reduction (a, 16, N);
+
+  if (ret != 10)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-6.c b/gcc/testsuite/gcc.dg/vect/pr65947-6.c
new file mode 100644
index 0000000..4997ef7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-6.c
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 30
+
+/* Condition reduction where loop type is different than the data type.  */
+
+int
+condition_reduction (int *a, int min_v)
+{
+  int last = N + 65;
+
+  for (char i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+
+int
+main (void)
+{
+  int a[N] = {
+  67, 32, 45, 43, 21, -11, 12, 3, 4, 5,
+  6, 76, -32, 56, -32, -1, 4, 5, 6, 99,
+  43, 22, -3, 22, 16, 34, 55, 31, 87, 324
+  };
+
+  int ret = condition_reduction (a, 16);
+
+  if (ret != -3)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-7.c b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
new file mode 100644
index 0000000..1044119
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-7.c
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 43
+
+/* Condition reduction with comparison is a different type to the data.  Will
+   fail to vectorize.  */
+
+int
+condition_reduction (short *a, int min_v, int *b)
+{
+  int last = N + 65;
+  short aval;
+
+  for (int i = 0; i < N; i++)
+    {
+      aval = a[i];
+      if (b[i] < min_v)
+	last = aval;
+    }
+  return last;
+}
+
+int
+main (void)
+{
+  short a[N] = {
+  31, -32, 133, 324, 335, 36, 37, 45, 11, 65,
+  1, -28, 3, 48, 5, -68, 7, 88, 89, 180,
+  121, -122, 123, 124, -125, 126, 127, 128, 129, 130,
+  11, 12, 13, 14, -15, -16, 17, 18, 19, 20,
+  33, 27, 99
+  };
+  int b[N] = {
+  11, -12, -13, 14, 15, 16, 17, 18, 19, 20,
+  21, -22, 23, 24, -25, 26, 27, 28, 29, 30,
+  1, 62, 3, 14, -15, 6, 37, 48, 99, 10,
+  31, -32, 33, 34, -35, 36, 37, 56, 54, 22,
+  73, 2, 87
+  };
+
+  int ret = condition_reduction (a, 16, b);
+
+  if (ret != 27)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-8.c b/gcc/testsuite/gcc.dg/vect/pr65947-8.c
new file mode 100644
index 0000000..5cdbbe0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-8.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 27
+
+/* Condition reduction with multiple types in the comparison.  Will fail to
+   vectorize.  */
+
+int
+condition_reduction (char *a, int min_v)
+{
+  int last = N + 65;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+
+int
+main (void)
+{
+  char a[N] = {
+  1, 28, 3, 48, 5, 68, 7, -88, 89, 180,
+  121, 122, -123, 124, 12, -12, 12, 67, 84, 122,
+  67, 55, 112, 22, 45, 23, 111
+  };
+
+  int ret = condition_reduction (a, 16);
+
+  if (ret != 12)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "multiple types in double reduction or condition reduction" "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-9.c b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
new file mode 100644
index 0000000..d0da13f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-9.c
@@ -0,0 +1,42 @@
+/* { dg-require-effective-target vect_condition } */
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 255
+
+/* Condition reduction with maximum possible loop size.  Will fail to
+   vectorize because the vectorisation requires a slot for default values.  */
+
+char
+condition_reduction (char *a, char min_v)
+{
+  char last = -72;
+
+  for (int i = 0; i < N; i++)
+    if (a[i] < min_v)
+      last = a[i];
+
+  return last;
+}
+
+char
+main (void)
+{
+  char a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+  31, 32
+  };
+  __builtin_memset (a+32, 43, N-32);
+
+  char ret = condition_reduction (a, 16);
+
+  if (ret != 10)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { xfail { ! vect_max_reduc } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 9057a27..9ac4abc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6428,50 +6428,59 @@ proc check_effective_target_pie_copyreloc { } {
 	# reloc.  Include the current process ID in the file names to
 	# prevent conflicts with invocations for multiple testsuites.
 
 	set src pie[pid].c
 	set obj pie[pid].o
 
 	set f [open $src "w"]
 	puts $f "#include \"../../auto-host.h\""
 	puts $f "#if HAVE_LD_PIE_COPYRELOC == 0"
 	puts $f "# error Linker does not support PIE with copy reloc."
 	puts $f "#endif"
 	close $f
 
 	verbose "check_effective_target_pie_copyreloc compiling testfile $src" 2
 	set lines [${tool}_target_compile $src $obj object ""]
 
 	file delete $src
 	file delete $obj
 
 	if [string match "" $lines] then {
 	    verbose "check_effective_target_pie_copyreloc testfile compilation passed" 2
 	    set pie_copyreloc_available_saved 1
 	} else {
 	    verbose "check_effective_target_pie_copyreloc testfile compilation failed" 2
 	    set pie_copyreloc_available_saved 0
 	}
     }
 
     return $pie_copyreloc_available_saved
 }
 
 # Return 1 if the target uses comdat groups.
 
 proc check_effective_target_comdat_group {} {
     return [check_no_messages_and_pattern comdat_group "\.section\[^\n\r]*,comdat" assembly {
 	// C++
 	inline int foo () { return 1; }
 	int (*fn) () = foo;
     }]
 }
 
 # Return 1 if target supports __builtin_eh_return
 proc check_effective_target_builtin_eh_return { } {
     return [check_no_compiler_messages builtin_eh_return object {
 	void test (long l, void *p)
 	{
 	    __builtin_eh_return (l, p);
 	}
     } "" ]
 }
+
+# Return 1 if the target supports max reduction for vectors.
+
+proc check_effective_target_vect_max_reduc { } {
+    if { [istarget aarch64*-*-*] || [istarget arm*-*-*] } {
+	return 1
+    }
+    return 0
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 63e29aa..ec145f9 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2275,125 +2275,132 @@ vect_is_slp_reduction (loop_vec_info loop_info, gimple *phi,
       lhs = gimple_assign_lhs (next_stmt);
       next_stmt = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next_stmt));
     }
 
   /* Save the chain for further analysis in SLP detection.  */
   first = GROUP_FIRST_ELEMENT (vinfo_for_stmt (current_stmt));
   LOOP_VINFO_REDUCTION_CHAINS (loop_info).safe_push (first);
   GROUP_SIZE (vinfo_for_stmt (first)) = size;
 
   return true;
 }
 
 
 /* Function vect_is_simple_reduction_1
 
    (1) Detect a cross-iteration def-use cycle that represents a simple
    reduction computation.  We look for the following pattern:
 
    loop_header:
      a1 = phi < a0, a2 >
      a3 = ...
      a2 = operation (a3, a1)
 
    or
 
    a3 = ...
    loop_header:
      a1 = phi < a0, a2 >
      a2 = operation (a3, a1)
 
    such that:
    1. operation is commutative and associative and it is safe to
       change the order of the computation (if CHECK_REDUCTION is true)
    2. no uses for a2 in the loop (a2 is used out of the loop)
    3. no uses of a1 in the loop besides the reduction operation
    4. no uses of a1 outside the loop.
 
    Conditions 1,4 are tested here.
    Conditions 2,3 are tested in vect_mark_stmts_to_be_vectorized.
 
    (2) Detect a cross-iteration def-use cycle in nested loops, i.e.,
    nested cycles, if CHECK_REDUCTION is false.
 
    (3) Detect cycles of phi nodes in outer-loop vectorization, i.e., double
    reductions:
 
      a1 = phi < a0, a2 >
      inner loop (def of a3)
      a2 = phi < a3 >
 
+   (4) Detect condition expressions, ie:
+     for (int i = 0; i < N; i++)
+       if (a[i] < val)
+	ret_val = a[i];
+
    If MODIFY is true it tries also to rework the code in-place to enable
    detection of more reduction patterns.  For the time being we rewrite
    "res -= RHS" into "rhs += -RHS" when it seems worthwhile.
 */
 
 static gimple *
 vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple *phi,
 			    bool check_reduction, bool *double_reduc,
-			    bool modify, bool need_wrapping_integral_overflow)
+			    bool modify, bool need_wrapping_integral_overflow,
+			    enum vect_reduction_type *v_reduc_type)
 {
   struct loop *loop = (gimple_bb (phi))->loop_father;
   struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
   edge latch_e = loop_latch_edge (loop);
   tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
   gimple *def_stmt, *def1 = NULL, *def2 = NULL;
   enum tree_code orig_code, code;
   tree op1, op2, op3 = NULL_TREE, op4 = NULL_TREE;
   tree type;
   int nloop_uses;
   tree name;
   imm_use_iterator imm_iter;
   use_operand_p use_p;
   bool phi_def;
 
   *double_reduc = false;
+  *v_reduc_type = TREE_CODE_REDUCTION;
 
   /* If CHECK_REDUCTION is true, we assume inner-most loop vectorization,
      otherwise, we assume outer loop vectorization.  */
   gcc_assert ((check_reduction && loop == vect_loop)
               || (!check_reduction && flow_loop_nested_p (vect_loop, loop)));
 
   name = PHI_RESULT (phi);
   /* ???  If there are no uses of the PHI result the inner loop reduction
      won't be detected as possibly double-reduction by vectorizable_reduction
      because that tries to walk the PHI arg from the preheader edge which
      can be constant.  See PR60382.  */
   if (has_zero_uses (name))
     return NULL;
   nloop_uses = 0;
   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
     {
       gimple *use_stmt = USE_STMT (use_p);
       if (is_gimple_debug (use_stmt))
 	continue;
 
       if (!flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
         {
           if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "intermediate value used outside loop.\n");
 
           return NULL;
         }
 
       nloop_uses++;
       if (nloop_uses > 1)
         {
           if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "reduction used in loop.\n");
           return NULL;
         }
     }
 
   if (TREE_CODE (loop_arg) != SSA_NAME)
     {
       if (dump_enabled_p ())
 	{
 	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			   "reduction: not ssa_name: ");
 	  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, loop_arg);
           dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
 	}
       return NULL;
     }
@@ -2445,382 +2452,406 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple *phi,
 	}
     }
 
   /* If DEF_STMT is a phi node itself, we expect it to have a single argument
      defined in the inner loop.  */
   if (phi_def)
     {
       op1 = PHI_ARG_DEF (def_stmt, 0);
 
       if (gimple_phi_num_args (def_stmt) != 1
           || TREE_CODE (op1) != SSA_NAME)
         {
           if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "unsupported phi node definition.\n");
 
           return NULL;
         }
 
       def1 = SSA_NAME_DEF_STMT (op1);
       if (gimple_bb (def1)
 	  && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))
           && loop->inner
           && flow_bb_inside_loop_p (loop->inner, gimple_bb (def1))
           && is_gimple_assign (def1))
         {
           if (dump_enabled_p ())
             report_vect_op (MSG_NOTE, def_stmt,
 			    "detected double reduction: ");
 
           *double_reduc = true;
           return def_stmt;
         }
 
       return NULL;
     }
 
   code = orig_code = gimple_assign_rhs_code (def_stmt);
 
   /* We can handle "res -= x[i]", which is non-associative by
      simply rewriting this into "res += -x[i]".  Avoid changing
      gimple instruction for the first simple tests and only do this
      if we're allowed to change code at all.  */
   if (code == MINUS_EXPR
       && modify
       && (op1 = gimple_assign_rhs1 (def_stmt))
       && TREE_CODE (op1) == SSA_NAME
       && SSA_NAME_DEF_STMT (op1) == phi)
     code = PLUS_EXPR;
 
-  if (check_reduction
-      && (!commutative_tree_code (code) || !associative_tree_code (code)))
+  if (check_reduction)
     {
-      if (dump_enabled_p ())
-        report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: not commutative/associative: ");
-      return NULL;
+      if (code != COND_EXPR
+	  && (!commutative_tree_code (code) || !associative_tree_code (code)))
+	{
+	  if (dump_enabled_p ())
+	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			    "reduction: not commutative/associative: ");
+	  return NULL;
+	}
+
+      if (code == COND_EXPR)
+	*v_reduc_type = COND_REDUCTION;
     }
 
   if (get_gimple_rhs_class (code) != GIMPLE_BINARY_RHS)
     {
       if (code != COND_EXPR)
         {
 	  if (dump_enabled_p ())
 	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
 			    "reduction: not binary operation: ");
 
           return NULL;
         }
 
       op3 = gimple_assign_rhs1 (def_stmt);
       if (COMPARISON_CLASS_P (op3))
         {
           op4 = TREE_OPERAND (op3, 1);
           op3 = TREE_OPERAND (op3, 0);
         }
 
       op1 = gimple_assign_rhs2 (def_stmt);
       op2 = gimple_assign_rhs3 (def_stmt);
 
       if (TREE_CODE (op1) != SSA_NAME && TREE_CODE (op2) != SSA_NAME)
         {
           if (dump_enabled_p ())
             report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
 			    "reduction: uses not ssa_names: ");
 
           return NULL;
         }
     }
   else
     {
       op1 = gimple_assign_rhs1 (def_stmt);
       op2 = gimple_assign_rhs2 (def_stmt);
 
       if (TREE_CODE (op1) != SSA_NAME && TREE_CODE (op2) != SSA_NAME)
         {
           if (dump_enabled_p ())
 	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
 			    "reduction: uses not ssa_names: ");
 
           return NULL;
         }
    }
 
   type = TREE_TYPE (gimple_assign_lhs (def_stmt));
   if ((TREE_CODE (op1) == SSA_NAME
        && !types_compatible_p (type,TREE_TYPE (op1)))
       || (TREE_CODE (op2) == SSA_NAME
           && !types_compatible_p (type, TREE_TYPE (op2)))
       || (op3 && TREE_CODE (op3) == SSA_NAME
           && !types_compatible_p (type, TREE_TYPE (op3)))
       || (op4 && TREE_CODE (op4) == SSA_NAME
           && !types_compatible_p (type, TREE_TYPE (op4))))
     {
       if (dump_enabled_p ())
         {
           dump_printf_loc (MSG_NOTE, vect_location,
 			   "reduction: multiple types: operation type: ");
           dump_generic_expr (MSG_NOTE, TDF_SLIM, type);
           dump_printf (MSG_NOTE, ", operands types: ");
           dump_generic_expr (MSG_NOTE, TDF_SLIM,
 			     TREE_TYPE (op1));
           dump_printf (MSG_NOTE, ",");
           dump_generic_expr (MSG_NOTE, TDF_SLIM,
 			     TREE_TYPE (op2));
           if (op3)
             {
               dump_printf (MSG_NOTE, ",");
               dump_generic_expr (MSG_NOTE, TDF_SLIM,
 				 TREE_TYPE (op3));
             }
 
           if (op4)
             {
               dump_printf (MSG_NOTE, ",");
               dump_generic_expr (MSG_NOTE, TDF_SLIM,
 				 TREE_TYPE (op4));
             }
           dump_printf (MSG_NOTE, "\n");
         }
 
       return NULL;
     }
 
   /* Check that it's ok to change the order of the computation.
      Generally, when vectorizing a reduction we change the order of the
      computation.  This may change the behavior of the program in some
      cases, so we need to check that this is ok.  One exception is when
      vectorizing an outer-loop: the inner-loop is executed sequentially,
      and therefore vectorizing reductions in the inner-loop during
      outer-loop vectorization is safe.  */
 
-  /* CHECKME: check for !flag_finite_math_only too?  */
-  if (SCALAR_FLOAT_TYPE_P (type) && !flag_associative_math
-      && check_reduction)
-    {
-      /* Changing the order of operations changes the semantics.  */
-      if (dump_enabled_p ())
-	report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: unsafe fp math optimization: ");
-      return NULL;
-    }
-  else if (INTEGRAL_TYPE_P (type) && check_reduction)
+  if (*v_reduc_type != COND_REDUCTION)
     {
-      if (!operation_no_trapping_overflow (type, code))
+      /* CHECKME: check for !flag_finite_math_only too?  */
+      if (SCALAR_FLOAT_TYPE_P (type) && !flag_associative_math
+	  && check_reduction)
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
 	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			    "reduction: unsafe int math optimization"
-			    " (overflow traps): ");
+			"reduction: unsafe fp math optimization: ");
 	  return NULL;
 	}
-      if (need_wrapping_integral_overflow
-	  && !TYPE_OVERFLOW_WRAPS (type)
-	  && operation_can_overflow (code))
+      else if (INTEGRAL_TYPE_P (type) && check_reduction)
+	{
+	  if (!operation_no_trapping_overflow (type, code))
+	    {
+	      /* Changing the order of operations changes the semantics.  */
+	      if (dump_enabled_p ())
+		report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+				"reduction: unsafe int math optimization"
+				" (overflow traps): ");
+	      return NULL;
+	    }
+	  if (need_wrapping_integral_overflow
+	      && !TYPE_OVERFLOW_WRAPS (type)
+	      && operation_can_overflow (code))
+	    {
+	      /* Changing the order of operations changes the semantics.  */
+	      if (dump_enabled_p ())
+		report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+				"reduction: unsafe int math optimization"
+				" (overflow doesn't wrap): ");
+	      return NULL;
+	    }
+	}
+      else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
 	{
 	  /* Changing the order of operations changes the semantics.  */
 	  if (dump_enabled_p ())
-	    report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			    "reduction: unsafe int math optimization"
-			    " (overflow doesn't wrap): ");
+	  report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+			  "reduction: unsafe fixed-point math optimization: ");
 	  return NULL;
 	}
     }
-  else if (SAT_FIXED_POINT_TYPE_P (type) && check_reduction)
-    {
-      /* Changing the order of operations changes the semantics.  */
-      if (dump_enabled_p ())
-	report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
-			"reduction: unsafe fixed-point math optimization: ");
-      return NULL;
-    }
 
   /* If we detected "res -= x[i]" earlier, rewrite it into
      "res += -x[i]" now.  If this turns out to be useless reassoc
      will clean it up again.  */
   if (orig_code == MINUS_EXPR)
     {
       tree rhs = gimple_assign_rhs2 (def_stmt);
       tree negrhs = make_ssa_name (TREE_TYPE (rhs));
       gimple *negate_stmt = gimple_build_assign (negrhs, NEGATE_EXPR, rhs);
       gimple_stmt_iterator gsi = gsi_for_stmt (def_stmt);
       set_vinfo_for_stmt (negate_stmt, new_stmt_vec_info (negate_stmt, 
 							  loop_info, NULL));
       gsi_insert_before (&gsi, negate_stmt, GSI_NEW_STMT);
       gimple_assign_set_rhs2 (def_stmt, negrhs);
       gimple_assign_set_rhs_code (def_stmt, PLUS_EXPR);
       update_stmt (def_stmt);
     }
 
   /* Reduction is safe. We're dealing with one of the following:
      1) integer arithmetic and no trapv
      2) floating point arithmetic, and special flags permit this optimization
      3) nested cycle (i.e., outer loop vectorization).  */
   if (TREE_CODE (op1) == SSA_NAME)
     def1 = SSA_NAME_DEF_STMT (op1);
 
   if (TREE_CODE (op2) == SSA_NAME)
     def2 = SSA_NAME_DEF_STMT (op2);
 
   if (code != COND_EXPR
       && ((!def1 || gimple_nop_p (def1)) && (!def2 || gimple_nop_p (def2))))
     {
       if (dump_enabled_p ())
 	report_vect_op (MSG_NOTE, def_stmt, "reduction: no defs for operands: ");
       return NULL;
     }
 
   /* Check that one def is the reduction def, defined by PHI,
      the other def is either defined in the loop ("vect_internal_def"),
      or it's an induction (defined by a loop-header phi-node).  */
 
   if (def2 && def2 == phi
       && (code == COND_EXPR
 	  || !def1 || gimple_nop_p (def1)
 	  || !flow_bb_inside_loop_p (loop, gimple_bb (def1))
           || (def1 && flow_bb_inside_loop_p (loop, gimple_bb (def1))
               && (is_gimple_assign (def1)
 		  || is_gimple_call (def1)
   	          || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1))
                       == vect_induction_def
    	          || (gimple_code (def1) == GIMPLE_PHI
 	              && STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1))
                           == vect_internal_def
  	              && !is_loop_header_bb_p (gimple_bb (def1)))))))
     {
       if (dump_enabled_p ())
 	report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
       return def_stmt;
     }
 
   if (def1 && def1 == phi
       && (code == COND_EXPR
 	  || !def2 || gimple_nop_p (def2)
 	  || !flow_bb_inside_loop_p (loop, gimple_bb (def2))
           || (def2 && flow_bb_inside_loop_p (loop, gimple_bb (def2))
  	      && (is_gimple_assign (def2)
 		  || is_gimple_call (def2)
 	          || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2))
                       == vect_induction_def
  	          || (gimple_code (def2) == GIMPLE_PHI
 		      && STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2))
                           == vect_internal_def
 		      && !is_loop_header_bb_p (gimple_bb (def2)))))))
     {
       if (check_reduction)
         {
+	  if (code == COND_EXPR)
+	    {
+	      /* No current known use where this case would be useful.  */
+	      if (dump_enabled_p ())
+		report_vect_op (MSG_NOTE, def_stmt,
+				"detected reduction: cannot currently swap "
+				"operands for cond_expr");
+	      return NULL;
+	    }
+
           /* Swap operands (just for simplicity - so that the rest of the code
 	     can assume that the reduction variable is always the last (second)
 	     argument).  */
           if (dump_enabled_p ())
 	    report_vect_op (MSG_NOTE, def_stmt,
 	  	            "detected reduction: need to swap operands: ");
 
           swap_ssa_operands (def_stmt, gimple_assign_rhs1_ptr (def_stmt),
  			     gimple_assign_rhs2_ptr (def_stmt));
 
 	  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (def_stmt)))
 	    LOOP_VINFO_OPERANDS_SWAPPED (loop_info) = true;
         }
       else
         {
           if (dump_enabled_p ())
             report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
         }
 
       return def_stmt;
     }
 
   /* Try to find SLP reduction chain.  */
-  if (check_reduction && vect_is_slp_reduction (loop_info, phi, def_stmt))
+  if (check_reduction && code != COND_EXPR
+      && vect_is_slp_reduction (loop_info, phi, def_stmt))
     {
       if (dump_enabled_p ())
         report_vect_op (MSG_NOTE, def_stmt,
 			"reduction: detected reduction chain: ");
 
       return def_stmt;
     }
 
   if (dump_enabled_p ())
     report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
 		    "reduction: unknown pattern: ");
        
   return NULL;
 }
 
 /* Wrapper around vect_is_simple_reduction_1, that won't modify code
    in-place.  Arguments as there.  */
 
 static gimple *
 vect_is_simple_reduction (loop_vec_info loop_info, gimple *phi,
 			  bool check_reduction, bool *double_reduc,
-			  bool need_wrapping_integral_overflow)
+			  bool need_wrapping_integral_overflow,
+			  enum vect_reduction_type *v_reduc_type)
 {
   return vect_is_simple_reduction_1 (loop_info, phi, check_reduction,
 				     double_reduc, false,
-				     need_wrapping_integral_overflow);
+				     need_wrapping_integral_overflow,
+				     v_reduc_type);
 }
 
 /* Wrapper around vect_is_simple_reduction_1, which will modify code
    in-place if it enables detection of more reductions.  Arguments
    as there.  */
 
 gimple *
 vect_force_simple_reduction (loop_vec_info loop_info, gimple *phi,
 			     bool check_reduction, bool *double_reduc,
 			     bool need_wrapping_integral_overflow)
 {
+  enum vect_reduction_type v_reduc_type;
   return vect_is_simple_reduction_1 (loop_info, phi, check_reduction,
 				     double_reduc, true,
-				     need_wrapping_integral_overflow);
+				     need_wrapping_integral_overflow,
+				     &v_reduc_type);
 }
 
 /* Calculate cost of peeling the loop PEEL_ITERS_PROLOGUE times.  */
 int
 vect_get_known_peeling_cost (loop_vec_info loop_vinfo, int peel_iters_prologue,
                              int *peel_iters_epilogue,
                              stmt_vector_for_cost *scalar_cost_vec,
 			     stmt_vector_for_cost *prologue_cost_vec,
 			     stmt_vector_for_cost *epilogue_cost_vec)
 {
   int retval = 0;
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
 
   if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
     {
       *peel_iters_epilogue = vf/2;
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location,
 			 "cost model: epilogue peel iters set to vf/2 "
 			 "because loop iterations are unknown .\n");
 
       /* If peeled iterations are known but number of scalar loop
          iterations are unknown, count a taken branch per peeled loop.  */
       retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
 				 NULL, 0, vect_prologue);
       retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
 				 NULL, 0, vect_epilogue);
     }
   else
     {
       int niters = LOOP_VINFO_INT_NITERS (loop_vinfo);
       peel_iters_prologue = niters < peel_iters_prologue ?
                             niters : peel_iters_prologue;
       *peel_iters_epilogue = (niters - peel_iters_prologue) % vf;
       /* If we need to peel for gaps, but no peeling is required, we have to
 	 peel VF iterations.  */
       if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && !*peel_iters_epilogue)
         *peel_iters_epilogue = vf;
     }
 
   stmt_info_for_cost *si;
   int j;
   if (peel_iters_prologue)
     FOR_EACH_VEC_ELT (*scalar_cost_vec, j, si)
       retval += record_stmt_cost (prologue_cost_vec,
 				  si->count * peel_iters_prologue,
 				  si->kind, NULL, si->misalign,
 				  vect_prologue);
   if (*peel_iters_epilogue)
     FOR_EACH_VEC_ELT (*scalar_cost_vec, j, si)
@@ -3231,146 +3262,175 @@ get_reduction_op (gimple *stmt, int reduc_index)
   switch (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)))
     {
     case GIMPLE_SINGLE_RHS:
       gcc_assert (TREE_OPERAND_LENGTH (gimple_assign_rhs1 (stmt))
 		  == ternary_op);
       return TREE_OPERAND (gimple_assign_rhs1 (stmt), reduc_index);
     case GIMPLE_UNARY_RHS:
       return gimple_assign_rhs1 (stmt);
     case GIMPLE_BINARY_RHS:
       return (reduc_index
 	      ? gimple_assign_rhs2 (stmt) : gimple_assign_rhs1 (stmt));
     case GIMPLE_TERNARY_RHS:
       return gimple_op (stmt, reduc_index + 1);
     default:
       gcc_unreachable ();
     }
 }
 
 /* TODO: Close dependency between vect_model_*_cost and vectorizable_*
    functions. Design better to avoid maintenance issues.  */
 
 /* Function vect_model_reduction_cost.
 
    Models cost for a reduction operation, including the vector ops
    generated within the strip-mine loop, the initial definition before
    the loop, and the epilogue code that must be generated.  */
 
 static bool
 vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
 			   int ncopies, int reduc_index)
 {
   int prologue_cost = 0, epilogue_cost = 0;
   enum tree_code code;
   optab optab;
   tree vectype;
   gimple *stmt, *orig_stmt;
   tree reduction_op;
   machine_mode mode;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = NULL;
   void *target_cost_data;
 
   if (loop_vinfo)
     {
       loop = LOOP_VINFO_LOOP (loop_vinfo);
       target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
     }
   else
     target_cost_data = BB_VINFO_TARGET_COST_DATA (STMT_VINFO_BB_VINFO (stmt_info));
 
+  /* Condition reductions generate two reductions in the loop.  */
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+    ncopies *= 2;
+
   /* Cost of reduction op inside loop.  */
   unsigned inside_cost = add_stmt_cost (target_cost_data, ncopies, vector_stmt,
 					stmt_info, 0, vect_body);
   stmt = STMT_VINFO_STMT (stmt_info);
 
   reduction_op = get_reduction_op (stmt, reduc_index);
 
   vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
   if (!vectype)
     {
       if (dump_enabled_p ())
         {
 	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			   "unsupported data-type ");
           dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
 			     TREE_TYPE (reduction_op));
           dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
         }
       return false;
    }
 
   mode = TYPE_MODE (vectype);
   orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
 
   if (!orig_stmt)
     orig_stmt = STMT_VINFO_STMT (stmt_info);
 
   code = gimple_assign_rhs_code (orig_stmt);
 
-  /* Add in cost for initial definition.  */
-  prologue_cost += add_stmt_cost (target_cost_data, 1, scalar_to_vec,
-				  stmt_info, 0, vect_prologue);
+  /* Add in cost for initial definition.
+     For cond reduction we have four vectors: initial index, step, initial
+     result of the data reduction, initial value of the index reduction.  */
+  int prologue_stmts = STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+		       == COND_REDUCTION ? 4 : 1;
+  prologue_cost += add_stmt_cost (target_cost_data, prologue_stmts,
+				  scalar_to_vec, stmt_info, 0,
+				  vect_prologue);
 
   /* Determine cost of epilogue code.
 
      We have a reduction operator that will reduce the vector in one statement.
      Also requires scalar extract.  */
 
   if (!loop || !nested_in_vect_loop_p (loop, orig_stmt))
     {
       if (reduc_code != ERROR_MARK)
 	{
-	  epilogue_cost += add_stmt_cost (target_cost_data, 1, vector_stmt,
-					  stmt_info, 0, vect_epilogue);
-	  epilogue_cost += add_stmt_cost (target_cost_data, 1, vec_to_scalar,
-					  stmt_info, 0, vect_epilogue);
+	  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+	    {
+	      /* An EQ stmt and an COND_EXPR stmt.  */
+	      epilogue_cost += add_stmt_cost (target_cost_data, 2,
+					      vector_stmt, stmt_info, 0,
+					      vect_epilogue);
+	      /* Reduction of the max index and a reduction of the found
+		 values.  */
+	      epilogue_cost += add_stmt_cost (target_cost_data, 2,
+					      vec_to_scalar, stmt_info, 0,
+					      vect_epilogue);
+	      /* A broadcast of the max value.  */
+	      epilogue_cost += add_stmt_cost (target_cost_data, 1,
+					      scalar_to_vec, stmt_info, 0,
+					      vect_epilogue);
+	    }
+	  else
+	    {
+	      epilogue_cost += add_stmt_cost (target_cost_data, 1, vector_stmt,
+					      stmt_info, 0, vect_epilogue);
+	      epilogue_cost += add_stmt_cost (target_cost_data, 1,
+					      vec_to_scalar, stmt_info, 0,
+					      vect_epilogue);
+	    }
 	}
       else
 	{
 	  int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype));
 	  tree bitsize =
 	    TYPE_SIZE (TREE_TYPE (gimple_assign_lhs (orig_stmt)));
 	  int element_bitsize = tree_to_uhwi (bitsize);
 	  int nelements = vec_size_in_bits / element_bitsize;
 
 	  optab = optab_for_tree_code (code, vectype, optab_default);
 
 	  /* We have a whole vector shift available.  */
 	  if (VECTOR_MODE_P (mode)
 	      && optab_handler (optab, mode) != CODE_FOR_nothing
 	      && have_whole_vector_shift (mode))
 	    {
 	      /* Final reduction via vector shifts and the reduction operator.
 		 Also requires scalar extract.  */
 	      epilogue_cost += add_stmt_cost (target_cost_data,
 					      exact_log2 (nelements) * 2,
 					      vector_stmt, stmt_info, 0,
 					      vect_epilogue);
 	      epilogue_cost += add_stmt_cost (target_cost_data, 1,
 					      vec_to_scalar, stmt_info, 0,
 					      vect_epilogue);
 	    }	  
 	  else
 	    /* Use extracts and reduction op for final reduction.  For N
 	       elements, we have N extracts and N-1 reduction ops.  */
 	    epilogue_cost += add_stmt_cost (target_cost_data, 
 					    nelements + nelements - 1,
 					    vector_stmt, stmt_info, 0,
 					    vect_epilogue);
 	}
     }
 
   if (dump_enabled_p ())
     dump_printf (MSG_NOTE, 
                  "vect_model_reduction_cost: inside_cost = %d, "
                  "prologue_cost = %d, epilogue_cost = %d .\n", inside_cost,
                  prologue_cost, epilogue_cost);
 
   return true;
 }
 
 
 /* Function vect_model_induction_cost.
 
    Models cost for induction operations.  */
 
@@ -3883,184 +3943,189 @@ get_initial_def_for_reduction (gimple *stmt, tree init_val,
             else
               *adjustment_def = init_val;
           }
 
         if (code == MULT_EXPR)
           {
             real_init_val = dconst1;
             int_init_val = 1;
           }
 
         if (code == BIT_AND_EXPR)
           int_init_val = -1;
 
         if (SCALAR_FLOAT_TYPE_P (scalar_type))
           def_for_init = build_real (scalar_type, real_init_val);
         else
           def_for_init = build_int_cst (scalar_type, int_init_val);
 
         /* Create a vector of '0' or '1' except the first element.  */
 	elts = XALLOCAVEC (tree, nunits);
         for (i = nunits - 2; i >= 0; --i)
 	  elts[i + 1] = def_for_init;
 
         /* Option1: the first element is '0' or '1' as well.  */
         if (adjustment_def)
           {
 	    elts[0] = def_for_init;
             init_def = build_vector (vectype, elts);
             break;
           }
 
         /* Option2: the first element is INIT_VAL.  */
 	elts[0] = init_val;
         if (TREE_CONSTANT (init_val))
           init_def = build_vector (vectype, elts);
         else
 	  {
 	    vec<constructor_elt, va_gc> *v;
 	    vec_alloc (v, nunits);
 	    CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
 	    for (i = 1; i < nunits; ++i)
 	      CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, elts[i]);
 	    init_def = build_constructor (vectype, v);
 	  }
 
         break;
 
       case MIN_EXPR:
       case MAX_EXPR:
       case COND_EXPR:
-        if (adjustment_def)
+	if (adjustment_def)
           {
-            *adjustment_def = NULL_TREE;
-            init_def = vect_get_vec_def_for_operand (init_val, stmt, NULL);
-            break;
-          }
+	    *adjustment_def = NULL_TREE;
 
+	    if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_vinfo) != COND_REDUCTION)
+	      {
+		init_def = vect_get_vec_def_for_operand (init_val, stmt, NULL);
+		break;
+	      }
+	  }
 	init_def = build_vector_from_val (vectype, init_value);
-        break;
+	break;
 
       default:
         gcc_unreachable ();
     }
 
   return init_def;
 }
 
 /* Function vect_create_epilog_for_reduction
 
    Create code at the loop-epilog to finalize the result of a reduction
    computation. 
   
    VECT_DEFS is list of vector of partial results, i.e., the lhs's of vector 
      reduction statements. 
    STMT is the scalar reduction stmt that is being vectorized.
    NCOPIES is > 1 in case the vectorization factor (VF) is bigger than the
      number of elements that we can fit in a vectype (nunits).  In this case
      we have to generate more than one vector stmt - i.e - we need to "unroll"
      the vector stmt by a factor VF/nunits.  For more details see documentation
      in vectorizable_operation.
    REDUC_CODE is the tree-code for the epilog reduction.
    REDUCTION_PHIS is a list of the phi-nodes that carry the reduction 
      computation.
    REDUC_INDEX is the index of the operand in the right hand side of the 
      statement that is defined by REDUCTION_PHI.
    DOUBLE_REDUC is TRUE if double reduction phi nodes should be handled.
    SLP_NODE is an SLP node containing a group of reduction statements. The 
      first one in this group is STMT.
+   INDUCTION_INDEX is the index of the loop for condition reductions.
+     Otherwise it is undefined.
 
    This function:
    1. Creates the reduction def-use cycles: sets the arguments for 
       REDUCTION_PHIS:
       The loop-entry argument is the vectorized initial-value of the reduction.
       The loop-latch argument is taken from VECT_DEFS - the vector of partial 
       sums.
    2. "Reduces" each vector of partial results VECT_DEFS into a single result,
       by applying the operation specified by REDUC_CODE if available, or by 
       other means (whole-vector shifts or a scalar loop).
       The function also creates a new phi node at the loop exit to preserve
       loop-closed form, as illustrated below.
 
      The flow at the entry to this function:
 
         loop:
           vec_def = phi <null, null>            # REDUCTION_PHI
           VECT_DEF = vector_stmt                # vectorized form of STMT
           s_loop = scalar_stmt                  # (scalar) STMT
         loop_exit:
           s_out0 = phi <s_loop>                 # (scalar) EXIT_PHI
           use <s_out0>
           use <s_out0>
 
      The above is transformed by this function into:
 
         loop:
           vec_def = phi <vec_init, VECT_DEF>    # REDUCTION_PHI
           VECT_DEF = vector_stmt                # vectorized form of STMT
           s_loop = scalar_stmt                  # (scalar) STMT
         loop_exit:
           s_out0 = phi <s_loop>                 # (scalar) EXIT_PHI
           v_out1 = phi <VECT_DEF>               # NEW_EXIT_PHI
           v_out2 = reduce <v_out1>
           s_out3 = extract_field <v_out2, 0>
           s_out4 = adjust_result <s_out3>
           use <s_out4>
           use <s_out4>
 */
 
 static void
 vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple *stmt,
 				  int ncopies, enum tree_code reduc_code,
 				  vec<gimple *> reduction_phis,
                                   int reduc_index, bool double_reduc, 
-                                  slp_tree slp_node)
+				  slp_tree slp_node, tree induction_index)
 {
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   stmt_vec_info prev_phi_info;
   tree vectype;
   machine_mode mode;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo), *outer_loop = NULL;
   basic_block exit_bb;
   tree scalar_dest;
   tree scalar_type;
   gimple *new_phi = NULL, *phi;
   gimple_stmt_iterator exit_gsi;
   tree vec_dest;
   tree new_temp = NULL_TREE, new_dest, new_name, new_scalar_dest;
   gimple *epilog_stmt = NULL;
   enum tree_code code = gimple_assign_rhs_code (stmt);
   gimple *exit_phi;
   tree bitsize;
   tree adjustment_def = NULL;
   tree vec_initial_def = NULL;
   tree reduction_op, expr, def;
   tree orig_name, scalar_result;
   imm_use_iterator imm_iter, phi_imm_iter;
   use_operand_p use_p, phi_use_p;
   gimple *use_stmt, *orig_stmt, *reduction_phi = NULL;
   bool nested_in_vect_loop = false;
   auto_vec<gimple *> new_phis;
   auto_vec<gimple *> inner_phis;
   enum vect_def_type dt = vect_unknown_def_type;
   int j, i;
   auto_vec<tree> scalar_results;
   unsigned int group_size = 1, k, ratio;
   auto_vec<tree> vec_initial_defs;
   auto_vec<gimple *> phis;
   bool slp_reduc = false;
   tree new_phi_result;
   gimple *inner_phi = NULL;
 
   if (slp_node)
     group_size = SLP_TREE_SCALAR_STMTS (slp_node).length (); 
 
   if (nested_in_vect_loop_p (loop, stmt))
     {
       outer_loop = loop;
       loop = loop->inner;
       nested_in_vect_loop = true;
       gcc_assert (!slp_node);
     }
 
   reduction_op = get_reduction_op (stmt, reduc_index);
@@ -4265,105 +4330,217 @@ vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple *stmt,
      loop - we don't need to extract a single scalar result at the end of the
      inner-loop (unless it is double reduction, i.e., the use of reduction is
      outside the outer-loop).  The final vector of partial results will be used
      in the vectorized outer-loop, or reduced to a scalar result at the end of
      the outer-loop.  */
   if (nested_in_vect_loop && !double_reduc)
     goto vect_finalize_reduction;
 
   /* SLP reduction without reduction chain, e.g.,
      # a1 = phi <a2, a0>
      # b1 = phi <b2, b0>
      a2 = operation (a1)
      b2 = operation (b1)  */
   slp_reduc = (slp_node && !GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)));
 
   /* In case of reduction chain, e.g.,
      # a1 = phi <a3, a0>
      a2 = operation (a1)
      a3 = operation (a2),
 
      we may end up with more than one vector result.  Here we reduce them to
      one vector.  */
   if (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)))
     {
       tree first_vect = PHI_RESULT (new_phis[0]);
       tree tmp;
       gassign *new_vec_stmt = NULL;
 
       vec_dest = vect_create_destination_var (scalar_dest, vectype);
       for (k = 1; k < new_phis.length (); k++)
         {
 	  gimple *next_phi = new_phis[k];
           tree second_vect = PHI_RESULT (next_phi);
 
           tmp = build2 (code, vectype,  first_vect, second_vect);
           new_vec_stmt = gimple_build_assign (vec_dest, tmp);
           first_vect = make_ssa_name (vec_dest, new_vec_stmt);
           gimple_assign_set_lhs (new_vec_stmt, first_vect);
           gsi_insert_before (&exit_gsi, new_vec_stmt, GSI_SAME_STMT);
         }
 
       new_phi_result = first_vect;
       if (new_vec_stmt)
         {
           new_phis.truncate (0);
           new_phis.safe_push (new_vec_stmt);
         }
     }
   else
     new_phi_result = PHI_RESULT (new_phis[0]);
- 
+
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+    {
+      /* For condition reductions, we have a vector (NEW_PHI_RESULT) containing
+	 various data values where the condition matched and another vector
+	 (INDUCTION_INDEX) containing all the indexes of those matches.  We
+	 need to extract the last matching index (which will be the index with
+	 highest value) and use this to index into the data vector.
+	 For the case where there were no matches, the data vector will contain
+	 all default values and the index vector will be all zeros.  */
+
+      /* Get various versions of the type of the vector of indexes.  */
+      tree index_vec_type = TREE_TYPE (induction_index);
+      gcc_checking_assert (TYPE_UNSIGNED (index_vec_type));
+      tree index_vec_type_signed = signed_type_for (index_vec_type);
+      tree index_scalar_type = TREE_TYPE (index_vec_type);
+
+      /* Get an unsigned integer version of the type of the data vector.  */
+      int scalar_precision = GET_MODE_PRECISION (TYPE_MODE (scalar_type));
+      tree scalar_type_unsigned = make_unsigned_type (scalar_precision);
+      tree vectype_unsigned = build_vector_type
+	(scalar_type_unsigned, TYPE_VECTOR_SUBPARTS (vectype));
+
+      /* First we need to create a vector (ZERO_VEC) of zeros and another
+	 vector (MAX_INDEX_VEC) filled with the last matching index, which we
+	 can create using a MAX reduction and then expanding.
+	 In the case where the loop never made any matches, the max index will
+	 be zero.  */
+
+      /* Vector of {0, 0, 0,...}.  */
+      tree zero_vec = make_ssa_name (vectype);
+      tree zero_vec_rhs = build_zero_cst (vectype);
+      gimple *zero_vec_stmt = gimple_build_assign (zero_vec, zero_vec_rhs);
+      gsi_insert_before (&exit_gsi, zero_vec_stmt, GSI_SAME_STMT);
+
+      /* Find maximum value from the vector of found indexes.  */
+      tree max_index = make_ssa_name (index_scalar_type);
+      gimple *max_index_stmt = gimple_build_assign (max_index, REDUC_MAX_EXPR,
+						    induction_index);
+      gsi_insert_before (&exit_gsi, max_index_stmt, GSI_SAME_STMT);
+
+      /* Vector of {max_index, max_index, max_index,...}.  */
+      tree max_index_vec = make_ssa_name (index_vec_type);
+      tree max_index_vec_rhs = build_vector_from_val (index_vec_type,
+						      max_index);
+      gimple *max_index_vec_stmt = gimple_build_assign (max_index_vec,
+							max_index_vec_rhs);
+      gsi_insert_before (&exit_gsi, max_index_vec_stmt, GSI_SAME_STMT);
+
+      /* Next we compare the new vector (MAX_INDEX_VEC) full of max indexes
+	 with the vector (INDUCTION_INDEX) of found indexes, choosing values
+	 from the data vector (NEW_PHI_RESULT) for matches, 0 (ZERO_VEC)
+	 otherwise.  Only one value should match, resulting in a vector
+	 (VEC_COND) with one data value and the rest zeros.
+	 In the case where the loop never made any matches, every index will
+	 match, resulting in a vector with all data values (which will all be
+	 the default value).  */
+
+      /* Compare the max index vector to the vector of found indexes to find
+	 the position of the max value.  */
+      tree vec_compare = make_ssa_name (index_vec_type_signed);
+      gimple *vec_compare_stmt = gimple_build_assign (vec_compare, EQ_EXPR,
+						      induction_index,
+						      max_index_vec);
+      gsi_insert_before (&exit_gsi, vec_compare_stmt, GSI_SAME_STMT);
+
+      /* Use the compare to choose either values from the data vector or
+	 zero.  */
+      tree vec_cond = make_ssa_name (vectype);
+      gimple *vec_cond_stmt = gimple_build_assign (vec_cond, VEC_COND_EXPR,
+						   vec_compare, new_phi_result,
+						   zero_vec);
+      gsi_insert_before (&exit_gsi, vec_cond_stmt, GSI_SAME_STMT);
+
+      /* Finally we need to extract the data value from the vector (VEC_COND)
+	 into a scalar (MATCHED_DATA_REDUC).  Logically we want to do a OR
+	 reduction, but because this doesn't exist, we can use a MAX reduction
+	 instead.  The data value might be signed or a float so we need to cast
+	 it first.
+	 In the case where the loop never made any matches, the data values are
+	 all identical, and so will reduce down correctly.  */
+
+      /* Make the matched data values unsigned.  */
+      tree vec_cond_cast = make_ssa_name (vectype_unsigned);
+      tree vec_cond_cast_rhs = build1 (VIEW_CONVERT_EXPR, vectype_unsigned,
+				       vec_cond);
+      gimple *vec_cond_cast_stmt = gimple_build_assign (vec_cond_cast,
+							VIEW_CONVERT_EXPR,
+							vec_cond_cast_rhs);
+      gsi_insert_before (&exit_gsi, vec_cond_cast_stmt, GSI_SAME_STMT);
+
+      /* Reduce down to a scalar value.  */
+      tree data_reduc = make_ssa_name (scalar_type_unsigned);
+      optab ot = optab_for_tree_code (REDUC_MAX_EXPR, vectype_unsigned,
+				      optab_default);
+      gcc_assert (optab_handler (ot, TYPE_MODE (vectype_unsigned))
+		  != CODE_FOR_nothing);
+      gimple *data_reduc_stmt = gimple_build_assign (data_reduc,
+						     REDUC_MAX_EXPR,
+						     vec_cond_cast);
+      gsi_insert_before (&exit_gsi, data_reduc_stmt, GSI_SAME_STMT);
+
+      /* Convert the reduced value back to the result type and set as the
+	 result.  */
+      tree data_reduc_cast = build1 (VIEW_CONVERT_EXPR, scalar_type,
+				     data_reduc);
+      epilog_stmt = gimple_build_assign (new_scalar_dest, data_reduc_cast);
+      new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
+      gimple_assign_set_lhs (epilog_stmt, new_temp);
+      gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+      scalar_results.safe_push (new_temp);
+    }
+
   /* 2.3 Create the reduction code, using one of the three schemes described
          above. In SLP we simply need to extract all the elements from the 
          vector (without reducing them), so we use scalar shifts.  */
-  if (reduc_code != ERROR_MARK && !slp_reduc)
+  else if (reduc_code != ERROR_MARK && !slp_reduc)
     {
       tree tmp;
       tree vec_elem_type;
 
       /*** Case 1:  Create:
            v_out2 = reduc_expr <v_out1>  */
 
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location,
 			 "Reduce using direct vector reduction.\n");
 
       vec_elem_type = TREE_TYPE (TREE_TYPE (new_phi_result));
       if (!useless_type_conversion_p (scalar_type, vec_elem_type))
 	{
           tree tmp_dest =
 	      vect_create_destination_var (scalar_dest, vec_elem_type);
 	  tmp = build1 (reduc_code, vec_elem_type, new_phi_result);
 	  epilog_stmt = gimple_build_assign (tmp_dest, tmp);
 	  new_temp = make_ssa_name (tmp_dest, epilog_stmt);
 	  gimple_assign_set_lhs (epilog_stmt, new_temp);
 	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
 
 	  tmp = build1 (NOP_EXPR, scalar_type, new_temp);
 	}
       else
 	tmp = build1 (reduc_code, scalar_type, new_phi_result);
       epilog_stmt = gimple_build_assign (new_scalar_dest, tmp);
       new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
       gimple_assign_set_lhs (epilog_stmt, new_temp);
       gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
       scalar_results.safe_push (new_temp);
     }
   else
     {
       bool reduce_with_shift = have_whole_vector_shift (mode);
       int element_bitsize = tree_to_uhwi (bitsize);
       int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype));
       tree vec_temp;
 
       /* Regardless of whether we have a whole vector shift, if we're
          emulating the operation via tree-vect-generic, we don't want
          to use it.  Only the first round of the reduction is likely
          to still be profitable via emulation.  */
       /* ??? It might be better to emit a reduction tree code here, so that
          tree-vect-generic can expand the first round via bit tricks.  */
       if (!VECTOR_MODE_P (mode))
         reduce_with_shift = false;
       else
         {
           optab optab = optab_for_tree_code (code, vectype, optab_default);
@@ -4791,175 +4968,185 @@ vect_finalize_reduction:
 	    {
 	      if (!is_gimple_debug (USE_STMT (use_p)))
 		phis.safe_push (USE_STMT (use_p));
 	    }
           else
             {
               if (double_reduc && gimple_code (USE_STMT (use_p)) == GIMPLE_PHI)
                 {
                   tree phi_res = PHI_RESULT (USE_STMT (use_p));
 
                   FOR_EACH_IMM_USE_FAST (phi_use_p, phi_imm_iter, phi_res)
                     {
                       if (!flow_bb_inside_loop_p (loop,
                                              gimple_bb (USE_STMT (phi_use_p)))
 			  && !is_gimple_debug (USE_STMT (phi_use_p)))
                         phis.safe_push (USE_STMT (phi_use_p));
                     }
                 }
             }
         }
 
       FOR_EACH_VEC_ELT (phis, i, exit_phi)
         {
           /* Replace the uses:  */
           orig_name = PHI_RESULT (exit_phi);
           scalar_result = scalar_results[k];
           FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
             FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
               SET_USE (use_p, scalar_result);
         }
 
       phis.release ();
     }
 }
 
 
 /* Function vectorizable_reduction.
 
    Check if STMT performs a reduction operation that can be vectorized.
    If VEC_STMT is also passed, vectorize the STMT: create a vectorized
    stmt to replace it, put it in VEC_STMT, and insert it at GSI.
    Return FALSE if not a vectorizable STMT, TRUE otherwise.
 
    This function also handles reduction idioms (patterns) that have been
    recognized in advance during vect_pattern_recog.  In this case, STMT may be
    of this form:
      X = pattern_expr (arg0, arg1, ..., X)
    and it's STMT_VINFO_RELATED_STMT points to the last stmt in the original
    sequence that had been detected and replaced by the pattern-stmt (STMT).
 
+   This function also handles reduction of condition expressions, for example:
+     for (int i = 0; i < N; i++)
+       if (a[i] < value)
+	 last = a[i];
+   This is handled by vectorising the loop and creating an additional vector
+   containing the loop indexes for which "a[i] < value" was true.  In the
+   function epilogue this is reduced to a single max value and then used to
+   index into the vector of results.
+
    In some cases of reduction patterns, the type of the reduction variable X is
    different than the type of the other arguments of STMT.
    In such cases, the vectype that is used when transforming STMT into a vector
    stmt is different than the vectype that is used to determine the
    vectorization factor, because it consists of a different number of elements
    than the actual number of elements that are being operated upon in parallel.
 
    For example, consider an accumulation of shorts into an int accumulator.
    On some targets it's possible to vectorize this pattern operating on 8
    shorts at a time (hence, the vectype for purposes of determining the
    vectorization factor should be V8HI); on the other hand, the vectype that
    is used to create the vector form is actually V4SI (the type of the result).
 
    Upon entry to this function, STMT_VINFO_VECTYPE records the vectype that
    indicates what is the actual level of parallelism (V8HI in the example), so
    that the right vectorization factor would be derived.  This vectype
    corresponds to the type of arguments to the reduction stmt, and should *NOT*
    be used to create the vectorized stmt.  The right vectype for the vectorized
    stmt is obtained from the type of the result X:
         get_vectype_for_scalar_type (TREE_TYPE (X))
 
    This means that, contrary to "regular" reductions (or "regular" stmts in
    general), the following equation:
       STMT_VINFO_VECTYPE == get_vectype_for_scalar_type (TREE_TYPE (X))
    does *NOT* necessarily hold for reduction patterns.  */
 
 bool
 vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
 			gimple **vec_stmt, slp_tree slp_node)
 {
   tree vec_dest;
   tree scalar_dest;
   tree loop_vec_def0 = NULL_TREE, loop_vec_def1 = NULL_TREE;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
   tree vectype_in = NULL_TREE;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   enum tree_code code, orig_code, epilog_reduc_code;
   machine_mode vec_mode;
   int op_type;
   optab optab, reduc_optab;
   tree new_temp = NULL_TREE;
   tree def;
   gimple *def_stmt;
   enum vect_def_type dt;
   gphi *new_phi = NULL;
   tree scalar_type;
   bool is_simple_use;
   gimple *orig_stmt;
   stmt_vec_info orig_stmt_info;
   tree expr = NULL_TREE;
   int i;
   int ncopies;
   int epilog_copies;
   stmt_vec_info prev_stmt_info, prev_phi_info;
   bool single_defuse_cycle = false;
   tree reduc_def = NULL_TREE;
   gimple *new_stmt = NULL;
   int j;
   tree ops[3];
   bool nested_cycle = false, found_nested_cycle_def = false;
   gimple *reduc_def_stmt = NULL;
   bool double_reduc = false, dummy;
   basic_block def_bb;
   struct loop * def_stmt_loop, *outer_loop = NULL;
   tree def_arg;
   gimple *def_arg_stmt;
   auto_vec<tree> vec_oprnds0;
   auto_vec<tree> vec_oprnds1;
   auto_vec<tree> vect_defs;
   auto_vec<gimple *> phis;
   int vec_num;
   tree def0, def1, tem, op0, op1 = NULL_TREE;
   bool first_p = true;
+  tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
 
   /* In case of reduction chain we switch to the first stmt in the chain, but
      we don't update STMT_INFO, since only the last stmt is marked as reduction
      and has reduction properties.  */
   if (GROUP_FIRST_ELEMENT (stmt_info)
       && GROUP_FIRST_ELEMENT (stmt_info) != stmt)
     {
       stmt = GROUP_FIRST_ELEMENT (stmt_info);
       first_p = false;
     }
 
   if (nested_in_vect_loop_p (loop, stmt))
     {
       outer_loop = loop;
       loop = loop->inner;
       nested_cycle = true;
     }
 
   /* 1. Is vectorizable reduction?  */
   /* Not supportable if the reduction variable is used in the loop, unless
      it's a reduction chain.  */
   if (STMT_VINFO_RELEVANT (stmt_info) > vect_used_in_outer
       && !GROUP_FIRST_ELEMENT (stmt_info))
     return false;
 
   /* Reductions that are not used even in an enclosing outer-loop,
      are expected to be "live" (used out of the loop).  */
   if (STMT_VINFO_RELEVANT (stmt_info) == vect_unused_in_scope
       && !STMT_VINFO_LIVE_P (stmt_info))
     return false;
 
   /* Make sure it was already recognized as a reduction computation.  */
   if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt)) != vect_reduction_def
       && STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt)) != vect_nested_cycle)
     return false;
 
   /* 2. Has this been recognized as a reduction pattern?
 
      Check if STMT represents a pattern that has been recognized
      in earlier analysis stages.  For stmts that represent a pattern,
      the STMT_VINFO_RELATED_STMT field records the last stmt in
      the original sequence that constitutes the pattern.  */
 
   orig_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt));
   if (orig_stmt)
     {
       orig_stmt_info = vinfo_for_stmt (orig_stmt);
       gcc_assert (STMT_VINFO_IN_PATTERN_P (orig_stmt_info));
       gcc_assert (!STMT_VINFO_IN_PATTERN_P (stmt_info));
     }
@@ -5035,134 +5222,141 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
     {
       /* The condition of COND_EXPR is checked in vectorizable_condition().  */
       if (i == 0 && code == COND_EXPR)
         continue;
 
       is_simple_use = vect_is_simple_use_1 (ops[i], stmt, loop_vinfo, NULL,
 					    &def_stmt, &def, &dt, &tem);
       if (!vectype_in)
 	vectype_in = tem;
       gcc_assert (is_simple_use);
 
       if (dt != vect_internal_def
 	  && dt != vect_external_def
 	  && dt != vect_constant_def
 	  && dt != vect_induction_def
           && !(dt == vect_nested_cycle && nested_cycle))
 	return false;
 
       if (dt == vect_nested_cycle)
         {
           found_nested_cycle_def = true;
           reduc_def_stmt = def_stmt;
           reduc_index = i;
         }
     }
 
   is_simple_use = vect_is_simple_use_1 (ops[i], stmt, loop_vinfo, NULL,
 					&def_stmt, &def, &dt, &tem);
   if (!vectype_in)
     vectype_in = tem;
   gcc_assert (is_simple_use);
   if (!found_nested_cycle_def)
     reduc_def_stmt = def_stmt;
 
   if (reduc_def_stmt && gimple_code (reduc_def_stmt) != GIMPLE_PHI)
     return false;
 
   if (!(dt == vect_reduction_def
 	|| dt == vect_nested_cycle
 	|| ((dt == vect_internal_def || dt == vect_external_def
 	     || dt == vect_constant_def || dt == vect_induction_def)
 	    && nested_cycle && found_nested_cycle_def)))
     {
       /* For pattern recognized stmts, orig_stmt might be a reduction,
 	 but some helper statements for the pattern might not, or
 	 might be COND_EXPRs with reduction uses in the condition.  */
       gcc_assert (orig_stmt);
       return false;
     }
 
-  gimple *tmp = vect_is_simple_reduction (loop_vinfo, reduc_def_stmt,
-					  !nested_cycle, &dummy, false);
+  gimple *tmp = vect_is_simple_reduction
+		  (loop_vinfo, reduc_def_stmt,
+		  !nested_cycle, &dummy, false,
+		  &STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info));
   if (orig_stmt)
     gcc_assert (tmp == orig_stmt
 		|| GROUP_FIRST_ELEMENT (vinfo_for_stmt (tmp)) == orig_stmt);
   else
     /* We changed STMT to be the first stmt in reduction chain, hence we
        check that in this case the first element in the chain is STMT.  */
     gcc_assert (stmt == tmp
 		|| GROUP_FIRST_ELEMENT (vinfo_for_stmt (tmp)) == stmt);
 
   if (STMT_VINFO_LIVE_P (vinfo_for_stmt (reduc_def_stmt)))
     return false;
 
   if (slp_node || PURE_SLP_STMT (stmt_info))
     ncopies = 1;
   else
     ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
                / TYPE_VECTOR_SUBPARTS (vectype_in));
 
   gcc_assert (ncopies >= 1);
 
   vec_mode = TYPE_MODE (vectype_in);
 
   if (code == COND_EXPR)
     {
+      /* Ensure we don't lose the type when calling vectorizable_condition.  */
+      enum stmt_vec_info_type backup_type = STMT_VINFO_TYPE (stmt_info);
+
       if (!vectorizable_condition (stmt, gsi, NULL, ops[reduc_index], 0, NULL))
         {
           if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "unsupported condition in reduction\n");
-
+	  STMT_VINFO_TYPE (stmt_info) = backup_type;
 	  return false;
         }
+
+      STMT_VINFO_TYPE (stmt_info) = backup_type;
     }
   else
     {
       /* 4. Supportable by target?  */
 
       if (code == LSHIFT_EXPR || code == RSHIFT_EXPR
 	  || code == LROTATE_EXPR || code == RROTATE_EXPR)
 	{
 	  /* Shifts and rotates are only supported by vectorizable_shifts,
 	     not vectorizable_reduction.  */
           if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "unsupported shift or rotation.\n");
 	  return false;
 	}
 
       /* 4.1. check support for the operation in the loop  */
       optab = optab_for_tree_code (code, vectype_in, optab_default);
       if (!optab)
         {
           if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "no optab.\n");
 
           return false;
         }
 
       if (optab_handler (optab, vec_mode) == CODE_FOR_nothing)
         {
           if (dump_enabled_p ())
             dump_printf (MSG_NOTE, "op not supported by target.\n");
 
           if (GET_MODE_SIZE (vec_mode) != UNITS_PER_WORD
               || LOOP_VINFO_VECT_FACTOR (loop_vinfo)
 	          < vect_min_worthwhile_factor (code))
             return false;
 
           if (dump_enabled_p ())
   	    dump_printf (MSG_NOTE, "proceeding using word mode.\n");
         }
 
       /* Worthwhile without SIMD support?  */
       if (!VECTOR_MODE_P (TYPE_MODE (vectype_in))
           && LOOP_VINFO_VECT_FACTOR (loop_vinfo)
    	     < vect_min_worthwhile_factor (code))
         {
           if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "not worthwhile without SIMD support.\n");
 
@@ -5190,166 +5384,219 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
              above we want to use 'widen_sum' in the loop, but 'plus' in the
              epilog.
           2. The type (mode) we use to check available target support
              for the vector operation to be created in the *epilog*, is
              determined by the type of the reduction variable (in the example
              above we'd check this: optab_handler (plus_optab, vect_int_mode])).
              However the type (mode) we use to check available target support
              for the vector operation to be created *inside the loop*, is
              determined by the type of the other arguments to STMT (in the
              example we'd check this: optab_handler (widen_sum_optab,
 	     vect_short_mode)).
 
           This is contrary to "regular" reductions, in which the types of all
           the arguments are the same as the type of the reduction variable.
           For "regular" reductions we can therefore use the same vector type
           (and also the same tree-code) when generating the epilog code and
           when generating the code inside the loop.  */
 
   if (orig_stmt)
     {
       /* This is a reduction pattern: get the vectype from the type of the
          reduction variable, and get the tree-code from orig_stmt.  */
       orig_code = gimple_assign_rhs_code (orig_stmt);
       gcc_assert (vectype_out);
       vec_mode = TYPE_MODE (vectype_out);
     }
   else
     {
       /* Regular reduction: use the same vectype and tree-code as used for
          the vector code inside the loop can be used for the epilog code. */
       orig_code = code;
     }
 
   if (nested_cycle)
     {
       def_bb = gimple_bb (reduc_def_stmt);
       def_stmt_loop = def_bb->loop_father;
       def_arg = PHI_ARG_DEF_FROM_EDGE (reduc_def_stmt,
                                        loop_preheader_edge (def_stmt_loop));
       if (TREE_CODE (def_arg) == SSA_NAME
           && (def_arg_stmt = SSA_NAME_DEF_STMT (def_arg))
           && gimple_code (def_arg_stmt) == GIMPLE_PHI
           && flow_bb_inside_loop_p (outer_loop, gimple_bb (def_arg_stmt))
           && vinfo_for_stmt (def_arg_stmt)
           && STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def_arg_stmt))
               == vect_double_reduction_def)
         double_reduc = true;
     }
 
   epilog_reduc_code = ERROR_MARK;
-  if (reduction_code_for_scalar_code (orig_code, &epilog_reduc_code))
+
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == TREE_CODE_REDUCTION)
     {
-      reduc_optab = optab_for_tree_code (epilog_reduc_code, vectype_out,
+      if (reduction_code_for_scalar_code (orig_code, &epilog_reduc_code))
+	{
+	  reduc_optab = optab_for_tree_code (epilog_reduc_code, vectype_out,
                                          optab_default);
-      if (!reduc_optab)
-        {
-          if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "no optab for reduction.\n");
-
-          epilog_reduc_code = ERROR_MARK;
-        }
-      else if (optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing)
-        {
-          optab = scalar_reduc_to_vector (reduc_optab, vectype_out);
-          if (optab_handler (optab, vec_mode) == CODE_FOR_nothing)
-            {
-              if (dump_enabled_p ())
-	        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "reduc op not supported by target.\n");
+	  if (!reduc_optab)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "no optab for reduction.\n");
 
 	      epilog_reduc_code = ERROR_MARK;
 	    }
-        }
+	  else if (optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing)
+	    {
+	      optab = scalar_reduc_to_vector (reduc_optab, vectype_out);
+	      if (optab_handler (optab, vec_mode) == CODE_FOR_nothing)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				     "reduc op not supported by target.\n");
+
+		  epilog_reduc_code = ERROR_MARK;
+		}
+	    }
+	}
+      else
+	{
+	  if (!nested_cycle || double_reduc)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "no reduc code for scalar code.\n");
+
+	      return false;
+	    }
+	}
     }
   else
     {
-      if (!nested_cycle || double_reduc)
-        {
-          if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "no reduc code for scalar code.\n");
+      int scalar_precision = GET_MODE_PRECISION (TYPE_MODE (scalar_type));
+      cr_index_scalar_type = make_unsigned_type (scalar_precision);
+      cr_index_vector_type = build_vector_type
+	(cr_index_scalar_type, TYPE_VECTOR_SUBPARTS (vectype_out));
 
-          return false;
-        }
+      epilog_reduc_code = REDUC_MAX_EXPR;
+      optab = optab_for_tree_code (REDUC_MAX_EXPR, cr_index_vector_type,
+				   optab_default);
+      if (optab_handler (optab, TYPE_MODE (cr_index_vector_type))
+	  == CODE_FOR_nothing)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "reduc max op not supported by target.\n");
+	  return false;
+	}
     }
 
-  if (double_reduc && ncopies > 1)
+  if ((double_reduc
+       || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+      && ncopies > 1)
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "multiple types in double reduction\n");
-
+			 "multiple types in double reduction or condition "
+			 "reduction.\n");
       return false;
     }
 
   /* In case of widenning multiplication by a constant, we update the type
      of the constant to be the type of the other operand.  We check that the
      constant fits the type in the pattern recognition pass.  */
   if (code == DOT_PROD_EXPR
       && !types_compatible_p (TREE_TYPE (ops[0]), TREE_TYPE (ops[1])))
     {
       if (TREE_CODE (ops[0]) == INTEGER_CST)
         ops[0] = fold_convert (TREE_TYPE (ops[1]), ops[0]);
       else if (TREE_CODE (ops[1]) == INTEGER_CST)
         ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]);
       else
         {
           if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "invalid types in dot-prod\n");
 
           return false;
         }
     }
 
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+    {
+      widest_int ni;
+
+      if (! max_loop_iterations (loop, &ni))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "loop count not known, cannot create cond "
+			     "reduction.\n");
+	  return false;
+	}
+      /* Convert backedges to iterations.  */
+      ni += 1;
+
+      /* The additional index will be the same type as the condition.  Check
+	 that the loop can fit into this less one (because we'll use up the
+	 zero slot for when there are no matches).  */
+      tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
+      if (wi::geu_p (ni, wi::to_widest (max_index)))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "loop size is greater than data size.\n");
+	  return false;
+	}
+    }
+
   if (!vec_stmt) /* transformation not required.  */
     {
       if (first_p
 	  && !vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies,
 					 reduc_index))
         return false;
       STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
       return true;
     }
 
   /** Transform.  **/
 
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
 
   /* FORNOW: Multiple types are not supported for condition.  */
   if (code == COND_EXPR)
     gcc_assert (ncopies == 1);
 
   /* Create the destination vector  */
   vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
 
   /* In case the vectorization factor (VF) is bigger than the number
      of elements that we can fit in a vectype (nunits), we have to generate
      more than one vector stmt - i.e - we need to "unroll" the
      vector stmt by a factor VF/nunits.  For more details see documentation
      in vectorizable_operation.  */
 
   /* If the reduction is used in an outer loop we need to generate
      VF intermediate results, like so (e.g. for ncopies=2):
 	r0 = phi (init, r0)
 	r1 = phi (init, r1)
 	r0 = x0 + r0;
         r1 = x1 + r1;
     (i.e. we generate VF results in 2 registers).
     In this case we have a separate def-use cycle for each copy, and therefore
     for each copy we get the vector def for the reduction variable from the
     respective phi node created for this copy.
 
     Otherwise (the reduction is unused in the loop nest), we can combine
     together intermediate results, like so (e.g. for ncopies=2):
 	r = phi (init, r)
 	r = x0 + r;
 	r = x1 + r;
    (i.e. we generate VF/2 results in a single register).
    In this case for each copy we get the vector def for the reduction variable
    from the vectorized reduction operation generated in the previous iteration.
   */
 
   if (STMT_VINFO_RELEVANT (stmt_info) == vect_unused_in_scope)
@@ -5472,111 +5719,201 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
             }
 
           def1 = ((op_type == ternary_op)
                   ? vec_oprnds1[i] : NULL);
           if (op_type == binary_op)
             {
               if (reduc_index == 0)
                 expr = build2 (code, vectype_out, reduc_def, def0);
               else
                 expr = build2 (code, vectype_out, def0, reduc_def);
             }
           else
             {
               if (reduc_index == 0)
                 expr = build3 (code, vectype_out, reduc_def, def0, def1);
               else
                 {
                   if (reduc_index == 1)
                     expr = build3 (code, vectype_out, def0, reduc_def, def1);
                   else
                     expr = build3 (code, vectype_out, def0, def1, reduc_def);
                 }
             }
 
           new_stmt = gimple_build_assign (vec_dest, expr);
           new_temp = make_ssa_name (vec_dest, new_stmt);
           gimple_assign_set_lhs (new_stmt, new_temp);
           vect_finish_stmt_generation (stmt, new_stmt, gsi);
 
           if (slp_node)
             {
               SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
               vect_defs.quick_push (new_temp);
             }
           else
             vect_defs[0] = new_temp;
         }
 
       if (slp_node)
         continue;
 
       if (j == 0)
 	STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
       else
 	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
 
       prev_stmt_info = vinfo_for_stmt (new_stmt);
       prev_phi_info = vinfo_for_stmt (new_phi);
     }
 
+  tree indx_before_incr, indx_after_incr, cond_name = NULL;
+
   /* Finalize the reduction-phi (set its arguments) and create the
      epilog reduction code.  */
   if ((!single_defuse_cycle || code == COND_EXPR) && !slp_node)
     {
       new_temp = gimple_assign_lhs (*vec_stmt);
       vect_defs[0] = new_temp;
+
+      /* For cond reductions we want to create a new vector (INDEX_COND_EXPR)
+	 which is updated with the current index of the loop for every match of
+	 the original loop's cond_expr (VEC_STMT).  This results in a vector
+	 containing the last time the condition passed for that vector lane.
+	 The first match will be a 1 to allow 0 to be used for non-matching
+	 indexes.  If there are no matches at all then the vector will be all
+	 zeroes.  */
+      if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+	{
+	  int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
+	  int k;
+
+	  gcc_assert (gimple_assign_rhs_code (*vec_stmt) == VEC_COND_EXPR);
+
+	  /* First we create a simple vector induction variable which starts
+	     with the values {1,2,3,...} (SERIES_VECT) and increments by the
+	     vector size (STEP).  */
+
+	  /* Create a {1,2,3,...} vector.  */
+	  tree *vtemp = XALLOCAVEC (tree, nunits_out);
+	  for (k = 0; k < nunits_out; ++k)
+	    vtemp[k] = build_int_cst (cr_index_scalar_type, k + 1);
+	  tree series_vect = build_vector (cr_index_vector_type, vtemp);
+
+	  /* Create a vector of the step value.  */
+	  tree step = build_int_cst (cr_index_scalar_type, nunits_out);
+	  tree vec_step = build_vector_from_val (cr_index_vector_type, step);
+
+	  /* Create an induction variable.  */
+	  gimple_stmt_iterator incr_gsi;
+	  bool insert_after;
+	  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
+	  create_iv (series_vect, vec_step, NULL_TREE, loop, &incr_gsi,
+		     insert_after, &indx_before_incr, &indx_after_incr);
+
+	  /* Next create a new phi node vector (NEW_PHI_TREE) which starts
+	     filled with zeros (VEC_ZERO).  */
+
+	  /* Create a vector of 0s.  */
+	  tree zero = build_zero_cst (cr_index_scalar_type);
+	  tree vec_zero = build_vector_from_val (cr_index_vector_type, zero);
+
+	  /* Create a vector phi node.  */
+	  tree new_phi_tree = make_ssa_name (cr_index_vector_type);
+	  new_phi = create_phi_node (new_phi_tree, loop->header);
+	  set_vinfo_for_stmt (new_phi, new_stmt_vec_info (new_phi, loop_vinfo,
+							  NULL));
+	  add_phi_arg (new_phi, vec_zero, loop_preheader_edge (loop),
+		       UNKNOWN_LOCATION);
+
+	  /* Now take the condition from the loops original cond_expr
+	     (VEC_STMT) and produce a new cond_expr (INDEX_COND_EXPR) which for
+	     every match uses values from the induction variable
+	     (INDEX_BEFORE_INCR) otherwise uses values from the phi node
+	     (NEW_PHI_TREE).
+	     Finally, we update the phi (NEW_PHI_TREE) to take the value of
+	     the new cond_expr (INDEX_COND_EXPR).  */
+
+	  /* Turn the condition from vec_stmt into an ssa name.  */
+	  gimple_stmt_iterator vec_stmt_gsi = gsi_for_stmt (*vec_stmt);
+	  tree ccompare = gimple_assign_rhs1 (*vec_stmt);
+	  tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
+	  gimple *ccompare_stmt = gimple_build_assign (ccompare_name,
+						       ccompare);
+	  gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT);
+	  gimple_assign_set_rhs1 (*vec_stmt, ccompare_name);
+	  update_stmt (*vec_stmt);
+
+	  /* Create a conditional, where the condition is taken from vec_stmt
+	     (CCOMPARE_NAME), then is the induction index (INDEX_BEFORE_INCR)
+	     and else is the phi (NEW_PHI_TREE).  */
+	  tree index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type,
+					 ccompare_name, indx_before_incr,
+					 new_phi_tree);
+	  cond_name = make_ssa_name (cr_index_vector_type);
+	  gimple *index_condition = gimple_build_assign (cond_name,
+							 index_cond_expr);
+	  gsi_insert_before (&incr_gsi, index_condition, GSI_SAME_STMT);
+	  stmt_vec_info index_vec_info = new_stmt_vec_info (index_condition,
+							    loop_vinfo, NULL);
+	  STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
+	  set_vinfo_for_stmt (index_condition, index_vec_info);
+
+	  /* Update the phi with the vec cond.  */
+	  add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
+		       UNKNOWN_LOCATION);
+	}
     }
 
   vect_create_epilog_for_reduction (vect_defs, stmt, epilog_copies,
                                     epilog_reduc_code, phis, reduc_index,
-                                    double_reduc, slp_node);
+				    double_reduc, slp_node, cond_name);
 
   return true;
 }
 
 /* Function vect_min_worthwhile_factor.
 
    For a loop where we could vectorize the operation indicated by CODE,
    return the minimum vectorization factor that makes it worthwhile
    to use generic vectors.  */
 int
 vect_min_worthwhile_factor (enum tree_code code)
 {
   switch (code)
     {
     case PLUS_EXPR:
     case MINUS_EXPR:
     case NEGATE_EXPR:
       return 4;
 
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
     case BIT_XOR_EXPR:
     case BIT_NOT_EXPR:
       return 2;
 
     default:
       return INT_MAX;
     }
 }
 
 
 /* Function vectorizable_induction
 
    Check if PHI performs an induction computation that can be vectorized.
    If VEC_STMT is also passed, vectorize the induction PHI: create a vectorized
    phi to replace it, put it in VEC_STMT, and add it to the same basic block.
    Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
 
 bool
 vectorizable_induction (gimple *phi,
 			gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED,
 			gimple **vec_stmt)
 {
   stmt_vec_info stmt_info = vinfo_for_stmt (phi);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   int nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
   tree vec_def;
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index cb9e7e8..7a7f15c 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7266,115 +7266,118 @@ vect_is_simple_cond (tree cond, gimple *stmt, loop_vec_info loop_vinfo,
     return false;
 
   *comp_vectype = vectype1 ? vectype1 : vectype2;
   return true;
 }
 
 /* vectorizable_condition.
 
    Check if STMT is conditional modify expression that can be vectorized.
    If VEC_STMT is also passed, vectorize the STMT: create a vectorized
    stmt using VEC_COND_EXPR  to replace it, put it in VEC_STMT, and insert it
    at GSI.
 
    When STMT is vectorized as nested cycle, REDUC_DEF is the vector variable
    to be used at REDUC_INDEX (in then clause if REDUC_INDEX is 1, and in
    else clause if it is 2).
 
    Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
 
 bool
 vectorizable_condition (gimple *stmt, gimple_stmt_iterator *gsi,
 			gimple **vec_stmt, tree reduc_def, int reduc_index,
 			slp_tree slp_node)
 {
   tree scalar_dest = NULL_TREE;
   tree vec_dest = NULL_TREE;
   tree cond_expr, then_clause, else_clause;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   tree comp_vectype = NULL_TREE;
   tree vec_cond_lhs = NULL_TREE, vec_cond_rhs = NULL_TREE;
   tree vec_then_clause = NULL_TREE, vec_else_clause = NULL_TREE;
   tree vec_compare, vec_cond_expr;
   tree new_temp;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   tree def;
   enum vect_def_type dt, dts[4];
   int ncopies;
   enum tree_code code;
   stmt_vec_info prev_stmt_info = NULL;
   int i, j;
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   vec<tree> vec_oprnds0 = vNULL;
   vec<tree> vec_oprnds1 = vNULL;
   vec<tree> vec_oprnds2 = vNULL;
   vec<tree> vec_oprnds3 = vNULL;
   tree vec_cmp_type;
 
   if (reduc_index && STMT_SLP_TYPE (stmt_info))
     return false;
 
-  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
-    return false;
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == TREE_CODE_REDUCTION)
+    {
+      if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+	return false;
 
-  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
-      && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
-           && reduc_def))
-    return false;
+      if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
+	  && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+	       && reduc_def))
+	return false;
 
-  /* FORNOW: not yet supported.  */
-  if (STMT_VINFO_LIVE_P (stmt_info))
-    {
-      if (dump_enabled_p ())
-        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                         "value used after loop.\n");
-      return false;
+      /* FORNOW: not yet supported.  */
+      if (STMT_VINFO_LIVE_P (stmt_info))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "value used after loop.\n");
+	  return false;
+	}
     }
 
   /* Is vectorizable conditional operation?  */
   if (!is_gimple_assign (stmt))
     return false;
 
   code = gimple_assign_rhs_code (stmt);
 
   if (code != COND_EXPR)
     return false;
 
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   int nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
   if (slp_node || PURE_SLP_STMT (stmt_info))
     ncopies = 1;
   else
     ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
 
   gcc_assert (ncopies >= 1);
   if (reduc_index && ncopies > 1)
     return false; /* FORNOW */
 
   cond_expr = gimple_assign_rhs1 (stmt);
   then_clause = gimple_assign_rhs2 (stmt);
   else_clause = gimple_assign_rhs3 (stmt);
 
   if (!vect_is_simple_cond (cond_expr, stmt, loop_vinfo, bb_vinfo,
 			    &comp_vectype)
       || !comp_vectype)
     return false;
 
   if (TREE_CODE (then_clause) == SSA_NAME)
     {
       gimple *then_def_stmt = SSA_NAME_DEF_STMT (then_clause);
       if (!vect_is_simple_use (then_clause, stmt, loop_vinfo, bb_vinfo,
 			       &then_def_stmt, &def, &dt))
 	return false;
     }
   else if (TREE_CODE (then_clause) != INTEGER_CST
 	   && TREE_CODE (then_clause) != REAL_CST
 	   && TREE_CODE (then_clause) != FIXED_CST)
     return false;
 
   if (TREE_CODE (else_clause) == SSA_NAME)
     {
       gimple *else_def_stmt = SSA_NAME_DEF_STMT (else_clause);
       if (!vect_is_simple_use (else_clause, stmt, loop_vinfo, bb_vinfo,
 			       &else_def_stmt, &def, &dt))
 	return false;
@@ -7955,100 +7958,101 @@ vect_transform_stmt (gimple *stmt, gimple_stmt_iterator *gsi,
 
 void
 vect_remove_stores (gimple *first_stmt)
 {
   gimple *next = first_stmt;
   gimple *tmp;
   gimple_stmt_iterator next_si;
 
   while (next)
     {
       stmt_vec_info stmt_info = vinfo_for_stmt (next);
 
       tmp = GROUP_NEXT_ELEMENT (stmt_info);
       if (is_pattern_stmt_p (stmt_info))
 	next = STMT_VINFO_RELATED_STMT (stmt_info);
       /* Free the attached stmt_vec_info and remove the stmt.  */
       next_si = gsi_for_stmt (next);
       unlink_stmt_vdef (next);
       gsi_remove (&next_si, true);
       release_defs (next);
       free_stmt_vec_info (next);
       next = tmp;
     }
 }
 
 
 /* Function new_stmt_vec_info.
 
    Create and initialize a new stmt_vec_info struct for STMT.  */
 
 stmt_vec_info
 new_stmt_vec_info (gimple *stmt, loop_vec_info loop_vinfo,
                    bb_vec_info bb_vinfo)
 {
   stmt_vec_info res;
   res = (stmt_vec_info) xcalloc (1, sizeof (struct _stmt_vec_info));
 
   STMT_VINFO_TYPE (res) = undef_vec_info_type;
   STMT_VINFO_STMT (res) = stmt;
   STMT_VINFO_LOOP_VINFO (res) = loop_vinfo;
   STMT_VINFO_BB_VINFO (res) = bb_vinfo;
   STMT_VINFO_RELEVANT (res) = vect_unused_in_scope;
   STMT_VINFO_LIVE_P (res) = false;
   STMT_VINFO_VECTYPE (res) = NULL;
   STMT_VINFO_VEC_STMT (res) = NULL;
   STMT_VINFO_VECTORIZABLE (res) = true;
   STMT_VINFO_IN_PATTERN_P (res) = false;
   STMT_VINFO_RELATED_STMT (res) = NULL;
   STMT_VINFO_PATTERN_DEF_SEQ (res) = NULL;
   STMT_VINFO_DATA_REF (res) = NULL;
+  STMT_VINFO_VEC_REDUCTION_TYPE (res) = TREE_CODE_REDUCTION;
 
   STMT_VINFO_DR_BASE_ADDRESS (res) = NULL;
   STMT_VINFO_DR_OFFSET (res) = NULL;
   STMT_VINFO_DR_INIT (res) = NULL;
   STMT_VINFO_DR_STEP (res) = NULL;
   STMT_VINFO_DR_ALIGNED_TO (res) = NULL;
 
   if (gimple_code (stmt) == GIMPLE_PHI
       && is_loop_header_bb_p (gimple_bb (stmt)))
     STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type;
   else
     STMT_VINFO_DEF_TYPE (res) = vect_internal_def;
 
   STMT_VINFO_SAME_ALIGN_REFS (res).create (0);
   STMT_SLP_TYPE (res) = loop_vect;
   GROUP_FIRST_ELEMENT (res) = NULL;
   GROUP_NEXT_ELEMENT (res) = NULL;
   GROUP_SIZE (res) = 0;
   GROUP_STORE_COUNT (res) = 0;
   GROUP_GAP (res) = 0;
   GROUP_SAME_DR_STMT (res) = NULL;
 
   return res;
 }
 
 
 /* Create a hash table for stmt_vec_info. */
 
 void
 init_stmt_vec_info_vec (void)
 {
   gcc_assert (!stmt_vec_info_vec.exists ());
   stmt_vec_info_vec.create (50);
 }
 
 
 /* Free hash table for stmt_vec_info. */
 
 void
 free_stmt_vec_info_vec (void)
 {
   unsigned int i;
   vec_void_p info;
   FOR_EACH_VEC_ELT (stmt_vec_info_vec, i, info)
     if (info != NULL)
       free_stmt_vec_info (STMT_VINFO_STMT ((stmt_vec_info) info));
   gcc_assert (stmt_vec_info_vec.exists ());
   stmt_vec_info_vec.release ();
 }
 
@@ -8207,102 +8211,102 @@ get_same_sized_vectype (tree scalar_type, tree vector_type)
 
 /* Function vect_is_simple_use.
 
    Input:
    LOOP_VINFO - the vect info of the loop that is being vectorized.
    BB_VINFO - the vect info of the basic block that is being vectorized.
    OPERAND - operand of STMT in the loop or bb.
    DEF - the defining stmt in case OPERAND is an SSA_NAME.
 
    Returns whether a stmt with OPERAND can be vectorized.
    For loops, supportable operands are constants, loop invariants, and operands
    that are defined by the current iteration of the loop.  Unsupportable
    operands are those that are defined by a previous iteration of the loop (as
    is the case in reduction/induction computations).
    For basic blocks, supportable operands are constants and bb invariants.
    For now, operands defined outside the basic block are not supported.  */
 
 bool
 vect_is_simple_use (tree operand, gimple *stmt, loop_vec_info loop_vinfo,
                     bb_vec_info bb_vinfo, gimple **def_stmt,
 		    tree *def, enum vect_def_type *dt)
 {
   *def_stmt = NULL;
   *def = NULL_TREE;
   *dt = vect_unknown_def_type;
 
   if (dump_enabled_p ())
     {
       dump_printf_loc (MSG_NOTE, vect_location,
                        "vect_is_simple_use: operand ");
       dump_generic_expr (MSG_NOTE, TDF_SLIM, operand);
       dump_printf (MSG_NOTE, "\n");
     }
 
   if (CONSTANT_CLASS_P (operand))
     {
       *dt = vect_constant_def;
       return true;
     }
 
   if (is_gimple_min_invariant (operand))
     {
       *def = operand;
       *dt = vect_external_def;
       return true;
     }
 
   if (TREE_CODE (operand) != SSA_NAME)
     {
       if (dump_enabled_p ())
-        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                         "not ssa-name.\n");
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "not ssa-name.\n");
       return false;
     }
 
   if (SSA_NAME_IS_DEFAULT_DEF (operand))
     {
       *def = operand;
       *dt = vect_external_def;
       return true;
     }
 
   *def_stmt = SSA_NAME_DEF_STMT (operand);
   if (dump_enabled_p ())
     {
       dump_printf_loc (MSG_NOTE, vect_location, "def_stmt: ");
       dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
     }
 
   basic_block bb = gimple_bb (*def_stmt);
   if ((loop_vinfo && !flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo), bb))
       || (bb_vinfo
 	  && (bb != BB_VINFO_BB (bb_vinfo)
 	      || gimple_code (*def_stmt) == GIMPLE_PHI)))
     *dt = vect_external_def;
   else
     {
       stmt_vec_info stmt_vinfo = vinfo_for_stmt (*def_stmt);
       if (bb_vinfo && !STMT_VINFO_VECTORIZABLE (stmt_vinfo))
 	*dt = vect_external_def;
       else
 	*dt = STMT_VINFO_DEF_TYPE (stmt_vinfo);
     }
 
   if (dump_enabled_p ())
     {
       dump_printf_loc (MSG_NOTE, vect_location, "type of def: ");
       switch (*dt)
 	{
 	case vect_uninitialized_def:
 	  dump_printf (MSG_NOTE, "uninitialized\n");
 	  break;
 	case vect_constant_def:
 	  dump_printf (MSG_NOTE, "constant\n");
 	  break;
 	case vect_external_def:
 	  dump_printf (MSG_NOTE, "external\n");
 	  break;
 	case vect_internal_def:
 	  dump_printf (MSG_NOTE, "internal\n");
 	  break;
 	case vect_induction_def:
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index e4d1feb..e25d00f 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -13,100 +13,106 @@ GCC is distributed in the hope that it will be useful, but WITHOUT ANY
 WARRANTY; without even the implied warranty of MERCHANTABILITY or
 FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 for more details.
 
 You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 #ifndef GCC_TREE_VECTORIZER_H
 #define GCC_TREE_VECTORIZER_H
 
 #include "tree-data-ref.h"
 #include "target.h"
 
 /* Used for naming of new temporaries.  */
 enum vect_var_kind {
   vect_simple_var,
   vect_pointer_var,
   vect_scalar_var
 };
 
 /* Defines type of operation.  */
 enum operation_type {
   unary_op = 1,
   binary_op,
   ternary_op
 };
 
 /* Define type of available alignment support.  */
 enum dr_alignment_support {
   dr_unaligned_unsupported,
   dr_unaligned_supported,
   dr_explicit_realign,
   dr_explicit_realign_optimized,
   dr_aligned
 };
 
 /* Define type of def-use cross-iteration cycle.  */
 enum vect_def_type {
   vect_uninitialized_def = 0,
   vect_constant_def = 1,
   vect_external_def,
   vect_internal_def,
   vect_induction_def,
   vect_reduction_def,
   vect_double_reduction_def,
   vect_nested_cycle,
   vect_unknown_def_type
 };
 
+/* Define type of reduction.  */
+enum vect_reduction_type {
+  TREE_CODE_REDUCTION,
+  COND_REDUCTION
+};
+
 #define VECTORIZABLE_CYCLE_DEF(D) (((D) == vect_reduction_def)           \
                                    || ((D) == vect_double_reduction_def) \
                                    || ((D) == vect_nested_cycle))
 
 /* Structure to encapsulate information about a group of like
    instructions to be presented to the target cost model.  */
 struct stmt_info_for_cost {
   int count;
   enum vect_cost_for_stmt kind;
   gimple *stmt;
   int misalign;
 };
 
 
 typedef vec<stmt_info_for_cost> stmt_vector_for_cost;
 
 static inline void
 add_stmt_info_to_vec (stmt_vector_for_cost *stmt_cost_vec, int count,
 		      enum vect_cost_for_stmt kind, gimple *stmt, int misalign)
 {
   stmt_info_for_cost si;
   si.count = count;
   si.kind = kind;
   si.stmt = stmt;
   si.misalign = misalign;
   stmt_cost_vec->safe_push (si);
 }
 
 /************************************************************************
   SLP
  ************************************************************************/
 typedef struct _slp_tree *slp_tree;
 
 /* A computation tree of an SLP instance.  Each node corresponds to a group of
    stmts to be packed in a SIMD stmt.  */
 struct _slp_tree {
   /* Nodes that contain def-stmts of this node statements operands.  */
   vec<slp_tree> children;
   /* A group of scalar stmts to be vectorized together.  */
   vec<gimple *> stmts;
   /* Load permutation relative to the stores, NULL if there is no
      permutation.  */
   vec<unsigned> load_permutation;
   /* Vectorized stmt/s.  */
   vec<gimple *> vec_stmts;
   /* Number of vector stmts that are created to replace the group of scalar
      stmts. It is calculated during the transformation phase as the number of
      scalar elements in one scalar iteration (GROUP_SIZE) multiplied by VF
      divided by vector size.  */
   unsigned int vec_stmts_size;
@@ -607,116 +613,121 @@ typedef struct _stmt_vec_info {
      for linear arguments (pair of NULLs for other arguments).  */
   vec<tree> simd_clone_info;
 
   /* Classify the def of this stmt.  */
   enum vect_def_type def_type;
 
   /*  Whether the stmt is SLPed, loop-based vectorized, or both.  */
   enum slp_vect_type slp_type;
 
   /* Interleaving and reduction chains info.  */
   /* First element in the group.  */
   gimple *first_element;
   /* Pointer to the next element in the group.  */
   gimple *next_element;
   /* For data-refs, in case that two or more stmts share data-ref, this is the
      pointer to the previously detected stmt with the same dr.  */
   gimple *same_dr_stmt;
   /* The size of the group.  */
   unsigned int size;
   /* For stores, number of stores from this group seen. We vectorize the last
      one.  */
   unsigned int store_count;
   /* For loads only, the gap from the previous load. For consecutive loads, GAP
      is 1.  */
   unsigned int gap;
 
   /* The minimum negative dependence distance this stmt participates in
      or zero if none.  */
   unsigned int min_neg_dist;
 
   /* Not all stmts in the loop need to be vectorized. e.g, the increment
      of the loop induction variable and computation of array indexes. relevant
      indicates whether the stmt needs to be vectorized.  */
   enum vect_relevant relevant;
 
   /* The bb_vec_info with respect to which STMT is vectorized.  */
   bb_vec_info bb_vinfo;
 
   /* Is this statement vectorizable or should it be skipped in (partial)
      vectorization.  */
   bool vectorizable;
 
   /* For loads if this is a gather, for stores if this is a scatter.  */
   bool gather_scatter_p;
 
   /* True if this is an access with loop-invariant stride.  */
   bool strided_p;
 
   /* For both loads and stores.  */
   bool simd_lane_access_p;
+
+  /* For reduction loops, this is the type of reduction.  */
+  enum vect_reduction_type v_reduc_type;
+
 } *stmt_vec_info;
 
 /* Access Functions.  */
 #define STMT_VINFO_TYPE(S)                 (S)->type
 #define STMT_VINFO_STMT(S)                 (S)->stmt
 #define STMT_VINFO_LOOP_VINFO(S)           (S)->loop_vinfo
 #define STMT_VINFO_BB_VINFO(S)             (S)->bb_vinfo
 #define STMT_VINFO_RELEVANT(S)             (S)->relevant
 #define STMT_VINFO_LIVE_P(S)               (S)->live
 #define STMT_VINFO_VECTYPE(S)              (S)->vectype
 #define STMT_VINFO_VEC_STMT(S)             (S)->vectorized_stmt
 #define STMT_VINFO_VECTORIZABLE(S)         (S)->vectorizable
 #define STMT_VINFO_DATA_REF(S)             (S)->data_ref_info
 #define STMT_VINFO_GATHER_SCATTER_P(S)	   (S)->gather_scatter_p
 #define STMT_VINFO_STRIDED_P(S)	   	   (S)->strided_p
 #define STMT_VINFO_SIMD_LANE_ACCESS_P(S)   (S)->simd_lane_access_p
+#define STMT_VINFO_VEC_REDUCTION_TYPE(S)   (S)->v_reduc_type
 
 #define STMT_VINFO_DR_BASE_ADDRESS(S)      (S)->dr_base_address
 #define STMT_VINFO_DR_INIT(S)              (S)->dr_init
 #define STMT_VINFO_DR_OFFSET(S)            (S)->dr_offset
 #define STMT_VINFO_DR_STEP(S)              (S)->dr_step
 #define STMT_VINFO_DR_ALIGNED_TO(S)        (S)->dr_aligned_to
 
 #define STMT_VINFO_IN_PATTERN_P(S)         (S)->in_pattern_p
 #define STMT_VINFO_RELATED_STMT(S)         (S)->related_stmt
 #define STMT_VINFO_PATTERN_DEF_SEQ(S)      (S)->pattern_def_seq
 #define STMT_VINFO_SAME_ALIGN_REFS(S)      (S)->same_align_refs
 #define STMT_VINFO_SIMD_CLONE_INFO(S)	   (S)->simd_clone_info
 #define STMT_VINFO_DEF_TYPE(S)             (S)->def_type
 #define STMT_VINFO_GROUP_FIRST_ELEMENT(S)  (S)->first_element
 #define STMT_VINFO_GROUP_NEXT_ELEMENT(S)   (S)->next_element
 #define STMT_VINFO_GROUP_SIZE(S)           (S)->size
 #define STMT_VINFO_GROUP_STORE_COUNT(S)    (S)->store_count
 #define STMT_VINFO_GROUP_GAP(S)            (S)->gap
 #define STMT_VINFO_GROUP_SAME_DR_STMT(S)   (S)->same_dr_stmt
 #define STMT_VINFO_GROUPED_ACCESS(S)      ((S)->first_element != NULL && (S)->data_ref_info)
 #define STMT_VINFO_LOOP_PHI_EVOLUTION_PART(S) (S)->loop_phi_evolution_part
 #define STMT_VINFO_MIN_NEG_DIST(S)	(S)->min_neg_dist
 
 #define GROUP_FIRST_ELEMENT(S)          (S)->first_element
 #define GROUP_NEXT_ELEMENT(S)           (S)->next_element
 #define GROUP_SIZE(S)                   (S)->size
 #define GROUP_STORE_COUNT(S)            (S)->store_count
 #define GROUP_GAP(S)                    (S)->gap
 #define GROUP_SAME_DR_STMT(S)           (S)->same_dr_stmt
 
 #define STMT_VINFO_RELEVANT_P(S)          ((S)->relevant != vect_unused_in_scope)
 
 #define HYBRID_SLP_STMT(S)                ((S)->slp_type == hybrid)
 #define PURE_SLP_STMT(S)                  ((S)->slp_type == pure_slp)
 #define STMT_SLP_TYPE(S)                   (S)->slp_type
 
 struct dataref_aux {
   int misalignment;
   /* If true the alignment of base_decl needs to be increased.  */
   bool base_misaligned;
   /* If true we know the base is at least vector element alignment aligned.  */
   bool base_element_aligned;
   tree base_decl;
 };
 
 #define DR_VECT_AUX(dr) ((dataref_aux *)(dr)->aux)
 
 #define VECT_MAX_COST 1000
 
 /* The maximum number of intermediate steps required in multi-step type
-- 
1.9.3 (Apple Git-50)


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-10-22 14:21 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-10 15:11 [PATCH] vectorizing conditional expressions (PR tree-optimization/65947) Alan Hayward
2015-09-10 22:34 ` Bill Schmidt
2015-09-11  9:19   ` Alan Hayward
2015-09-11 13:23     ` Bill Schmidt
2015-09-11 13:55       ` Ramana Radhakrishnan
2015-09-11 14:41         ` Richard Sandiford
2015-09-11 15:14           ` Bill Schmidt
2015-09-11 15:30             ` Richard Sandiford
2015-09-11 15:50               ` Bill Schmidt
2015-09-11 16:54                 ` Ramana Radhakrishnan
2015-09-15 11:47                   ` Richard Biener
2015-09-14  9:50       ` Alan Lawrence
2015-09-14 14:20         ` Bill Schmidt
2015-09-15 12:10 ` Richard Biener
2015-09-15 15:41   ` Alan Hayward
2015-09-18 12:22     ` Richard Biener
2015-09-18 13:36       ` Alan Lawrence
2015-09-18 14:14         ` Alan Hayward
     [not found]         ` <D221D55E.8386%alan.hayward@arm.com>
2015-09-23 16:07           ` Alan Hayward
2015-09-30 12:49             ` Richard Biener
2015-10-01 15:22               ` Alan Hayward
2015-09-15 12:12 ` Richard Biener
2015-10-19  8:34 Alan Hayward
2015-10-21 10:46 ` Richard Biener
2015-10-22 14:18 ` Alan Lawrence
2015-10-22 14:23   ` Alan Hayward

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).