Vector Comparison patch

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Vector Comparison patch
@ 2011-08-12  7:04 Artem Shinkarov
  2011-08-15 15:25 ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-12  7:04 UTC (permalink / raw)
  To: gcc-patches, Richard Guenther, Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 2621 bytes --]

Hi

Here is a completed version of the vector comparison patch we
discussed a long time ago here:
http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01184.html

The patch implements vector comparison according to the OpenCL
standard, when the result of the comparison of two vectors is vector
of signed integers, where -1 represents true and 0 false.

The patch implements vector conditional res = VCOND<V1 ? V2 : V3>
which is expanded into:
foreach (i in length (V1)) res[i] = V1 == 0 ? V3[i] : V2[i].

ChangeLog

2011-08-12 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>

       gcc/
       * targhooks.c (default_builtin_vec_compare): New hook.
       * targhooks.h (default_builtin_vec_compare): New definition.
       * target.def (builtin_vec_compare): New hook.
       * target.h: New include (gimple.h).
       * fold-const.c
       (fold_comparison): Adjust x <cmp> x vector operations.
       * c-typeck.c (build_binary_op): Allow vector comparison.
       (c_obj_common_truthvalue_conversion): Deny vector comparison
       inside of if statement.
       (build_conditional_expr): Adjust to build VEC_COND_EXPR.
       * tree-vect-generic.c (do_compare): Helper function.
       (expand_vector_comparison): Check if hardware comparison
       is available, if not expand comparison piecewise.
       (expand_vector_operation): Handle vector comparison
       expressions separately.
       (earlyexpand_vec_cond_expr): Expand vector comparison
       piecewise.
       * Makefile.in: New dependencies.
       * tree-cfg.c (verify_gimple_comparison): Allow vector
       comparison operations in gimple.
       * c-parser.c (c_parser_conditional_expression): Adjust
       to handle VEC_COND_EXPR.
       * gimplify.c (gimplify_expr): Adjust to handle VEC_COND_EXPR.
       * config/i386/i386.c (vector_fp_compare): Build hardware
       specific code for floating point vector comparison.
       (vector_int_compare): Build hardware specific code for
       integer vector comparison.
       (ix86_vectorize_builtin_vec_compare): Implementation of
       builtin_vec_compare hook.

       gcc/testsuite/
       * gcc.c-torture/execute/vector-vcond-1.c: New test.
       * gcc.c-torture/execute/vector-vcond-2.c: New test.
       * gcc.c-torture/execute/vector-compare-1.c: New test.
       * gcc.c-torture/execute/vector-compare-2.c: New test.
       * gcc.dg/vector-compare-1.c: New test.
       * gcc.dg/vector-compare-2.c: New test.

       gcc/doc
       * extend.texi: Adjust.
       * tm.texi: Adjust.
       * tm.texi.in: Adjust.


bootstrapped and tested on x86_64_unknown-linux.


Thanks,
Artem Shinkarov.

[-- Attachment #2: vector-compare-vcond-3.diff --]
[-- Type: text/plain, Size: 53342 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 177665)
+++ gcc/doc/extend.texi	(working copy)
@@ -6553,6 +6553,97 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+In C vector comparison is supported within standard comparison operators:
+@code{==, !=, <, <=, >, >=}. Both integer-type and real-type vectors
+can be compared but only of the same type. The result of the
+comparison is a signed integer-type vector where the size of each
+element must be the same as the size of compared vectors element.
+Comparison is happening element by element. False value is 0, true
+value is -1 (constant of the appropriate type where all bits are set).
+Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
+c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
+@end smallexample
+
+In addition to the vector comparison C supports conditional expressions
+where the condition is a vector of signed integers. In that case result
+of the condition is used as a mask to select either from the first 
+operand or from the second. Consider the following example:
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,7@};
+v4si c = @{2,3,4,5@};
+v4si d = @{6,7,8,9@};
+v4si res;
+
+res = a >= b ? c : d;  /* res would contain @{6, 3, 4, 9@}  */
+@end smallexample
+
+The number of elements in the condition must be the same as number of
+elements in the both operands. The same stands for the size of the type
+of the elements. The type of the vector conditional is determined by
+the types of the operands which must be the same. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+typedef float v4f __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{2,3,4,5@};
+v4f f = @{1.,  5., 7., -8.@};
+v4f g = @{3., -2., 8.,  1.@};
+v4si ires;
+v4f fres;
+
+fres = a <= b ? f : g;  /* fres would contain @{1., 5., 7., -8.@}  */
+ires = f <= g ? a : b;  /* fres would contain @{1,  3,  3,   4@}  */
+@end smallexample
+
+For the convenience condition in the vector conditional can be just a
+vector of signed integer type. In that case this vector is implicitly
+compared with vectors of zeroes. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+
+ires = a ? b : a;  /* synonym for ires = a != @{0,0,0,0@} ? a :b;  */
+@end smallexample
+
+Pleas note that the conditional where the operands are vectors and the
+condition is integer works in a standard way -- returns first operand
+if the condition is true and second otherwise. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+int x,y;
+
+/* standard conditional returning A or B  */
+ires = x > y ? a : b;  
+
+/* vector conditional where the condition is (x > y ? a : b)  */
+ires = (x > y ? a : b) ? b : a; 
+@end smallexample
+
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 177665)
+++ gcc/doc/tm.texi	(working copy)
@@ -5738,6 +5738,10 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_COMPARE (gimple_stmt_iterator *@var{gsi}, tree @var{type}, tree @var{v0}, tree @var{v1}, enum tree_code @var{code})
+This hook should check whether it is possible to express vectorcomparison using the hardware-specific instructions and return resulttree. Hook should return NULL_TREE if expansion is impossible.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_PERM (tree @var{type}, tree *@var{mask_element_type})
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 177665)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -5676,6 +5676,8 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@hook TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+
 @hook TARGET_VECTORIZE_BUILTIN_VEC_PERM
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 177665)
+++ gcc/targhooks.c	(working copy)
@@ -969,6 +969,18 @@ default_builtin_vector_alignment_reachab
   return true;
 }
 
+/* Replaces vector comparison with the target-specific instructions 
+   and returns the resulting variable or NULL_TREE otherwise.  */
+tree 
+default_builtin_vec_compare (gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, 
+                             tree type ATTRIBUTE_UNUSED, 
+                             tree v0 ATTRIBUTE_UNUSED, 
+                             tree v1 ATTRIBUTE_UNUSED, 
+                             enum tree_code code ATTRIBUTE_UNUSED)
+{
+  return NULL_TREE;
+}
+
 /* By default, assume that a target supports any factor of misalignment
    memory access if it supports movmisalign patten.
    is_packed is true if the memory access is defined in a packed struct.  */
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 177665)
+++ gcc/targhooks.h	(working copy)
@@ -86,6 +86,11 @@ extern int default_builtin_vectorization
 extern tree default_builtin_reciprocal (unsigned int, bool, bool);
 
 extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
+
+extern tree default_builtin_vec_compare (gimple_stmt_iterator *gsi, 
+                                         tree type, tree v0, tree v1, 
+                                         enum tree_code code);
+
 extern bool
 default_builtin_support_vector_misalignment (enum machine_mode mode,
 					     const_tree,
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 177665)
+++ gcc/target.def	(working copy)
@@ -988,6 +988,15 @@ DEFHOOK
  bool, (tree vec_type, tree mask),
  hook_bool_tree_tree_true)
 
+/* Implement hardware vector comparison or return false.  */
+DEFHOOK
+(builtin_vec_compare,
+ "This hook should check whether it is possible to express vector\
+comparison using the hardware-specific instructions and return result\
+tree. Hook should return NULL_TREE if expansion is impossible.",
+ tree, (gimple_stmt_iterator *gsi, tree type, tree v0, tree v1, enum tree_code code),
+ default_builtin_vec_compare)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 177665)
+++ gcc/target.h	(working copy)
@@ -51,6 +51,7 @@
 #define GCC_TARGET_H
 
 #include "insn-modes.h"
+#include "gimple.h"
 
 #ifdef ENABLE_CHECKING
 
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 177665)
+++ gcc/fold-const.c	(working copy)
@@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr
      floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
     {
-      switch (code)
+      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
 	{
-	case EQ_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-	    return constant_boolean_node (1, type);
-	  break;
+	  tree el_type = TREE_TYPE (TREE_TYPE (arg0));
+	  switch (code)
+	    {
+	    case EQ_EXPR:
+	    case GE_EXPR:
+	    case LE_EXPR:
+	      if (!FLOAT_TYPE_P (el_type) 
+		  || HONOR_NANS (TYPE_MODE (el_type)))
+		return build_vector_from_val 
+			  (TREE_TYPE (arg0), build_int_cst (el_type, -1));
+	      break;
+	    case NE_EXPR:
+	      if (FLOAT_TYPE_P (el_type)
+		  && HONOR_NANS (TYPE_MODE (el_type)))
+		break;
+	    /* ... fall through ...  */
+	    case GT_EXPR:
+	    case LT_EXPR:
+	      return build_vector_from_val 
+			  (TREE_TYPE (arg0), build_int_cst (el_type, 0));
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+      else
+	switch (code)
+	  {
+	  case EQ_EXPR:
+	    if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
+		|| ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	      return constant_boolean_node (1, type);
+	    break;
 
-	case GE_EXPR:
-	case LE_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-	    return constant_boolean_node (1, type);
-	  return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);
+	  case GE_EXPR:
+	  case LE_EXPR:
+	    if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
+		|| ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	      return constant_boolean_node (1, type);
+	    return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);
 
-	case NE_EXPR:
-	  /* For NE, we can only do this simplification if integer
-	     or we don't honor IEEE floating point NaNs.  */
-	  if (FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      && HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-	    break;
-	  /* ... fall through ...  */
-	case GT_EXPR:
-	case LT_EXPR:
-	  return constant_boolean_node (0, type);
-	default:
-	  gcc_unreachable ();
-	}
+	  case NE_EXPR:
+	    /* For NE, we can only do this simplification if integer
+	       or we don't honor IEEE floating point NaNs.  */
+	    if (FLOAT_TYPE_P (TREE_TYPE (arg0))
+		&& HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	      break;
+	    /* ... fall through ...  */
+	  case GT_EXPR:
+	  case LT_EXPR:
+	    return constant_boolean_node (0, type);
+	  default:
+	    gcc_unreachable ();
+	  }
     }
 
   /* If we are comparing an expression that just has comparisons
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
@@ -0,0 +1,77 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(count, res, i0, i1, c0, c1, op, fmt0, fmt1) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if ((res)[__i] != \
+                ((i0)[__i] op (i1)[__i]  \
+		? (c0)[__i] : (c1)[__i]))  \
+	{ \
+            __builtin_printf (fmt0 " != " fmt1 " " #op " " fmt1 ") ? " \
+			      fmt0 " : " fmt0 ")", \
+	    (res)[__i], (i0)[__i], (i1)[__i],\
+	    (c0)[__i], (c1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, c0, c1, res, fmt0, fmt1); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >, fmt0, fmt1); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >=, fmt0, fmt1); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <, fmt0, fmt1); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <=, fmt0, fmt1); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, ==, fmt0, fmt1); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, !=, fmt0, fmt1); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+  vector (4, int) i0 = {argc, 1,  2,  10}; 
+  vector (4, int) i1 = {0, argc, 2, (int)-23};
+  vector (4, int) ires;
+  vector (4, float) f0 = {1., 7., (float)argc, 4.};
+  vector (4, float) f1 = {6., 2., 8., (float)argc};
+  vector (4, float) fres;
+
+  vector (2, double) d0 = {1., (double)argc};
+  vector (2, double) d1 = {6., 2.};
+  vector (2, double) dres;
+  vector (2, long) l0 = {argc, 3};
+  vector (2, long) l1 = {5,  8};
+  vector (2, long) lres;
+  
+  /* Thes tests work fine.  */
+  test (4, i0, i1, f0, f1, fres, "%f", "%i");
+  test (4, f0, f1, i0, i1, ires, "%i", "%f");
+  test (2, d0, d1, l0, l1, lres, "%i", "%f");
+  test (2, l0, l1, d0, d1, dres, "%f", "%i");
+
+  /* Condition expressed with a single variable.  */
+  dres = l0 ? d0 : d1;
+  check_compare (2, dres, l0, ((vector (2, long)){0,0}), d0, d1, !=, "%i", "%f");
+  
+  lres = l1 ? l0 : l1;
+  check_compare (2, lres, l1, ((vector (2, long)){0,0}), l0, l1, !=, "%i", "%i");
+ 
+  fres = i0 ? f0 : f1;
+  check_compare (4, fres, i0, ((vector (4, int)){0,0,0,0}), f0, f1, !=, "%i", "%f");
+
+  ires = i1 ? i0 : i1;
+  check_compare (4, ires, i1, ((vector (4, int)){0,0,0,0}), i0, i1, !=, "%i", "%i");
+
+
+  return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
@@ -0,0 +1,123 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define check_compare(count, res, i0, i1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+      if ((res)[__i] != ((i0)[__i] op (i1)[__i] ? -1 : 0)) \
+	{ \
+            __builtin_printf ("%i != ((" fmt " " #op " " fmt " ? -1 : 0) ", \
+			      (res)[__i], (i0)[__i], (i1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, res, fmt); \
+do { \
+    res = (v0 > v1); \
+    check_compare (count, res, v0, v1, >, fmt); \
+    res = (v0 < v1); \
+    check_compare (count, res, v0, v1, <, fmt); \
+    res = (v0 >= v1); \
+    check_compare (count, res, v0, v1, >=, fmt); \
+    res = (v0 <= v1); \
+    check_compare (count, res, v0, v1, <=, fmt); \
+    res = (v0 == v1); \
+    check_compare (count, res, v0, v1, ==, fmt); \
+    res = (v0 != v1); \
+    check_compare (count, res, v0, v1, !=, fmt); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0;
+    vector (4, INT) i1;
+    vector (4, int) ires;
+    int i;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, i0, i1, ires, "%i");
+#undef INT
+
+#define INT unsigned int 
+    vector (4, int) ures;
+    vector (4, INT) u0;
+    vector (4, INT) u1;
+
+    u0 = (vector (4, INT)){argc, 1,  2,  10};
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, u0, u1, ures, "%u");
+#undef INT
+
+
+#define SHORT short
+    vector (8, SHORT) s0;
+    vector (8, SHORT) s1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, s0, s1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;
+    vector (8, SHORT) us1;
+    vector (8, short) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, us0, us1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;
+    vector (16, CHAR) c1;
+    vector (16, signed char) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, c0, c1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;
+    vector (16, CHAR) uc1;
+    vector (16, signed char) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, uc0, uc1, ucres, "%u");
+#undef CHAR
+/* Float comparison.  */
+    vector (4, float) f0;
+    vector (4, float) f1;
+    vector (4, int) ifres;
+
+    f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
+    f1 = (vector (4, float)){0., 3., 2., (float)-23};    
+    test (4, f0, f1, ifres, "%f");
+    
+/* Double comparison.  */
+    vector (2, double) d0;
+    vector (2, double) d1;
+    vector (2, long) idres;
+
+    d0 = (vector (2, double)){(double)argc,  10.};
+    d1 = (vector (2, double)){0., (double)-23};    
+    test (2, d0, d1, idres, "%f");
+
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
@@ -0,0 +1,154 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(type, count, res, i0, i1, c0, c1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if (vidx (type, res, __i) != \
+                ((vidx (type, i0, __i) op vidx (type, i1, __i))  \
+		? vidx (type, c0, __i) : vidx (type, c1, __i)))  \
+	{ \
+            __builtin_printf (fmt " != ((" fmt " " #op " " fmt ") ? " fmt " : " fmt ")", \
+	    vidx (type, res, __i), vidx (type, i0, __i), vidx (type, i1, __i),\
+	    vidx (type, c0, __i), vidx (type, c1, __i)); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(type, count, v0, v1, c0, c1, res, fmt); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >, fmt); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >=, fmt); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <, fmt); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <=, fmt); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, ==, fmt); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, !=, fmt); \
+} while (0)
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0; vector (4, INT) i1;
+    vector (4, INT) ic0; vector (4, INT) ic1;
+    vector (4, INT) ires;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    ic0 = (vector (4, INT)){1, argc,  argc,  10};
+    ic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, i0, i1, ic0, ic1, ires, "%i");
+#undef INT
+
+#define INT  unsigned int
+    vector (4, INT) ui0; vector (4, INT) ui1;
+    vector (4, INT) uic0; vector (4, INT) uic1;
+    vector (4, INT) uires;
+
+    ui0 = (vector (4, INT)){argc, 1,  2,  10};
+    ui1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    uic0 = (vector (4, INT)){1, argc,  argc,  10};
+    uic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, ui0, ui1, uic0, uic1, uires, "%u");
+#undef INT
+
+#define SHORT short
+    vector (8, SHORT) s0;   vector (8, SHORT) s1;
+    vector (8, SHORT) sc0;   vector (8, SHORT) sc1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    sc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    sc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, s0, s1, sc0, sc1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;   vector (8, SHORT) us1;
+    vector (8, SHORT) usc0;   vector (8, SHORT) usc1;
+    vector (8, SHORT) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    usc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    usc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, us0, us1, usc0, usc1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;   vector (16, CHAR) c1;
+    vector (16, CHAR) cc0;   vector (16, CHAR) cc1;
+    vector (16, CHAR) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    cc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    cc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, c0, c1, cc0, cc1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;   vector (16, CHAR) uc1;
+    vector (16, CHAR) ucc0;   vector (16, CHAR) ucc1;
+    vector (16, CHAR) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    ucc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    ucc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, uc0, uc1, ucc0, ucc1, ucres, "%u");
+#undef CHAR
+
+/* Float version.  */
+   vector (4, float) f0 = {1., 7., (float)argc, 4.};
+   vector (4, float) f1 = {6., 2., 8., (float)argc};
+   vector (4, float) fc0 = {3., 12., 4., (float)argc};
+   vector (4, float) fc1 = {7., 5., (float)argc, 6.};
+   vector (4, float) fres;
+
+   test (float, 4, f0, f1, fc0, fc1, fres, "%f");
+
+/* Double version.  */
+   vector (2, double) d0 = {1., (double)argc};
+   vector (2, double) d1 = {6., 2.};
+   vector (2, double) dc0 = {(double)argc, 7.};
+   vector (2, double) dc1 = {7., 5.};
+   vector (2, double) dres;
+
+   test (double, 2, d0, d1, dc0, dc1, dres, "%f");
+
+
+   return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+/* Check that constant folding in 
+   these simple cases works.  */
+vector (4, int)
+foo (vector (4, int) x)
+{
+  return   (x == x) + (x != x) + (x >  x) 
+	 + (x <  x) + (x >= x) + (x <= x);
+}
+
+int 
+main (int argc, char *argv[])
+{
+  vector (4, int) t = {argc, 2, argc, 42};
+  vector (4, int) r;
+  int i;
+
+  r = foo (t);
+
+  for (i = 0; i < 4; i++)
+    if (r[i] != -3)
+      __builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+void
+foo (vector (4, int) x, vector (4, float) y)
+{
+  vector (4, int) p4;
+  vector (4, int) r4;
+  vector (4, unsigned int) q4;
+  vector (8, int) r8;
+  vector (4, float) f4;
+  
+  r4 = x > y;	    /* { dg-error "comparing vectors with different element types" } */
+  r8 = (x != p4);   /* { dg-error "incompatible types when assigning to type" } */
+  r8 == r4;	    /* { dg-error "comparing vectors with different number of elements" } */
+
+  r4 ? y : p4;	    /* { dg-error "vectors of different types involved in vector comparison" } */
+  r4 ? r4 : r8;	    /* { dg-error "vectors of different length found in vector comparison" } */
+  y ? f4 : y;	    /* { dg-error "non-integer type in vector condition" } */
+  q4 ? p4 : r4;	    /* { dg-error "vector comparison must be of signed integer vector type" } */
+}
Index: gcc/testsuite/gcc.dg/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */   
+
+/* Test if C_MAYBE_CONST are folded correctly when 
+   creating VEC_COND_EXPR.  */
+
+typedef int vec __attribute__((vector_size(16)));
+
+vec i,j;
+extern vec a, b, c;
+
+vec 
+foo (int x)
+{
+  return (x ? i : j) ? a : b;
+}
+
+vec 
+bar (int x)
+{
+  return a ? (x ? i : j) : b;
+}
+
+vec 
+baz (int x)
+{
+  return a ? b : (x ? i : j);
+}
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 177665)
+++ gcc/c-typeck.c	(working copy)
@@ -4058,6 +4058,93 @@ build_conditional_expr (location_t colon
   type2 = TREE_TYPE (op2);
   code2 = TREE_CODE (type2);
 
+  if (TREE_CODE (TREE_TYPE (ifexp)) == VECTOR_TYPE)
+    {
+      bool maybe_const = true;
+      tree sc;
+
+      if (TREE_CODE (TREE_TYPE (TREE_TYPE (ifexp))) != INTEGER_TYPE)
+        {
+          error_at (colon_loc, "non-integer type in vector condition");
+          return error_mark_node;
+        }
+      
+      if (TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (ifexp))))
+        {
+          error_at (colon_loc, "vector comparison must be of signed "
+			       "integer vector type");
+          return error_mark_node;
+        }
+
+      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
+          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
+             != TYPE_VECTOR_SUBPARTS (type1))
+        {
+          error_at (colon_loc, "vectors of different length found in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+      
+      if (type1 != type2)
+        {
+          error_at (colon_loc, "vectors of different types involved in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+
+      if (TYPE_SIZE (TREE_TYPE (TREE_TYPE (ifexp))) 
+          != TYPE_SIZE (TREE_TYPE (type1)))
+        {
+          error_at (colon_loc, "vector-condition element type must be "
+                               "the same as result vector element type");
+          return error_mark_node;
+        }
+      
+      /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
+      sc = c_fully_fold (ifexp, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	ifexp = c_wrap_maybe_const (sc, true);
+      else
+	ifexp = sc;
+      
+      sc = c_fully_fold (op1, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	op1 = c_wrap_maybe_const (sc, true);
+      else
+	op1 = sc;
+      
+      sc = c_fully_fold (op2, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	op2 = c_wrap_maybe_const (sc, true);
+      else
+	op2 = sc;
+
+      /* Currently the expansion of VEC_COND_EXPR does not allow
+	 expessions where the type of vectors you compare differs
+	 form the type of vectors you select from. For the time
+	 being we insert implicit conversions.  */
+      if ((COMPARISON_CLASS_P (ifexp)
+	   && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
+	  || TREE_TYPE (ifexp) != type1)
+	{
+	  tree comp_type = COMPARISON_CLASS_P (ifexp)
+			   ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
+			   : TREE_TYPE (ifexp);
+	  tree vcond;
+	  
+	  op1 = convert (comp_type, op1);
+	  op2 = convert (comp_type, op2);
+	  vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
+	  vcond = convert (type1, vcond);
+	  return vcond;
+	}
+      else
+	return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);
+    }
+
   /* C90 does not permit non-lvalue arrays in conditional expressions.
      In C99 they will be pointers by now.  */
   if (code1 == ARRAY_TYPE || code2 == ARRAY_TYPE)
@@ -9906,6 +9993,29 @@ build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -10018,6 +10128,29 @@ build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10425,6 +10558,10 @@ c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 177665)
+++ gcc/gimplify.c	(working copy)
@@ -7064,6 +7064,22 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
+        case VEC_COND_EXPR:
+	  {
+	    enum gimplify_status r0, r1, r2;
+
+	    r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+				post_p, is_gimple_condexpr, fb_rvalue);
+	    r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    recalculate_side_effects (*expr_p);
+
+	    ret = MIN (r0, MIN (r1, r2));
+	  }
+	  break;
+
 	case TARGET_MEM_REF:
 	  {
 	    enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE;
@@ -7348,6 +7364,11 @@ gimplify_expr (tree *expr_p, gimple_seq
 		{
 		  tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1));
 
+		  /* Vector comparisons is a valid gimple expression
+		     which could be lowered down later.  */
+		  if (TREE_CODE (type) == VECTOR_TYPE)
+		    goto expr_2;
+
 		  if (!AGGREGATE_TYPE_P (type))
 		    {
 		      tree org_type = TREE_TYPE (*expr_p);
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 177665)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "target.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -125,6 +126,21 @@ do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0;  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree cond;
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  cond = gimplify_build2 (gsi, code, inner_type, a, b);
+  return gimplify_build3 (gsi, COND_EXPR, inner_type, cond, 
+                    build_int_cst (inner_type, -1),
+                    build_int_cst (inner_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -333,6 +349,21 @@ uniform_vector_p (tree vec)
   return NULL_TREE;
 }
 
+/* Try a hardware hook for vector comparison or 
+   extract comparison piecewise.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+  tree t = targetm.vectorize.builtin_vec_compare (gsi, type, op0, op1, code);
+
+  if (t == NULL_TREE)
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  return t;
+
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -375,8 +406,24 @@ expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
-
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+        return expand_vector_comparison (gsi, type,
+                                      gimple_assign_rhs1 (assign),
+                                      gimple_assign_rhs2 (assign), code);
       default:
 	break;
       }
@@ -432,6 +479,64 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+/* Expand vector condition EXP which should have the form
+   VEC_COND_EXPR<cond, vec0, vec1> into the following
+   vector:
+     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
+   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
+static tree
+earlyexpand_vec_cond_expr (gimple_stmt_iterator *gsi, tree exp)
+{
+  tree cond = TREE_OPERAND (exp, 0);
+  tree vec0 = TREE_OPERAND (exp, 1);
+  tree vec1 = TREE_OPERAND (exp, 2);
+  tree type = TREE_TYPE (vec0);
+  tree inner_type = TREE_TYPE (type);
+  tree var, new_rhs;
+  gimple new_stmt;
+
+  VEC(constructor_elt,gc) *v;
+  tree part_width = TYPE_SIZE (inner_type);
+  tree index = bitsize_int (0);
+  int nunits = TYPE_VECTOR_SUBPARTS (type);
+  int i;
+
+  /* Ensure that we will be able to expand vector comparison
+     in case it is not supported by the architecture.  */
+  gcc_assert (COMPARISON_CLASS_P (cond));
+  
+  /* Check if we need to expand vector condition inside of
+     VEC_COND_EXPR.  */
+  var = create_tmp_reg (TREE_TYPE (cond), "cond");
+  new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
+                                      TREE_OPERAND (cond, 0),
+				      TREE_OPERAND (cond, 1),
+                                      TREE_CODE (cond));
+  new_stmt = gimple_build_assign (var, new_rhs);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  update_stmt (gsi_stmt (*gsi));
+
+  /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs.  */
+  v = VEC_alloc(constructor_elt, gc, nunits);
+  for (i = 0; i < nunits;
+       i += 1, index = int_const_binop (PLUS_EXPR, index, part_width))
+    {
+      tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, index);
+      tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index);
+      tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index);
+      tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond, 
+                                    build_int_cst (inner_type ,0));
+
+      tree result =  gimplify_build3 (gsi, COND_EXPR, inner_type, rcond, a, b);
+     
+      constructor_elt *ce = VEC_quick_push (constructor_elt, v, NULL);
+      ce->index = NULL_TREE;
+      ce->value = result;
+    }
+
+  return build_constructor (type, v);
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -451,6 +556,33 @@ expand_vector_operations_1 (gimple_stmt_
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
 
+  /* Check if VEC_COND_EXPR is supported in hardware within the
+     given types.  */
+  if (code == VEC_COND_EXPR)
+    {
+      tree exp = gimple_assign_rhs1 (stmt);
+      tree cond = TREE_OPERAND (exp, 0);
+      
+      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
+	 change it to A != {0,0,...} ? V0 : V1  */
+      if (!COMPARISON_CLASS_P (cond))
+	TREE_OPERAND (exp, 0) = 
+	    build2 (NE_EXPR, TREE_TYPE (cond), cond,
+		    build_vector_from_val (TREE_TYPE (cond),
+		      build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
+
+      if (expand_vec_cond_expr_p (TREE_TYPE (exp), 
+                                  TYPE_MODE (TREE_TYPE (exp))))
+        {
+	  update_stmt (gsi_stmt (*gsi));
+	  return;
+        }
+        
+      new_rhs = earlyexpand_vec_cond_expr (gsi, exp);
+      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
+      update_stmt (gsi_stmt (*gsi));
+    }
+
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
@@ -471,6 +603,7 @@ expand_vector_operations_1 (gimple_stmt_
 
   gcc_assert (code != CONVERT_EXPR);
 
+  
   /* The signedness is determined from input argument.  */
   if (code == VEC_UNPACK_FLOAT_HI_EXPR
       || code == VEC_UNPACK_FLOAT_LO_EXPR)
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 177665)
+++ gcc/Makefile.in	(working copy)
@@ -888,7 +888,7 @@ EXCEPT_H = except.h $(HASHTAB_H) vecprim
 TARGET_DEF = target.def target-hooks-macros.h
 C_TARGET_DEF = c-family/c-target.def target-hooks-macros.h
 COMMON_TARGET_DEF = common/common-target.def target-hooks-macros.h
-TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
+TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
 C_TARGET_H = c-family/c-target.h $(C_TARGET_DEF)
 COMMON_TARGET_H = common/common-target.h $(INPUT_H) $(COMMON_TARGET_DEF)
 MACHMODE_H = machmode.h mode-classes.def insn-modes.h
@@ -919,8 +919,9 @@ TREE_H = tree.h all-tree.def tree.def c-
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) cfghooks.h
 GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
-	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TARGET_H) tree-ssa-operands.h \
+	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TGT) tree-ssa-operands.h \
 	tree-ssa-alias.h $(INTERNAL_FN_H)
+TARGET_H = $(TGT) gimple.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 COVERAGE_H = coverage.h $(GCOV_IO_H)
 DEMANGLE_H = $(srcdir)/../include/demangle.h
@@ -3185,7 +3186,7 @@ tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h target.h
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 177665)
+++ gcc/tree-cfg.c	(working copy)
@@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
       return true;
     }
 
+  if (TREE_CODE (op0_type) == VECTOR_TYPE 
+      && TREE_CODE (op1_type) == VECTOR_TYPE
+      && TREE_CODE (type) == VECTOR_TYPE)
+    {
+      if (TYPE_VECTOR_SUBPARTS (op0_type) != TYPE_VECTOR_SUBPARTS (op1_type))
+        {
+          error ("invalid vector comparison, number of elements do not match");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type))
+        {
+          error ("invalid vector comparison, vector element type mismatch");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type)) 
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+        
+      return false;
+    }
+
   /* For comparisons we do not have the operations type as the
      effective type the comparison is carried out in.  Instead
      we require that either the first operand is trivially
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 177665)
+++ gcc/c-parser.c	(working copy)
@@ -5337,8 +5337,17 @@ c_parser_conditional_expression (c_parse
   if (c_parser_next_token_is (parser, CPP_COLON))
     {
       tree eptype = NULL_TREE;
-
+      
       middle_loc = c_parser_peek_token (parser)->location;
+      
+      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
+        {
+          error_at (middle_loc, "cannot ommit middle operator in "
+                               "vector comparison");
+          ret.value = error_mark_node;
+          return ret;
+        }
+      
       pedwarn (middle_loc, OPT_pedantic, 
 	       "ISO C forbids omitting the middle term of a ?: expression");
       warn_for_omitted_condop (middle_loc, cond.value);
@@ -5357,9 +5366,12 @@ c_parser_conditional_expression (c_parse
     }
   else
     {
-      cond.value
-	= c_objc_common_truthvalue_conversion
-	(cond_loc, default_conversion (cond.value));
+      if (TREE_CODE (TREE_TYPE (cond.value)) != VECTOR_TYPE)
+        {
+          cond.value
+            = c_objc_common_truthvalue_conversion
+            (cond_loc, default_conversion (cond.value));
+        }
       c_inhibit_evaluation_warnings += cond.value == truthvalue_false_node;
       exp1 = c_parser_expression_conv (parser);
       mark_exp_read (exp1.value);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177665)
+++ gcc/config/i386/i386.c	(working copy)
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -32827,6 +32828,276 @@ ix86_vectorize_builtin_vec_perm (tree ve
   return ix86_builtins[(int) fcode];
 }
 
+/* Find target specific sequence for vector comparison of 
+   real-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                   enum machine_mode mode, tree v0, tree v1,
+                   enum tree_code code)
+{
+  enum ix86_builtins fcode;
+  int arg = -1;
+  tree fdef, frtype, tmp, var, t;
+  gimple new_stmt;
+  bool reverse = false;
+
+#define SWITCH_MODE(mode, fcode, code, value) \
+switch (mode) \
+  { \
+    case V2DFmode: \
+      if (!TARGET_SSE2) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PD; \
+      break; \
+    case V4DFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPD256; \
+      arg = value; \
+      break; \
+    case V4SFmode: \
+      if (!TARGET_SSE) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PS; \
+      break; \
+    case V8SFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPS256; \
+      arg = value; \
+      break; \
+    default: \
+      return NULL_TREE; \
+    /* FIXME: Similar instructions for MMX.  */ \
+  }
+
+  switch (code)
+    {
+      case EQ_EXPR:
+        SWITCH_MODE (mode, fcode, EQ, 0);
+        break;
+      
+      case NE_EXPR:
+        SWITCH_MODE (mode, fcode, NEQ, 4);
+        break;
+      
+      case GT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        reverse = true;
+        break;
+      
+      case LT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        break;
+      
+      case LE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        break;
+
+      case GE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        reverse = true;
+        break;
+
+      default:
+        return NULL_TREE;
+    }
+#undef SWITCH_MODE
+
+  fdef = ix86_builtins[(int)fcode];
+  frtype = TREE_TYPE (TREE_TYPE (fdef));
+ 
+  tmp = create_tmp_var (frtype, "tmp");
+  var = create_tmp_var (rettype, "tmp");
+
+  if (arg == -1)
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 2, v1, v0);
+    else
+      new_stmt = gimple_build_call (fdef, 2, v0, v1);
+  else
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 3, v0, v1, 
+                    build_int_cst (char_type_node, arg));
+    else
+      new_stmt = gimple_build_call (fdef, 3, v1, v0, 
+                    build_int_cst (char_type_node, arg));
+     
+  gimple_call_set_lhs (new_stmt, tmp); 
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  
+  return var;
+}
+
+/* Find target specific sequence for vector comparison of 
+   integer-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_int_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                    enum machine_mode mode, tree v0, tree v1,
+                    enum tree_code code)
+{
+  enum ix86_builtins feq, fgt;
+  tree var, t, tmp, tmp1, tmp2, defeq, defgt, gtrtype, eqrtype;
+  gimple new_stmt;
+
+  switch (mode)
+    {
+      /* SSE integer-type vectors.  */
+      case V2DImode:
+        if (!TARGET_SSE4_2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQQ;
+        fgt = IX86_BUILTIN_PCMPGTQ;
+        break;
+
+      case V4SImode:
+        if (!TARGET_SSE2) return NULL_TREE; 
+        feq = IX86_BUILTIN_PCMPEQD128;
+        fgt = IX86_BUILTIN_PCMPGTD128;
+        break;
+      
+      case V8HImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW128;
+        fgt = IX86_BUILTIN_PCMPGTW128;
+        break;
+      
+      case V16QImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB128;
+        fgt = IX86_BUILTIN_PCMPGTB128;
+        break;
+      
+      /* MMX integer-type vectors.  */
+      case V2SImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQD;
+        fgt = IX86_BUILTIN_PCMPGTD;
+        break;
+
+      case V4HImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW;
+        fgt = IX86_BUILTIN_PCMPGTW;
+        break;
+
+      case V8QImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB;
+        fgt = IX86_BUILTIN_PCMPGTB;
+        break;
+      
+      /* FIXME: Similar instructions for AVX.  */
+      default:
+        return NULL_TREE;
+    }
+
+  
+  var = create_tmp_var (rettype, "ret");
+  defeq = ix86_builtins[(int)feq];
+  defgt = ix86_builtins[(int)fgt];
+  eqrtype = TREE_TYPE (TREE_TYPE (defeq));
+  gtrtype = TREE_TYPE (TREE_TYPE (defgt));
+
+#define EQGT_CALL(gsi, stmt, var, op0, op1, gteq) \
+do { \
+  var = create_tmp_var (gteq ## rtype, "tmp"); \
+  stmt = gimple_build_call (def ## gteq, 2, op0, op1); \
+  gimple_call_set_lhs (stmt, var); \
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT); \
+} while (0)
+   
+  switch (code)
+    {
+      case EQ_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, eq);
+        break;
+
+      case NE_EXPR:
+        tmp = create_tmp_var (eqrtype, "tmp");
+
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, eq);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v0, eq);
+
+        /* t = tmp1 ^ {-1, -1,...}  */
+        t = gimplify_build2 (gsi, BIT_XOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+
+      case GT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, gt);
+        break;
+
+      case LT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v1, v0, gt);
+        break;
+
+      case GE_EXPR:
+        if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+      
+      case LE_EXPR:
+         if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v1, v0, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+     
+      default:
+        return NULL_TREE;
+    }
+#undef EQGT_CALL
+
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  return var;
+}
+
+/* Lower a comparison of two vectors V0 and V1, returning a 
+   variable with the result of comparison. Returns NULL_TREE
+   when it is impossible to find a target specific sequence.  */
+static tree 
+ix86_vectorize_builtin_vec_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                                    tree v0, tree v1, enum tree_code code)
+{
+  tree type;
+
+  /* Make sure we are comparing the same types.  */
+  if (TREE_TYPE (v0) != TREE_TYPE (v1)
+      || TREE_TYPE (TREE_TYPE (v0)) != TREE_TYPE (TREE_TYPE (v1)))
+    return NULL_TREE;
+  
+  type = TREE_TYPE (v0);
+  
+  /* Cannot compare packed unsigned integers 
+     unless it is EQ or NEQ operations.  */
+  if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE 
+      && TYPE_UNSIGNED (TREE_TYPE (type)))
+    if (code != EQ_EXPR && code != NE_EXPR)
+      return NULL_TREE;
+
+
+  if (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE)
+    return vector_fp_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+    return vector_int_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else
+    return NULL_TREE;
+}
+
 /* Return a vector mode with twice as many elements as VMODE.  */
 /* ??? Consider moving this to a table generated by genmodes.c.  */
 
@@ -35270,6 +35541,11 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   ix86_autovectorize_vector_sizes
 
+#undef TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+#define TARGET_VECTORIZE_BUILTIN_VEC_COMPARE \
+  ix86_vectorize_builtin_vec_compare
+
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function
 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-12  7:04 Vector Comparison patch Artem Shinkarov
@ 2011-08-15 15:25 ` Richard Guenther
  2011-08-15 17:53   ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-15 15:25 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

On Fri, Aug 12, 2011 at 4:03 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Hi
>
> Here is a completed version of the vector comparison patch we
> discussed a long time ago here:
> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01184.html
>
> The patch implements vector comparison according to the OpenCL
> standard, when the result of the comparison of two vectors is vector
> of signed integers, where -1 represents true and 0 false.
>
> The patch implements vector conditional res = VCOND<V1 ? V2 : V3>
> which is expanded into:
> foreach (i in length (V1)) res[i] = V1 == 0 ? V3[i] : V2[i].

Some comments on the patch below.  First, in general I don't see
why you need a new target hook to specify whether to "vectorize"
a comparison.  Why are the existing hooks used by the vectorizer
not enough?

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c    (revision 177665)
+++ gcc/fold-const.c    (working copy)
@@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr
      floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
     {
-      switch (code)
+      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
        {
-       case EQ_EXPR:
-         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-           return constant_boolean_node (1, type);

I think this change should go in a separate patch for improved
constant folding.  It shouldn't be necessary for enabling vector compares, no?

+      if (TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (ifexp))))
+        {
+          error_at (colon_loc, "vector comparison must be of signed "
+                              "integer vector type");
+          return error_mark_node;
+        }

why that?

+      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
+          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
+             != TYPE_VECTOR_SUBPARTS (type1))
+        {
+          error_at (colon_loc, "vectors of different length found in "
+                               "vector comparison");
+          return error_mark_node;
+        }

I miss verification that type1 and type2 are vector types, or is that done
elsewhere?  I think type1 and type2 are already verified to be
compatible (but you might double-check).  At least the above would be
redundant with

+      if (type1 != type2)
+        {
+          error_at (colon_loc, "vectors of different types involved in "
+                               "vector comparison");
+          return error_mark_node;
+        }

Joseph may have comments about the fully-fold stuff that follows.

+      /* Currently the expansion of VEC_COND_EXPR does not allow
+        expessions where the type of vectors you compare differs
+        form the type of vectors you select from. For the time
+        being we insert implicit conversions.  */
+      if ((COMPARISON_CLASS_P (ifexp)

Why only for comparison-class?

+          && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
+         || TREE_TYPE (ifexp) != type1)
+       {
+         tree comp_type = COMPARISON_CLASS_P (ifexp)
+                          ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
+                          : TREE_TYPE (ifexp);
+         tree vcond;
+
+         op1 = convert (comp_type, op1);
+         op2 = convert (comp_type, op2);
+         vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
+         vcond = convert (type1, vcond);
+         return vcond;
+       }
+      else
+       return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);

In the end we of course will try to fix the middle-end/backends to
allow mixed types here as the current restriction doesn't really make sense.

     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }

as above - compatibility should already be ensured, thus type0 == type1
here?

+/* Try a hardware hook for vector comparison or
+   extract comparison piecewise.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)

comments should mention and describe all function arguments.

+/* Expand vector condition EXP which should have the form
+   VEC_COND_EXPR<cond, vec0, vec1> into the following
+   vector:
+     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
+   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
+static tree
+earlyexpand_vec_cond_expr (gimple_stmt_iterator *gsi, tree exp)

that would be expand_vec_cond_expr_piecewise, no?

+  /* Ensure that we will be able to expand vector comparison
+     in case it is not supported by the architecture.  */
+  gcc_assert (COMPARISON_CLASS_P (cond));

that looks dangerous to me - did you try

 vec = v1 <= v2;
 vec2 = vec ? v1 : v2;

without optimization?

+  /* Check if we need to expand vector condition inside of
+     VEC_COND_EXPR.  */
+  var = create_tmp_reg (TREE_TYPE (cond), "cond");
+  new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
+                                      TREE_OPERAND (cond, 0),
+                                     TREE_OPERAND (cond, 1),
+                                      TREE_CODE (cond));

That unconditionally expands, so no need for "Check".

+  /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs.  */
+  v = VEC_alloc(constructor_elt, gc, nunits);
+  for (i = 0; i < nunits;
+       i += 1, index = int_const_binop (PLUS_EXPR, index, part_width))
+    {
+      tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, index);
+      tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index);
+      tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index);
+      tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond,
+                                    build_int_cst (inner_type ,0));

I seriously doubt that when expanding this part piecewise expanding
the mask first in either way is going to be beneficial.  Instead I would
suggest to "inline" the comparison here.  Thus instead of

 mask =
         = { mask[0] != 0 ? ... }

do

          = { c0[0] < c1[0] ? ..., }

or even expand the ? : using mask operations if we efficiently can
create that mask.


+  /* Check if VEC_COND_EXPR is supported in hardware within the
+     given types.  */
+  if (code == VEC_COND_EXPR)
+    {
+      tree exp = gimple_assign_rhs1 (stmt);
+      tree cond = TREE_OPERAND (exp, 0);
+
+      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
+        change it to A != {0,0,...} ? V0 : V1  */
+      if (!COMPARISON_CLASS_P (cond))
+       TREE_OPERAND (exp, 0) =
+           build2 (NE_EXPR, TREE_TYPE (cond), cond,
+                   build_vector_from_val (TREE_TYPE (cond),
+                     build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));

That looks inefficient as well.  Iff we know that the mask is always
either {-1, -1 ..} or {0, 0 ...} then we can expand the ? : using
bitwise operations (see what the i?86 expander does, for example).

@@ -471,6 +603,7 @@ expand_vector_operations_1 (gimple_stmt_

   gcc_assert (code != CONVERT_EXPR);

+
   /* The signedness is determined from input argument.  */
   if (code == VEC_UNPACK_FLOAT_HI_EXPR
       || code == VEC_UNPACK_FLOAT_LO_EXPR)

spurious whitespace change.

Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c      (revision 177665)
+++ gcc/tree-cfg.c      (working copy)
@@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
       return true;
     }

+  if (TREE_CODE (op0_type) == VECTOR_TYPE
+      && TREE_CODE (op1_type) == VECTOR_TYPE
+      && TREE_CODE (type) == VECTOR_TYPE)
+    {

this should check TREE_CODE (type) == VECTOR_TYPE only
and then verify the comparison operands are actually vectors.

+      if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type))
+        {
+          error ("invalid vector comparison, vector element type mismatch");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }

this needs to use code similar to the scalar variant,

          !useless_type_conversion_p (op0_type, op1_type)
          && !useless_type_conversion_p (op1_type, op0_type)

which also makes the first TYPE_VECTOR_SUBPARTS redundant.

+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type))
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }

I think you can drop the TYPE_PRECISION check.  We might want to
assert that a vector element types precision always matches its
mode precision (in make_vector_type).

Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c      (revision 177665)
+++ gcc/c-parser.c      (working copy)
@@ -5337,8 +5337,17 @@ c_parser_conditional_expression (c_parse
   if (c_parser_next_token_is (parser, CPP_COLON))
     {
       tree eptype = NULL_TREE;
-
+
       middle_loc = c_parser_peek_token (parser)->location;
+
+      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)

watch out for whitespace changes - you add a trailing tab here.

+/* Find target specific sequence for vector comparison of
+   real-type vectors V0 and V1. Returns variable containing
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype,
+                   enum machine_mode mode, tree v0, tree v1,
+                   enum tree_code code)
+{
+  enum ix86_builtins fcode;

is there a reason we need this and cannot simply provide expanders
for the named patterns?  We'd need to give them semantics of
producing all-ones / all-zero masks of course.  Richard, do you think
that's sensible?  That way we'd avoid the new target hook and could
simply do optab queries.

Thanks,
Richard.

> ChangeLog
>
> 2011-08-12 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>
>       gcc/
>       * targhooks.c (default_builtin_vec_compare): New hook.
>       * targhooks.h (default_builtin_vec_compare): New definition.
>       * target.def (builtin_vec_compare): New hook.
>       * target.h: New include (gimple.h).
>       * fold-const.c
>       (fold_comparison): Adjust x <cmp> x vector operations.
>       * c-typeck.c (build_binary_op): Allow vector comparison.
>       (c_obj_common_truthvalue_conversion): Deny vector comparison
>       inside of if statement.
>       (build_conditional_expr): Adjust to build VEC_COND_EXPR.
>       * tree-vect-generic.c (do_compare): Helper function.
>       (expand_vector_comparison): Check if hardware comparison
>       is available, if not expand comparison piecewise.
>       (expand_vector_operation): Handle vector comparison
>       expressions separately.
>       (earlyexpand_vec_cond_expr): Expand vector comparison
>       piecewise.
>       * Makefile.in: New dependencies.
>       * tree-cfg.c (verify_gimple_comparison): Allow vector
>       comparison operations in gimple.
>       * c-parser.c (c_parser_conditional_expression): Adjust
>       to handle VEC_COND_EXPR.
>       * gimplify.c (gimplify_expr): Adjust to handle VEC_COND_EXPR.
>       * config/i386/i386.c (vector_fp_compare): Build hardware
>       specific code for floating point vector comparison.
>       (vector_int_compare): Build hardware specific code for
>       integer vector comparison.
>       (ix86_vectorize_builtin_vec_compare): Implementation of
>       builtin_vec_compare hook.
>
>       gcc/testsuite/
>       * gcc.c-torture/execute/vector-vcond-1.c: New test.
>       * gcc.c-torture/execute/vector-vcond-2.c: New test.
>       * gcc.c-torture/execute/vector-compare-1.c: New test.
>       * gcc.c-torture/execute/vector-compare-2.c: New test.
>       * gcc.dg/vector-compare-1.c: New test.
>       * gcc.dg/vector-compare-2.c: New test.
>
>       gcc/doc
>       * extend.texi: Adjust.
>       * tm.texi: Adjust.
>       * tm.texi.in: Adjust.
>
>
> bootstrapped and tested on x86_64_unknown-linux.
>
>
> Thanks,
> Artem Shinkarov.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-15 15:25 ` Richard Guenther
@ 2011-08-15 17:53   ` Artem Shinkarov
  2011-08-16 16:39     ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-15 17:53 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

On Mon, Aug 15, 2011 at 3:24 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Fri, Aug 12, 2011 at 4:03 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> Hi
>>
>> Here is a completed version of the vector comparison patch we
>> discussed a long time ago here:
>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01184.html
>>
>> The patch implements vector comparison according to the OpenCL
>> standard, when the result of the comparison of two vectors is vector
>> of signed integers, where -1 represents true and 0 false.
>>
>> The patch implements vector conditional res = VCOND<V1 ? V2 : V3>
>> which is expanded into:
>> foreach (i in length (V1)) res[i] = V1 == 0 ? V3[i] : V2[i].
>
> Some comments on the patch below.  First, in general I don't see
> why you need a new target hook to specify whether to "vectorize"
> a comparison.  Why are the existing hooks used by the vectorizer
> not enough?
>
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    (revision 177665)
> +++ gcc/fold-const.c    (working copy)
> @@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr
>      floating-point, we can only do some of these simplifications.)  */
>   if (operand_equal_p (arg0, arg1, 0))
>     {
> -      switch (code)
> +      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
>        {
> -       case EQ_EXPR:
> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
> -           return constant_boolean_node (1, type);
>
> I think this change should go in a separate patch for improved
> constant folding.  It shouldn't be necessary for enabling vector compares, no?

Unfortunately no, this case must be covered here, otherwise x != x
condition fails.

>
> +      if (TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (ifexp))))
> +        {
> +          error_at (colon_loc, "vector comparison must be of signed "
> +                              "integer vector type");
> +          return error_mark_node;
> +        }
>
> why that?

Well, later on I rely on this fact. I mean OpenCL says that it should
return -1 in the sense that all bits set. I don't really know, I can
support unsigned masks as well, but wouldn't it just introduce a
source for possible errors. I mean that natural choice for true and
flase is 0 and 1, not 0 and -1. Anyway I don't have a strong opinion
there, and I could easily adjust it if we decide that we want it.

>
> +      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
> +          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
> +             != TYPE_VECTOR_SUBPARTS (type1))
> +        {
> +          error_at (colon_loc, "vectors of different length found in "
> +                               "vector comparison");
> +          return error_mark_node;
> +        }
>
> I miss verification that type1 and type2 are vector types, or is that done
> elsewhere?  I think type1 and type2 are already verified to be
> compatible (but you might double-check).  At least the above would be
> redundant with

Thanks, type1 and type2 both vectors comparison is missing, going to
be added in the new version of the patch.
>
> +      if (type1 != type2)
> +        {
> +          error_at (colon_loc, "vectors of different types involved in "
> +                               "vector comparison");
> +          return error_mark_node;
> +        }

You are right, what I meant here is TREE_TYPE (type1) != TREE_TYPE
(type2), because vector (4, int) have the same number of elements as
vector (4, float). This would be fixed in the new version.

>
> Joseph may have comments about the fully-fold stuff that follows.
>
> +      /* Currently the expansion of VEC_COND_EXPR does not allow
> +        expessions where the type of vectors you compare differs
> +        form the type of vectors you select from. For the time
> +        being we insert implicit conversions.  */
> +      if ((COMPARISON_CLASS_P (ifexp)
>
> Why only for comparison-class?
Not only, there is || involved:
(COMPARISON_CLASS_P (ifexp)  && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
|| TREE_TYPE (ifexp) != type1

So if this is a comparison class, we check the first operand, because
the result of the comparison fits, however the operands could not. In
case we have an expression of signed vector, we know that we would
transform it into exp != {0,0,...} in tree-vect-generic.c, but if the
types of operands do not match we convert them.

>
> +          && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
> +         || TREE_TYPE (ifexp) != type1)
> +       {
> +         tree comp_type = COMPARISON_CLASS_P (ifexp)
> +                          ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
> +                          : TREE_TYPE (ifexp);
> +         tree vcond;
> +
> +         op1 = convert (comp_type, op1);
> +         op2 = convert (comp_type, op2);
> +         vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
> +         vcond = convert (type1, vcond);
> +         return vcond;
> +       }
> +      else
> +       return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);
>
> In the end we of course will try to fix the middle-end/backends to
> allow mixed types here as the current restriction doesn't really make sense.

Yes, that would be nice, but these conversions do not really affect
the code generation, so for the time being I think it is fine to have
them.

>
>     case EQ_EXPR:
>     case NE_EXPR:
> +      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
> +        {
> +          tree intt;
> +          if (TREE_TYPE (type0) != TREE_TYPE (type1))
> +            {
> +              error_at (location, "comparing vectors with different "
> +                                  "element types");
> +              return error_mark_node;
> +            }
> +
> +          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
> +            {
> +              error_at (location, "comparing vectors with different "
> +                                  "number of elements");
> +              return error_mark_node;
> +            }
>
> as above - compatibility should already be ensured, thus type0 == type1
> here?

Yeah, we know that they are both vector types, but that is about all
we know. Anyhow, all these errors are reachable. As an example see
vector-compare-1.c:
r4 = x > y;  /* { dg-error "comparing vectors with different element types" } */
r8 == r4; /* { dg-error "comparing vectors with different number of
elements"} */

>
> +/* Try a hardware hook for vector comparison or
> +   extract comparison piecewise.  */
> +static tree
> +expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
> +                          tree op1, enum tree_code code)
>
> comments should mention and describe all function arguments.

Ok, coming in the new version of the patch.

> +/* Expand vector condition EXP which should have the form
> +   VEC_COND_EXPR<cond, vec0, vec1> into the following
> +   vector:
> +     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
> +   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
> +static tree
> +earlyexpand_vec_cond_expr (gimple_stmt_iterator *gsi, tree exp)
>
> that would be expand_vec_cond_expr_piecewise, no?

Adjusted.

>
> +  /* Ensure that we will be able to expand vector comparison
> +     in case it is not supported by the architecture.  */
> +  gcc_assert (COMPARISON_CLASS_P (cond));
>
> that looks dangerous to me - did you try
>
>  vec = v1 <= v2;
>  vec2 = vec ? v1 : v2;
>
> without optimization?

Sure, tests should cover this case.
I have this assertion there because only two cases are possible:
1) it is a comparison
2) function callee converted expr to expr != {0,0,...}
So we should be perfectly fine.

>
> +  /* Check if we need to expand vector condition inside of
> +     VEC_COND_EXPR.  */
> +  var = create_tmp_reg (TREE_TYPE (cond), "cond");
> +  new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
> +                                      TREE_OPERAND (cond, 0),
> +                                     TREE_OPERAND (cond, 1),
> +                                      TREE_CODE (cond));
>
> That unconditionally expands, so no need for "Check".

Ok.

>
> +  /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs.  */
> +  v = VEC_alloc(constructor_elt, gc, nunits);
> +  for (i = 0; i < nunits;
> +       i += 1, index = int_const_binop (PLUS_EXPR, index, part_width))
> +    {
> +      tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, index);
> +      tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index);
> +      tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index);
> +      tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond,
> +                                    build_int_cst (inner_type ,0));
>
> I seriously doubt that when expanding this part piecewise expanding
> the mask first in either way is going to be beneficial.  Instead I would
> suggest to "inline" the comparison here.  Thus instead of

Well, the ting is that, if expand_vector_comparison, would insert
builtin there rather than expanding the code piecewise, I'll have to
do the comparison with 0 anyway, because true is expressed as -1
there.

Well, I would hope that in case we have:
c_0 = a_0 > b_0;
d_0 = c_0 != 0;

{d_0, d_1,...}

all the d_n should be constant-folded, or should I pull fold explicitly here?

1) I construct the mask
>
>  mask =
>         = { mask[0] != 0 ? ... }
>
> do
>
>          = { c0[0] < c1[0] ? ..., }
>
> or even expand the ? : using mask operations if we efficiently can
> create that mask.
>

I assume that if we cannot expand VEC_COND_EXPR, then masking the
elements is a problem for us. Otherwise VEC_COND_EXPE expansion has a
bug somewhere. Or I am wrong somewhere?

>
> +  /* Check if VEC_COND_EXPR is supported in hardware within the
> +     given types.  */
> +  if (code == VEC_COND_EXPR)
> +    {
> +      tree exp = gimple_assign_rhs1 (stmt);
> +      tree cond = TREE_OPERAND (exp, 0);
> +
> +      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
> +        change it to A != {0,0,...} ? V0 : V1  */
> +      if (!COMPARISON_CLASS_P (cond))
> +       TREE_OPERAND (exp, 0) =
> +           build2 (NE_EXPR, TREE_TYPE (cond), cond,
> +                   build_vector_from_val (TREE_TYPE (cond),
> +                     build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
>
> That looks inefficient as well.  Iff we know that the mask is always
> either {-1, -1 ..} or {0, 0 ...} then we can expand the ? : using
> bitwise operations (see what the i?86 expander does, for example).

This is a requirement of VEC_COND_EXPR, I need to pass 4 parameters,
not 3, that is why I introduce this fake {0,0,..} here.

>
> @@ -471,6 +603,7 @@ expand_vector_operations_1 (gimple_stmt_
>
>   gcc_assert (code != CONVERT_EXPR);
>
> +
>   /* The signedness is determined from input argument.  */
>   if (code == VEC_UNPACK_FLOAT_HI_EXPR
>       || code == VEC_UNPACK_FLOAT_LO_EXPR)
>
> spurious whitespace change.

Fixed.
>
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      (revision 177665)
> +++ gcc/tree-cfg.c      (working copy)
> @@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
>       return true;
>     }
>
> +  if (TREE_CODE (op0_type) == VECTOR_TYPE
> +      && TREE_CODE (op1_type) == VECTOR_TYPE
> +      && TREE_CODE (type) == VECTOR_TYPE)
> +    {
>
> this should check TREE_CODE (type) == VECTOR_TYPE only
> and then verify the comparison operands are actually vectors.

Yes, you are right, adjusted.

>
> +      if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type))
> +        {
> +          error ("invalid vector comparison, vector element type mismatch");
> +          debug_generic_expr (op0_type);
> +          debug_generic_expr (op1_type);
> +          return true;
> +        }
>
> this needs to use code similar to the scalar variant,
>
>          !useless_type_conversion_p (op0_type, op1_type)
>          && !useless_type_conversion_p (op1_type, op0_type)
>
> which also makes the first TYPE_VECTOR_SUBPARTS redundant.
>

Fixed.

> +      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
> +          && TYPE_PRECISION (TREE_TYPE (op0_type))
> +             != TYPE_PRECISION (TREE_TYPE (type)))
> +        {
> +          error ("invalid vector comparison resulting type");
> +          debug_generic_expr (type);
> +          return true;
> +        }
>
> I think you can drop the TYPE_PRECISION check.  We might want to
> assert that a vector element types precision always matches its
> mode precision (in make_vector_type).

I would leave it for a while. During the optimisation you can
construct some strange things, so I would better make verifier
resistant to the all kind of stuff.

>
> Index: gcc/c-parser.c
> ===================================================================
> --- gcc/c-parser.c      (revision 177665)
> +++ gcc/c-parser.c      (working copy)
> @@ -5337,8 +5337,17 @@ c_parser_conditional_expression (c_parse
>   if (c_parser_next_token_is (parser, CPP_COLON))
>     {
>       tree eptype = NULL_TREE;
> -
> +
>       middle_loc = c_parser_peek_token (parser)->location;
> +
> +      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
>
> watch out for whitespace changes - you add a trailing tab here.

Fixed.

>
> +/* Find target specific sequence for vector comparison of
> +   real-type vectors V0 and V1. Returns variable containing
> +   result of the comparison or NULL_TREE in other case.  */
> +static tree
> +vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype,
> +                   enum machine_mode mode, tree v0, tree v1,
> +                   enum tree_code code)
> +{
> +  enum ix86_builtins fcode;
>
> is there a reason we need this and cannot simply provide expanders
> for the named patterns?  We'd need to give them semantics of
> producing all-ones / all-zero masks of course.  Richard, do you think
> that's sensible?  That way we'd avoid the new target hook and could
> simply do optab queries.

I think I don't really understand the idea. How we are going to
represent the fact that we need to convert a given node to the given
machine instruction? May be you could point where the similar
technique is already used.


The new patch will be tested and submitted here soon.


Thanks,
Artem.

>
> Thanks,
> Richard.
>
>> ChangeLog
>>
>> 2011-08-12 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>
>>       gcc/
>>       * targhooks.c (default_builtin_vec_compare): New hook.
>>       * targhooks.h (default_builtin_vec_compare): New definition.
>>       * target.def (builtin_vec_compare): New hook.
>>       * target.h: New include (gimple.h).
>>       * fold-const.c
>>       (fold_comparison): Adjust x <cmp> x vector operations.
>>       * c-typeck.c (build_binary_op): Allow vector comparison.
>>       (c_obj_common_truthvalue_conversion): Deny vector comparison
>>       inside of if statement.
>>       (build_conditional_expr): Adjust to build VEC_COND_EXPR.
>>       * tree-vect-generic.c (do_compare): Helper function.
>>       (expand_vector_comparison): Check if hardware comparison
>>       is available, if not expand comparison piecewise.
>>       (expand_vector_operation): Handle vector comparison
>>       expressions separately.
>>       (earlyexpand_vec_cond_expr): Expand vector comparison
>>       piecewise.
>>       * Makefile.in: New dependencies.
>>       * tree-cfg.c (verify_gimple_comparison): Allow vector
>>       comparison operations in gimple.
>>       * c-parser.c (c_parser_conditional_expression): Adjust
>>       to handle VEC_COND_EXPR.
>>       * gimplify.c (gimplify_expr): Adjust to handle VEC_COND_EXPR.
>>       * config/i386/i386.c (vector_fp_compare): Build hardware
>>       specific code for floating point vector comparison.
>>       (vector_int_compare): Build hardware specific code for
>>       integer vector comparison.
>>       (ix86_vectorize_builtin_vec_compare): Implementation of
>>       builtin_vec_compare hook.
>>
>>       gcc/testsuite/
>>       * gcc.c-torture/execute/vector-vcond-1.c: New test.
>>       * gcc.c-torture/execute/vector-vcond-2.c: New test.
>>       * gcc.c-torture/execute/vector-compare-1.c: New test.
>>       * gcc.c-torture/execute/vector-compare-2.c: New test.
>>       * gcc.dg/vector-compare-1.c: New test.
>>       * gcc.dg/vector-compare-2.c: New test.
>>
>>       gcc/doc
>>       * extend.texi: Adjust.
>>       * tm.texi: Adjust.
>>       * tm.texi.in: Adjust.
>>
>>
>> bootstrapped and tested on x86_64_unknown-linux.
>>
>>
>> Thanks,
>> Artem Shinkarov.
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-15 17:53   ` Artem Shinkarov
@ 2011-08-16 16:39     ` Richard Guenther
  2011-08-16 17:01       ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-16 16:39 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

On Mon, Aug 15, 2011 at 6:58 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Mon, Aug 15, 2011 at 3:24 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Fri, Aug 12, 2011 at 4:03 AM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> Hi
>>>
>>> Here is a completed version of the vector comparison patch we
>>> discussed a long time ago here:
>>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01184.html
>>>
>>> The patch implements vector comparison according to the OpenCL
>>> standard, when the result of the comparison of two vectors is vector
>>> of signed integers, where -1 represents true and 0 false.
>>>
>>> The patch implements vector conditional res = VCOND<V1 ? V2 : V3>
>>> which is expanded into:
>>> foreach (i in length (V1)) res[i] = V1 == 0 ? V3[i] : V2[i].
>>
>> Some comments on the patch below.  First, in general I don't see
>> why you need a new target hook to specify whether to "vectorize"
>> a comparison.  Why are the existing hooks used by the vectorizer
>> not enough?
>>
>> Index: gcc/fold-const.c
>> ===================================================================
>> --- gcc/fold-const.c    (revision 177665)
>> +++ gcc/fold-const.c    (working copy)
>> @@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr
>>      floating-point, we can only do some of these simplifications.)  */
>>   if (operand_equal_p (arg0, arg1, 0))
>>     {
>> -      switch (code)
>> +      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
>>        {
>> -       case EQ_EXPR:
>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>> -           return constant_boolean_node (1, type);
>>
>> I think this change should go in a separate patch for improved
>> constant folding.  It shouldn't be necessary for enabling vector compares, no?
>
> Unfortunately no, this case must be covered here, otherwise x != x
> condition fails.

How does it fail?

>>
>> +      if (TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (ifexp))))
>> +        {
>> +          error_at (colon_loc, "vector comparison must be of signed "
>> +                              "integer vector type");
>> +          return error_mark_node;
>> +        }
>>
>> why that?
>
> Well, later on I rely on this fact. I mean OpenCL says that it should
> return -1 in the sense that all bits set. I don't really know, I can
> support unsigned masks as well, but wouldn't it just introduce a
> source for possible errors. I mean that natural choice for true and
> flase is 0 and 1, not 0 and -1. Anyway I don't have a strong opinion
> there, and I could easily adjust it if we decide that we want it.

I think we want to allow both signed and unsigned masks.

>>
>> +      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
>> +          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
>> +             != TYPE_VECTOR_SUBPARTS (type1))
>> +        {
>> +          error_at (colon_loc, "vectors of different length found in "
>> +                               "vector comparison");
>> +          return error_mark_node;
>> +        }
>>
>> I miss verification that type1 and type2 are vector types, or is that done
>> elsewhere?  I think type1 and type2 are already verified to be
>> compatible (but you might double-check).  At least the above would be
>> redundant with
>
> Thanks, type1 and type2 both vectors comparison is missing, going to
> be added in the new version of the patch.
>>
>> +      if (type1 != type2)
>> +        {
>> +          error_at (colon_loc, "vectors of different types involved in "
>> +                               "vector comparison");
>> +          return error_mark_node;
>> +        }
>
> You are right, what I meant here is TREE_TYPE (type1) != TREE_TYPE
> (type2), because vector (4, int) have the same number of elements as
> vector (4, float). This would be fixed in the new version.
>
>>
>> Joseph may have comments about the fully-fold stuff that follows.
>>
>> +      /* Currently the expansion of VEC_COND_EXPR does not allow
>> +        expessions where the type of vectors you compare differs
>> +        form the type of vectors you select from. For the time
>> +        being we insert implicit conversions.  */
>> +      if ((COMPARISON_CLASS_P (ifexp)
>>
>> Why only for comparison-class?
> Not only, there is || involved:
> (COMPARISON_CLASS_P (ifexp)  && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
> || TREE_TYPE (ifexp) != type1
>
> So if this is a comparison class, we check the first operand, because
> the result of the comparison fits, however the operands could not. In
> case we have an expression of signed vector, we know that we would
> transform it into exp != {0,0,...} in tree-vect-generic.c, but if the
> types of operands do not match we convert them.

Hm, ok ... let's hope we can sort-out the backend issues before this
patch goes in so we can remove this converting stuff.

>>
>> +          && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
>> +         || TREE_TYPE (ifexp) != type1)
>> +       {
>> +         tree comp_type = COMPARISON_CLASS_P (ifexp)
>> +                          ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>> +                          : TREE_TYPE (ifexp);
>> +         tree vcond;
>> +
>> +         op1 = convert (comp_type, op1);
>> +         op2 = convert (comp_type, op2);
>> +         vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>> +         vcond = convert (type1, vcond);
>> +         return vcond;
>> +       }
>> +      else
>> +       return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);
>>
>> In the end we of course will try to fix the middle-end/backends to
>> allow mixed types here as the current restriction doesn't really make sense.
>
> Yes, that would be nice, but these conversions do not really affect
> the code generation, so for the time being I think it is fine to have
> them.
>
>>
>>     case EQ_EXPR:
>>     case NE_EXPR:
>> +      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
>> +        {
>> +          tree intt;
>> +          if (TREE_TYPE (type0) != TREE_TYPE (type1))
>> +            {
>> +              error_at (location, "comparing vectors with different "
>> +                                  "element types");
>> +              return error_mark_node;
>> +            }
>> +
>> +          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
>> +            {
>> +              error_at (location, "comparing vectors with different "
>> +                                  "number of elements");
>> +              return error_mark_node;
>> +            }
>>
>> as above - compatibility should already be ensured, thus type0 == type1
>> here?
>
> Yeah, we know that they are both vector types, but that is about all
> we know. Anyhow, all these errors are reachable. As an example see
> vector-compare-1.c:
> r4 = x > y;  /* { dg-error "comparing vectors with different element types" } */
> r8 == r4; /* { dg-error "comparing vectors with different number of
> elements"} */

Ok, I see.

>>
>> +/* Try a hardware hook for vector comparison or
>> +   extract comparison piecewise.  */
>> +static tree
>> +expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
>> +                          tree op1, enum tree_code code)
>>
>> comments should mention and describe all function arguments.
>
> Ok, coming in the new version of the patch.
>
>> +/* Expand vector condition EXP which should have the form
>> +   VEC_COND_EXPR<cond, vec0, vec1> into the following
>> +   vector:
>> +     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
>> +   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
>> +static tree
>> +earlyexpand_vec_cond_expr (gimple_stmt_iterator *gsi, tree exp)
>>
>> that would be expand_vec_cond_expr_piecewise, no?
>
> Adjusted.
>
>>
>> +  /* Ensure that we will be able to expand vector comparison
>> +     in case it is not supported by the architecture.  */
>> +  gcc_assert (COMPARISON_CLASS_P (cond));
>>
>> that looks dangerous to me - did you try
>>
>>  vec = v1 <= v2;
>>  vec2 = vec ? v1 : v2;
>>
>> without optimization?
>
> Sure, tests should cover this case.
> I have this assertion there because only two cases are possible:
> 1) it is a comparison
> 2) function callee converted expr to expr != {0,0,...}
> So we should be perfectly fine.
>
>>
>> +  /* Check if we need to expand vector condition inside of
>> +     VEC_COND_EXPR.  */
>> +  var = create_tmp_reg (TREE_TYPE (cond), "cond");
>> +  new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
>> +                                      TREE_OPERAND (cond, 0),
>> +                                     TREE_OPERAND (cond, 1),
>> +                                      TREE_CODE (cond));
>>
>> That unconditionally expands, so no need for "Check".
>
> Ok.
>
>>
>> +  /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs.  */
>> +  v = VEC_alloc(constructor_elt, gc, nunits);
>> +  for (i = 0; i < nunits;
>> +       i += 1, index = int_const_binop (PLUS_EXPR, index, part_width))
>> +    {
>> +      tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, index);
>> +      tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index);
>> +      tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index);
>> +      tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond,
>> +                                    build_int_cst (inner_type ,0));
>>
>> I seriously doubt that when expanding this part piecewise expanding
>> the mask first in either way is going to be beneficial.  Instead I would
>> suggest to "inline" the comparison here.  Thus instead of
>
> Well, the ting is that, if expand_vector_comparison, would insert
> builtin there rather than expanding the code piecewise, I'll have to
> do the comparison with 0 anyway, because true is expressed as -1
> there.
>
> Well, I would hope that in case we have:
> c_0 = a_0 > b_0;
> d_0 = c_0 != 0;
>
> {d_0, d_1,...}
>
> all the d_n should be constant-folded, or should I pull fold explicitly here?
>
> 1) I construct the mask
>>
>>  mask =
>>         = { mask[0] != 0 ? ... }
>>
>> do
>>
>>          = { c0[0] < c1[0] ? ..., }
>>
>> or even expand the ? : using mask operations if we efficiently can
>> create that mask.
>>
>
> I assume that if we cannot expand VEC_COND_EXPR, then masking the
> elements is a problem for us. Otherwise VEC_COND_EXPE expansion has a
> bug somewhere. Or I am wrong somewhere?

I think we can always do bitwise operations, so if we can get at the
mask vector we are fine.

I was thinking about how the case of explicitly computing the value of
v1 < v2 into a vector vs. a condition inside a VEC_COND_EXPR should
be handled.  If the target produces a mask of condition codes for
a comparison then it might be able to efficiently expand a VEC_COND_EXPR.
It could as well generate a mask via expanding v1 < v2 ? -1 : 0 then.
A similar case is for AMD XOP which can expand mask ? v1 : v2
with a single instruction (so even without seeing a comparison).

Basically, if we can get at the mask we should use that to do the
vector selection in parallel via (v1 & mask) | (v2 & ~mask).

If we cannot even get at the mask then we can build the result
vector piecewise as { v1[0] < v2[0] ? v1[0] : v2[0], .... } etc.

>>
>> +  /* Check if VEC_COND_EXPR is supported in hardware within the
>> +     given types.  */
>> +  if (code == VEC_COND_EXPR)
>> +    {
>> +      tree exp = gimple_assign_rhs1 (stmt);
>> +      tree cond = TREE_OPERAND (exp, 0);
>> +
>> +      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
>> +        change it to A != {0,0,...} ? V0 : V1  */
>> +      if (!COMPARISON_CLASS_P (cond))
>> +       TREE_OPERAND (exp, 0) =
>> +           build2 (NE_EXPR, TREE_TYPE (cond), cond,
>> +                   build_vector_from_val (TREE_TYPE (cond),
>> +                     build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
>>
>> That looks inefficient as well.  Iff we know that the mask is always
>> either {-1, -1 ..} or {0, 0 ...} then we can expand the ? : using
>> bitwise operations (see what the i?86 expander does, for example).
>
> This is a requirement of VEC_COND_EXPR, I need to pass 4 parameters,
> not 3, that is why I introduce this fake {0,0,..} here.

Sure, but if you look at expand_vec_cond_expr_p then you don't need
that, and this fake comparison should instead be produced by the
expander (or really avoided by maybe splitting up the named pattern
into two).

It's for sure not necessary for earlyexpand_vec_cond_expr (but instead
makes it less efficient - with just the mask it can do the bitwise
fallback easily).

>>
>> @@ -471,6 +603,7 @@ expand_vector_operations_1 (gimple_stmt_
>>
>>   gcc_assert (code != CONVERT_EXPR);
>>
>> +
>>   /* The signedness is determined from input argument.  */
>>   if (code == VEC_UNPACK_FLOAT_HI_EXPR
>>       || code == VEC_UNPACK_FLOAT_LO_EXPR)
>>
>> spurious whitespace change.
>
> Fixed.
>>
>> Index: gcc/tree-cfg.c
>> ===================================================================
>> --- gcc/tree-cfg.c      (revision 177665)
>> +++ gcc/tree-cfg.c      (working copy)
>> @@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
>>       return true;
>>     }
>>
>> +  if (TREE_CODE (op0_type) == VECTOR_TYPE
>> +      && TREE_CODE (op1_type) == VECTOR_TYPE
>> +      && TREE_CODE (type) == VECTOR_TYPE)
>> +    {
>>
>> this should check TREE_CODE (type) == VECTOR_TYPE only
>> and then verify the comparison operands are actually vectors.
>
> Yes, you are right, adjusted.
>
>>
>> +      if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type))
>> +        {
>> +          error ("invalid vector comparison, vector element type mismatch");
>> +          debug_generic_expr (op0_type);
>> +          debug_generic_expr (op1_type);
>> +          return true;
>> +        }
>>
>> this needs to use code similar to the scalar variant,
>>
>>          !useless_type_conversion_p (op0_type, op1_type)
>>          && !useless_type_conversion_p (op1_type, op0_type)
>>
>> which also makes the first TYPE_VECTOR_SUBPARTS redundant.
>>
>
> Fixed.
>
>> +      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
>> +          && TYPE_PRECISION (TREE_TYPE (op0_type))
>> +             != TYPE_PRECISION (TREE_TYPE (type)))
>> +        {
>> +          error ("invalid vector comparison resulting type");
>> +          debug_generic_expr (type);
>> +          return true;
>> +        }
>>
>> I think you can drop the TYPE_PRECISION check.  We might want to
>> assert that a vector element types precision always matches its
>> mode precision (in make_vector_type).
>
> I would leave it for a while. During the optimisation you can
> construct some strange things, so I would better make verifier
> resistant to the all kind of stuff.

Ok.

>>
>> Index: gcc/c-parser.c
>> ===================================================================
>> --- gcc/c-parser.c      (revision 177665)
>> +++ gcc/c-parser.c      (working copy)
>> @@ -5337,8 +5337,17 @@ c_parser_conditional_expression (c_parse
>>   if (c_parser_next_token_is (parser, CPP_COLON))
>>     {
>>       tree eptype = NULL_TREE;
>> -
>> +
>>       middle_loc = c_parser_peek_token (parser)->location;
>> +
>> +      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
>>
>> watch out for whitespace changes - you add a trailing tab here.
>
> Fixed.
>
>>
>> +/* Find target specific sequence for vector comparison of
>> +   real-type vectors V0 and V1. Returns variable containing
>> +   result of the comparison or NULL_TREE in other case.  */
>> +static tree
>> +vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype,
>> +                   enum machine_mode mode, tree v0, tree v1,
>> +                   enum tree_code code)
>> +{
>> +  enum ix86_builtins fcode;
>>
>> is there a reason we need this and cannot simply provide expanders
>> for the named patterns?  We'd need to give them semantics of
>> producing all-ones / all-zero masks of course.  Richard, do you think
>> that's sensible?  That way we'd avoid the new target hook and could
>> simply do optab queries.
>
> I think I don't really understand the idea. How we are going to
> represent the fact that we need to convert a given node to the given
> machine instruction? May be you could point where the similar
> technique is already used.

In all places we check optab_handler (op, mode) != CODE_FOR_nothing.
We have eq_optab for example, so optab_handler (eq_optab, V4SImode)
would get you the instruction sequence for a comparison of V4SImode
vectors.  That isn't yet properly defined what it should return.

Otherwise I'd say we should ask the target to expand
v1 < v2 as VEC_COND_EXPR (v1 < v2, -1, 0) instead.  That one could
as well special-case the -1 and 0 result vectors (and maybe it already
does).

Richard.
>
> The new patch will be tested and submitted here soon.
>
>
> Thanks,
> Artem.
>
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog
>>>
>>> 2011-08-12 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>>
>>>       gcc/
>>>       * targhooks.c (default_builtin_vec_compare): New hook.
>>>       * targhooks.h (default_builtin_vec_compare): New definition.
>>>       * target.def (builtin_vec_compare): New hook.
>>>       * target.h: New include (gimple.h).
>>>       * fold-const.c
>>>       (fold_comparison): Adjust x <cmp> x vector operations.
>>>       * c-typeck.c (build_binary_op): Allow vector comparison.
>>>       (c_obj_common_truthvalue_conversion): Deny vector comparison
>>>       inside of if statement.
>>>       (build_conditional_expr): Adjust to build VEC_COND_EXPR.
>>>       * tree-vect-generic.c (do_compare): Helper function.
>>>       (expand_vector_comparison): Check if hardware comparison
>>>       is available, if not expand comparison piecewise.
>>>       (expand_vector_operation): Handle vector comparison
>>>       expressions separately.
>>>       (earlyexpand_vec_cond_expr): Expand vector comparison
>>>       piecewise.
>>>       * Makefile.in: New dependencies.
>>>       * tree-cfg.c (verify_gimple_comparison): Allow vector
>>>       comparison operations in gimple.
>>>       * c-parser.c (c_parser_conditional_expression): Adjust
>>>       to handle VEC_COND_EXPR.
>>>       * gimplify.c (gimplify_expr): Adjust to handle VEC_COND_EXPR.
>>>       * config/i386/i386.c (vector_fp_compare): Build hardware
>>>       specific code for floating point vector comparison.
>>>       (vector_int_compare): Build hardware specific code for
>>>       integer vector comparison.
>>>       (ix86_vectorize_builtin_vec_compare): Implementation of
>>>       builtin_vec_compare hook.
>>>
>>>       gcc/testsuite/
>>>       * gcc.c-torture/execute/vector-vcond-1.c: New test.
>>>       * gcc.c-torture/execute/vector-vcond-2.c: New test.
>>>       * gcc.c-torture/execute/vector-compare-1.c: New test.
>>>       * gcc.c-torture/execute/vector-compare-2.c: New test.
>>>       * gcc.dg/vector-compare-1.c: New test.
>>>       * gcc.dg/vector-compare-2.c: New test.
>>>
>>>       gcc/doc
>>>       * extend.texi: Adjust.
>>>       * tm.texi: Adjust.
>>>       * tm.texi.in: Adjust.
>>>
>>>
>>> bootstrapped and tested on x86_64_unknown-linux.
>>>
>>>
>>> Thanks,
>>> Artem Shinkarov.
>>>
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-16 16:39     ` Richard Guenther
@ 2011-08-16 17:01       ` Artem Shinkarov
  2011-08-16 21:48         ` Artem Shinkarov
  2011-08-17 12:49         ` Richard Guenther
  0 siblings, 2 replies; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-16 17:01 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

On Tue, Aug 16, 2011 at 4:28 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 15, 2011 at 6:58 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Mon, Aug 15, 2011 at 3:24 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, Aug 12, 2011 at 4:03 AM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> Hi
>>>>
>>>> Here is a completed version of the vector comparison patch we
>>>> discussed a long time ago here:
>>>> http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01184.html
>>>>
>>>> The patch implements vector comparison according to the OpenCL
>>>> standard, when the result of the comparison of two vectors is vector
>>>> of signed integers, where -1 represents true and 0 false.
>>>>
>>>> The patch implements vector conditional res = VCOND<V1 ? V2 : V3>
>>>> which is expanded into:
>>>> foreach (i in length (V1)) res[i] = V1 == 0 ? V3[i] : V2[i].
>>>
>>> Some comments on the patch below.  First, in general I don't see
>>> why you need a new target hook to specify whether to "vectorize"
>>> a comparison.  Why are the existing hooks used by the vectorizer
>>> not enough?
>>>
>>> Index: gcc/fold-const.c
>>> ===================================================================
>>> --- gcc/fold-const.c    (revision 177665)
>>> +++ gcc/fold-const.c    (working copy)
>>> @@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr
>>>      floating-point, we can only do some of these simplifications.)  */
>>>   if (operand_equal_p (arg0, arg1, 0))
>>>     {
>>> -      switch (code)
>>> +      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
>>>        {
>>> -       case EQ_EXPR:
>>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>> -           return constant_boolean_node (1, type);
>>>
>>> I think this change should go in a separate patch for improved
>>> constant folding.  It shouldn't be necessary for enabling vector compares, no?
>>
>> Unfortunately no, this case must be covered here, otherwise x != x
>> condition fails.
>
> How does it fail?

When I have x > x, x == x, and so on, fold-const.c trigger
operand_equal_p (arg0, arg1, 0), which returns true, and then it calls
 constant_boolean_node (<val>, type). But the problem is that the
result of the comparison is a vector,  not a boolean. So we have an
assertion failure:
test.c: In function ‘foo’:
test.c:9:3: internal compiler error: in build_int_cst_wide, at tree.c:1222
Please submit a full bug report,
with preprocessed source if appropriate.

>
>>>
>>> +      if (TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (ifexp))))
>>> +        {
>>> +          error_at (colon_loc, "vector comparison must be of signed "
>>> +                              "integer vector type");
>>> +          return error_mark_node;
>>> +        }
>>>
>>> why that?
>>
>> Well, later on I rely on this fact. I mean OpenCL says that it should
>> return -1 in the sense that all bits set. I don't really know, I can
>> support unsigned masks as well, but wouldn't it just introduce a
>> source for possible errors. I mean that natural choice for true and
>> flase is 0 and 1, not 0 and -1. Anyway I don't have a strong opinion
>> there, and I could easily adjust it if we decide that we want it.
>
> I think we want to allow both signed and unsigned masks.

Ok, I'll adjust.

>
>>>
>>> +      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
>>> +          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
>>> +             != TYPE_VECTOR_SUBPARTS (type1))
>>> +        {
>>> +          error_at (colon_loc, "vectors of different length found in "
>>> +                               "vector comparison");
>>> +          return error_mark_node;
>>> +        }
>>>
>>> I miss verification that type1 and type2 are vector types, or is that done
>>> elsewhere?  I think type1 and type2 are already verified to be
>>> compatible (but you might double-check).  At least the above would be
>>> redundant with
>>
>> Thanks, type1 and type2 both vectors comparison is missing, going to
>> be added in the new version of the patch.
>>>
>>> +      if (type1 != type2)
>>> +        {
>>> +          error_at (colon_loc, "vectors of different types involved in "
>>> +                               "vector comparison");
>>> +          return error_mark_node;
>>> +        }
>>
>> You are right, what I meant here is TREE_TYPE (type1) != TREE_TYPE
>> (type2), because vector (4, int) have the same number of elements as
>> vector (4, float). This would be fixed in the new version.
>>
>>>
>>> Joseph may have comments about the fully-fold stuff that follows.
>>>
>>> +      /* Currently the expansion of VEC_COND_EXPR does not allow
>>> +        expessions where the type of vectors you compare differs
>>> +        form the type of vectors you select from. For the time
>>> +        being we insert implicit conversions.  */
>>> +      if ((COMPARISON_CLASS_P (ifexp)
>>>
>>> Why only for comparison-class?
>> Not only, there is || involved:
>> (COMPARISON_CLASS_P (ifexp)  && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
>> || TREE_TYPE (ifexp) != type1
>>
>> So if this is a comparison class, we check the first operand, because
>> the result of the comparison fits, however the operands could not. In
>> case we have an expression of signed vector, we know that we would
>> transform it into exp != {0,0,...} in tree-vect-generic.c, but if the
>> types of operands do not match we convert them.
>
> Hm, ok ... let's hope we can sort-out the backend issues before this
> patch goes in so we can remove this converting stuff.

Hm, I would hope that we could commit this patch even with this issue,
because my feeling is that this case would produce errors on all the
other architectures as well, as VEC_COND_EXPR is the feature heavily
used in auto-vectorizer. So it means that all the backends must be
fixed. And another argument, that this conversion is harmless.

So I really hope that someone could shed some light or help me with
this issue, but even if not I think that the current conversion is ok.
However, I don't have any architectures different from x86.

>
>>>
>>> +          && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
>>> +         || TREE_TYPE (ifexp) != type1)
>>> +       {
>>> +         tree comp_type = COMPARISON_CLASS_P (ifexp)
>>> +                          ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>>> +                          : TREE_TYPE (ifexp);
>>> +         tree vcond;
>>> +
>>> +         op1 = convert (comp_type, op1);
>>> +         op2 = convert (comp_type, op2);
>>> +         vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>>> +         vcond = convert (type1, vcond);
>>> +         return vcond;
>>> +       }
>>> +      else
>>> +       return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);
>>>
>>> In the end we of course will try to fix the middle-end/backends to
>>> allow mixed types here as the current restriction doesn't really make sense.
>>
>> Yes, that would be nice, but these conversions do not really affect
>> the code generation, so for the time being I think it is fine to have
>> them.
>>
>>>
>>>     case EQ_EXPR:
>>>     case NE_EXPR:
>>> +      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
>>> +        {
>>> +          tree intt;
>>> +          if (TREE_TYPE (type0) != TREE_TYPE (type1))
>>> +            {
>>> +              error_at (location, "comparing vectors with different "
>>> +                                  "element types");
>>> +              return error_mark_node;
>>> +            }
>>> +
>>> +          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
>>> +            {
>>> +              error_at (location, "comparing vectors with different "
>>> +                                  "number of elements");
>>> +              return error_mark_node;
>>> +            }
>>>
>>> as above - compatibility should already be ensured, thus type0 == type1
>>> here?
>>
>> Yeah, we know that they are both vector types, but that is about all
>> we know. Anyhow, all these errors are reachable. As an example see
>> vector-compare-1.c:
>> r4 = x > y;  /* { dg-error "comparing vectors with different element types" } */
>> r8 == r4; /* { dg-error "comparing vectors with different number of
>> elements"} */
>
> Ok, I see.
>
>>>
>>> +/* Try a hardware hook for vector comparison or
>>> +   extract comparison piecewise.  */
>>> +static tree
>>> +expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
>>> +                          tree op1, enum tree_code code)
>>>
>>> comments should mention and describe all function arguments.
>>
>> Ok, coming in the new version of the patch.
>>
>>> +/* Expand vector condition EXP which should have the form
>>> +   VEC_COND_EXPR<cond, vec0, vec1> into the following
>>> +   vector:
>>> +     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
>>> +   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
>>> +static tree
>>> +earlyexpand_vec_cond_expr (gimple_stmt_iterator *gsi, tree exp)
>>>
>>> that would be expand_vec_cond_expr_piecewise, no?
>>
>> Adjusted.
>>
>>>
>>> +  /* Ensure that we will be able to expand vector comparison
>>> +     in case it is not supported by the architecture.  */
>>> +  gcc_assert (COMPARISON_CLASS_P (cond));
>>>
>>> that looks dangerous to me - did you try
>>>
>>>  vec = v1 <= v2;
>>>  vec2 = vec ? v1 : v2;
>>>
>>> without optimization?
>>
>> Sure, tests should cover this case.
>> I have this assertion there because only two cases are possible:
>> 1) it is a comparison
>> 2) function callee converted expr to expr != {0,0,...}
>> So we should be perfectly fine.
>>
>>>
>>> +  /* Check if we need to expand vector condition inside of
>>> +     VEC_COND_EXPR.  */
>>> +  var = create_tmp_reg (TREE_TYPE (cond), "cond");
>>> +  new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
>>> +                                      TREE_OPERAND (cond, 0),
>>> +                                     TREE_OPERAND (cond, 1),
>>> +                                      TREE_CODE (cond));
>>>
>>> That unconditionally expands, so no need for "Check".
>>
>> Ok.
>>
>>>
>>> +  /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs.  */
>>> +  v = VEC_alloc(constructor_elt, gc, nunits);
>>> +  for (i = 0; i < nunits;
>>> +       i += 1, index = int_const_binop (PLUS_EXPR, index, part_width))
>>> +    {
>>> +      tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, index);
>>> +      tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index);
>>> +      tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index);
>>> +      tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond,
>>> +                                    build_int_cst (inner_type ,0));
>>>
>>> I seriously doubt that when expanding this part piecewise expanding
>>> the mask first in either way is going to be beneficial.  Instead I would
>>> suggest to "inline" the comparison here.  Thus instead of
>>
>> Well, the ting is that, if expand_vector_comparison, would insert
>> builtin there rather than expanding the code piecewise, I'll have to
>> do the comparison with 0 anyway, because true is expressed as -1
>> there.
>>
>> Well, I would hope that in case we have:
>> c_0 = a_0 > b_0;
>> d_0 = c_0 != 0;
>>
>> {d_0, d_1,...}
>>
>> all the d_n should be constant-folded, or should I pull fold explicitly here?
>>
>> 1) I construct the mask
>>>
>>>  mask =
>>>         = { mask[0] != 0 ? ... }
>>>
>>> do
>>>
>>>          = { c0[0] < c1[0] ? ..., }
>>>
>>> or even expand the ? : using mask operations if we efficiently can
>>> create that mask.
>>>
>>
>> I assume that if we cannot expand VEC_COND_EXPR, then masking the
>> elements is a problem for us. Otherwise VEC_COND_EXPE expansion has a
>> bug somewhere. Or I am wrong somewhere?
>
> I think we can always do bitwise operations, so if we can get at the
> mask vector we are fine.
>
> I was thinking about how the case of explicitly computing the value of
> v1 < v2 into a vector vs. a condition inside a VEC_COND_EXPR should
> be handled.  If the target produces a mask of condition codes for
> a comparison then it might be able to efficiently expand a VEC_COND_EXPR.
> It could as well generate a mask via expanding v1 < v2 ? -1 : 0 then.
> A similar case is for AMD XOP which can expand mask ? v1 : v2
> with a single instruction (so even without seeing a comparison).
>
> Basically, if we can get at the mask we should use that to do the
> vector selection in parallel via (v1 & mask) | (v2 & ~mask).
>
> If we cannot even get at the mask then we can build the result
> vector piecewise as { v1[0] < v2[0] ? v1[0] : v2[0], .... } etc.

Ok, I am perfectly fine to construct (v1 & mask) | (v2 & ~mask), the
question is do  I need to check (v1 & mask) and (v2 & mask) or I can
just blindly insert it? The problem is that we have a single veclower
pass, so if I insert something that needs expansion, we would not have
the second chance to expand it again.

I'll adjust the patch.

>>>
>>> +  /* Check if VEC_COND_EXPR is supported in hardware within the
>>> +     given types.  */
>>> +  if (code == VEC_COND_EXPR)
>>> +    {
>>> +      tree exp = gimple_assign_rhs1 (stmt);
>>> +      tree cond = TREE_OPERAND (exp, 0);
>>> +
>>> +      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
>>> +        change it to A != {0,0,...} ? V0 : V1  */
>>> +      if (!COMPARISON_CLASS_P (cond))
>>> +       TREE_OPERAND (exp, 0) =
>>> +           build2 (NE_EXPR, TREE_TYPE (cond), cond,
>>> +                   build_vector_from_val (TREE_TYPE (cond),
>>> +                     build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
>>>
>>> That looks inefficient as well.  Iff we know that the mask is always
>>> either {-1, -1 ..} or {0, 0 ...} then we can expand the ? : using
>>> bitwise operations (see what the i?86 expander does, for example).
>>
>> This is a requirement of VEC_COND_EXPR, I need to pass 4 parameters,
>> not 3, that is why I introduce this fake {0,0,..} here.
>
> Sure, but if you look at expand_vec_cond_expr_p then you don't need
> that, and this fake comparison should instead be produced by the
> expander (or really avoided by maybe splitting up the named pattern
> into two).
>
> It's for sure not necessary for earlyexpand_vec_cond_expr (but instead
> makes it less efficient - with just the mask it can do the bitwise
> fallback easily).

Richard, let me give you an example:
#define vector(elcount, type)  \
__attribute__((vector_size((elcount)*sizeof(type)))) type

int
foo (vector (4, int) i0, vector (4, int) i1, int x)
{
  i0 = i0 ? i1 : i0;
  return i0[x];
}

when we optimize i0 ? i1 : i0, expand_vec_cond_expr_p  happily accepts
that and says that it can expand this expression. Now after the
veclowering is done, expand_vec_cond_expr calls vector_compare_rtx
(op0, unsignedp, icode), which has an assertion:
gcc_assert (COMPARISON_CLASS_P (cond));
and of course it fails.

So someone needs to insert != {0,0...} expression. I do it in the
tree-vect-geneic, but it could be done in expand_vec_cond_expr. The
question is where?

I can agree with you that I don't need to put this mask in case I
expand vcond piecewise, I will adjust that, but actually it does not
make much of a difference, in case expansion will use (v0 & mask) |
(v1 & ~mask).

Am I wrong somewhere?
>
>>>
>>> @@ -471,6 +603,7 @@ expand_vector_operations_1 (gimple_stmt_
>>>
>>>   gcc_assert (code != CONVERT_EXPR);
>>>
>>> +
>>>   /* The signedness is determined from input argument.  */
>>>   if (code == VEC_UNPACK_FLOAT_HI_EXPR
>>>       || code == VEC_UNPACK_FLOAT_LO_EXPR)
>>>
>>> spurious whitespace change.
>>
>> Fixed.
>>>
>>> Index: gcc/tree-cfg.c
>>> ===================================================================
>>> --- gcc/tree-cfg.c      (revision 177665)
>>> +++ gcc/tree-cfg.c      (working copy)
>>> @@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
>>>       return true;
>>>     }
>>>
>>> +  if (TREE_CODE (op0_type) == VECTOR_TYPE
>>> +      && TREE_CODE (op1_type) == VECTOR_TYPE
>>> +      && TREE_CODE (type) == VECTOR_TYPE)
>>> +    {
>>>
>>> this should check TREE_CODE (type) == VECTOR_TYPE only
>>> and then verify the comparison operands are actually vectors.
>>
>> Yes, you are right, adjusted.
>>
>>>
>>> +      if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type))
>>> +        {
>>> +          error ("invalid vector comparison, vector element type mismatch");
>>> +          debug_generic_expr (op0_type);
>>> +          debug_generic_expr (op1_type);
>>> +          return true;
>>> +        }
>>>
>>> this needs to use code similar to the scalar variant,
>>>
>>>          !useless_type_conversion_p (op0_type, op1_type)
>>>          && !useless_type_conversion_p (op1_type, op0_type)
>>>
>>> which also makes the first TYPE_VECTOR_SUBPARTS redundant.
>>>
>>
>> Fixed.
>>
>>> +      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
>>> +          && TYPE_PRECISION (TREE_TYPE (op0_type))
>>> +             != TYPE_PRECISION (TREE_TYPE (type)))
>>> +        {
>>> +          error ("invalid vector comparison resulting type");
>>> +          debug_generic_expr (type);
>>> +          return true;
>>> +        }
>>>
>>> I think you can drop the TYPE_PRECISION check.  We might want to
>>> assert that a vector element types precision always matches its
>>> mode precision (in make_vector_type).
>>
>> I would leave it for a while. During the optimisation you can
>> construct some strange things, so I would better make verifier
>> resistant to the all kind of stuff.
>
> Ok.
>
>>>
>>> Index: gcc/c-parser.c
>>> ===================================================================
>>> --- gcc/c-parser.c      (revision 177665)
>>> +++ gcc/c-parser.c      (working copy)
>>> @@ -5337,8 +5337,17 @@ c_parser_conditional_expression (c_parse
>>>   if (c_parser_next_token_is (parser, CPP_COLON))
>>>     {
>>>       tree eptype = NULL_TREE;
>>> -
>>> +
>>>       middle_loc = c_parser_peek_token (parser)->location;
>>> +
>>> +      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
>>>
>>> watch out for whitespace changes - you add a trailing tab here.
>>
>> Fixed.
>>
>>>
>>> +/* Find target specific sequence for vector comparison of
>>> +   real-type vectors V0 and V1. Returns variable containing
>>> +   result of the comparison or NULL_TREE in other case.  */
>>> +static tree
>>> +vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype,
>>> +                   enum machine_mode mode, tree v0, tree v1,
>>> +                   enum tree_code code)
>>> +{
>>> +  enum ix86_builtins fcode;
>>>
>>> is there a reason we need this and cannot simply provide expanders
>>> for the named patterns?  We'd need to give them semantics of
>>> producing all-ones / all-zero masks of course.  Richard, do you think
>>> that's sensible?  That way we'd avoid the new target hook and could
>>> simply do optab queries.
>>
>> I think I don't really understand the idea. How we are going to
>> represent the fact that we need to convert a given node to the given
>> machine instruction? May be you could point where the similar
>> technique is already used.
>
> In all places we check optab_handler (op, mode) != CODE_FOR_nothing.
> We have eq_optab for example, so optab_handler (eq_optab, V4SImode)
> would get you the instruction sequence for a comparison of V4SImode
> vectors.  That isn't yet properly defined what it should return.
>
> Otherwise I'd say we should ask the target to expand
> v1 < v2 as VEC_COND_EXPR (v1 < v2, -1, 0) instead.  That one could
> as well special-case the -1 and 0 result vectors (and maybe it already
> does).

Ok, I can adjust the optab  checking for the mode, but I recall that
we introduced the hook exactly because optabs did not return anything
sensible. It was your idea :)

Also, I don't like the idea to expand any comparison  to VEC_COND_EXPR
(v1 < v2, -1, 0). Look at expand_vec_cond_expr, it would do the job
only if there is an instruction vcond in the architecture, it checks
for direct_optab_handler (vcond_optab, mode). But it is not
necessarily the case that using vcond is as efficient as using
comparison instructions. Also, we could run into the situation when
vcond is not supported, but comparison is, or can't we?

Anyhow, I would think that we want to keep vcond and comparison separately.


Artem.

>
> Richard.
>>
>> The new patch will be tested and submitted here soon.
>>
>>
>> Thanks,
>> Artem.
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> ChangeLog
>>>>
>>>> 2011-08-12 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>>>
>>>>       gcc/
>>>>       * targhooks.c (default_builtin_vec_compare): New hook.
>>>>       * targhooks.h (default_builtin_vec_compare): New definition.
>>>>       * target.def (builtin_vec_compare): New hook.
>>>>       * target.h: New include (gimple.h).
>>>>       * fold-const.c
>>>>       (fold_comparison): Adjust x <cmp> x vector operations.
>>>>       * c-typeck.c (build_binary_op): Allow vector comparison.
>>>>       (c_obj_common_truthvalue_conversion): Deny vector comparison
>>>>       inside of if statement.
>>>>       (build_conditional_expr): Adjust to build VEC_COND_EXPR.
>>>>       * tree-vect-generic.c (do_compare): Helper function.
>>>>       (expand_vector_comparison): Check if hardware comparison
>>>>       is available, if not expand comparison piecewise.
>>>>       (expand_vector_operation): Handle vector comparison
>>>>       expressions separately.
>>>>       (earlyexpand_vec_cond_expr): Expand vector comparison
>>>>       piecewise.
>>>>       * Makefile.in: New dependencies.
>>>>       * tree-cfg.c (verify_gimple_comparison): Allow vector
>>>>       comparison operations in gimple.
>>>>       * c-parser.c (c_parser_conditional_expression): Adjust
>>>>       to handle VEC_COND_EXPR.
>>>>       * gimplify.c (gimplify_expr): Adjust to handle VEC_COND_EXPR.
>>>>       * config/i386/i386.c (vector_fp_compare): Build hardware
>>>>       specific code for floating point vector comparison.
>>>>       (vector_int_compare): Build hardware specific code for
>>>>       integer vector comparison.
>>>>       (ix86_vectorize_builtin_vec_compare): Implementation of
>>>>       builtin_vec_compare hook.
>>>>
>>>>       gcc/testsuite/
>>>>       * gcc.c-torture/execute/vector-vcond-1.c: New test.
>>>>       * gcc.c-torture/execute/vector-vcond-2.c: New test.
>>>>       * gcc.c-torture/execute/vector-compare-1.c: New test.
>>>>       * gcc.c-torture/execute/vector-compare-2.c: New test.
>>>>       * gcc.dg/vector-compare-1.c: New test.
>>>>       * gcc.dg/vector-compare-2.c: New test.
>>>>
>>>>       gcc/doc
>>>>       * extend.texi: Adjust.
>>>>       * tm.texi: Adjust.
>>>>       * tm.texi.in: Adjust.
>>>>
>>>>
>>>> bootstrapped and tested on x86_64_unknown-linux.
>>>>
>>>>
>>>> Thanks,
>>>> Artem Shinkarov.
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-16 17:01       ` Artem Shinkarov
@ 2011-08-16 21:48         ` Artem Shinkarov
  2011-08-17 12:58           ` Richard Guenther
  2011-08-17 12:49         ` Richard Guenther
  1 sibling, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-16 21:48 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

Hi, here is a new version of the patch with the adjustments.

Two important comments.
1) At the moment when I expand expression  mask ? vec0 : vec1, I
replace mask with (mask == {-1,-1,..}). The first reason is that
expand_vec_cond_expr requires first operand to be a comparison. Second
reason is that a mask {3, 4, -1, 5} should be transformed into
{0,0,-1,0} in order to simulate vcond as ((vec0 & mask) | (vec1 &
~mask)). So in both cases we need this adjustment.

2) Vector comparison through optab.
As far as I just have adjusted expand_vector_operation in
tree-vect-generic.c, it would be called only when there is no
sufficient optab. I is being checked in expand_vector_operations_1. So
the only place where I try to find an optab for the comparison is
expand_vec_cond_expr_piecewise, which I adjusted.

As for the vector hook, it will be triggered only when we don't have
an appropriate optab.

bootstrapped and tested on x86_64-unknown-linux-gnu.
Anything else?


Artem.

[-- Attachment #2: vector-compare-vcond-4.diff --]
[-- Type: text/plain, Size: 52967 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 177665)
+++ gcc/doc/extend.texi	(working copy)
@@ -6553,6 +6553,97 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+In C vector comparison is supported within standard comparison operators:
+@code{==, !=, <, <=, >, >=}. Both integer-type and real-type vectors
+can be compared but only of the same type. The result of the
+comparison is a signed integer-type vector where the size of each
+element must be the same as the size of compared vectors element.
+Comparison is happening element by element. False value is 0, true
+value is -1 (constant of the appropriate type where all bits are set).
+Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
+c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
+@end smallexample
+
+In addition to the vector comparison C supports conditional expressions
+where the condition is a vector of signed integers. In that case result
+of the condition is used as a mask to select either from the first 
+operand or from the second. Consider the following example:
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,7@};
+v4si c = @{2,3,4,5@};
+v4si d = @{6,7,8,9@};
+v4si res;
+
+res = a >= b ? c : d;  /* res would contain @{6, 3, 4, 9@}  */
+@end smallexample
+
+The number of elements in the condition must be the same as number of
+elements in the both operands. The same stands for the size of the type
+of the elements. The type of the vector conditional is determined by
+the types of the operands which must be the same. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+typedef float v4f __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{2,3,4,5@};
+v4f f = @{1.,  5., 7., -8.@};
+v4f g = @{3., -2., 8.,  1.@};
+v4si ires;
+v4f fres;
+
+fres = a <= b ? f : g;  /* fres would contain @{1., 5., 7., -8.@}  */
+ires = f <= g ? a : b;  /* fres would contain @{1,  3,  3,   4@}  */
+@end smallexample
+
+For the convenience condition in the vector conditional can be just a
+vector of signed integer type. In that case this vector is implicitly
+compared with vectors of zeroes. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+
+ires = a ? b : a;  /* synonym for ires = a != @{0,0,0,0@} ? a :b;  */
+@end smallexample
+
+Pleas note that the conditional where the operands are vectors and the
+condition is integer works in a standard way -- returns first operand
+if the condition is true and second otherwise. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+int x,y;
+
+/* standard conditional returning A or B  */
+ires = x > y ? a : b;  
+
+/* vector conditional where the condition is (x > y ? a : b)  */
+ires = (x > y ? a : b) ? b : a; 
+@end smallexample
+
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 177665)
+++ gcc/doc/tm.texi	(working copy)
@@ -5738,6 +5738,10 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_COMPARE (gimple_stmt_iterator *@var{gsi}, tree @var{type}, tree @var{v0}, tree @var{v1}, enum tree_code @var{code})
+This hook should check whether it is possible to express vectorcomparison using the hardware-specific instructions and return resulttree. Hook should return NULL_TREE if expansion is impossible.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_PERM (tree @var{type}, tree *@var{mask_element_type})
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 177665)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -5676,6 +5676,8 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@hook TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+
 @hook TARGET_VECTORIZE_BUILTIN_VEC_PERM
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 177665)
+++ gcc/targhooks.c	(working copy)
@@ -969,6 +969,18 @@ default_builtin_vector_alignment_reachab
   return true;
 }
 
+/* Replaces vector comparison with the target-specific instructions 
+   and returns the resulting variable or NULL_TREE otherwise.  */
+tree 
+default_builtin_vec_compare (gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, 
+                             tree type ATTRIBUTE_UNUSED, 
+                             tree v0 ATTRIBUTE_UNUSED, 
+                             tree v1 ATTRIBUTE_UNUSED, 
+                             enum tree_code code ATTRIBUTE_UNUSED)
+{
+  return NULL_TREE;
+}
+
 /* By default, assume that a target supports any factor of misalignment
    memory access if it supports movmisalign patten.
    is_packed is true if the memory access is defined in a packed struct.  */
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 177665)
+++ gcc/targhooks.h	(working copy)
@@ -86,6 +86,11 @@ extern int default_builtin_vectorization
 extern tree default_builtin_reciprocal (unsigned int, bool, bool);
 
 extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
+
+extern tree default_builtin_vec_compare (gimple_stmt_iterator *gsi, 
+                                         tree type, tree v0, tree v1, 
+                                         enum tree_code code);
+
 extern bool
 default_builtin_support_vector_misalignment (enum machine_mode mode,
 					     const_tree,
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 177665)
+++ gcc/target.def	(working copy)
@@ -988,6 +988,15 @@ DEFHOOK
  bool, (tree vec_type, tree mask),
  hook_bool_tree_tree_true)
 
+/* Implement hardware vector comparison or return false.  */
+DEFHOOK
+(builtin_vec_compare,
+ "This hook should check whether it is possible to express vector\
+comparison using the hardware-specific instructions and return result\
+tree. Hook should return NULL_TREE if expansion is impossible.",
+ tree, (gimple_stmt_iterator *gsi, tree type, tree v0, tree v1, enum tree_code code),
+ default_builtin_vec_compare)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 177665)
+++ gcc/target.h	(working copy)
@@ -51,6 +51,7 @@
 #define GCC_TARGET_H
 
 #include "insn-modes.h"
+#include "gimple.h"
 
 #ifdef ENABLE_CHECKING
 
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 177665)
+++ gcc/fold-const.c	(working copy)
@@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr
      floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
     {
-      switch (code)
+      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
 	{
-	case EQ_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-	    return constant_boolean_node (1, type);
-	  break;
+	  tree el_type = TREE_TYPE (TREE_TYPE (arg0));
+	  switch (code)
+	    {
+	    case EQ_EXPR:
+	    case GE_EXPR:
+	    case LE_EXPR:
+	      if (!FLOAT_TYPE_P (el_type) 
+		  || HONOR_NANS (TYPE_MODE (el_type)))
+		return build_vector_from_val 
+			  (TREE_TYPE (arg0), build_int_cst (el_type, -1));
+	      break;
+	    case NE_EXPR:
+	      if (FLOAT_TYPE_P (el_type)
+		  && HONOR_NANS (TYPE_MODE (el_type)))
+		break;
+	    /* ... fall through ...  */
+	    case GT_EXPR:
+	    case LT_EXPR:
+	      return build_vector_from_val 
+			  (TREE_TYPE (arg0), build_int_cst (el_type, 0));
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+      else
+	switch (code)
+	  {
+	  case EQ_EXPR:
+	    if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
+		|| ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	      return constant_boolean_node (1, type);
+	    break;
 
-	case GE_EXPR:
-	case LE_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-	    return constant_boolean_node (1, type);
-	  return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);
+	  case GE_EXPR:
+	  case LE_EXPR:
+	    if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
+		|| ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	      return constant_boolean_node (1, type);
+	    return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);
 
-	case NE_EXPR:
-	  /* For NE, we can only do this simplification if integer
-	     or we don't honor IEEE floating point NaNs.  */
-	  if (FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      && HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-	    break;
-	  /* ... fall through ...  */
-	case GT_EXPR:
-	case LT_EXPR:
-	  return constant_boolean_node (0, type);
-	default:
-	  gcc_unreachable ();
-	}
+	  case NE_EXPR:
+	    /* For NE, we can only do this simplification if integer
+	       or we don't honor IEEE floating point NaNs.  */
+	    if (FLOAT_TYPE_P (TREE_TYPE (arg0))
+		&& HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	      break;
+	    /* ... fall through ...  */
+	  case GT_EXPR:
+	  case LT_EXPR:
+	    return constant_boolean_node (0, type);
+	  default:
+	    gcc_unreachable ();
+	  }
     }
 
   /* If we are comparing an expression that just has comparisons
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
@@ -0,0 +1,78 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(count, res, i0, i1, c0, c1, op, fmt0, fmt1) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if ((res)[__i] != \
+                ((i0)[__i] op (i1)[__i]  \
+		? (c0)[__i] : (c1)[__i]))  \
+	{ \
+            __builtin_printf (fmt0 " != (" fmt1 " " #op " " fmt1 " ? " \
+			      fmt0 " : " fmt0 ")", \
+	    (res)[__i], (i0)[__i], (i1)[__i],\
+	    (c0)[__i], (c1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, c0, c1, res, fmt0, fmt1); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >, fmt0, fmt1); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >=, fmt0, fmt1); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <, fmt0, fmt1); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <=, fmt0, fmt1); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, ==, fmt0, fmt1); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, !=, fmt0, fmt1); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+  vector (4, int) i0 = {argc, 1,  2,  10}; 
+  vector (4, int) i1 = {0, argc, 2, (int)-23};
+  vector (4, int) ires;
+  vector (4, float) f0 = {1., 7., (float)argc, 4.};
+  vector (4, float) f1 = {6., 2., 8., (float)argc};
+  vector (4, float) fres;
+
+  vector (2, double) d0 = {1., (double)argc};
+  vector (2, double) d1 = {6., 2.};
+  vector (2, double) dres;
+  vector (2, long) l0 = {argc, 3};
+  vector (2, long) l1 = {5,  8};
+  vector (2, long) lres;
+  
+  /* Thes tests work fine.  */
+  test (4, i0, i1, f0, f1, fres, "%f", "%i");
+  test (4, f0, f1, i0, i1, ires, "%i", "%f");
+  test (2, d0, d1, l0, l1, lres, "%i", "%f");
+  test (2, l0, l1, d0, d1, dres, "%f", "%i");
+
+  /* Condition expressed with a single variable.  */
+  dres = l0 ? d0 : d1;
+  check_compare (2, dres, l0, ((vector (2, long)){-1,-1}), d0, d1, ==, "%f", "%i");
+  
+  lres = l1 ? l0 : l1;
+  check_compare (2, lres, l1, ((vector (2, long)){-1,-1}), l0, l1, ==, "%i", "%i");
+ 
+  fres = i0 ? f0 : f1;
+  check_compare (4, fres, i0, ((vector (4, int)){-1,-1,-1,-1}), 
+		 f0, f1, ==, "%f", "%i");
+
+  ires = i1 ? i0 : i1;
+  check_compare (4, ires, i1, ((vector (4, int)){-1,-1,-1,-1}), 
+		 i0, i1, ==, "%i", "%i");
+
+  return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
@@ -0,0 +1,123 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define check_compare(count, res, i0, i1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+      if ((res)[__i] != ((i0)[__i] op (i1)[__i] ? -1 : 0)) \
+	{ \
+            __builtin_printf ("%i != ((" fmt " " #op " " fmt " ? -1 : 0) ", \
+			      (res)[__i], (i0)[__i], (i1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, res, fmt); \
+do { \
+    res = (v0 > v1); \
+    check_compare (count, res, v0, v1, >, fmt); \
+    res = (v0 < v1); \
+    check_compare (count, res, v0, v1, <, fmt); \
+    res = (v0 >= v1); \
+    check_compare (count, res, v0, v1, >=, fmt); \
+    res = (v0 <= v1); \
+    check_compare (count, res, v0, v1, <=, fmt); \
+    res = (v0 == v1); \
+    check_compare (count, res, v0, v1, ==, fmt); \
+    res = (v0 != v1); \
+    check_compare (count, res, v0, v1, !=, fmt); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0;
+    vector (4, INT) i1;
+    vector (4, int) ires;
+    int i;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, i0, i1, ires, "%i");
+#undef INT
+
+#define INT unsigned int 
+    vector (4, int) ures;
+    vector (4, INT) u0;
+    vector (4, INT) u1;
+
+    u0 = (vector (4, INT)){argc, 1,  2,  10};
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, u0, u1, ures, "%u");
+#undef INT
+
+
+#define SHORT short
+    vector (8, SHORT) s0;
+    vector (8, SHORT) s1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, s0, s1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;
+    vector (8, SHORT) us1;
+    vector (8, short) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, us0, us1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;
+    vector (16, CHAR) c1;
+    vector (16, signed char) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, c0, c1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;
+    vector (16, CHAR) uc1;
+    vector (16, signed char) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, uc0, uc1, ucres, "%u");
+#undef CHAR
+/* Float comparison.  */
+    vector (4, float) f0;
+    vector (4, float) f1;
+    vector (4, int) ifres;
+
+    f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
+    f1 = (vector (4, float)){0., 3., 2., (float)-23};    
+    test (4, f0, f1, ifres, "%f");
+    
+/* Double comparison.  */
+    vector (2, double) d0;
+    vector (2, double) d1;
+    vector (2, long) idres;
+
+    d0 = (vector (2, double)){(double)argc,  10.};
+    d1 = (vector (2, double)){0., (double)-23};    
+    test (2, d0, d1, idres, "%f");
+
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
@@ -0,0 +1,154 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(type, count, res, i0, i1, c0, c1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if (vidx (type, res, __i) != \
+                ((vidx (type, i0, __i) op vidx (type, i1, __i))  \
+		? vidx (type, c0, __i) : vidx (type, c1, __i)))  \
+	{ \
+            __builtin_printf (fmt " != ((" fmt " " #op " " fmt ") ? " fmt " : " fmt ")", \
+	    vidx (type, res, __i), vidx (type, i0, __i), vidx (type, i1, __i),\
+	    vidx (type, c0, __i), vidx (type, c1, __i)); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(type, count, v0, v1, c0, c1, res, fmt); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >, fmt); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >=, fmt); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <, fmt); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <=, fmt); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, ==, fmt); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, !=, fmt); \
+} while (0)
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0; vector (4, INT) i1;
+    vector (4, INT) ic0; vector (4, INT) ic1;
+    vector (4, INT) ires;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    ic0 = (vector (4, INT)){1, argc,  argc,  10};
+    ic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, i0, i1, ic0, ic1, ires, "%i");
+#undef INT
+
+#define INT  unsigned int
+    vector (4, INT) ui0; vector (4, INT) ui1;
+    vector (4, INT) uic0; vector (4, INT) uic1;
+    vector (4, INT) uires;
+
+    ui0 = (vector (4, INT)){argc, 1,  2,  10};
+    ui1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    uic0 = (vector (4, INT)){1, argc,  argc,  10};
+    uic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, ui0, ui1, uic0, uic1, uires, "%u");
+#undef INT
+
+#define SHORT short
+    vector (8, SHORT) s0;   vector (8, SHORT) s1;
+    vector (8, SHORT) sc0;   vector (8, SHORT) sc1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    sc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    sc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, s0, s1, sc0, sc1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;   vector (8, SHORT) us1;
+    vector (8, SHORT) usc0;   vector (8, SHORT) usc1;
+    vector (8, SHORT) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    usc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    usc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, us0, us1, usc0, usc1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;   vector (16, CHAR) c1;
+    vector (16, CHAR) cc0;   vector (16, CHAR) cc1;
+    vector (16, CHAR) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    cc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    cc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, c0, c1, cc0, cc1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;   vector (16, CHAR) uc1;
+    vector (16, CHAR) ucc0;   vector (16, CHAR) ucc1;
+    vector (16, CHAR) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    ucc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    ucc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, uc0, uc1, ucc0, ucc1, ucres, "%u");
+#undef CHAR
+
+/* Float version.  */
+   vector (4, float) f0 = {1., 7., (float)argc, 4.};
+   vector (4, float) f1 = {6., 2., 8., (float)argc};
+   vector (4, float) fc0 = {3., 12., 4., (float)argc};
+   vector (4, float) fc1 = {7., 5., (float)argc, 6.};
+   vector (4, float) fres;
+
+   test (float, 4, f0, f1, fc0, fc1, fres, "%f");
+
+/* Double version.  */
+   vector (2, double) d0 = {1., (double)argc};
+   vector (2, double) d1 = {6., 2.};
+   vector (2, double) dc0 = {(double)argc, 7.};
+   vector (2, double) dc1 = {7., 5.};
+   vector (2, double) dres;
+
+   test (double, 2, d0, d1, dc0, dc1, dres, "%f");
+
+
+   return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+/* Check that constant folding in 
+   these simple cases works.  */
+vector (4, int)
+foo (vector (4, int) x)
+{
+  return   (x == x) + (x != x) + (x >  x) 
+	 + (x <  x) + (x >= x) + (x <= x);
+}
+
+int 
+main (int argc, char *argv[])
+{
+  vector (4, int) t = {argc, 2, argc, 42};
+  vector (4, int) r;
+  int i;
+
+  r = foo (t);
+
+  for (i = 0; i < 4; i++)
+    if (r[i] != -3)
+      __builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+void
+foo (vector (4, int) x, vector (4, float) y)
+{
+  vector (4, int) p4;
+  vector (4, int) r4;
+  vector (4, unsigned int) q4;
+  vector (8, int) r8;
+  vector (4, float) f4;
+  
+  r4 = x > y;	    /* { dg-error "comparing vectors with different element types" } */
+  r8 = (x != p4);   /* { dg-error "incompatible types when assigning to type" } */
+  r8 == r4;	    /* { dg-error "comparing vectors with different number of elements" } */
+
+  r4 ? y : p4;	    /* { dg-error "vectors of different types involved in vector comparison" } */
+  r4 ? r4 : r8;	    /* { dg-error "vectors of different length found in vector comparison" } */
+  y ? f4 : y;	    /* { dg-error "non-integer type in vector condition" } */
+  
+  /* Do not trigger that  */
+  q4 ? p4 : r4;	    /* { "vector comparison must be of signed integer vector type" } */
+}
Index: gcc/testsuite/gcc.dg/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */   
+
+/* Test if C_MAYBE_CONST are folded correctly when 
+   creating VEC_COND_EXPR.  */
+
+typedef int vec __attribute__((vector_size(16)));
+
+vec i,j;
+extern vec a, b, c;
+
+vec 
+foo (int x)
+{
+  return (x ? i : j) ? a : b;
+}
+
+vec 
+bar (int x)
+{
+  return a ? (x ? i : j) : b;
+}
+
+vec 
+baz (int x)
+{
+  return a ? b : (x ? i : j);
+}
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 177665)
+++ gcc/c-typeck.c	(working copy)
@@ -4058,6 +4058,94 @@ build_conditional_expr (location_t colon
   type2 = TREE_TYPE (op2);
   code2 = TREE_CODE (type2);
 
+  if (TREE_CODE (TREE_TYPE (ifexp)) == VECTOR_TYPE)
+    {
+      bool maybe_const = true;
+      tree sc;
+      
+      if (TREE_CODE (type1) != VECTOR_TYPE
+	  || TREE_CODE (type2) != VECTOR_TYPE)
+        {
+          error_at (colon_loc, "vector comparisom arguments must be of "
+                               "type vector");
+          return error_mark_node;
+        }
+
+      if (TREE_CODE (TREE_TYPE (TREE_TYPE (ifexp))) != INTEGER_TYPE)
+        {
+          error_at (colon_loc, "non-integer type in vector condition");
+          return error_mark_node;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
+          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
+             != TYPE_VECTOR_SUBPARTS (type1))
+        {
+          error_at (colon_loc, "vectors of different length found in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+      
+      if (TREE_TYPE (type1) != TREE_TYPE (type2))
+        {
+          error_at (colon_loc, "vectors of different types involved in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+
+      if (TYPE_SIZE (TREE_TYPE (TREE_TYPE (ifexp))) 
+          != TYPE_SIZE (TREE_TYPE (type1)))
+        {
+          error_at (colon_loc, "vector-condition element type must be "
+                               "the same as result vector element type");
+          return error_mark_node;
+        }
+      
+      /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
+      sc = c_fully_fold (ifexp, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	ifexp = c_wrap_maybe_const (sc, true);
+      else
+	ifexp = sc;
+      
+      sc = c_fully_fold (op1, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	op1 = c_wrap_maybe_const (sc, true);
+      else
+	op1 = sc;
+      
+      sc = c_fully_fold (op2, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	op2 = c_wrap_maybe_const (sc, true);
+      else
+	op2 = sc;
+
+      /* Currently the expansion of VEC_COND_EXPR does not allow
+	 expessions where the type of vectors you compare differs
+	 form the type of vectors you select from. For the time
+	 being we insert implicit conversions.  */
+      if ((COMPARISON_CLASS_P (ifexp)
+	   && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
+	  || TREE_TYPE (ifexp) != type1)
+	{
+	  tree comp_type = COMPARISON_CLASS_P (ifexp)
+			   ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
+			   : TREE_TYPE (ifexp);
+	  tree vcond;
+	  
+	  op1 = convert (comp_type, op1);
+	  op2 = convert (comp_type, op2);
+	  vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
+	  vcond = convert (type1, vcond);
+	  return vcond;
+	}
+      else
+	return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);
+    }
+
   /* C90 does not permit non-lvalue arrays in conditional expressions.
      In C99 they will be pointers by now.  */
   if (code1 == ARRAY_TYPE || code2 == ARRAY_TYPE)
@@ -9906,6 +9994,29 @@ build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -10018,6 +10129,29 @@ build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10425,6 +10559,10 @@ c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 177665)
+++ gcc/gimplify.c	(working copy)
@@ -7064,6 +7064,22 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
+        case VEC_COND_EXPR:
+	  {
+	    enum gimplify_status r0, r1, r2;
+
+	    r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+				post_p, is_gimple_condexpr, fb_rvalue);
+	    r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    recalculate_side_effects (*expr_p);
+
+	    ret = MIN (r0, MIN (r1, r2));
+	  }
+	  break;
+
 	case TARGET_MEM_REF:
 	  {
 	    enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE;
@@ -7348,6 +7364,11 @@ gimplify_expr (tree *expr_p, gimple_seq
 		{
 		  tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1));
 
+		  /* Vector comparisons is a valid gimple expression
+		     which could be lowered down later.  */
+		  if (TREE_CODE (type) == VECTOR_TYPE)
+		    goto expr_2;
+
 		  if (!AGGREGATE_TYPE_P (type))
 		    {
 		      tree org_type = TREE_TYPE (*expr_p);
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 177665)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "target.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -125,6 +126,21 @@ do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0;  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree cond;
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  cond = gimplify_build2 (gsi, code, inner_type, a, b);
+  return gimplify_build3 (gsi, COND_EXPR, inner_type, cond, 
+                    build_int_cst (inner_type, -1),
+                    build_int_cst (inner_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -333,6 +349,24 @@ uniform_vector_p (tree vec)
   return NULL_TREE;
 }
 
+/* Try to expand vector comparison expression OP0 CODE OP1 using  
+   builtin_vec_compare hardware hook, in case target does not 
+   support comparison of type TYPE, extract comparison piecewise.  
+   GSI is used inside the target hook to create the code needed
+   for the given comparison.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+ tree t = targetm.vectorize.builtin_vec_compare (gsi, type, op0, op1, code);
+
+  if (t == NULL_TREE)
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  return t;
+
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -375,8 +409,24 @@ expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
-
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+        return expand_vector_comparison (gsi, type,
+                                      gimple_assign_rhs1 (assign),
+                                      gimple_assign_rhs2 (assign), code);
       default:
 	break;
       }
@@ -432,6 +482,50 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+/* Expand vector condition EXP which should have the form
+   VEC_COND_EXPR<cond, vec0, vec1> into the following
+   vector:
+     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
+   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
+static tree
+expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
+{
+  tree cond = TREE_OPERAND (exp, 0);
+  tree vec0 = TREE_OPERAND (exp, 1);
+  tree vec1 = TREE_OPERAND (exp, 2);
+  tree type = TREE_TYPE (vec0);
+  tree lhs, rhs, notmask;
+  tree var, new_rhs;
+  optab op = NULL;
+  gimple new_stmt;
+
+  /* Ensure that we will be able to expand vector comparison
+     in case it is not supported by the architecture.  */
+  gcc_assert (COMPARISON_CLASS_P (cond));
+  
+  /* Expand vector condition inside of VEC_COND_EXPR.  */
+  op = optab_for_tree_code (TREE_CODE (cond), type, optab_default);
+  if (!op || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing)
+    {
+      var = create_tmp_reg (TREE_TYPE (cond), "cond");
+      new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
+					  TREE_OPERAND (cond, 0),
+					  TREE_OPERAND (cond, 1),
+					  TREE_CODE (cond));
+      new_stmt = gimple_build_assign (var, new_rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (gsi_stmt (*gsi));
+    }
+  else
+    var = cond;
+    
+  /* Expand VCOND<mask, v0, v1> to ((v0 & mask) | (v1 & ~mask))  */
+  lhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, var, vec0);
+  notmask = gimplify_build1 (gsi, BIT_NOT_EXPR, type, var);
+  rhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, notmask, vec1);
+  return gimplify_build2 (gsi, BIT_IOR_EXPR, type, lhs, rhs);
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -451,6 +545,33 @@ expand_vector_operations_1 (gimple_stmt_
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
 
+  /* Check if VEC_COND_EXPR is supported in hardware within the
+     given types.  */
+  if (code == VEC_COND_EXPR)
+    {
+      tree exp = gimple_assign_rhs1 (stmt);
+      tree cond = TREE_OPERAND (exp, 0);
+      
+      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
+      change it to A != {0,0,...} ? V0 : V1  */
+      if (!COMPARISON_CLASS_P (cond))
+	TREE_OPERAND (exp, 0) = 
+	  build2 (EQ_EXPR, TREE_TYPE (cond), cond,
+		  build_vector_from_val (TREE_TYPE (cond),
+		  build_int_cst (TREE_TYPE (TREE_TYPE (cond)), -1)));
+   
+      if (expand_vec_cond_expr_p (TREE_TYPE (exp), 
+                                  TYPE_MODE (TREE_TYPE (exp))))
+        {
+	  update_stmt (gsi_stmt (*gsi));
+	  return;
+        }
+        
+      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
+      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
+      update_stmt (gsi_stmt (*gsi));
+    }
+
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
@@ -470,7 +591,6 @@ expand_vector_operations_1 (gimple_stmt_
     return;
 
   gcc_assert (code != CONVERT_EXPR);
-
   /* The signedness is determined from input argument.  */
   if (code == VEC_UNPACK_FLOAT_HI_EXPR
       || code == VEC_UNPACK_FLOAT_LO_EXPR)
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 177665)
+++ gcc/Makefile.in	(working copy)
@@ -888,7 +888,7 @@ EXCEPT_H = except.h $(HASHTAB_H) vecprim
 TARGET_DEF = target.def target-hooks-macros.h
 C_TARGET_DEF = c-family/c-target.def target-hooks-macros.h
 COMMON_TARGET_DEF = common/common-target.def target-hooks-macros.h
-TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
+TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
 C_TARGET_H = c-family/c-target.h $(C_TARGET_DEF)
 COMMON_TARGET_H = common/common-target.h $(INPUT_H) $(COMMON_TARGET_DEF)
 MACHMODE_H = machmode.h mode-classes.def insn-modes.h
@@ -919,8 +919,9 @@ TREE_H = tree.h all-tree.def tree.def c-
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) cfghooks.h
 GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
-	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TARGET_H) tree-ssa-operands.h \
+	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TGT) tree-ssa-operands.h \
 	tree-ssa-alias.h $(INTERNAL_FN_H)
+TARGET_H = $(TGT) gimple.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 COVERAGE_H = coverage.h $(GCOV_IO_H)
 DEMANGLE_H = $(srcdir)/../include/demangle.h
@@ -3185,7 +3186,7 @@ tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h target.h
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 177665)
+++ gcc/tree-cfg.c	(working copy)
@@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
       return true;
     }
 
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      if (TREE_CODE (op0_type) != VECTOR_TYPE
+	  || TREE_CODE (op1_type) != VECTOR_TYPE)
+        {
+          error ("non-vector operands in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (!useless_type_conversion_p (op0_type, op1_type)
+	  && !useless_type_conversion_p (op1_type, op0_type))
+        {
+          error ("type mismatch in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type)) 
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+        
+      return false;
+    }
+
   /* For comparisons we do not have the operations type as the
      effective type the comparison is carried out in.  Instead
      we require that either the first operand is trivially
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 177665)
+++ gcc/c-parser.c	(working copy)
@@ -5339,6 +5339,15 @@ c_parser_conditional_expression (c_parse
       tree eptype = NULL_TREE;
 
       middle_loc = c_parser_peek_token (parser)->location;
+
+      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
+        {
+          error_at (middle_loc, "cannot ommit middle operator in "
+                                "vector comparison");
+          ret.value = error_mark_node;
+          return ret;
+        }
+      
       pedwarn (middle_loc, OPT_pedantic, 
 	       "ISO C forbids omitting the middle term of a ?: expression");
       warn_for_omitted_condop (middle_loc, cond.value);
@@ -5357,9 +5366,12 @@ c_parser_conditional_expression (c_parse
     }
   else
     {
-      cond.value
-	= c_objc_common_truthvalue_conversion
-	(cond_loc, default_conversion (cond.value));
+      if (TREE_CODE (TREE_TYPE (cond.value)) != VECTOR_TYPE)
+        {
+          cond.value
+            = c_objc_common_truthvalue_conversion
+            (cond_loc, default_conversion (cond.value));
+        }
       c_inhibit_evaluation_warnings += cond.value == truthvalue_false_node;
       exp1 = c_parser_expression_conv (parser);
       mark_exp_read (exp1.value);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177665)
+++ gcc/config/i386/i386.c	(working copy)
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -32827,6 +32828,276 @@ ix86_vectorize_builtin_vec_perm (tree ve
   return ix86_builtins[(int) fcode];
 }
 
+/* Find target specific sequence for vector comparison of 
+   real-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                   enum machine_mode mode, tree v0, tree v1,
+                   enum tree_code code)
+{
+  enum ix86_builtins fcode;
+  int arg = -1;
+  tree fdef, frtype, tmp, var, t;
+  gimple new_stmt;
+  bool reverse = false;
+
+#define SWITCH_MODE(mode, fcode, code, value) \
+switch (mode) \
+  { \
+    case V2DFmode: \
+      if (!TARGET_SSE2) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PD; \
+      break; \
+    case V4DFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPD256; \
+      arg = value; \
+      break; \
+    case V4SFmode: \
+      if (!TARGET_SSE) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PS; \
+      break; \
+    case V8SFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPS256; \
+      arg = value; \
+      break; \
+    default: \
+      return NULL_TREE; \
+    /* FIXME: Similar instructions for MMX.  */ \
+  }
+
+  switch (code)
+    {
+      case EQ_EXPR:
+        SWITCH_MODE (mode, fcode, EQ, 0);
+        break;
+      
+      case NE_EXPR:
+        SWITCH_MODE (mode, fcode, NEQ, 4);
+        break;
+      
+      case GT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        reverse = true;
+        break;
+      
+      case LT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        break;
+      
+      case LE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        break;
+
+      case GE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        reverse = true;
+        break;
+
+      default:
+        return NULL_TREE;
+    }
+#undef SWITCH_MODE
+
+  fdef = ix86_builtins[(int)fcode];
+  frtype = TREE_TYPE (TREE_TYPE (fdef));
+ 
+  tmp = create_tmp_var (frtype, "tmp");
+  var = create_tmp_var (rettype, "tmp");
+
+  if (arg == -1)
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 2, v1, v0);
+    else
+      new_stmt = gimple_build_call (fdef, 2, v0, v1);
+  else
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 3, v0, v1, 
+                    build_int_cst (char_type_node, arg));
+    else
+      new_stmt = gimple_build_call (fdef, 3, v1, v0, 
+                    build_int_cst (char_type_node, arg));
+     
+  gimple_call_set_lhs (new_stmt, tmp); 
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  
+  return var;
+}
+
+/* Find target specific sequence for vector comparison of 
+   integer-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_int_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                    enum machine_mode mode, tree v0, tree v1,
+                    enum tree_code code)
+{
+  enum ix86_builtins feq, fgt;
+  tree var, t, tmp, tmp1, tmp2, defeq, defgt, gtrtype, eqrtype;
+  gimple new_stmt;
+
+  switch (mode)
+    {
+      /* SSE integer-type vectors.  */
+      case V2DImode:
+        if (!TARGET_SSE4_2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQQ;
+        fgt = IX86_BUILTIN_PCMPGTQ;
+        break;
+
+      case V4SImode:
+        if (!TARGET_SSE2) return NULL_TREE; 
+        feq = IX86_BUILTIN_PCMPEQD128;
+        fgt = IX86_BUILTIN_PCMPGTD128;
+        break;
+      
+      case V8HImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW128;
+        fgt = IX86_BUILTIN_PCMPGTW128;
+        break;
+      
+      case V16QImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB128;
+        fgt = IX86_BUILTIN_PCMPGTB128;
+        break;
+      
+      /* MMX integer-type vectors.  */
+      case V2SImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQD;
+        fgt = IX86_BUILTIN_PCMPGTD;
+        break;
+
+      case V4HImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW;
+        fgt = IX86_BUILTIN_PCMPGTW;
+        break;
+
+      case V8QImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB;
+        fgt = IX86_BUILTIN_PCMPGTB;
+        break;
+      
+      /* FIXME: Similar instructions for AVX.  */
+      default:
+        return NULL_TREE;
+    }
+
+  
+  var = create_tmp_var (rettype, "ret");
+  defeq = ix86_builtins[(int)feq];
+  defgt = ix86_builtins[(int)fgt];
+  eqrtype = TREE_TYPE (TREE_TYPE (defeq));
+  gtrtype = TREE_TYPE (TREE_TYPE (defgt));
+
+#define EQGT_CALL(gsi, stmt, var, op0, op1, gteq) \
+do { \
+  var = create_tmp_var (gteq ## rtype, "tmp"); \
+  stmt = gimple_build_call (def ## gteq, 2, op0, op1); \
+  gimple_call_set_lhs (stmt, var); \
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT); \
+} while (0)
+   
+  switch (code)
+    {
+      case EQ_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, eq);
+        break;
+
+      case NE_EXPR:
+        tmp = create_tmp_var (eqrtype, "tmp");
+
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, eq);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v0, eq);
+
+        /* t = tmp1 ^ {-1, -1,...}  */
+        t = gimplify_build2 (gsi, BIT_XOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+
+      case GT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, gt);
+        break;
+
+      case LT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v1, v0, gt);
+        break;
+
+      case GE_EXPR:
+        if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+      
+      case LE_EXPR:
+         if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v1, v0, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+     
+      default:
+        return NULL_TREE;
+    }
+#undef EQGT_CALL
+
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  return var;
+}
+
+/* Lower a comparison of two vectors V0 and V1, returning a 
+   variable with the result of comparison. Returns NULL_TREE
+   when it is impossible to find a target specific sequence.  */
+static tree 
+ix86_vectorize_builtin_vec_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                                    tree v0, tree v1, enum tree_code code)
+{
+  tree type;
+
+  /* Make sure we are comparing the same types.  */
+  if (TREE_TYPE (v0) != TREE_TYPE (v1)
+      || TREE_TYPE (TREE_TYPE (v0)) != TREE_TYPE (TREE_TYPE (v1)))
+    return NULL_TREE;
+  
+  type = TREE_TYPE (v0);
+  
+  /* Cannot compare packed unsigned integers 
+     unless it is EQ or NEQ operations.  */
+  if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE 
+      && TYPE_UNSIGNED (TREE_TYPE (type)))
+    if (code != EQ_EXPR && code != NE_EXPR)
+      return NULL_TREE;
+
+
+  if (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE)
+    return vector_fp_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+    return vector_int_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else
+    return NULL_TREE;
+}
+
 /* Return a vector mode with twice as many elements as VMODE.  */
 /* ??? Consider moving this to a table generated by genmodes.c.  */
 
@@ -35270,6 +35541,11 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   ix86_autovectorize_vector_sizes
 
+#undef TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+#define TARGET_VECTORIZE_BUILTIN_VEC_COMPARE \
+  ix86_vectorize_builtin_vec_compare
+
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function
 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-16 17:01       ` Artem Shinkarov
  2011-08-16 21:48         ` Artem Shinkarov
@ 2011-08-17 12:49         ` Richard Guenther
  2011-08-20 11:22           ` Uros Bizjak
  1 sibling, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-17 12:49 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: gcc-patches, Joseph S. Myers, Richard Henderson, Uros Bizjak

On Tue, Aug 16, 2011 at 6:35 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 16, 2011 at 4:28 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>>>> Index: gcc/fold-const.c
>>>> ===================================================================
>>>> --- gcc/fold-const.c    (revision 177665)
>>>> +++ gcc/fold-const.c    (working copy)
>>>> @@ -9073,34 +9073,61 @@ fold_comparison (location_t loc, enum tr
>>>>      floating-point, we can only do some of these simplifications.)  */
>>>>   if (operand_equal_p (arg0, arg1, 0))
>>>>     {
>>>> -      switch (code)
>>>> +      if (TREE_CODE (TREE_TYPE (arg0)) == VECTOR_TYPE)
>>>>        {
>>>> -       case EQ_EXPR:
>>>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>>> -           return constant_boolean_node (1, type);
>>>>
>>>> I think this change should go in a separate patch for improved
>>>> constant folding.  It shouldn't be necessary for enabling vector compares, no?
>>>
>>> Unfortunately no, this case must be covered here, otherwise x != x
>>> condition fails.
>>
>> How does it fail?
>
> When I have x > x, x == x, and so on, fold-const.c trigger
> operand_equal_p (arg0, arg1, 0), which returns true, and then it calls
>  constant_boolean_node (<val>, type). But the problem is that the
> result of the comparison is a vector,  not a boolean. So we have an
> assertion failure:
> test.c: In function ‘foo’:
> test.c:9:3: internal compiler error: in build_int_cst_wide, at tree.c:1222
> Please submit a full bug report,
> with preprocessed source if appropriate.

Ok, so we have to either avoid folding it (which would be a shame), or
define how true / false look like for vector typed comparison results.

The documentation above the tree code defintions for comparisons in
tree.def needs updating then, with something like

  and the value is either the type used by the language for booleans
  or an integer vector type of the same size and with the same number
  of elements as the comparison operands.  True for a vector of
  comparison results has all bits set while false is equal to zero.

or some better wording.

Then changing constant_boolean_node to return a proper true/false
vector would be the fix for your problem.

>>>> +      /* Currently the expansion of VEC_COND_EXPR does not allow
>>>> +        expessions where the type of vectors you compare differs
>>>> +        form the type of vectors you select from. For the time
>>>> +        being we insert implicit conversions.  */
>>>> +      if ((COMPARISON_CLASS_P (ifexp)
>>>>
>>>> Why only for comparison-class?
>>> Not only, there is || involved:
>>> (COMPARISON_CLASS_P (ifexp)  && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
>>> || TREE_TYPE (ifexp) != type1
>>>
>>> So if this is a comparison class, we check the first operand, because
>>> the result of the comparison fits, however the operands could not. In
>>> case we have an expression of signed vector, we know that we would
>>> transform it into exp != {0,0,...} in tree-vect-generic.c, but if the
>>> types of operands do not match we convert them.
>>
>> Hm, ok ... let's hope we can sort-out the backend issues before this
>> patch goes in so we can remove this converting stuff.
>
> Hm, I would hope that we could commit this patch even with this issue,
> because my feeling is that this case would produce errors on all the
> other architectures as well, as VEC_COND_EXPR is the feature heavily
> used in auto-vectorizer. So it means that all the backends must be
> fixed. And another argument, that this conversion is harmless.

It shouldn't be hard to fix all the backends.  And if we don't do it now
it will never happen.  I would expect that the codegen part of the
backends doesn't need any adjustments, just the patterns that
match what is supported.

Uros, can you convert x86 as an example?  Thus, for

(define_expand "vcond<mode>"
  [(set (match_operand:VF 0 "register_operand" "")
        (if_then_else:VF
          (match_operator 3 ""
            [(match_operand:VF 4 "nonimmediate_operand" "")
             (match_operand:VF 5 "nonimmediate_operand" "")])
          (match_operand:VF 1 "general_operand" "")
          (match_operand:VF 2 "general_operand" "")))]
  "TARGET_SSE"
{
  bool ok = ix86_expand_fp_vcond (operands);
  gcc_assert (ok);

allow any vector mode of the same size (and same number of elements?)
for the vcond mode and operand 1 and 2?  Thus, only restrict the
embedded comparison to VF?

> So I really hope that someone could shed some light or help me with
> this issue, but even if not I think that the current conversion is ok.
> However, I don't have any architectures different from x86.
[...]
>>>>
>>>> +  /* Expand VEC_COND_EXPR into a vector of scalar COND_EXPRs.  */
>>>> +  v = VEC_alloc(constructor_elt, gc, nunits);
>>>> +  for (i = 0; i < nunits;
>>>> +       i += 1, index = int_const_binop (PLUS_EXPR, index, part_width))
>>>> +    {
>>>> +      tree tcond = tree_vec_extract (gsi, inner_type, var, part_width, index);
>>>> +      tree a = tree_vec_extract (gsi, inner_type, vec0, part_width, index);
>>>> +      tree b = tree_vec_extract (gsi, inner_type, vec1, part_width, index);
>>>> +      tree rcond = gimplify_build2 (gsi, NE_EXPR, boolean_type_node, tcond,
>>>> +                                    build_int_cst (inner_type ,0));
>>>>
>>>> I seriously doubt that when expanding this part piecewise expanding
>>>> the mask first in either way is going to be beneficial.  Instead I would
>>>> suggest to "inline" the comparison here.  Thus instead of
>>>
>>> Well, the ting is that, if expand_vector_comparison, would insert
>>> builtin there rather than expanding the code piecewise, I'll have to
>>> do the comparison with 0 anyway, because true is expressed as -1
>>> there.
>>>
>>> Well, I would hope that in case we have:
>>> c_0 = a_0 > b_0;
>>> d_0 = c_0 != 0;
>>>
>>> {d_0, d_1,...}
>>>
>>> all the d_n should be constant-folded, or should I pull fold explicitly here?
>>>
>>> 1) I construct the mask
>>>>
>>>>  mask =
>>>>         = { mask[0] != 0 ? ... }
>>>>
>>>> do
>>>>
>>>>          = { c0[0] < c1[0] ? ..., }
>>>>
>>>> or even expand the ? : using mask operations if we efficiently can
>>>> create that mask.
>>>>
>>>
>>> I assume that if we cannot expand VEC_COND_EXPR, then masking the
>>> elements is a problem for us. Otherwise VEC_COND_EXPE expansion has a
>>> bug somewhere. Or I am wrong somewhere?
>>
>> I think we can always do bitwise operations, so if we can get at the
>> mask vector we are fine.
>>
>> I was thinking about how the case of explicitly computing the value of
>> v1 < v2 into a vector vs. a condition inside a VEC_COND_EXPR should
>> be handled.  If the target produces a mask of condition codes for
>> a comparison then it might be able to efficiently expand a VEC_COND_EXPR.
>> It could as well generate a mask via expanding v1 < v2 ? -1 : 0 then.
>> A similar case is for AMD XOP which can expand mask ? v1 : v2
>> with a single instruction (so even without seeing a comparison).
>>
>> Basically, if we can get at the mask we should use that to do the
>> vector selection in parallel via (v1 & mask) | (v2 & ~mask).
>>
>> If we cannot even get at the mask then we can build the result
>> vector piecewise as { v1[0] < v2[0] ? v1[0] : v2[0], .... } etc.
>
> Ok, I am perfectly fine to construct (v1 & mask) | (v2 & ~mask), the
> question is do  I need to check (v1 & mask) and (v2 & mask) or I can
> just blindly insert it? The problem is that we have a single veclower
> pass, so if I insert something that needs expansion, we would not have
> the second chance to expand it again.

I think you need to re-lower them, thus, insert a stmt and then immediately
lower it.

> I'll adjust the patch.
>
>>>>
>>>> +  /* Check if VEC_COND_EXPR is supported in hardware within the
>>>> +     given types.  */
>>>> +  if (code == VEC_COND_EXPR)
>>>> +    {
>>>> +      tree exp = gimple_assign_rhs1 (stmt);
>>>> +      tree cond = TREE_OPERAND (exp, 0);
>>>> +
>>>> +      /* If VEC_COND_EXPR is presented as A ? V0 : V1, we
>>>> +        change it to A != {0,0,...} ? V0 : V1  */
>>>> +      if (!COMPARISON_CLASS_P (cond))
>>>> +       TREE_OPERAND (exp, 0) =
>>>> +           build2 (NE_EXPR, TREE_TYPE (cond), cond,
>>>> +                   build_vector_from_val (TREE_TYPE (cond),
>>>> +                     build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
>>>>
>>>> That looks inefficient as well.  Iff we know that the mask is always
>>>> either {-1, -1 ..} or {0, 0 ...} then we can expand the ? : using
>>>> bitwise operations (see what the i?86 expander does, for example).
>>>
>>> This is a requirement of VEC_COND_EXPR, I need to pass 4 parameters,
>>> not 3, that is why I introduce this fake {0,0,..} here.
>>
>> Sure, but if you look at expand_vec_cond_expr_p then you don't need
>> that, and this fake comparison should instead be produced by the
>> expander (or really avoided by maybe splitting up the named pattern
>> into two).
>>
>> It's for sure not necessary for earlyexpand_vec_cond_expr (but instead
>> makes it less efficient - with just the mask it can do the bitwise
>> fallback easily).
>
> Richard, let me give you an example:
> #define vector(elcount, type)  \
> __attribute__((vector_size((elcount)*sizeof(type)))) type
>
> int
> foo (vector (4, int) i0, vector (4, int) i1, int x)
> {
>  i0 = i0 ? i1 : i0;
>  return i0[x];
> }
>
> when we optimize i0 ? i1 : i0, expand_vec_cond_expr_p  happily accepts
> that and says that it can expand this expression. Now after the
> veclowering is done, expand_vec_cond_expr calls vector_compare_rtx
> (op0, unsignedp, icode), which has an assertion:
> gcc_assert (COMPARISON_CLASS_P (cond));
> and of course it fails.

Yes, it's totally non-robust ;)  Specifically tailored for the
vectorizer so far.

> So someone needs to insert != {0,0...} expression. I do it in the
> tree-vect-geneic, but it could be done in expand_vec_cond_expr. The
> question is where?

For this obvious case in expand_vec_cond_expr.  It should handle
what expand_vec_cond_expr_p claims to handle ;)

> I can agree with you that I don't need to put this mask in case I
> expand vcond piecewise, I will adjust that, but actually it does not
> make much of a difference, in case expansion will use (v0 & mask) |
> (v1 & ~mask).
>
> Am I wrong somewhere?

Just in the place that should need fixing.

>>>> +/* Find target specific sequence for vector comparison of
>>>> +   real-type vectors V0 and V1. Returns variable containing
>>>> +   result of the comparison or NULL_TREE in other case.  */
>>>> +static tree
>>>> +vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype,
>>>> +                   enum machine_mode mode, tree v0, tree v1,
>>>> +                   enum tree_code code)
>>>> +{
>>>> +  enum ix86_builtins fcode;
>>>>
>>>> is there a reason we need this and cannot simply provide expanders
>>>> for the named patterns?  We'd need to give them semantics of
>>>> producing all-ones / all-zero masks of course.  Richard, do you think
>>>> that's sensible?  That way we'd avoid the new target hook and could
>>>> simply do optab queries.
>>>
>>> I think I don't really understand the idea. How we are going to
>>> represent the fact that we need to convert a given node to the given
>>> machine instruction? May be you could point where the similar
>>> technique is already used.
>>
>> In all places we check optab_handler (op, mode) != CODE_FOR_nothing.
>> We have eq_optab for example, so optab_handler (eq_optab, V4SImode)
>> would get you the instruction sequence for a comparison of V4SImode
>> vectors.  That isn't yet properly defined what it should return.
>>
>> Otherwise I'd say we should ask the target to expand
>> v1 < v2 as VEC_COND_EXPR (v1 < v2, -1, 0) instead.  That one could
>> as well special-case the -1 and 0 result vectors (and maybe it already
>> does).
>
> Ok, I can adjust the optab  checking for the mode, but I recall that
> we introduced the hook exactly because optabs did not return anything
> sensible. It was your idea :)

Heh, I don't remember ...

Still it's probably easiest (for now) to handle

 vec = v1 < v2;

the same as we would handle

 vec = v1 < v2 ? {-1,...} : {0,...};

during lowering (and even for expanding).  I tried to convince the vectorizer
to create a vectorized stand-alone comparison but failed, so it's probably
un-tested territory anyway.

At least the above would reduce the patch size considerably.

> Also, I don't like the idea to expand any comparison  to VEC_COND_EXPR
> (v1 < v2, -1, 0). Look at expand_vec_cond_expr, it would do the job
> only if there is an instruction vcond in the architecture, it checks
> for direct_optab_handler (vcond_optab, mode). But it is not
> necessarily the case that using vcond is as efficient as using
> comparison instructions. Also, we could run into the situation when
> vcond is not supported, but comparison is, or can't we?

That's unlikely as the vcond pattern is required by the vectorizer and
all targets implement it if they can handle the comparison part.  The
rest is just emulated by the target using the bitwise operation trick.

> Anyhow, I would think that we want to keep vcond and comparison separately.

Sure, but I think as the vcond case is in place already we can optimize
the comparison case separately if needed.  I would expect that the
code generated with your patch for

v = v1 < v2;
v = v1 < v2 ? {-1,...} : {0,...};

should be the same?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-16 21:48         ` Artem Shinkarov
@ 2011-08-17 12:58           ` Richard Guenther
  2011-08-17 15:27             ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-17 12:58 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

On Tue, Aug 16, 2011 at 11:12 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Hi, here is a new version of the patch with the adjustments.
>
> Two important comments.
> 1) At the moment when I expand expression  mask ? vec0 : vec1, I
> replace mask with (mask == {-1,-1,..}). The first reason is that
> expand_vec_cond_expr requires first operand to be a comparison. Second
> reason is that a mask {3, 4, -1, 5} should be transformed into
> {0,0,-1,0} in order to simulate vcond as ((vec0 & mask) | (vec1 &
> ~mask)). So in both cases we need this adjustment.

Well.  From a middle-end view I'd say that mask ? vec0 : vec1
should return (vec0 & mask) | (vec1 & ~mask) which is what
the XOP vcond instructions do, btw.  Only by defining
v1 < v2 to return a mask constrained to {-1|0, -1|0, ...} the
combination v1 < v2 ? vec0 : vec1 gets it's vector element
selection semantic (instead of being just a bitwise selection,
which it really is).

So no, I don't think we need to convert {3, 4, -1, 5} to {0,0,-1,0}
(that would surprise my anyway, I'd have expected {-1,-1,-1,-1} ;)).

Does OpenCL somehow support you here?

> 2) Vector comparison through optab.
> As far as I just have adjusted expand_vector_operation in
> tree-vect-generic.c, it would be called only when there is no
> sufficient optab. I is being checked in expand_vector_operations_1. So
> the only place where I try to find an optab for the comparison is
> expand_vec_cond_expr_piecewise, which I adjusted.
>
> As for the vector hook, it will be triggered only when we don't have
> an appropriate optab.
>
> bootstrapped and tested on x86_64-unknown-linux-gnu.
> Anything else?

I didn't yet look at the updated patch, I'll wait for another update that
eventually follows my comments to your earlier mail.

Richard.

>
> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-17 12:58           ` Richard Guenther
@ 2011-08-17 15:27             ` Artem Shinkarov
  2011-08-17 16:14               ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-17 15:27 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

Hi

Several comments before the new version of the patch.
1) x != x
I am happy to adjust constant_boolean_node, but look at the code
around line 9074 in fold-const.c, you will see that x <op> x
elimination, even with adjusted constant_boolean_node, will look about
the same as my code. Because I need to check the parameters (!FLOAT_P,
 HONOR_NANS) on TREE_TYPE (arg0) not arg0, and I need to construct
constant_boolean_node (-1), not 1 in case of true.
But I will change constant_boolean_node to accept vector types.

2) comparison vs vcond
v = v1 < v2;
v = v1 < v2 ? {-1,...} : {0,...};

are not the same.
16,25c16,22
<       movdqa  .LC1(%rip), %xmm1
<       pshufd  $225, %xmm1, %xmm1
<       pshufd  $39, %xmm0, %xmm0
<       movss   %xmm2, %xmm1
<       pshufd  $225, %xmm1, %xmm1
<       pcmpgtd %xmm1, %xmm0
<       pcmpeqd %xmm1, %xmm1
<       pcmpeqd %xmm1, %xmm0
<       pand    %xmm1, %xmm0
<       movdqa  %xmm0, -24(%rsp)
---
>       pshufd  $39, %xmm0, %xmm1
>       movdqa  .LC1(%rip), %xmm0
>       pshufd  $225, %xmm0, %xmm0
>       movss   %xmm2, %xmm0
>       pshufd  $225, %xmm0, %xmm0
>       pcmpgtd %xmm0, %xmm1
>       movdqa  %xmm1, -24(%rsp)

So I would keep the hook, it could be removed at any time when the
standard expansion will start to work fine.

3) mask ? vec0 : vec1
So no, I don't think we need to convert {3, 4, -1, 5} to {0,0,-1,0}
(that would surprise my anyway, I'd have expected {-1,-1,-1,-1} ;)).

Does OpenCL somehow support you here?

OpenCL says that vector operation mask ? vec0 : vec1 is the same as
select (vec0, vec1, mask). The semantics of select operation is the
following:

gentype select (gentype a, gentype b, igentype c)
For each component of a vector type,
result[i] = if MSB of c[i] is set ? b[i] : a[i].

I am not sure what they really understand using the term MSB. As far
as I know MSB is Most Significant Bit, so does it mean that in case of
3-bit integer 100 would trigger true but 011 would be still false...

My reading would be that if all bits set, then take the first element,
otherwise the second.

It is also confusing when  a ? vec0 : vec1, and a != 0 ? vec0 vec1
produce different results. So I would stick to all bits set being true
scenario.

4) Backend stuff. Ok, we could always fall back to reject the cases
when cond and operands have different type, and then fix the backend.

Adjustments are coming.

Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-17 15:27             ` Artem Shinkarov
@ 2011-08-17 16:14               ` Richard Guenther
  2011-08-17 17:07                 ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-17 16:14 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

On Wed, Aug 17, 2011 at 3:30 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Hi
>
> Several comments before the new version of the patch.
> 1) x != x
> I am happy to adjust constant_boolean_node, but look at the code
> around line 9074 in fold-const.c, you will see that x <op> x
> elimination, even with adjusted constant_boolean_node, will look about
> the same as my code. Because I need to check the parameters (!FLOAT_P,
>  HONOR_NANS) on TREE_TYPE (arg0) not arg0, and I need to construct
> constant_boolean_node (-1), not 1 in case of true.
> But I will change constant_boolean_node to accept vector types.

Hm, that should be handled transparently if you look at the defines
of FLOAT_TYPE_P and the HONOR_* macros.

>
> 2) comparison vs vcond
> v = v1 < v2;
> v = v1 < v2 ? {-1,...} : {0,...};
>
> are not the same.
> 16,25c16,22
> <       movdqa  .LC1(%rip), %xmm1
> <       pshufd  $225, %xmm1, %xmm1
> <       pshufd  $39, %xmm0, %xmm0
> <       movss   %xmm2, %xmm1
> <       pshufd  $225, %xmm1, %xmm1
> <       pcmpgtd %xmm1, %xmm0
> <       pcmpeqd %xmm1, %xmm1
> <       pcmpeqd %xmm1, %xmm0
> <       pand    %xmm1, %xmm0
> <       movdqa  %xmm0, -24(%rsp)
> ---
>>       pshufd  $39, %xmm0, %xmm1
>>       movdqa  .LC1(%rip), %xmm0
>>       pshufd  $225, %xmm0, %xmm0
>>       movss   %xmm2, %xmm0
>>       pshufd  $225, %xmm0, %xmm0
>>       pcmpgtd %xmm0, %xmm1
>>       movdqa  %xmm1, -24(%rsp)
>
> So I would keep the hook, it could be removed at any time when the
> standard expansion will start to work fine.

Which one is which?  I'd really like to make this patch simpler at first,
and removing that hook is an obvious thing that _should_ be possible,
even optimally (by fixing the targets).

> 3) mask ? vec0 : vec1
> So no, I don't think we need to convert {3, 4, -1, 5} to {0,0,-1,0}
> (that would surprise my anyway, I'd have expected {-1,-1,-1,-1} ;)).
>
> Does OpenCL somehow support you here?
>
> OpenCL says that vector operation mask ? vec0 : vec1 is the same as
> select (vec0, vec1, mask). The semantics of select operation is the
> following:
>
> gentype select (gentype a, gentype b, igentype c)
> For each component of a vector type,
> result[i] = if MSB of c[i] is set ? b[i] : a[i].
>
> I am not sure what they really understand using the term MSB. As far
> as I know MSB is Most Significant Bit, so does it mean that in case of
> 3-bit integer 100 would trigger true but 011 would be still false...

Yes, MSB is Most Significant Bit - that's a somewhat odd definition ;)

> My reading would be that if all bits set, then take the first element,
> otherwise the second.
>
> It is also confusing when  a ? vec0 : vec1, and a != 0 ? vec0 vec1
> produce different results. So I would stick to all bits set being true
> scenario.

For the middle-end part definitely.  Thus I'd simply leave the mask alone.

> 4) Backend stuff. Ok, we could always fall back to reject the cases
> when cond and operands have different type, and then fix the backend.
>
> Adjustments are coming.
>
>
> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-17 16:14               ` Richard Guenther
@ 2011-08-17 17:07                 ` Artem Shinkarov
  2011-08-17 21:18                   ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-17 17:07 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

On Wed, Aug 17, 2011 at 3:58 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Aug 17, 2011 at 3:30 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> Hi
>>
>> Several comments before the new version of the patch.
>> 1) x != x
>> I am happy to adjust constant_boolean_node, but look at the code
>> around line 9074 in fold-const.c, you will see that x <op> x
>> elimination, even with adjusted constant_boolean_node, will look about
>> the same as my code. Because I need to check the parameters (!FLOAT_P,
>>  HONOR_NANS) on TREE_TYPE (arg0) not arg0, and I need to construct
>> constant_boolean_node (-1), not 1 in case of true.
>> But I will change constant_boolean_node to accept vector types.
>
> Hm, that should be handled transparently if you look at the defines
> of FLOAT_TYPE_P and the HONOR_* macros.
>

Ok, Currently I have this, what do you think:
      int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0;
      tree arg0_type = TREE_CODE (type) == VECTOR_TYPE
		       ? TREE_TYPE (TREE_TYPE (arg0)) : TREE_TYPE (arg0);
	switch (code)
	  {
	  case EQ_EXPR:
	    if (! FLOAT_TYPE_P (arg0_type)
		|| ! HONOR_NANS (TYPE_MODE (arg0_type)))
	      return constant_boolean_node (true_val, type);
	    break;

	  case GE_EXPR:
	  case LE_EXPR:
	    if (! FLOAT_TYPE_P (arg0_type)
		|| ! HONOR_NANS (TYPE_MODE (arg0_type)))
	      return constant_boolean_node (true_val, type);
	    return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);

	  case NE_EXPR:
	    /* For NE, we can only do this simplification if integer
	       or we don't honor IEEE floating point NaNs.  */
	    if (FLOAT_TYPE_P (arg0_type)
		&& HONOR_NANS (TYPE_MODE (arg0_type)))
	      break;
	    /* ... fall through ...  */
	  case GT_EXPR:
	  case LT_EXPR:
	    return constant_boolean_node (0, type);
	  default:
	    gcc_unreachable ();
	  }

Works fine for both vector and scalar cases.

>>
>> 2) comparison vs vcond
>> v = v1 < v2;
>> v = v1 < v2 ? {-1,...} : {0,...};
>>
>> are not the same.
>> 16,25c16,22
>> <       movdqa  .LC1(%rip), %xmm1
>> <       pshufd  $225, %xmm1, %xmm1
>> <       pshufd  $39, %xmm0, %xmm0
>> <       movss   %xmm2, %xmm1
>> <       pshufd  $225, %xmm1, %xmm1
>> <       pcmpgtd %xmm1, %xmm0
>> <       pcmpeqd %xmm1, %xmm1
>> <       pcmpeqd %xmm1, %xmm0
>> <       pand    %xmm1, %xmm0
>> <       movdqa  %xmm0, -24(%rsp)
>> ---
>>>       pshufd  $39, %xmm0, %xmm1
>>>       movdqa  .LC1(%rip), %xmm0
>>>       pshufd  $225, %xmm0, %xmm0
>>>       movss   %xmm2, %xmm0
>>>       pshufd  $225, %xmm0, %xmm0
>>>       pcmpgtd %xmm0, %xmm1
>>>       movdqa  %xmm1, -24(%rsp)
>>
>> So I would keep the hook, it could be removed at any time when the
>> standard expansion will start to work fine.
>
> Which one is which?

You must be joking. :)
The first one (inefficient) is vec0 > vec1 ? {-1,...} : {0,...}
The second is vec0 > vec1. expand_vec_cond_expr is stupid, which is
fine, but it means that we need to construct it carefully.

> I'd really like to make this patch simpler at first,
> and removing that hook is an obvious thing that _should_ be possible,
> even optimally (by fixing the targets).

Ok, let's remove the hook, then could you provide some more
information rather than we just need to do it?

Simple in this case means inefficient -- I would hope to make it
efficient as well.

>> 3) mask ? vec0 : vec1
>> So no, I don't think we need to convert {3, 4, -1, 5} to {0,0,-1,0}
>> (that would surprise my anyway, I'd have expected {-1,-1,-1,-1} ;)).
>>
>> Does OpenCL somehow support you here?
>>
>> OpenCL says that vector operation mask ? vec0 : vec1 is the same as
>> select (vec0, vec1, mask). The semantics of select operation is the
>> following:
>>
>> gentype select (gentype a, gentype b, igentype c)
>> For each component of a vector type,
>> result[i] = if MSB of c[i] is set ? b[i] : a[i].
>>
>> I am not sure what they really understand using the term MSB. As far
>> as I know MSB is Most Significant Bit, so does it mean that in case of
>> 3-bit integer 100 would trigger true but 011 would be still false...
>
> Yes, MSB is Most Significant Bit - that's a somewhat odd definition ;)
>
>> My reading would be that if all bits set, then take the first element,
>> otherwise the second.
>>
>> It is also confusing when  a ? vec0 : vec1, and a != 0 ? vec0 vec1
>> produce different results. So I would stick to all bits set being true
>> scenario.
>
> For the middle-end part definitely.  Thus I'd simply leave the mask alone.
>

Well, it seems very unnatural to me. In the case of scalars mask ?
val0 : val1 would not work the same way as (mask & val0) | (~mask  &
val1), why should we have the same behaviour for the vector stuff?


>> 4) Backend stuff. Ok, we could always fall back to reject the cases
>> when cond and operands have different type, and then fix the backend.
>>
>> Adjustments are coming.
>>
>>
>> Artem.
>>
>

New issue about transforming cond to cons == {-1, ..} in
expand_vec_cond_expr. When I do this:
  icode = get_vcond_icode (vec_cond_type, mode);
  if (icode == CODE_FOR_nothing)
    return 0;

  /* If OP0 is not a comparison, adjust it by transforming to
     the expression OP0 == {-1, -1, ...}  */
  if (!COMPARISON_CLASS_P (op0))
    op0 = build2 (EQ_EXPR, TREE_TYPE (op0), op0,
		  build_vector_from_val (TREE_TYPE (op0),
		  build_int_cst (TREE_TYPE (TREE_TYPE (op0)), -1)));

I run into the trouble that the constant vector which I insert, cannot
be expanded, and  compiler fails with assertion.

This happens on my machine:
Linux temanbk 2.6.38-gentoo-r4 #3 SMP Mon Aug 8 00:32:30 BST 2011
x86_64 Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz GenuineIntel
GNU/Linux

When I run a comparison of vectors of 64-bit integers. They are
lowered in the veclower, but if I insert them in expand_vec_cond_expr,
I receive an error. However expand_vec_cond_expr_p happily accepts it.


Thanks,
Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-17 17:07                 ` Artem Shinkarov
@ 2011-08-17 21:18                   ` Artem Shinkarov
  2011-08-18  1:22                     ` Joseph S. Myers
  2011-08-18 10:21                     ` Richard Guenther
  0 siblings, 2 replies; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-17 21:18 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 6787 bytes --]

On Wed, Aug 17, 2011 at 4:28 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Wed, Aug 17, 2011 at 3:58 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Wed, Aug 17, 2011 at 3:30 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> Hi
>>>
>>> Several comments before the new version of the patch.
>>> 1) x != x
>>> I am happy to adjust constant_boolean_node, but look at the code
>>> around line 9074 in fold-const.c, you will see that x <op> x
>>> elimination, even with adjusted constant_boolean_node, will look about
>>> the same as my code. Because I need to check the parameters (!FLOAT_P,
>>>  HONOR_NANS) on TREE_TYPE (arg0) not arg0, and I need to construct
>>> constant_boolean_node (-1), not 1 in case of true.
>>> But I will change constant_boolean_node to accept vector types.
>>
>> Hm, that should be handled transparently if you look at the defines
>> of FLOAT_TYPE_P and the HONOR_* macros.
>>
>
> Ok, Currently I have this, what do you think:
>      int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0;
>      tree arg0_type = TREE_CODE (type) == VECTOR_TYPE
>                       ? TREE_TYPE (TREE_TYPE (arg0)) : TREE_TYPE (arg0);
>        switch (code)
>          {
>          case EQ_EXPR:
>            if (! FLOAT_TYPE_P (arg0_type)
>                || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>              return constant_boolean_node (true_val, type);
>            break;
>
>          case GE_EXPR:
>          case LE_EXPR:
>            if (! FLOAT_TYPE_P (arg0_type)
>                || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>              return constant_boolean_node (true_val, type);
>            return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);
>
>          case NE_EXPR:
>            /* For NE, we can only do this simplification if integer
>               or we don't honor IEEE floating point NaNs.  */
>            if (FLOAT_TYPE_P (arg0_type)
>                && HONOR_NANS (TYPE_MODE (arg0_type)))
>              break;
>            /* ... fall through ...  */
>          case GT_EXPR:
>          case LT_EXPR:
>            return constant_boolean_node (0, type);
>          default:
>            gcc_unreachable ();
>          }
>
> Works fine for both vector and scalar cases.

Please ignore this comment.

>
>>>
>>> 2) comparison vs vcond
>>> v = v1 < v2;
>>> v = v1 < v2 ? {-1,...} : {0,...};
>>>
>>> are not the same.
>>> 16,25c16,22
>>> <       movdqa  .LC1(%rip), %xmm1
>>> <       pshufd  $225, %xmm1, %xmm1
>>> <       pshufd  $39, %xmm0, %xmm0
>>> <       movss   %xmm2, %xmm1
>>> <       pshufd  $225, %xmm1, %xmm1
>>> <       pcmpgtd %xmm1, %xmm0
>>> <       pcmpeqd %xmm1, %xmm1
>>> <       pcmpeqd %xmm1, %xmm0
>>> <       pand    %xmm1, %xmm0
>>> <       movdqa  %xmm0, -24(%rsp)
>>> ---
>>>>       pshufd  $39, %xmm0, %xmm1
>>>>       movdqa  .LC1(%rip), %xmm0
>>>>       pshufd  $225, %xmm0, %xmm0
>>>>       movss   %xmm2, %xmm0
>>>>       pshufd  $225, %xmm0, %xmm0
>>>>       pcmpgtd %xmm0, %xmm1
>>>>       movdqa  %xmm1, -24(%rsp)
>>>
>>> So I would keep the hook, it could be removed at any time when the
>>> standard expansion will start to work fine.
>>
>> Which one is which?
>
> You must be joking. :)
> The first one (inefficient) is vec0 > vec1 ? {-1,...} : {0,...}
> The second is vec0 > vec1. expand_vec_cond_expr is stupid, which is
> fine, but it means that we need to construct it carefully.

This is still important.

>
>> I'd really like to make this patch simpler at first,
>> and removing that hook is an obvious thing that _should_ be possible,
>> even optimally (by fixing the targets).
>
> Ok, let's remove the hook, then could you provide some more
> information rather than we just need to do it?
>
> Simple in this case means inefficient -- I would hope to make it
> efficient as well.

This is very important.

>>> 3) mask ? vec0 : vec1
>>> So no, I don't think we need to convert {3, 4, -1, 5} to {0,0,-1,0}
>>> (that would surprise my anyway, I'd have expected {-1,-1,-1,-1} ;)).
>>>
>>> Does OpenCL somehow support you here?
>>>
>>> OpenCL says that vector operation mask ? vec0 : vec1 is the same as
>>> select (vec0, vec1, mask). The semantics of select operation is the
>>> following:
>>>
>>> gentype select (gentype a, gentype b, igentype c)
>>> For each component of a vector type,
>>> result[i] = if MSB of c[i] is set ? b[i] : a[i].
>>>
>>> I am not sure what they really understand using the term MSB. As far
>>> as I know MSB is Most Significant Bit, so does it mean that in case of
>>> 3-bit integer 100 would trigger true but 011 would be still false...
>>
>> Yes, MSB is Most Significant Bit - that's a somewhat odd definition ;)
>>
>>> My reading would be that if all bits set, then take the first element,
>>> otherwise the second.
>>>
>>> It is also confusing when  a ? vec0 : vec1, and a != 0 ? vec0 vec1
>>> produce different results. So I would stick to all bits set being true
>>> scenario.
>>
>> For the middle-end part definitely.  Thus I'd simply leave the mask alone.
>>
>
> Well, it seems very unnatural to me. In the case of scalars mask ?
> val0 : val1 would not work the same way as (mask & val0) | (~mask  &
> val1), why should we have the same behaviour for the vector stuff?
>

And that.

>>> 4) Backend stuff. Ok, we could always fall back to reject the cases
>>> when cond and operands have different type, and then fix the backend.
>>>
>>> Adjustments are coming.
>>>
>>>
>>> Artem.
>>>
>>
>
> New issue about transforming cond to cons == {-1, ..} in
> expand_vec_cond_expr. When I do this:
>  icode = get_vcond_icode (vec_cond_type, mode);
>  if (icode == CODE_FOR_nothing)
>    return 0;
>
>  /* If OP0 is not a comparison, adjust it by transforming to
>     the expression OP0 == {-1, -1, ...}  */
>  if (!COMPARISON_CLASS_P (op0))
>    op0 = build2 (EQ_EXPR, TREE_TYPE (op0), op0,
>                  build_vector_from_val (TREE_TYPE (op0),
>                  build_int_cst (TREE_TYPE (TREE_TYPE (op0)), -1)));
>
> I run into the trouble that the constant vector which I insert, cannot
> be expanded, and  compiler fails with assertion.
>
> This happens on my machine:
> Linux temanbk 2.6.38-gentoo-r4 #3 SMP Mon Aug 8 00:32:30 BST 2011
> x86_64 Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz GenuineIntel
> GNU/Linux
>
> When I run a comparison of vectors of 64-bit integers. They are
> lowered in the veclower, but if I insert them in expand_vec_cond_expr,
> I receive an error. However expand_vec_cond_expr_p happily accepts it.

This is solved as well.

And here is a new version of the patch. Tested and bootstrapped.


Artem.

[-- Attachment #2: vector-compare-vcond-5.diff --]
[-- Type: text/plain, Size: 54023 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 177665)
+++ gcc/doc/extend.texi	(working copy)
@@ -6553,6 +6553,97 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+In C vector comparison is supported within standard comparison operators:
+@code{==, !=, <, <=, >, >=}. Both integer-type and real-type vectors
+can be compared but only of the same type. The result of the
+comparison is a signed integer-type vector where the size of each
+element must be the same as the size of compared vectors element.
+Comparison is happening element by element. False value is 0, true
+value is -1 (constant of the appropriate type where all bits are set).
+Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
+c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
+@end smallexample
+
+In addition to the vector comparison C supports conditional expressions
+where the condition is a vector of signed integers. In that case result
+of the condition is used as a mask to select either from the first 
+operand or from the second. Consider the following example:
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,7@};
+v4si c = @{2,3,4,5@};
+v4si d = @{6,7,8,9@};
+v4si res;
+
+res = a >= b ? c : d;  /* res would contain @{6, 3, 4, 9@}  */
+@end smallexample
+
+The number of elements in the condition must be the same as number of
+elements in the both operands. The same stands for the size of the type
+of the elements. The type of the vector conditional is determined by
+the types of the operands which must be the same. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+typedef float v4f __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{2,3,4,5@};
+v4f f = @{1.,  5., 7., -8.@};
+v4f g = @{3., -2., 8.,  1.@};
+v4si ires;
+v4f fres;
+
+fres = a <= b ? f : g;  /* fres would contain @{1., 5., 7., -8.@}  */
+ires = f <= g ? a : b;  /* fres would contain @{1,  3,  3,   4@}  */
+@end smallexample
+
+For the convenience condition in the vector conditional can be just a
+vector of signed integer type. In that case this vector is implicitly
+compared with vectors of zeroes. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+
+ires = a ? b : a;  /* synonym for ires = a != @{0,0,0,0@} ? a :b;  */
+@end smallexample
+
+Pleas note that the conditional where the operands are vectors and the
+condition is integer works in a standard way -- returns first operand
+if the condition is true and second otherwise. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+int x,y;
+
+/* standard conditional returning A or B  */
+ires = x > y ? a : b;  
+
+/* vector conditional where the condition is (x > y ? a : b)  */
+ires = (x > y ? a : b) ? b : a; 
+@end smallexample
+
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 177665)
+++ gcc/doc/tm.texi	(working copy)
@@ -5738,6 +5738,10 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_COMPARE (gimple_stmt_iterator *@var{gsi}, tree @var{type}, tree @var{v0}, tree @var{v1}, enum tree_code @var{code})
+This hook should check whether it is possible to express vectorcomparison using the hardware-specific instructions and return resulttree. Hook should return NULL_TREE if expansion is impossible.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_PERM (tree @var{type}, tree *@var{mask_element_type})
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 177665)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -5676,6 +5676,8 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@hook TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+
 @hook TARGET_VECTORIZE_BUILTIN_VEC_PERM
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 177665)
+++ gcc/targhooks.c	(working copy)
@@ -969,6 +969,18 @@ default_builtin_vector_alignment_reachab
   return true;
 }
 
+/* Replaces vector comparison with the target-specific instructions 
+   and returns the resulting variable or NULL_TREE otherwise.  */
+tree 
+default_builtin_vec_compare (gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, 
+                             tree type ATTRIBUTE_UNUSED, 
+                             tree v0 ATTRIBUTE_UNUSED, 
+                             tree v1 ATTRIBUTE_UNUSED, 
+                             enum tree_code code ATTRIBUTE_UNUSED)
+{
+  return NULL_TREE;
+}
+
 /* By default, assume that a target supports any factor of misalignment
    memory access if it supports movmisalign patten.
    is_packed is true if the memory access is defined in a packed struct.  */
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 177665)
+++ gcc/targhooks.h	(working copy)
@@ -86,6 +86,11 @@ extern int default_builtin_vectorization
 extern tree default_builtin_reciprocal (unsigned int, bool, bool);
 
 extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
+
+extern tree default_builtin_vec_compare (gimple_stmt_iterator *gsi, 
+                                         tree type, tree v0, tree v1, 
+                                         enum tree_code code);
+
 extern bool
 default_builtin_support_vector_misalignment (enum machine_mode mode,
 					     const_tree,
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 177665)
+++ gcc/target.def	(working copy)
@@ -988,6 +988,15 @@ DEFHOOK
  bool, (tree vec_type, tree mask),
  hook_bool_tree_tree_true)
 
+/* Implement hardware vector comparison or return false.  */
+DEFHOOK
+(builtin_vec_compare,
+ "This hook should check whether it is possible to express vector\
+comparison using the hardware-specific instructions and return result\
+tree. Hook should return NULL_TREE if expansion is impossible.",
+ tree, (gimple_stmt_iterator *gsi, tree type, tree v0, tree v1, enum tree_code code),
+ default_builtin_vec_compare)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 177665)
+++ gcc/optabs.c	(working copy)
@@ -6572,6 +6572,13 @@ expand_vec_cond_expr (tree vec_cond_type
   if (icode == CODE_FOR_nothing)
     return 0;
 
+  /* If OP0 is not a comparison, adjust it by transforming to 
+     the expression OP0 == {-1, -1, ...}  */
+  if (!COMPARISON_CLASS_P (op0))
+    op0 = build2 (EQ_EXPR, TREE_TYPE (op0), op0,
+		  build_vector_from_val (TREE_TYPE (op0),
+		  build_int_cst (TREE_TYPE (TREE_TYPE (op0)), -1)));
+  
   comparison = vector_compare_rtx (op0, unsignedp, icode);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 177665)
+++ gcc/target.h	(working copy)
@@ -51,6 +51,7 @@
 #define GCC_TARGET_H
 
 #include "insn-modes.h"
+#include "gimple.h"
 
 #ifdef ENABLE_CHECKING
 
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 177665)
+++ gcc/fold-const.c	(working copy)
@@ -5930,12 +5930,21 @@ extract_muldiv_1 (tree t, tree c, enum t
 }
 \f
 /* Return a node which has the indicated constant VALUE (either 0 or
-   1), and is of the indicated TYPE.  */
+   1 for scalars and is either {-1,-1,..} or {0,0,...} for vectors), 
+   and is of the indicated TYPE.  */
 
 tree
 constant_boolean_node (int value, tree type)
 {
-  if (type == integer_type_node)
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      tree tval;
+      
+      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
+      tval = build_int_cst (TREE_TYPE (type), value);
+      return build_vector_from_val (type, tval);
+    }
+  else if (type == integer_type_node)
     return value ? integer_one_node : integer_zero_node;
   else if (type == boolean_type_node)
     return value ? boolean_true_node : boolean_false_node;
@@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr
      floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
     {
+      int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0;
+      tree arg0_type = TREE_TYPE (arg0);
+      
       switch (code)
 	{
 	case EQ_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-	    return constant_boolean_node (1, type);
+	  if (! FLOAT_TYPE_P (arg0_type)
+	      || ! HONOR_NANS (TYPE_MODE (arg0_type)))
+	    return constant_boolean_node (true_val, type);
 	  break;
 
 	case GE_EXPR:
 	case LE_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-	    return constant_boolean_node (1, type);
+	  if (! FLOAT_TYPE_P (arg0_type)
+	      || ! HONOR_NANS (TYPE_MODE (arg0_type)))
+	    return constant_boolean_node (true_val, type);
 	  return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);
 
 	case NE_EXPR:
 	  /* For NE, we can only do this simplification if integer
 	     or we don't honor IEEE floating point NaNs.  */
-	  if (FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      && HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	  if (FLOAT_TYPE_P (arg0_type)
+	      && HONOR_NANS (TYPE_MODE (arg0_type)))
 	    break;
 	  /* ... fall through ...  */
 	case GT_EXPR:
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
@@ -0,0 +1,78 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(count, res, i0, i1, c0, c1, op, fmt0, fmt1) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if ((res)[__i] != \
+                ((i0)[__i] op (i1)[__i]  \
+		? (c0)[__i] : (c1)[__i]))  \
+	{ \
+            __builtin_printf (fmt0 " != (" fmt1 " " #op " " fmt1 " ? " \
+			      fmt0 " : " fmt0 ")", \
+	    (res)[__i], (i0)[__i], (i1)[__i],\
+	    (c0)[__i], (c1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, c0, c1, res, fmt0, fmt1); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >, fmt0, fmt1); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >=, fmt0, fmt1); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <, fmt0, fmt1); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <=, fmt0, fmt1); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, ==, fmt0, fmt1); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, !=, fmt0, fmt1); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+  vector (4, int) i0 = {argc, 1,  2,  10}; 
+  vector (4, int) i1 = {0, argc, 2, (int)-23};
+  vector (4, int) ires;
+  vector (4, float) f0 = {1., 7., (float)argc, 4.};
+  vector (4, float) f1 = {6., 2., 8., (float)argc};
+  vector (4, float) fres;
+
+  vector (2, double) d0 = {1., (double)argc};
+  vector (2, double) d1 = {6., 2.};
+  vector (2, double) dres;
+  vector (2, long) l0 = {argc, 3};
+  vector (2, long) l1 = {5,  8};
+  vector (2, long) lres;
+  
+  /* Thes tests work fine.  */
+  test (4, i0, i1, f0, f1, fres, "%f", "%i");
+  test (4, f0, f1, i0, i1, ires, "%i", "%f");
+  test (2, d0, d1, l0, l1, lres, "%i", "%f");
+  test (2, l0, l1, d0, d1, dres, "%f", "%i");
+
+  /* Condition expressed with a single variable.  */
+  dres = l0 ? d0 : d1;
+  check_compare (2, dres, l0, ((vector (2, long)){-1,-1}), d0, d1, ==, "%f", "%i");
+  
+  lres = l1 ? l0 : l1;
+  check_compare (2, lres, l1, ((vector (2, long)){-1,-1}), l0, l1, ==, "%i", "%i");
+ 
+  fres = i0 ? f0 : f1;
+  check_compare (4, fres, i0, ((vector (4, int)){-1,-1,-1,-1}), 
+		 f0, f1, ==, "%f", "%i");
+
+  ires = i1 ? i0 : i1;
+  check_compare (4, ires, i1, ((vector (4, int)){-1,-1,-1,-1}), 
+		 i0, i1, ==, "%i", "%i");
+
+  return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
@@ -0,0 +1,123 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define check_compare(count, res, i0, i1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+      if ((res)[__i] != ((i0)[__i] op (i1)[__i] ? -1 : 0)) \
+	{ \
+            __builtin_printf ("%i != ((" fmt " " #op " " fmt " ? -1 : 0) ", \
+			      (res)[__i], (i0)[__i], (i1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, res, fmt); \
+do { \
+    res = (v0 > v1); \
+    check_compare (count, res, v0, v1, >, fmt); \
+    res = (v0 < v1); \
+    check_compare (count, res, v0, v1, <, fmt); \
+    res = (v0 >= v1); \
+    check_compare (count, res, v0, v1, >=, fmt); \
+    res = (v0 <= v1); \
+    check_compare (count, res, v0, v1, <=, fmt); \
+    res = (v0 == v1); \
+    check_compare (count, res, v0, v1, ==, fmt); \
+    res = (v0 != v1); \
+    check_compare (count, res, v0, v1, !=, fmt); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0;
+    vector (4, INT) i1;
+    vector (4, int) ires;
+    int i;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, i0, i1, ires, "%i");
+#undef INT
+
+#define INT unsigned int 
+    vector (4, int) ures;
+    vector (4, INT) u0;
+    vector (4, INT) u1;
+
+    u0 = (vector (4, INT)){argc, 1,  2,  10};
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, u0, u1, ures, "%u");
+#undef INT
+
+
+#define SHORT short
+    vector (8, SHORT) s0;
+    vector (8, SHORT) s1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, s0, s1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;
+    vector (8, SHORT) us1;
+    vector (8, short) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, us0, us1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;
+    vector (16, CHAR) c1;
+    vector (16, signed char) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, c0, c1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;
+    vector (16, CHAR) uc1;
+    vector (16, signed char) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, uc0, uc1, ucres, "%u");
+#undef CHAR
+/* Float comparison.  */
+    vector (4, float) f0;
+    vector (4, float) f1;
+    vector (4, int) ifres;
+
+    f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
+    f1 = (vector (4, float)){0., 3., 2., (float)-23};    
+    test (4, f0, f1, ifres, "%f");
+    
+/* Double comparison.  */
+    vector (2, double) d0;
+    vector (2, double) d1;
+    vector (2, long) idres;
+
+    d0 = (vector (2, double)){(double)argc,  10.};
+    d1 = (vector (2, double)){0., (double)-23};    
+    test (2, d0, d1, idres, "%f");
+
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
@@ -0,0 +1,154 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(type, count, res, i0, i1, c0, c1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if (vidx (type, res, __i) != \
+                ((vidx (type, i0, __i) op vidx (type, i1, __i))  \
+		? vidx (type, c0, __i) : vidx (type, c1, __i)))  \
+	{ \
+            __builtin_printf (fmt " != ((" fmt " " #op " " fmt ") ? " fmt " : " fmt ")", \
+	    vidx (type, res, __i), vidx (type, i0, __i), vidx (type, i1, __i),\
+	    vidx (type, c0, __i), vidx (type, c1, __i)); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(type, count, v0, v1, c0, c1, res, fmt); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >, fmt); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >=, fmt); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <, fmt); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <=, fmt); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, ==, fmt); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, !=, fmt); \
+} while (0)
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0; vector (4, INT) i1;
+    vector (4, INT) ic0; vector (4, INT) ic1;
+    vector (4, INT) ires;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    ic0 = (vector (4, INT)){1, argc,  argc,  10};
+    ic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, i0, i1, ic0, ic1, ires, "%i");
+#undef INT
+
+#define INT  unsigned int
+    vector (4, INT) ui0; vector (4, INT) ui1;
+    vector (4, INT) uic0; vector (4, INT) uic1;
+    vector (4, INT) uires;
+
+    ui0 = (vector (4, INT)){argc, 1,  2,  10};
+    ui1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    uic0 = (vector (4, INT)){1, argc,  argc,  10};
+    uic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, ui0, ui1, uic0, uic1, uires, "%u");
+#undef INT
+
+#define SHORT short
+    vector (8, SHORT) s0;   vector (8, SHORT) s1;
+    vector (8, SHORT) sc0;   vector (8, SHORT) sc1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    sc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    sc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, s0, s1, sc0, sc1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;   vector (8, SHORT) us1;
+    vector (8, SHORT) usc0;   vector (8, SHORT) usc1;
+    vector (8, SHORT) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    usc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    usc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, us0, us1, usc0, usc1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;   vector (16, CHAR) c1;
+    vector (16, CHAR) cc0;   vector (16, CHAR) cc1;
+    vector (16, CHAR) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    cc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    cc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, c0, c1, cc0, cc1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;   vector (16, CHAR) uc1;
+    vector (16, CHAR) ucc0;   vector (16, CHAR) ucc1;
+    vector (16, CHAR) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    ucc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    ucc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, uc0, uc1, ucc0, ucc1, ucres, "%u");
+#undef CHAR
+
+/* Float version.  */
+   vector (4, float) f0 = {1., 7., (float)argc, 4.};
+   vector (4, float) f1 = {6., 2., 8., (float)argc};
+   vector (4, float) fc0 = {3., 12., 4., (float)argc};
+   vector (4, float) fc1 = {7., 5., (float)argc, 6.};
+   vector (4, float) fres;
+
+   test (float, 4, f0, f1, fc0, fc1, fres, "%f");
+
+/* Double version.  */
+   vector (2, double) d0 = {1., (double)argc};
+   vector (2, double) d1 = {6., 2.};
+   vector (2, double) dc0 = {(double)argc, 7.};
+   vector (2, double) dc1 = {7., 5.};
+   vector (2, double) dres;
+
+   //test (double, 2, d0, d1, dc0, dc1, dres, "%f");
+
+
+   return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+/* Check that constant folding in 
+   these simple cases works.  */
+vector (4, int)
+foo (vector (4, int) x)
+{
+  return   (x == x) + (x != x) + (x >  x) 
+	 + (x <  x) + (x >= x) + (x <= x);
+}
+
+int 
+main (int argc, char *argv[])
+{
+  vector (4, int) t = {argc, 2, argc, 42};
+  vector (4, int) r;
+  int i;
+
+  r = foo (t);
+
+  for (i = 0; i < 4; i++)
+    if (r[i] != -3)
+      __builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+void
+foo (vector (4, int) x, vector (4, float) y)
+{
+  vector (4, int) p4;
+  vector (4, int) r4;
+  vector (4, unsigned int) q4;
+  vector (8, int) r8;
+  vector (4, float) f4;
+  
+  r4 = x > y;	    /* { dg-error "comparing vectors with different element types" } */
+  r8 = (x != p4);   /* { dg-error "incompatible types when assigning to type" } */
+  r8 == r4;	    /* { dg-error "comparing vectors with different number of elements" } */
+
+  r4 ? y : p4;	    /* { dg-error "vectors of different types involved in vector comparison" } */
+  r4 ? r4 : r8;	    /* { dg-error "vectors of different length found in vector comparison" } */
+  y ? f4 : y;	    /* { dg-error "non-integer type in vector condition" } */
+  
+  /* Do not trigger that  */
+  q4 ? p4 : r4;	    /* { "vector comparison must be of signed integer vector type" } */
+}
Index: gcc/testsuite/gcc.dg/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */   
+
+/* Test if C_MAYBE_CONST are folded correctly when 
+   creating VEC_COND_EXPR.  */
+
+typedef int vec __attribute__((vector_size(16)));
+
+vec i,j;
+extern vec a, b, c;
+
+vec 
+foo (int x)
+{
+  return (x ? i : j) ? a : b;
+}
+
+vec 
+bar (int x)
+{
+  return a ? (x ? i : j) : b;
+}
+
+vec 
+baz (int x)
+{
+  return a ? b : (x ? i : j);
+}
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 177665)
+++ gcc/c-typeck.c	(working copy)
@@ -4058,6 +4058,94 @@ build_conditional_expr (location_t colon
   type2 = TREE_TYPE (op2);
   code2 = TREE_CODE (type2);
 
+  if (TREE_CODE (TREE_TYPE (ifexp)) == VECTOR_TYPE)
+    {
+      bool maybe_const = true;
+      tree sc;
+      
+      if (TREE_CODE (type1) != VECTOR_TYPE
+	  || TREE_CODE (type2) != VECTOR_TYPE)
+        {
+          error_at (colon_loc, "vector comparisom arguments must be of "
+                               "type vector");
+          return error_mark_node;
+        }
+
+      if (TREE_CODE (TREE_TYPE (TREE_TYPE (ifexp))) != INTEGER_TYPE)
+        {
+          error_at (colon_loc, "non-integer type in vector condition");
+          return error_mark_node;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
+          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
+             != TYPE_VECTOR_SUBPARTS (type1))
+        {
+          error_at (colon_loc, "vectors of different length found in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+      
+      if (TREE_TYPE (type1) != TREE_TYPE (type2))
+        {
+          error_at (colon_loc, "vectors of different types involved in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+
+      if (TYPE_SIZE (TREE_TYPE (TREE_TYPE (ifexp))) 
+          != TYPE_SIZE (TREE_TYPE (type1)))
+        {
+          error_at (colon_loc, "vector-condition element type must be "
+                               "the same as result vector element type");
+          return error_mark_node;
+        }
+      
+      /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
+      sc = c_fully_fold (ifexp, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	ifexp = c_wrap_maybe_const (sc, true);
+      else
+	ifexp = sc;
+      
+      sc = c_fully_fold (op1, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	op1 = c_wrap_maybe_const (sc, true);
+      else
+	op1 = sc;
+      
+      sc = c_fully_fold (op2, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	op2 = c_wrap_maybe_const (sc, true);
+      else
+	op2 = sc;
+
+      /* Currently the expansion of VEC_COND_EXPR does not allow
+	 expessions where the type of vectors you compare differs
+	 form the type of vectors you select from. For the time
+	 being we insert implicit conversions.  */
+      if ((COMPARISON_CLASS_P (ifexp)
+	   && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
+	  || TREE_TYPE (ifexp) != type1)
+	{
+	  tree comp_type = COMPARISON_CLASS_P (ifexp)
+			   ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
+			   : TREE_TYPE (ifexp);
+	  tree vcond;
+	  
+	  op1 = convert (comp_type, op1);
+	  op2 = convert (comp_type, op2);
+	  vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
+	  vcond = convert (type1, vcond);
+	  return vcond;
+	}
+      else
+	return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);
+    }
+
   /* C90 does not permit non-lvalue arrays in conditional expressions.
      In C99 they will be pointers by now.  */
   if (code1 == ARRAY_TYPE || code2 == ARRAY_TYPE)
@@ -9906,6 +9994,29 @@ build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -10018,6 +10129,29 @@ build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10425,6 +10559,10 @@ c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 177665)
+++ gcc/gimplify.c	(working copy)
@@ -7064,6 +7064,22 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
+        case VEC_COND_EXPR:
+	  {
+	    enum gimplify_status r0, r1, r2;
+
+	    r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+				post_p, is_gimple_condexpr, fb_rvalue);
+	    r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    recalculate_side_effects (*expr_p);
+
+	    ret = MIN (r0, MIN (r1, r2));
+	  }
+	  break;
+
 	case TARGET_MEM_REF:
 	  {
 	    enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE;
@@ -7348,6 +7364,11 @@ gimplify_expr (tree *expr_p, gimple_seq
 		{
 		  tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1));
 
+		  /* Vector comparisons is a valid gimple expression
+		     which could be lowered down later.  */
+		  if (TREE_CODE (type) == VECTOR_TYPE)
+		    goto expr_2;
+
 		  if (!AGGREGATE_TYPE_P (type))
 		    {
 		      tree org_type = TREE_TYPE (*expr_p);
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 177665)
+++ gcc/tree.def	(working copy)
@@ -704,7 +704,10 @@ DEFTREECODE (TRUTH_NOT_EXPR, "truth_not_
    The others are allowed only for integer (or pointer or enumeral)
    or real types.
    In all cases the operands will have the same type,
-   and the value is always the type used by the language for booleans.  */
+   and the value is either the type used by the language for booleans
+   or an integer vector type of the same size and with the same number
+   of elements as the comparison operands.  True for a vector of
+   comparison results has all bits set while false is equal to zero.  */
 DEFTREECODE (LT_EXPR, "lt_expr", tcc_comparison, 2)
 DEFTREECODE (LE_EXPR, "le_expr", tcc_comparison, 2)
 DEFTREECODE (GT_EXPR, "gt_expr", tcc_comparison, 2)
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 177665)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,11 +30,16 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "target.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
 #include "optabs.h"
 
+
+static void expand_vector_operations_1 (gimple_stmt_iterator *);
+
+
 /* Build a constant of type TYPE, made of VALUE's bits replicated
    every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision.  */
 static tree
@@ -125,6 +130,21 @@ do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0;  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree cond;
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  cond = gimplify_build2 (gsi, code, inner_type, a, b);
+  return gimplify_build3 (gsi, COND_EXPR, inner_type, cond, 
+                    build_int_cst (inner_type, -1),
+                    build_int_cst (inner_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -333,6 +353,24 @@ uniform_vector_p (tree vec)
   return NULL_TREE;
 }
 
+/* Try to expand vector comparison expression OP0 CODE OP1 using  
+   builtin_vec_compare hardware hook, in case target does not 
+   support comparison of type TYPE, extract comparison piecewise.  
+   GSI is used inside the target hook to create the code needed
+   for the given comparison.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+ tree t = targetm.vectorize.builtin_vec_compare (gsi, type, op0, op1, code);
+
+  if (t == NULL_TREE)
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  return t;
+
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -375,8 +413,24 @@ expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
-
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+        return expand_vector_comparison (gsi, type,
+                                      gimple_assign_rhs1 (assign),
+                                      gimple_assign_rhs2 (assign), code);
       default:
 	break;
       }
@@ -432,6 +486,64 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+
+/* Expand vector condition EXP which should have the form
+   VEC_COND_EXPR<cond, vec0, vec1> into the following
+   vector:
+     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
+   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
+static tree
+expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
+{
+  tree cond = TREE_OPERAND (exp, 0);
+  tree vec0 = TREE_OPERAND (exp, 1);
+  tree vec1 = TREE_OPERAND (exp, 2);
+  tree type = TREE_TYPE (vec0);
+  tree lhs, rhs, notmask;
+  tree var, new_rhs;
+  optab op = NULL;
+  gimple new_stmt;
+  gimple_stmt_iterator gsi_tmp;
+  tree t;
+
+  if (!COMPARISON_CLASS_P (cond))
+    cond = build2 (EQ_EXPR, TREE_TYPE (cond), cond,
+			    build_vector_from_val (TREE_TYPE (cond),
+			    build_int_cst (TREE_TYPE (TREE_TYPE (cond)), -1)));
+     
+  /* Expand vector condition inside of VEC_COND_EXPR.  */
+  op = optab_for_tree_code (TREE_CODE (cond), type, optab_default);
+  if (!op || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing)
+    {
+      var = create_tmp_reg (TREE_TYPE (cond), "cond");
+      new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
+					  TREE_OPERAND (cond, 0),
+					  TREE_OPERAND (cond, 1),
+					  TREE_CODE (cond));
+      new_stmt = gimple_build_assign (var, new_rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (gsi_stmt (*gsi));
+    }
+  else
+    var = cond;
+  
+  gsi_tmp = *gsi;
+  gsi_prev (&gsi_tmp);
+
+  /* Expand VCOND<mask, v0, v1> to ((v0 & mask) | (v1 & ~mask))  */
+  lhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, var, vec0);
+  notmask = gimplify_build1 (gsi, BIT_NOT_EXPR, type, var);
+  rhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, notmask, vec1);
+  t = gimplify_build2 (gsi, BIT_IOR_EXPR, type, lhs, rhs);
+
+  /* Run vecower on the expresisons we have introduced.  */
+  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
+    expand_vector_operations_1 (&gsi_tmp);
+  
+  return t;
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -451,6 +563,23 @@ expand_vector_operations_1 (gimple_stmt_
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
 
+  /* Check if VEC_COND_EXPR is supported in hardware within the
+     given types.  */
+  if (code == VEC_COND_EXPR)
+    {
+      tree exp = gimple_assign_rhs1 (stmt);
+      if (expand_vec_cond_expr_p (TREE_TYPE (exp), 
+                                  TYPE_MODE (TREE_TYPE (exp))))
+        {
+	  update_stmt (gsi_stmt (*gsi));
+	  return;
+        }
+        
+      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
+      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
+      update_stmt (gsi_stmt (*gsi));
+    }
+
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 177665)
+++ gcc/Makefile.in	(working copy)
@@ -888,7 +888,7 @@ EXCEPT_H = except.h $(HASHTAB_H) vecprim
 TARGET_DEF = target.def target-hooks-macros.h
 C_TARGET_DEF = c-family/c-target.def target-hooks-macros.h
 COMMON_TARGET_DEF = common/common-target.def target-hooks-macros.h
-TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
+TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
 C_TARGET_H = c-family/c-target.h $(C_TARGET_DEF)
 COMMON_TARGET_H = common/common-target.h $(INPUT_H) $(COMMON_TARGET_DEF)
 MACHMODE_H = machmode.h mode-classes.def insn-modes.h
@@ -919,8 +919,9 @@ TREE_H = tree.h all-tree.def tree.def c-
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) cfghooks.h
 GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
-	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TARGET_H) tree-ssa-operands.h \
+	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TGT) tree-ssa-operands.h \
 	tree-ssa-alias.h $(INTERNAL_FN_H)
+TARGET_H = $(TGT) gimple.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 COVERAGE_H = coverage.h $(GCOV_IO_H)
 DEMANGLE_H = $(srcdir)/../include/demangle.h
@@ -3185,7 +3186,7 @@ tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h target.h
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 177665)
+++ gcc/tree-cfg.c	(working copy)
@@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
       return true;
     }
 
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      if (TREE_CODE (op0_type) != VECTOR_TYPE
+	  || TREE_CODE (op1_type) != VECTOR_TYPE)
+        {
+          error ("non-vector operands in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (!useless_type_conversion_p (op0_type, op1_type)
+	  && !useless_type_conversion_p (op1_type, op0_type))
+        {
+          error ("type mismatch in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type)) 
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+        
+      return false;
+    }
+
   /* For comparisons we do not have the operations type as the
      effective type the comparison is carried out in.  Instead
      we require that either the first operand is trivially
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 177665)
+++ gcc/c-parser.c	(working copy)
@@ -5339,6 +5339,15 @@ c_parser_conditional_expression (c_parse
       tree eptype = NULL_TREE;
 
       middle_loc = c_parser_peek_token (parser)->location;
+
+      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
+        {
+          error_at (middle_loc, "cannot ommit middle operator in "
+                                "vector comparison");
+          ret.value = error_mark_node;
+          return ret;
+        }
+      
       pedwarn (middle_loc, OPT_pedantic, 
 	       "ISO C forbids omitting the middle term of a ?: expression");
       warn_for_omitted_condop (middle_loc, cond.value);
@@ -5357,9 +5366,12 @@ c_parser_conditional_expression (c_parse
     }
   else
     {
-      cond.value
-	= c_objc_common_truthvalue_conversion
-	(cond_loc, default_conversion (cond.value));
+      if (TREE_CODE (TREE_TYPE (cond.value)) != VECTOR_TYPE)
+        {
+          cond.value
+            = c_objc_common_truthvalue_conversion
+            (cond_loc, default_conversion (cond.value));
+        }
       c_inhibit_evaluation_warnings += cond.value == truthvalue_false_node;
       exp1 = c_parser_expression_conv (parser);
       mark_exp_read (exp1.value);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177665)
+++ gcc/config/i386/i386.c	(working copy)
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -32827,6 +32828,276 @@ ix86_vectorize_builtin_vec_perm (tree ve
   return ix86_builtins[(int) fcode];
 }
 
+/* Find target specific sequence for vector comparison of 
+   real-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                   enum machine_mode mode, tree v0, tree v1,
+                   enum tree_code code)
+{
+  enum ix86_builtins fcode;
+  int arg = -1;
+  tree fdef, frtype, tmp, var, t;
+  gimple new_stmt;
+  bool reverse = false;
+
+#define SWITCH_MODE(mode, fcode, code, value) \
+switch (mode) \
+  { \
+    case V2DFmode: \
+      if (!TARGET_SSE2) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PD; \
+      break; \
+    case V4DFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPD256; \
+      arg = value; \
+      break; \
+    case V4SFmode: \
+      if (!TARGET_SSE) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PS; \
+      break; \
+    case V8SFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPS256; \
+      arg = value; \
+      break; \
+    default: \
+      return NULL_TREE; \
+    /* FIXME: Similar instructions for MMX.  */ \
+  }
+
+  switch (code)
+    {
+      case EQ_EXPR:
+        SWITCH_MODE (mode, fcode, EQ, 0);
+        break;
+      
+      case NE_EXPR:
+        SWITCH_MODE (mode, fcode, NEQ, 4);
+        break;
+      
+      case GT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        reverse = true;
+        break;
+      
+      case LT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        break;
+      
+      case LE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        break;
+
+      case GE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        reverse = true;
+        break;
+
+      default:
+        return NULL_TREE;
+    }
+#undef SWITCH_MODE
+
+  fdef = ix86_builtins[(int)fcode];
+  frtype = TREE_TYPE (TREE_TYPE (fdef));
+ 
+  tmp = create_tmp_var (frtype, "tmp");
+  var = create_tmp_var (rettype, "tmp");
+
+  if (arg == -1)
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 2, v1, v0);
+    else
+      new_stmt = gimple_build_call (fdef, 2, v0, v1);
+  else
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 3, v0, v1, 
+                    build_int_cst (char_type_node, arg));
+    else
+      new_stmt = gimple_build_call (fdef, 3, v1, v0, 
+                    build_int_cst (char_type_node, arg));
+     
+  gimple_call_set_lhs (new_stmt, tmp); 
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  
+  return var;
+}
+
+/* Find target specific sequence for vector comparison of 
+   integer-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_int_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                    enum machine_mode mode, tree v0, tree v1,
+                    enum tree_code code)
+{
+  enum ix86_builtins feq, fgt;
+  tree var, t, tmp, tmp1, tmp2, defeq, defgt, gtrtype, eqrtype;
+  gimple new_stmt;
+
+  switch (mode)
+    {
+      /* SSE integer-type vectors.  */
+      case V2DImode:
+        if (!TARGET_SSE4_2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQQ;
+        fgt = IX86_BUILTIN_PCMPGTQ;
+        break;
+
+      case V4SImode:
+        if (!TARGET_SSE2) return NULL_TREE; 
+        feq = IX86_BUILTIN_PCMPEQD128;
+        fgt = IX86_BUILTIN_PCMPGTD128;
+        break;
+      
+      case V8HImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW128;
+        fgt = IX86_BUILTIN_PCMPGTW128;
+        break;
+      
+      case V16QImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB128;
+        fgt = IX86_BUILTIN_PCMPGTB128;
+        break;
+      
+      /* MMX integer-type vectors.  */
+      case V2SImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQD;
+        fgt = IX86_BUILTIN_PCMPGTD;
+        break;
+
+      case V4HImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW;
+        fgt = IX86_BUILTIN_PCMPGTW;
+        break;
+
+      case V8QImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB;
+        fgt = IX86_BUILTIN_PCMPGTB;
+        break;
+      
+      /* FIXME: Similar instructions for AVX.  */
+      default:
+        return NULL_TREE;
+    }
+
+  
+  var = create_tmp_var (rettype, "ret");
+  defeq = ix86_builtins[(int)feq];
+  defgt = ix86_builtins[(int)fgt];
+  eqrtype = TREE_TYPE (TREE_TYPE (defeq));
+  gtrtype = TREE_TYPE (TREE_TYPE (defgt));
+
+#define EQGT_CALL(gsi, stmt, var, op0, op1, gteq) \
+do { \
+  var = create_tmp_var (gteq ## rtype, "tmp"); \
+  stmt = gimple_build_call (def ## gteq, 2, op0, op1); \
+  gimple_call_set_lhs (stmt, var); \
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT); \
+} while (0)
+   
+  switch (code)
+    {
+      case EQ_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, eq);
+        break;
+
+      case NE_EXPR:
+        tmp = create_tmp_var (eqrtype, "tmp");
+
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, eq);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v0, eq);
+
+        /* t = tmp1 ^ {-1, -1,...}  */
+        t = gimplify_build2 (gsi, BIT_XOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+
+      case GT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, gt);
+        break;
+
+      case LT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v1, v0, gt);
+        break;
+
+      case GE_EXPR:
+        if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+      
+      case LE_EXPR:
+         if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v1, v0, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+     
+      default:
+        return NULL_TREE;
+    }
+#undef EQGT_CALL
+
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  return var;
+}
+
+/* Lower a comparison of two vectors V0 and V1, returning a 
+   variable with the result of comparison. Returns NULL_TREE
+   when it is impossible to find a target specific sequence.  */
+static tree 
+ix86_vectorize_builtin_vec_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                                    tree v0, tree v1, enum tree_code code)
+{
+  tree type;
+
+  /* Make sure we are comparing the same types.  */
+  if (TREE_TYPE (v0) != TREE_TYPE (v1)
+      || TREE_TYPE (TREE_TYPE (v0)) != TREE_TYPE (TREE_TYPE (v1)))
+    return NULL_TREE;
+  
+  type = TREE_TYPE (v0);
+  
+  /* Cannot compare packed unsigned integers 
+     unless it is EQ or NEQ operations.  */
+  if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE 
+      && TYPE_UNSIGNED (TREE_TYPE (type)))
+    if (code != EQ_EXPR && code != NE_EXPR)
+      return NULL_TREE;
+
+
+  if (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE)
+    return vector_fp_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+    return vector_int_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else
+    return NULL_TREE;
+}
+
 /* Return a vector mode with twice as many elements as VMODE.  */
 /* ??? Consider moving this to a table generated by genmodes.c.  */
 
@@ -35270,6 +35541,11 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   ix86_autovectorize_vector_sizes
 
+#undef TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+#define TARGET_VECTORIZE_BUILTIN_VEC_COMPARE \
+  ix86_vectorize_builtin_vec_compare
+
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function
 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-17 21:18                   ` Artem Shinkarov
@ 2011-08-18  1:22                     ` Joseph S. Myers
  2011-08-18 11:37                       ` Artem Shinkarov
  2011-08-18 10:21                     ` Richard Guenther
  1 sibling, 1 reply; 91+ messages in thread
From: Joseph S. Myers @ 2011-08-18  1:22 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Richard Guenther, gcc-patches, Richard Henderson

On Wed, 17 Aug 2011, Artem Shinkarov wrote:

> +For the convenience condition in the vector conditional can be just a
> +vector of signed integer type. In that case this vector is implicitly
> +compared with vectors of zeroes. Consider an example:

Where is this bit tested in the testcases added?

> +      if (TREE_CODE (type1) != VECTOR_TYPE
> +	  || TREE_CODE (type2) != VECTOR_TYPE)
> +        {
> +          error_at (colon_loc, "vector comparisom arguments must be of "
> +                               "type vector");

"comparison"

> +      /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
> +      sc = c_fully_fold (ifexp, false, &maybe_const);
> +      sc = save_expr (sc);
> +      if (!maybe_const)
> +	ifexp = c_wrap_maybe_const (sc, true);
> +      else
> +	ifexp = sc;

This looks like it's duplicating c_save_expr; that is, like "ifexp = 
c_save_expr (ifexp);" would suffice.

But, it's not clear that it actually achieves the effect described in the 
comment; have you actually tried with function calls, assignments etc. in 
the operands?  The code in build_binary_op uses save_expr rather than 
c_save_expr because it does some intermediate operations before calling 
c_wrap_maybe_const, and if you really want to avoid C_MAYBE_CONST in 
VEC_COND_EXPR then you'll need to continue calling save_expr, as here, but 
delay the call to c_wrap_maybe_const so that the whole VEC_COND_EXPR is 
wrapped if required.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-17 21:18                   ` Artem Shinkarov
  2011-08-18  1:22                     ` Joseph S. Myers
@ 2011-08-18 10:21                     ` Richard Guenther
  2011-08-18 11:24                       ` Artem Shinkarov
                                         ` (2 more replies)
  1 sibling, 3 replies; 91+ messages in thread
From: Richard Guenther @ 2011-08-18 10:21 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

On Wed, Aug 17, 2011 at 8:51 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Wed, Aug 17, 2011 at 4:28 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Wed, Aug 17, 2011 at 3:58 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Wed, Aug 17, 2011 at 3:30 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> Hi
>>>>
>>>> Several comments before the new version of the patch.
>>>> 1) x != x
>>>> I am happy to adjust constant_boolean_node, but look at the code
>>>> around line 9074 in fold-const.c, you will see that x <op> x
>>>> elimination, even with adjusted constant_boolean_node, will look about
>>>> the same as my code. Because I need to check the parameters (!FLOAT_P,
>>>>  HONOR_NANS) on TREE_TYPE (arg0) not arg0, and I need to construct
>>>> constant_boolean_node (-1), not 1 in case of true.
>>>> But I will change constant_boolean_node to accept vector types.
>>>
>>> Hm, that should be handled transparently if you look at the defines
>>> of FLOAT_TYPE_P and the HONOR_* macros.
>>>
>>
>> Ok, Currently I have this, what do you think:
>>      int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0;
>>      tree arg0_type = TREE_CODE (type) == VECTOR_TYPE
>>                       ? TREE_TYPE (TREE_TYPE (arg0)) : TREE_TYPE (arg0);
>>        switch (code)
>>          {
>>          case EQ_EXPR:
>>            if (! FLOAT_TYPE_P (arg0_type)
>>                || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>>              return constant_boolean_node (true_val, type);
>>            break;
>>
>>          case GE_EXPR:
>>          case LE_EXPR:
>>            if (! FLOAT_TYPE_P (arg0_type)
>>                || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>>              return constant_boolean_node (true_val, type);
>>            return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);
>>
>>          case NE_EXPR:
>>            /* For NE, we can only do this simplification if integer
>>               or we don't honor IEEE floating point NaNs.  */
>>            if (FLOAT_TYPE_P (arg0_type)
>>                && HONOR_NANS (TYPE_MODE (arg0_type)))
>>              break;
>>            /* ... fall through ...  */
>>          case GT_EXPR:
>>          case LT_EXPR:
>>            return constant_boolean_node (0, type);
>>          default:
>>            gcc_unreachable ();
>>          }
>>
>> Works fine for both vector and scalar cases.
>
> Please ignore this comment.
>
>>
>>>>
>>>> 2) comparison vs vcond
>>>> v = v1 < v2;
>>>> v = v1 < v2 ? {-1,...} : {0,...};
>>>>
>>>> are not the same.
>>>> 16,25c16,22
>>>> <       movdqa  .LC1(%rip), %xmm1
>>>> <       pshufd  $225, %xmm1, %xmm1
>>>> <       pshufd  $39, %xmm0, %xmm0
>>>> <       movss   %xmm2, %xmm1
>>>> <       pshufd  $225, %xmm1, %xmm1
>>>> <       pcmpgtd %xmm1, %xmm0
>>>> <       pcmpeqd %xmm1, %xmm1
>>>> <       pcmpeqd %xmm1, %xmm0
>>>> <       pand    %xmm1, %xmm0
>>>> <       movdqa  %xmm0, -24(%rsp)
>>>> ---
>>>>>       pshufd  $39, %xmm0, %xmm1
>>>>>       movdqa  .LC1(%rip), %xmm0
>>>>>       pshufd  $225, %xmm0, %xmm0
>>>>>       movss   %xmm2, %xmm0
>>>>>       pshufd  $225, %xmm0, %xmm0
>>>>>       pcmpgtd %xmm0, %xmm1
>>>>>       movdqa  %xmm1, -24(%rsp)
>>>>
>>>> So I would keep the hook, it could be removed at any time when the
>>>> standard expansion will start to work fine.
>>>
>>> Which one is which?
>>
>> You must be joking. :)

:)

>> The first one (inefficient) is vec0 > vec1 ? {-1,...} : {0,...}
>> The second is vec0 > vec1. expand_vec_cond_expr is stupid, which is
>> fine, but it means that we need to construct it carefully.
>
> This is still important.

Yes.  I think the backends need to handle optimizing this case,
esp. considering targets that do not have instructions to produce
a {-1,...}/{0,...} bitmask from a comparison but produce a vector
of condition codes.  With using vec0 > vec1 ? {-1...} : {0,...} for
mask = vec0 > vec1; we avoid exposing the result kind of
vector comparisons.

It should be easily possible for x86 for example to recognize
the -1 : 0 case.

>>> I'd really like to make this patch simpler at first,
>>> and removing that hook is an obvious thing that _should_ be possible,
>>> even optimally (by fixing the targets).
>>
>> Ok, let's remove the hook, then could you provide some more
>> information rather than we just need to do it?
>>
>> Simple in this case means inefficient -- I would hope to make it
>> efficient as well.
>
> This is very important.

Yes, and I think the fix is in the backends.  I still think we have to
sort out the best building blocks we want the targets to expose.
Currently we only have the vectorizer vcond patterns which should
be enough to get the C language support implemented.  After that
we should concentrate on generating efficient code for all variants.

>>>> 3) mask ? vec0 : vec1
>>>> So no, I don't think we need to convert {3, 4, -1, 5} to {0,0,-1,0}
>>>> (that would surprise my anyway, I'd have expected {-1,-1,-1,-1} ;)).
>>>>
>>>> Does OpenCL somehow support you here?
>>>>
>>>> OpenCL says that vector operation mask ? vec0 : vec1 is the same as
>>>> select (vec0, vec1, mask). The semantics of select operation is the
>>>> following:
>>>>
>>>> gentype select (gentype a, gentype b, igentype c)
>>>> For each component of a vector type,
>>>> result[i] = if MSB of c[i] is set ? b[i] : a[i].
>>>>
>>>> I am not sure what they really understand using the term MSB. As far
>>>> as I know MSB is Most Significant Bit, so does it mean that in case of
>>>> 3-bit integer 100 would trigger true but 011 would be still false...
>>>
>>> Yes, MSB is Most Significant Bit - that's a somewhat odd definition ;)
>>>
>>>> My reading would be that if all bits set, then take the first element,
>>>> otherwise the second.
>>>>
>>>> It is also confusing when  a ? vec0 : vec1, and a != 0 ? vec0 vec1
>>>> produce different results. So I would stick to all bits set being true
>>>> scenario.
>>>
>>> For the middle-end part definitely.  Thus I'd simply leave the mask alone.
>>>
>>
>> Well, it seems very unnatural to me. In the case of scalars mask ?
>> val0 : val1 would not work the same way as (mask & val0) | (~mask  &
>> val1), why should we have the same behaviour for the vector stuff?
>
> And that.

Yeah, well.  That's really a question for language lawyers ;)  I agree
that it would be nice to have mask ? val0 : val1 behave "the same"
for scalars and vectors.  The question is whether for vectors you
define it on the bit-level (which makes it equal to (mask & val0) |
(~mask & val1))
or on the vector component level.  The vector component level
is probably what people would expect.

Which means we have to treat mask ? val0 : val1 as
mask != {0,...} ? val0 : val1.

>>>> 4) Backend stuff. Ok, we could always fall back to reject the cases
>>>> when cond and operands have different type, and then fix the backend.
>>>>
>>>> Adjustments are coming.
>>>>
>>>>
>>>> Artem.
>>>>
>>>
>>
>> New issue about transforming cond to cons == {-1, ..} in
>> expand_vec_cond_expr. When I do this:
>>  icode = get_vcond_icode (vec_cond_type, mode);
>>  if (icode == CODE_FOR_nothing)
>>    return 0;
>>
>>  /* If OP0 is not a comparison, adjust it by transforming to
>>     the expression OP0 == {-1, -1, ...}  */
>>  if (!COMPARISON_CLASS_P (op0))
>>    op0 = build2 (EQ_EXPR, TREE_TYPE (op0), op0,
>>                  build_vector_from_val (TREE_TYPE (op0),
>>                  build_int_cst (TREE_TYPE (TREE_TYPE (op0)), -1)));
>>
>> I run into the trouble that the constant vector which I insert, cannot
>> be expanded, and  compiler fails with assertion.

I'd use != {0,0,...} as eventually a zero vector is cheaper to construct
and it supports the scalar ?: semantics - whenever the mask element
is non-zero it's true.

>> This happens on my machine:
>> Linux temanbk 2.6.38-gentoo-r4 #3 SMP Mon Aug 8 00:32:30 BST 2011
>> x86_64 Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz GenuineIntel
>> GNU/Linux
>>
>> When I run a comparison of vectors of 64-bit integers. They are
>> lowered in the veclower, but if I insert them in expand_vec_cond_expr,
>> I receive an error. However expand_vec_cond_expr_p happily accepts it.
>
> This is solved as well.
>
> And here is a new version of the patch. Tested and bootstrapped.

I'll look at it later.

Thanks,
Richard.

>
> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-18 10:21                     ` Richard Guenther
@ 2011-08-18 11:24                       ` Artem Shinkarov
  2011-08-18 15:05                         ` Artem Shinkarov
  2011-08-18 15:19                       ` Richard Henderson
  2011-08-29 12:54                       ` Paolo Bonzini
  2 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-18 11:24 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

> Yes.  I think the backends need to handle optimizing this case,
> esp. considering targets that do not have instructions to produce
> a {-1,...}/{0,...} bitmask from a comparison but produce a vector
> of condition codes.  With using vec0 > vec1 ? {-1...} : {0,...} for
> mask = vec0 > vec1; we avoid exposing the result kind of
> vector comparisons.
>
> It should be easily possible for x86 for example to recognize
> the -1 : 0 case.

Ok, I am fine with this approach. Ho could we check if vector
comparison returns {-1..}/{0..} or something else. If I can check
that, I could adjust expand_vec_cond_exrp, and get rid of the hook.

> Yes, and I think the fix is in the backends.  I still think we have to
> sort out the best building blocks we want the targets to expose.
> Currently we only have the vectorizer vcond patterns which should
> be enough to get the C language support implemented.  After that
> we should concentrate on generating efficient code for all variants.

Ok, see my above comment.

> Yeah, well.  That's really a question for language lawyers ;)  I agree
> that it would be nice to have mask ? val0 : val1 behave "the same"
> for scalars and vectors.  The question is whether for vectors you
> define it on the bit-level (which makes it equal to (mask & val0) |
> (~mask & val1))
> or on the vector component level.  The vector component level
> is probably what people would expect.
>
> Which means we have to treat mask ? val0 : val1 as
> mask != {0,...} ? val0 : val1.

> I'd use != {0,0,...} as eventually a zero vector is cheaper to construct
> and it supports the scalar ?: semantics - whenever the mask element
> is non-zero it's true.

Ok, I am fine with x != {0,...}, I can adjust it in both cases.


Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-18  1:22                     ` Joseph S. Myers
@ 2011-08-18 11:37                       ` Artem Shinkarov
  2011-08-18 14:20                         ` Joseph S. Myers
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-18 11:37 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Guenther, gcc-patches, Richard Henderson

On Wed, Aug 17, 2011 at 10:52 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Wed, 17 Aug 2011, Artem Shinkarov wrote:
>
>> +For the convenience condition in the vector conditional can be just a
>> +vector of signed integer type. In that case this vector is implicitly
>> +compared with vectors of zeroes. Consider an example:
>
> Where is this bit tested in the testcases added?

In the gcc.c-torture/execute/vector-vcond-2.c at the end of test-case:

  /* Condition expressed with a single variable.  */
  dres = l0 ? d0 : d1;
  check_compare (2, dres, l0, ((vector (2, long)){-1,-1}), d0, d1, ==,
"%f", "%i");

  lres = l1 ? l0 : l1;
  check_compare (2, lres, l1, ((vector (2, long)){-1,-1}), l0, l1, ==,
"%i", "%i");

  fres = i0 ? f0 : f1;
  check_compare (4, fres, i0, ((vector (4, int)){-1,-1,-1,-1}),
		 f0, f1, ==, "%f", "%i");

  ires = i1 ? i0 : i1;
  check_compare (4, ires, i1, ((vector (4, int)){-1,-1,-1,-1}),
		 i0, i1, ==, "%i", "%i");

>
>> +      if (TREE_CODE (type1) != VECTOR_TYPE
>> +       || TREE_CODE (type2) != VECTOR_TYPE)
>> +        {
>> +          error_at (colon_loc, "vector comparisom arguments must be of "
>> +                               "type vector");
>
> "comparison"

Thanks, adjusted.

>> +      /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
>> +      sc = c_fully_fold (ifexp, false, &maybe_const);
>> +      sc = save_expr (sc);
>> +      if (!maybe_const)
>> +     ifexp = c_wrap_maybe_const (sc, true);
>> +      else
>> +     ifexp = sc;
>
> This looks like it's duplicating c_save_expr; that is, like "ifexp =
> c_save_expr (ifexp);" would suffice.
>
> But, it's not clear that it actually achieves the effect described in the
> comment; have you actually tried with function calls, assignments etc. in
> the operands?

I tested it with gcc.dg/vector-compare-2.c:
typedef int vec __attribute__((vector_size(16)));

vec i,j;
extern vec a, b, c;

vec
foo (int x)
{
  return (x ? i : j) ? a : b;
}

vec
bar (int x)
{
  return a ? (x ? i : j) : b;
}

vec
baz (int x)
{
  return a ? b : (x ? i : j);
}

Is it good enough?

> The code in build_binary_op uses save_expr rather than
> c_save_expr because it does some intermediate operations before calling
> c_wrap_maybe_const, and if you really want to avoid C_MAYBE_CONST in
> VEC_COND_EXPR then you'll need to continue calling save_expr, as here, but
> delay the call to c_wrap_maybe_const so that the whole VEC_COND_EXPR is
> wrapped if required.

Ok, but I need to wrap it at some point, where do you think it would
be appropriate to do?


Thanks,
Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-18 11:37                       ` Artem Shinkarov
@ 2011-08-18 14:20                         ` Joseph S. Myers
  0 siblings, 0 replies; 91+ messages in thread
From: Joseph S. Myers @ 2011-08-18 14:20 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Richard Guenther, gcc-patches, Richard Henderson

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1897 bytes --]

On Thu, 18 Aug 2011, Artem Shinkarov wrote:

> >> +      /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
> >> +      sc = c_fully_fold (ifexp, false, &maybe_const);
> >> +      sc = save_expr (sc);
> >> +      if (!maybe_const)
> >> +     ifexp = c_wrap_maybe_const (sc, true);
> >> +      else
> >> +     ifexp = sc;
> >
> > This looks like it's duplicating c_save_expr; that is, like "ifexp =
> > c_save_expr (ifexp);" would suffice.
> >
> > But, it's not clear that it actually achieves the effect described in the
> > comment; have you actually tried with function calls, assignments etc. in
> > the operands?
> 
> I tested it with gcc.dg/vector-compare-2.c:
> typedef int vec __attribute__((vector_size(16)));
> 
> vec i,j;
> extern vec a, b, c;
> 
> vec
> foo (int x)
> {
>   return (x ? i : j) ? a : b;
> }
> 
> vec
> bar (int x)
> {
>   return a ? (x ? i : j) : b;
> }
> 
> vec
> baz (int x)
> {
>   return a ? b : (x ? i : j);
> }
> 
> Is it good enough?

No, because none of the operands there involve assignment, increment, 
decrement, function call or comma operators (which are the main cases that 
would trigger the creation of C_MAYBE_CONST_EXPR).

> > The code in build_binary_op uses save_expr rather than
> > c_save_expr because it does some intermediate operations before calling
> > c_wrap_maybe_const, and if you really want to avoid C_MAYBE_CONST in
> > VEC_COND_EXPR then you'll need to continue calling save_expr, as here, but
> > delay the call to c_wrap_maybe_const so that the whole VEC_COND_EXPR is
> > wrapped if required.
> 
> Ok, but I need to wrap it at some point, where do you think it would
> be appropriate to do?

After the creation of the VEC_COND_EXPR.  I.e. don't just return the 
results of build3 or convert, wrap them as needed before returning them.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-18 11:24                       ` Artem Shinkarov
@ 2011-08-18 15:05                         ` Artem Shinkarov
  0 siblings, 0 replies; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-18 15:05 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Joseph S. Myers, Richard Henderson

Richard, I am trying to make sure that when vcond has {-1} and {0} it
does not trigger masking. Currently I am doing this:

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c  (revision 177665)
+++ config/i386/i386.c  (working copy)
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -18434,7 +18435,30 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx t2, t3, x;
-
+  rtx mask_true;
+
+  rtvec v;
+  int units, i;
+  enum machine_mode inner;
+
+  units = GET_MODE_NUNITS (mode);
+  inner = GET_MODE_INNER (mode);
+  v = rtvec_alloc (units);
+  for (i = 0; i < units; ++i)
+    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (inner, -1);
+
+  mask_true = gen_rtx_raw_CONST_VECTOR (mode, v);
+
+  fprintf (stderr, "I am here\n");
+  debug_rtx (mask_true);
+  debug_rtx (op_true);
+  if (rtx_equal_p (op_true, mask_true))
+    {
+      fprintf (stderr, "Yes it is\n");
+      emit_insn (gen_rtx_SET (VOIDmode, dest, cmp));
+      return;
+    }
+  else
   if (op_false == CONST0_RTX (mode))
     {
       op_true = force_reg (mode, op_true);


It works out the case when mask is -1 very well, however in the code
generated by the expansion I still see excessive operations:

ires = i0 < i1 ? (vector (4, int)){-1,-1,-1,-1} : (vector (4, int)){0,0,0,0};

expands to:

pcmpgtd %xmm1, %xmm0
pcmpeqd %xmm1, %xmm1
pcmpeqd %xmm1, %xmm0
movdqa  %xmm0, -24(%rsp)

Where the code
ires = i0 < i1;

using my hook expands to:
pcmpgtd %xmm1, %xmm0
movdqa  %xmm0, -24(%rsp)


So someone is putting two extra instructions there, and I cannot
really figure out who is doing that. Anyone knows how could I fix
this...


Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-18 10:21                     ` Richard Guenther
  2011-08-18 11:24                       ` Artem Shinkarov
@ 2011-08-18 15:19                       ` Richard Henderson
  2011-08-19  8:17                         ` Artem Shinkarov
  2011-08-29 12:54                       ` Paolo Bonzini
  2 siblings, 1 reply; 91+ messages in thread
From: Richard Henderson @ 2011-08-18 15:19 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Artem Shinkarov, gcc-patches, Joseph S. Myers

On 08/18/2011 02:23 AM, Richard Guenther wrote:
>>> >> The first one (inefficient) is vec0 > vec1 ? {-1,...} : {0,...}
>>> >> The second is vec0 > vec1. expand_vec_cond_expr is stupid, which is
>>> >> fine, but it means that we need to construct it carefully.
>> >
>> > This is still important.
> Yes.  I think the backends need to handle optimizing this case,
> esp. considering targets that do not have instructions to produce
> a {-1,...}/{0,...} bitmask from a comparison but produce a vector
> of condition codes.  With using vec0 > vec1 ? {-1...} : {0,...} for
> mask = vec0 > vec1; we avoid exposing the result kind of
> vector comparisons.
> 
> It should be easily possible for x86 for example to recognize
> the -1 : 0 case.
> 

I think you've been glossing over the hard part with "..." up there.
I challenge you to actually fill that in with something meaningful
in rtl.

I suspect that you simply have to add another named pattern that
will Do What You Want on mips and suchlike that produce a CCmode.



r~

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-18 15:19                       ` Richard Henderson
@ 2011-08-19  8:17                         ` Artem Shinkarov
  2011-08-19 15:38                           ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-19  8:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Richard Guenther, gcc-patches, Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 2910 bytes --]

Hi, I had the problem with passing information about single variable
from expand_vec_cond_expr optab into ix86_expand_*_vcond.

I looked into it this problem for quite a while and found a solution.
Now the question if it could be done better.

First of all the problem:

If we represent any vector comparison with VEC_COND_EXPR < v0 <OP> v1
? {-1,...} : {0,...} >, then in the assembler we do not want to see
this useless comparison with {-1...}.

Now it is easy to fix the problem about excessive masking. The real
challenge starts when the comparison inside vcond is expressed as a
variable. In that case in order to construct correct vector expression
we need to adjust cond in cond ? v0 : v1 to  cond == {-1...} or as we
agreed recently cond != {0,..}. But hat we need to do only to make
vec_cond_expr happy. On the level of assembler we don't want this
condition.

Now, if I just construct the tree, then in x86, rtx_equal_p, does not
know that this is a constant vector full of -1, because the comparison
operands are not immediate. So I need somehow to mark the fact in
optabs, and then check the information in the x86.

At the moment I do something like this:

optabs:

if (!COMPARISON_CLASS_P (op0))
  ops[3] = gen_rtx_EQ (mode, NULL_RTX, NULL_RTX);

This expression is preserved while checking and verifying.

ix86:
if (GET_CODE (comp) == EQ && XEXP (comp, 0) == NULL_RTX
      && XEXP (comp, 1) == NULL_RTX)

See the patch attached for more details. The patch is just to give you
an idea of the way I am doing it and it seems to work. Please don't
criticise the patch itself, better help me to understand if there is a
better way to pass the information from optabs to ix86.


Thanks,
Artem.

On Thu, Aug 18, 2011 at 3:31 PM, Richard Henderson <rth@redhat.com> wrote:
> On 08/18/2011 02:23 AM, Richard Guenther wrote:
>>>> >> The first one (inefficient) is vec0 > vec1 ? {-1,...} : {0,...}
>>>> >> The second is vec0 > vec1. expand_vec_cond_expr is stupid, which is
>>>> >> fine, but it means that we need to construct it carefully.
>>> >
>>> > This is still important.
>> Yes.  I think the backends need to handle optimizing this case,
>> esp. considering targets that do not have instructions to produce
>> a {-1,...}/{0,...} bitmask from a comparison but produce a vector
>> of condition codes.  With using vec0 > vec1 ? {-1...} : {0,...} for
>> mask = vec0 > vec1; we avoid exposing the result kind of
>> vector comparisons.
>>
>> It should be easily possible for x86 for example to recognize
>> the -1 : 0 case.
>>
>
> I think you've been glossing over the hard part with "..." up there.
> I challenge you to actually fill that in with something meaningful
> in rtl.
>
> I suspect that you simply have to add another named pattern that
> will Do What You Want on mips and suchlike that produce a CCmode.
>
>
>
> r~
>

[-- Attachment #2: variable-in-vcond.diff --]
[-- Type: text/plain, Size: 5607 bytes --]

Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 177665)
+++ gcc/optabs.c	(working copy)
@@ -6557,6 +6557,8 @@ expand_vec_cond_expr_p (tree type, enum
 
 /* Generate insns for a VEC_COND_EXPR, given its TYPE and its
    three operands.  */
+rtx rtx_build_vector_from_val (enum machine_mode, HOST_WIDE_INT);
+rtx gen_const_vector1 (enum machine_mode, int);
 
 rtx
 expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
@@ -6572,16 +6574,39 @@ expand_vec_cond_expr (tree vec_cond_type
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (op0, unsignedp, icode);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
+  
+  if (COMPARISON_CLASS_P (op0))
+    {
+      comparison = vector_compare_rtx (op0, unsignedp, icode);
+      create_output_operand (&ops[0], target, mode);
+      create_input_operand (&ops[1], rtx_op1, mode);
+      create_input_operand (&ops[2], rtx_op2, mode);
+      create_fixed_operand (&ops[3], comparison);
+      create_fixed_operand (&ops[4], XEXP (comparison, 0));
+      create_fixed_operand (&ops[5], XEXP (comparison, 1));
+
+    }
+  else
+    {
+      enum rtx_code rcode;
+      rtx rtx_op0;
+      rtx vec; 
+    
+      rtx_op0 = expand_normal (op0);
+      rcode = get_rtx_code (EQ_EXPR, unsignedp);
+      comparison = gen_rtx_EQ (mode, NULL_RTX, NULL_RTX); 
+      vec = rtx_build_vector_from_val (mode, -1);
+
+      create_output_operand (&ops[0], target, mode);
+      create_input_operand (&ops[1], rtx_op1, mode);
+      create_input_operand (&ops[2], rtx_op2, mode);
+      create_input_operand (&ops[3], comparison, mode);
+      create_input_operand (&ops[4], rtx_op0, mode);
+      create_input_operand (&ops[5], vec, mode);
+    }
 
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_fixed_operand (&ops[3], comparison);
-  create_fixed_operand (&ops[4], XEXP (comparison, 0));
-  create_fixed_operand (&ops[5], XEXP (comparison, 1));
   expand_insn (icode, 6, ops);
   return ops[0].value;
 }
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177665)
+++ gcc/config/i386/i386.c	(working copy)
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -18402,6 +18403,23 @@ ix86_expand_sse_fp_minmax (rtx dest, enu
   return true;
 }
 
+/* Returns a vector of mode MODE where all the elements are ARG.  */
+rtx
+rtx_build_vector_from_val (enum machine_mode mode, HOST_WIDE_INT arg)
+{
+  rtvec v;
+  int units, i;
+  enum machine_mode inner;
+  
+  units = GET_MODE_NUNITS (mode);
+  inner = GET_MODE_INNER (mode);
+  v = rtvec_alloc (units);
+  for (i = 0; i < units; ++i)
+    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (inner, arg);
+  
+  return gen_rtx_raw_CONST_VECTOR (mode, v);
+}
+
 /* Expand an sse vector comparison.  Return the register with the result.  */
 
 static rtx
@@ -18411,18 +18429,28 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_
   enum machine_mode mode = GET_MODE (dest);
   rtx x;
 
-  cmp_op0 = force_reg (mode, cmp_op0);
-  if (!nonimmediate_operand (cmp_op1, mode))
-    cmp_op1 = force_reg (mode, cmp_op1);
+  /* Avoid useless comparison.  */
+  if (code == EQ 
+      && rtx_equal_p (cmp_op1, rtx_build_vector_from_val (mode, -1)))
+    {
+      cmp_op0 = force_reg (mode, cmp_op0);
+      x = cmp_op0;
+    }
+  else
+    {
+      cmp_op0 = force_reg (mode, cmp_op0);
+      if (!nonimmediate_operand (cmp_op1, mode))
+	cmp_op1 = force_reg (mode, cmp_op1);
+
+      x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1);
+    }
 
   if (optimize
       || reg_overlap_mentioned_p (dest, op_true)
       || reg_overlap_mentioned_p (dest, op_false))
     dest = gen_reg_rtx (mode);
 
-  x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1);
   emit_insn (gen_rtx_SET (VOIDmode, dest, x));
-
   return dest;
 }
 
@@ -18434,8 +18462,14 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx t2, t3, x;
-
-  if (op_false == CONST0_RTX (mode))
+  rtx mask_true;
+  
+  if (rtx_equal_p (op_true, rtx_build_vector_from_val (mode, -1))
+      && rtx_equal_p (op_false, CONST0_RTX (mode)))
+    {
+      emit_insn (gen_rtx_SET (VOIDmode, dest, cmp));
+    }
+  else if (op_false == CONST0_RTX (mode))
     {
       op_true = force_reg (mode, op_true);
       x = gen_rtx_AND (mode, cmp, op_true);
@@ -18569,7 +18603,9 @@ ix86_expand_int_vcond (rtx operands[])
   enum rtx_code code = GET_CODE (operands[3]);
   bool negate = false;
   rtx x, cop0, cop1;
+  rtx comp;
 
+  comp = operands[3];
   cop0 = operands[4];
   cop1 = operands[5];
 
@@ -18681,8 +18717,18 @@ ix86_expand_int_vcond (rtx operands[])
 	}
     }
 
-  x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
-			   operands[1+negate], operands[2-negate]);
+  if (GET_CODE (comp) == EQ && XEXP (comp, 0) == NULL_RTX 
+      && XEXP (comp, 1) == NULL_RTX)
+    {
+      rtx vec = rtx_build_vector_from_val (mode, -1);
+      x = ix86_expand_sse_cmp (operands[0], code, cop0, vec,
+			       operands[1+negate], operands[2-negate]);
+    }
+  else
+    {
+      x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
+			       operands[1+negate], operands[2-negate]);
+    }
 
   ix86_expand_sse_movcc (operands[0], x, operands[1+negate],
 			 operands[2-negate]);

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-19  8:17                         ` Artem Shinkarov
@ 2011-08-19 15:38                           ` Richard Guenther
  2011-08-19 16:28                             ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-19 15:38 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Richard Henderson, gcc-patches, Joseph S. Myers

On Fri, Aug 19, 2011 at 2:29 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Hi, I had the problem with passing information about single variable
> from expand_vec_cond_expr optab into ix86_expand_*_vcond.
>
> I looked into it this problem for quite a while and found a solution.
> Now the question if it could be done better.
>
> First of all the problem:
>
> If we represent any vector comparison with VEC_COND_EXPR < v0 <OP> v1
> ? {-1,...} : {0,...} >, then in the assembler we do not want to see
> this useless comparison with {-1...}.
>
> Now it is easy to fix the problem about excessive masking. The real
> challenge starts when the comparison inside vcond is expressed as a
> variable. In that case in order to construct correct vector expression
> we need to adjust cond in cond ? v0 : v1 to  cond == {-1...} or as we
> agreed recently cond != {0,..}. But hat we need to do only to make
> vec_cond_expr happy. On the level of assembler we don't want this
> condition.
>
> Now, if I just construct the tree, then in x86, rtx_equal_p, does not
> know that this is a constant vector full of -1, because the comparison
> operands are not immediate. So I need somehow to mark the fact in
> optabs, and then check the information in the x86.

Well, this is why I was suggesting the bitwise semantic for a mask
operand.  What we should do on the tree level (and that should happen
already), is forward the comparison into the COND_EXPR.  Thus,

mask = v1 < v2;
v3 = mask ? v4 : v5;

should get changed to

v3 = v1 < v2 ? v4 : v5;

by tree-ssa-forwprop.c.  If that is not happening we have to fix that there.

Because we _don't_ know the mask is all -1 or 0 ;)  The user might
put in {3, 5 ,1 3} and expect it to be treated like {-1,...} but it isn't
so already.

> At the moment I do something like this:
>
> optabs:
>
> if (!COMPARISON_CLASS_P (op0))
>  ops[3] = gen_rtx_EQ (mode, NULL_RTX, NULL_RTX);
>
> This expression is preserved while checking and verifying.
>
> ix86:
> if (GET_CODE (comp) == EQ && XEXP (comp, 0) == NULL_RTX
>      && XEXP (comp, 1) == NULL_RTX)
>
> See the patch attached for more details. The patch is just to give you
> an idea of the way I am doing it and it seems to work. Please don't
> criticise the patch itself, better help me to understand if there is a
> better way to pass the information from optabs to ix86.

Hm, I'm not sure the expand_vec_cond_expr will work that way,
I'd have to play with it myself (but will now be running for weekend).

Is the special-casing of a < b ? {-1,-1,-1} : {0,0,0,0} in the backend
working for you?  I think there are probably some rtl all-ones and all-zeros
predicates you can re-use.

Richard.

>
> Thanks,
> Artem.
>
> On Thu, Aug 18, 2011 at 3:31 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 08/18/2011 02:23 AM, Richard Guenther wrote:
>>>>> >> The first one (inefficient) is vec0 > vec1 ? {-1,...} : {0,...}
>>>>> >> The second is vec0 > vec1. expand_vec_cond_expr is stupid, which is
>>>>> >> fine, but it means that we need to construct it carefully.
>>>> >
>>>> > This is still important.
>>> Yes.  I think the backends need to handle optimizing this case,
>>> esp. considering targets that do not have instructions to produce
>>> a {-1,...}/{0,...} bitmask from a comparison but produce a vector
>>> of condition codes.  With using vec0 > vec1 ? {-1...} : {0,...} for
>>> mask = vec0 > vec1; we avoid exposing the result kind of
>>> vector comparisons.
>>>
>>> It should be easily possible for x86 for example to recognize
>>> the -1 : 0 case.
>>>
>>
>> I think you've been glossing over the hard part with "..." up there.
>> I challenge you to actually fill that in with something meaningful
>> in rtl.
>>
>> I suspect that you simply have to add another named pattern that
>> will Do What You Want on mips and suchlike that produce a CCmode.
>>
>>
>>
>> r~
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-19 15:38                           ` Richard Guenther
@ 2011-08-19 16:28                             ` Artem Shinkarov
  2011-08-20 10:14                               ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-19 16:28 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Richard Henderson, gcc-patches, Joseph S. Myers

On Fri, Aug 19, 2011 at 3:54 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Fri, Aug 19, 2011 at 2:29 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> Hi, I had the problem with passing information about single variable
>> from expand_vec_cond_expr optab into ix86_expand_*_vcond.
>>
>> I looked into it this problem for quite a while and found a solution.
>> Now the question if it could be done better.
>>
>> First of all the problem:
>>
>> If we represent any vector comparison with VEC_COND_EXPR < v0 <OP> v1
>> ? {-1,...} : {0,...} >, then in the assembler we do not want to see
>> this useless comparison with {-1...}.
>>
>> Now it is easy to fix the problem about excessive masking. The real
>> challenge starts when the comparison inside vcond is expressed as a
>> variable. In that case in order to construct correct vector expression
>> we need to adjust cond in cond ? v0 : v1 to  cond == {-1...} or as we
>> agreed recently cond != {0,..}. But hat we need to do only to make
>> vec_cond_expr happy. On the level of assembler we don't want this
>> condition.
>>
>> Now, if I just construct the tree, then in x86, rtx_equal_p, does not
>> know that this is a constant vector full of -1, because the comparison
>> operands are not immediate. So I need somehow to mark the fact in
>> optabs, and then check the information in the x86.
>
> Well, this is why I was suggesting the bitwise semantic for a mask
> operand.  What we should do on the tree level (and that should happen
> already), is forward the comparison into the COND_EXPR.  Thus,
>
> mask = v1 < v2;
> v3 = mask ? v4 : v5;
>
> should get changed to
>
> v3 = v1 < v2 ? v4 : v5;
>
> by tree-ssa-forwprop.c.  If that is not happening we have to fix that there.

Yeah, that is something I am working on.

> Because we _don't_ know the mask is all -1 or 0 ;)  The user might
> put in {3, 5 ,1 3} and expect it to be treated like {-1,...} but it isn't
> so already.
>
>> At the moment I do something like this:
>>
>> optabs:
>>
>> if (!COMPARISON_CLASS_P (op0))
>>  ops[3] = gen_rtx_EQ (mode, NULL_RTX, NULL_RTX);
>>
>> This expression is preserved while checking and verifying.
>>
>> ix86:
>> if (GET_CODE (comp) == EQ && XEXP (comp, 0) == NULL_RTX
>>      && XEXP (comp, 1) == NULL_RTX)
>>
>> See the patch attached for more details. The patch is just to give you
>> an idea of the way I am doing it and it seems to work. Please don't
>> criticise the patch itself, better help me to understand if there is a
>> better way to pass the information from optabs to ix86.
>
> Hm, I'm not sure the expand_vec_cond_expr will work that way,
> I'd have to play with it myself (but will now be running for weekend).
>
> Is the special-casing of a < b ? {-1,-1,-1} : {0,0,0,0} in the backend
> working for you?  I think there are probably some rtl all-ones and all-zeros
> predicates you can re-use.
>
> Richard.

It works fine. Masks all ones and all zeroes are predefined, all -1
are not, but I am switching to all zeroes. The real question is that
this special case of comparison with two empty operands is a little
bit hackish. On the other hand there should be no problem with that,
because operand 3 is used only to get the code of comparison, noone is
looking inside the arguments, so we could use this fact. The question
is whether there is a better way.

Thanks,
Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-19 16:28                             ` Artem Shinkarov
@ 2011-08-20 10:14                               ` Richard Guenther
  2011-08-22  7:32                                 ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-20 10:14 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Richard Henderson, gcc-patches, Joseph S. Myers

On Fri, Aug 19, 2011 at 5:22 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Fri, Aug 19, 2011 at 3:54 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Fri, Aug 19, 2011 at 2:29 AM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> Hi, I had the problem with passing information about single variable
>>> from expand_vec_cond_expr optab into ix86_expand_*_vcond.
>>>
>>> I looked into it this problem for quite a while and found a solution.
>>> Now the question if it could be done better.
>>>
>>> First of all the problem:
>>>
>>> If we represent any vector comparison with VEC_COND_EXPR < v0 <OP> v1
>>> ? {-1,...} : {0,...} >, then in the assembler we do not want to see
>>> this useless comparison with {-1...}.
>>>
>>> Now it is easy to fix the problem about excessive masking. The real
>>> challenge starts when the comparison inside vcond is expressed as a
>>> variable. In that case in order to construct correct vector expression
>>> we need to adjust cond in cond ? v0 : v1 to  cond == {-1...} or as we
>>> agreed recently cond != {0,..}. But hat we need to do only to make
>>> vec_cond_expr happy. On the level of assembler we don't want this
>>> condition.
>>>
>>> Now, if I just construct the tree, then in x86, rtx_equal_p, does not
>>> know that this is a constant vector full of -1, because the comparison
>>> operands are not immediate. So I need somehow to mark the fact in
>>> optabs, and then check the information in the x86.
>>
>> Well, this is why I was suggesting the bitwise semantic for a mask
>> operand.  What we should do on the tree level (and that should happen
>> already), is forward the comparison into the COND_EXPR.  Thus,
>>
>> mask = v1 < v2;
>> v3 = mask ? v4 : v5;
>>
>> should get changed to
>>
>> v3 = v1 < v2 ? v4 : v5;
>>
>> by tree-ssa-forwprop.c.  If that is not happening we have to fix that there.
>
> Yeah, that is something I am working on.
>
>> Because we _don't_ know the mask is all -1 or 0 ;)  The user might
>> put in {3, 5 ,1 3} and expect it to be treated like {-1,...} but it isn't
>> so already.
>>
>>> At the moment I do something like this:
>>>
>>> optabs:
>>>
>>> if (!COMPARISON_CLASS_P (op0))
>>>  ops[3] = gen_rtx_EQ (mode, NULL_RTX, NULL_RTX);
>>>
>>> This expression is preserved while checking and verifying.
>>>
>>> ix86:
>>> if (GET_CODE (comp) == EQ && XEXP (comp, 0) == NULL_RTX
>>>      && XEXP (comp, 1) == NULL_RTX)
>>>
>>> See the patch attached for more details. The patch is just to give you
>>> an idea of the way I am doing it and it seems to work. Please don't
>>> criticise the patch itself, better help me to understand if there is a
>>> better way to pass the information from optabs to ix86.
>>
>> Hm, I'm not sure the expand_vec_cond_expr will work that way,
>> I'd have to play with it myself (but will now be running for weekend).
>>
>> Is the special-casing of a < b ? {-1,-1,-1} : {0,0,0,0} in the backend
>> working for you?  I think there are probably some rtl all-ones and all-zeros
>> predicates you can re-use.
>>
>> Richard.
>
> It works fine. Masks all ones and all zeroes are predefined, all -1
> are not, but I am switching to all zeroes. The real question is that

All -1 is the same as all ones.

> this special case of comparison with two empty operands is a little
> bit hackish. On the other hand there should be no problem with that,

I didn't mean this special case which I believe is incorrect anyways
due to the above comment, but the special case resulting from
expanding v1 < v2 as v1 < v2 ? {-1,-1...} : {0,0,...}.

> because operand 3 is used only to get the code of comparison, noone is
> looking inside the arguments, so we could use this fact. The question
> is whether there is a better way.

As I said above, we can't rely on the mask being either {-1,...} or {0,...}.
If we can, then we should have propagated a comparison, otherwise
we need a real != compare with { 0,....}.

> Thanks,
> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-17 12:49         ` Richard Guenther
@ 2011-08-20 11:22           ` Uros Bizjak
  0 siblings, 0 replies; 91+ messages in thread
From: Uros Bizjak @ 2011-08-20 11:22 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Artem Shinkarov, gcc-patches, Joseph S. Myers, Richard Henderson

On Wed, Aug 17, 2011 at 11:49 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:

>>> Hm, ok ... let's hope we can sort-out the backend issues before this
>>> patch goes in so we can remove this converting stuff.
>>
>> Hm, I would hope that we could commit this patch even with this issue,
>> because my feeling is that this case would produce errors on all the
>> other architectures as well, as VEC_COND_EXPR is the feature heavily
>> used in auto-vectorizer. So it means that all the backends must be
>> fixed. And another argument, that this conversion is harmless.
>
> It shouldn't be hard to fix all the backends.  And if we don't do it now
> it will never happen.  I would expect that the codegen part of the
> backends doesn't need any adjustments, just the patterns that
> match what is supported.
>
> Uros, can you convert x86 as an example?  Thus, for
>
> (define_expand "vcond<mode>"
>  [(set (match_operand:VF 0 "register_operand" "")
>        (if_then_else:VF
>          (match_operator 3 ""
>            [(match_operand:VF 4 "nonimmediate_operand" "")
>             (match_operand:VF 5 "nonimmediate_operand" "")])
>          (match_operand:VF 1 "general_operand" "")
>          (match_operand:VF 2 "general_operand" "")))]
>  "TARGET_SSE"
> {
>  bool ok = ix86_expand_fp_vcond (operands);
>  gcc_assert (ok);

> allow any vector mode of the same size (and same number of elements?)
> for the vcond mode and operand 1 and 2?  Thus, only restrict the
> embedded comparison to VF?

I am a bit late to this discussion, but I see no problem for the
backend to relax this restriction. I will look into it.

Uros.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-20 10:14                               ` Richard Guenther
@ 2011-08-22  7:32                                 ` Artem Shinkarov
  2011-08-22 12:06                                   ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-22  7:32 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Richard Henderson, gcc-patches, Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

Richard

I formalized an approach a little-bit, now it works without target
hooks, but some polishing is still required. I want you to comment on
the several important approaches that I use in the patch.

So how does it work.
1) All the vector comparisons at the level of  type-checker are
introduced using VEC_COND_EXPR with constant selection operands being
{-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
> v1, {-1}, {0}>.

2) When optabs expand VEC_COND_EXPR, two cases are considered:
2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
2.b) first operand is something else, in that case, we specially mark
this case, recognize it in the backend, and do not create a
comparison, but use the mask as it was a result of some comparison.

3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
vector comparison we use is_vector_comparison function, if it returns
false, then we replace mask with mask != {0}.

So we end-up with the following functionality:
VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
comparison of two vectors, we leave it as it is, otherwise change with
mask != {0}.

Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
which correctly expands, without creating useless masking.


Basically for me there are two questions:
1) Can we perform information passing in optabs in a nicer way?
2) How is_vector_comparison could be improved? I have several ideas,
like checking if constant vector all consists of 0 and -1, and so on.
But first is it conceptually fine.

P.S. I tired to put the functionality of is_vector_comparison in
tree-ssa-forwprop, but the thing is that it is called only with -On,
which I find inappropriate, and the functionality gets more
complicated.


Thanks,
Artem.

[-- Attachment #2: vec-cond-no-hooks.diff --]
[-- Type: text/plain, Size: 43180 bytes --]

Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 177665)
+++ gcc/doc/tm.texi	(working copy)
@@ -5738,6 +5738,10 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_COMPARE (gimple_stmt_iterator *@var{gsi}, tree @var{type}, tree @var{v0}, tree @var{v1}, enum tree_code @var{code})
+This hook should check whether it is possible to express vectorcomparison using the hardware-specific instructions and return resulttree. Hook should return NULL_TREE if expansion is impossible.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_PERM (tree @var{type}, tree *@var{mask_element_type})
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 177665)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -5676,6 +5676,8 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@hook TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+
 @hook TARGET_VECTORIZE_BUILTIN_VEC_PERM
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 177665)
+++ gcc/targhooks.c	(working copy)
@@ -969,6 +969,18 @@ default_builtin_vector_alignment_reachab
   return true;
 }
 
+/* Replaces vector comparison with the target-specific instructions 
+   and returns the resulting variable or NULL_TREE otherwise.  */
+tree 
+default_builtin_vec_compare (gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, 
+                             tree type ATTRIBUTE_UNUSED, 
+                             tree v0 ATTRIBUTE_UNUSED, 
+                             tree v1 ATTRIBUTE_UNUSED, 
+                             enum tree_code code ATTRIBUTE_UNUSED)
+{
+  return NULL_TREE;
+}
+
 /* By default, assume that a target supports any factor of misalignment
    memory access if it supports movmisalign patten.
    is_packed is true if the memory access is defined in a packed struct.  */
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 177665)
+++ gcc/targhooks.h	(working copy)
@@ -86,6 +86,11 @@ extern int default_builtin_vectorization
 extern tree default_builtin_reciprocal (unsigned int, bool, bool);
 
 extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
+
+extern tree default_builtin_vec_compare (gimple_stmt_iterator *gsi, 
+                                         tree type, tree v0, tree v1, 
+                                         enum tree_code code);
+
 extern bool
 default_builtin_support_vector_misalignment (enum machine_mode mode,
 					     const_tree,
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 177665)
+++ gcc/target.def	(working copy)
@@ -988,6 +988,15 @@ DEFHOOK
  bool, (tree vec_type, tree mask),
  hook_bool_tree_tree_true)
 
+/* Implement hardware vector comparison or return false.  */
+DEFHOOK
+(builtin_vec_compare,
+ "This hook should check whether it is possible to express vector\
+comparison using the hardware-specific instructions and return result\
+tree. Hook should return NULL_TREE if expansion is impossible.",
+ tree, (gimple_stmt_iterator *gsi, tree type, tree v0, tree v1, enum tree_code code),
+ default_builtin_vec_compare)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 177665)
+++ gcc/optabs.c	(working copy)
@@ -6572,16 +6572,37 @@ expand_vec_cond_expr (tree vec_cond_type
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (op0, unsignedp, icode);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
+  
+  if (COMPARISON_CLASS_P (op0))
+    {
+      comparison = vector_compare_rtx (op0, unsignedp, icode);
+      create_output_operand (&ops[0], target, mode);
+      create_input_operand (&ops[1], rtx_op1, mode);
+      create_input_operand (&ops[2], rtx_op2, mode);
+      create_fixed_operand (&ops[3], comparison);
+      create_fixed_operand (&ops[4], XEXP (comparison, 0));
+      create_fixed_operand (&ops[5], XEXP (comparison, 1));
+
+    }
+  else
+    {
+      rtx rtx_op0;
+      rtx vec; 
+    
+      rtx_op0 = expand_normal (op0);
+      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX); 
+      vec = CONST0_RTX (mode);
+
+      create_output_operand (&ops[0], target, mode);
+      create_input_operand (&ops[1], rtx_op1, mode);
+      create_input_operand (&ops[2], rtx_op2, mode);
+      create_input_operand (&ops[3], comparison, mode);
+      create_input_operand (&ops[4], rtx_op0, mode);
+      create_input_operand (&ops[5], vec, mode);
+    }
 
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_fixed_operand (&ops[3], comparison);
-  create_fixed_operand (&ops[4], XEXP (comparison, 0));
-  create_fixed_operand (&ops[5], XEXP (comparison, 1));
   expand_insn (icode, 6, ops);
   return ops[0].value;
 }
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 177665)
+++ gcc/target.h	(working copy)
@@ -51,6 +51,7 @@
 #define GCC_TARGET_H
 
 #include "insn-modes.h"
+#include "gimple.h"
 
 #ifdef ENABLE_CHECKING
 
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 177665)
+++ gcc/fold-const.c	(working copy)
@@ -5930,12 +5930,21 @@ extract_muldiv_1 (tree t, tree c, enum t
 }
 \f
 /* Return a node which has the indicated constant VALUE (either 0 or
-   1), and is of the indicated TYPE.  */
+   1 for scalars and is either {-1,-1,..} or {0,0,...} for vectors), 
+   and is of the indicated TYPE.  */
 
 tree
 constant_boolean_node (int value, tree type)
 {
-  if (type == integer_type_node)
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      tree tval;
+      
+      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
+      tval = build_int_cst (TREE_TYPE (type), value);
+      return build_vector_from_val (type, tval);
+    }
+  else if (type == integer_type_node)
     return value ? integer_one_node : integer_zero_node;
   else if (type == boolean_type_node)
     return value ? boolean_true_node : boolean_false_node;
@@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr
      floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
     {
+      int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0;
+      tree arg0_type = TREE_TYPE (arg0);
+      
       switch (code)
 	{
 	case EQ_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-	    return constant_boolean_node (1, type);
+	  if (! FLOAT_TYPE_P (arg0_type)
+	      || ! HONOR_NANS (TYPE_MODE (arg0_type)))
+	    return constant_boolean_node (true_val, type);
 	  break;
 
 	case GE_EXPR:
 	case LE_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
-	    return constant_boolean_node (1, type);
+	  if (! FLOAT_TYPE_P (arg0_type)
+	      || ! HONOR_NANS (TYPE_MODE (arg0_type)))
+	    return constant_boolean_node (true_val, type);
 	  return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);
 
 	case NE_EXPR:
 	  /* For NE, we can only do this simplification if integer
 	     or we don't honor IEEE floating point NaNs.  */
-	  if (FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      && HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	  if (FLOAT_TYPE_P (arg0_type)
+	      && HONOR_NANS (TYPE_MODE (arg0_type)))
 	    break;
 	  /* ... fall through ...  */
 	case GT_EXPR:
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 177665)
+++ gcc/c-typeck.c	(working copy)
@@ -4058,6 +4058,94 @@ build_conditional_expr (location_t colon
   type2 = TREE_TYPE (op2);
   code2 = TREE_CODE (type2);
 
+  if (TREE_CODE (TREE_TYPE (ifexp)) == VECTOR_TYPE)
+    {
+      bool maybe_const = true;
+      tree sc;
+      
+      if (TREE_CODE (type1) != VECTOR_TYPE
+	  || TREE_CODE (type2) != VECTOR_TYPE)
+        {
+          error_at (colon_loc, "vector comparison arguments must be of "
+                               "type vector");
+          return error_mark_node;
+        }
+
+      if (TREE_CODE (TREE_TYPE (TREE_TYPE (ifexp))) != INTEGER_TYPE)
+        {
+          error_at (colon_loc, "non-integer type in vector condition");
+          return error_mark_node;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
+          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
+             != TYPE_VECTOR_SUBPARTS (type1))
+        {
+          error_at (colon_loc, "vectors of different length found in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+      
+      if (TREE_TYPE (type1) != TREE_TYPE (type2))
+        {
+          error_at (colon_loc, "vectors of different types involved in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+
+      if (TYPE_SIZE (TREE_TYPE (TREE_TYPE (ifexp))) 
+          != TYPE_SIZE (TREE_TYPE (type1)))
+        {
+          error_at (colon_loc, "vector-condition element type must be "
+                               "the same as result vector element type");
+          return error_mark_node;
+        }
+      
+      /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
+      sc = c_fully_fold (ifexp, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	ifexp = c_wrap_maybe_const (sc, true);
+      else
+	ifexp = sc;
+      
+      sc = c_fully_fold (op1, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	op1 = c_wrap_maybe_const (sc, true);
+      else
+	op1 = sc;
+      
+      sc = c_fully_fold (op2, false, &maybe_const);
+      sc = save_expr (sc);
+      if (!maybe_const)
+	op2 = c_wrap_maybe_const (sc, true);
+      else
+	op2 = sc;
+
+      /* Currently the expansion of VEC_COND_EXPR does not allow
+	 expessions where the type of vectors you compare differs
+	 form the type of vectors you select from. For the time
+	 being we insert implicit conversions.  */
+      if ((COMPARISON_CLASS_P (ifexp)
+	   && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1)
+	  || TREE_TYPE (ifexp) != type1)
+	{
+	  tree comp_type = COMPARISON_CLASS_P (ifexp)
+			   ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
+			   : TREE_TYPE (ifexp);
+	  tree vcond;
+	  
+	  op1 = convert (comp_type, op1);
+	  op2 = convert (comp_type, op2);
+	  vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
+	  vcond = convert (type1, vcond);
+	  return vcond;
+	}
+      else
+	return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2);
+    }
+
   /* C90 does not permit non-lvalue arrays in conditional expressions.
      In C99 they will be pointers by now.  */
   if (code1 == ARRAY_TYPE || code2 == ARRAY_TYPE)
@@ -9906,6 +9995,37 @@ build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          /*break;  */
+
+	  ret = build3 (VEC_COND_EXPR, result_type, 
+			build2 (code, result_type, op0, op1), 
+			build_vector_from_val (result_type,
+					       build_int_cst (intt, -1)),
+			build_vector_from_val (result_type,
+					       build_int_cst (intt,  0)));
+	  goto return_build_binary_op;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -10018,6 +10138,37 @@ build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          /* break; */
+	  ret = build3 (VEC_COND_EXPR, result_type, 
+			build2 (code, result_type, op0, op1), 
+			build_vector_from_val (result_type,
+					       build_int_cst (intt, -1)),
+			build_vector_from_val (result_type,
+					       build_int_cst (intt,  0)));
+	  goto return_build_binary_op;
+
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10425,6 +10576,10 @@ c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 177665)
+++ gcc/gimplify.c	(working copy)
@@ -7064,6 +7064,22 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
+        case VEC_COND_EXPR:
+	  {
+	    enum gimplify_status r0, r1, r2;
+
+	    r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+				post_p, is_gimple_condexpr, fb_rvalue);
+	    r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    recalculate_side_effects (*expr_p);
+
+	    ret = MIN (r0, MIN (r1, r2));
+	  }
+	  break;
+
 	case TARGET_MEM_REF:
 	  {
 	    enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE;
@@ -7348,6 +7364,11 @@ gimplify_expr (tree *expr_p, gimple_seq
 		{
 		  tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1));
 
+		  /* Vector comparisons is a valid gimple expression
+		     which could be lowered down later.  */
+		  if (TREE_CODE (type) == VECTOR_TYPE)
+		    goto expr_2;
+
 		  if (!AGGREGATE_TYPE_P (type))
 		    {
 		      tree org_type = TREE_TYPE (*expr_p);
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 177665)
+++ gcc/tree.def	(working copy)
@@ -704,7 +704,10 @@ DEFTREECODE (TRUTH_NOT_EXPR, "truth_not_
    The others are allowed only for integer (or pointer or enumeral)
    or real types.
    In all cases the operands will have the same type,
-   and the value is always the type used by the language for booleans.  */
+   and the value is either the type used by the language for booleans
+   or an integer vector type of the same size and with the same number
+   of elements as the comparison operands.  True for a vector of
+   comparison results has all bits set while false is equal to zero.  */
 DEFTREECODE (LT_EXPR, "lt_expr", tcc_comparison, 2)
 DEFTREECODE (LE_EXPR, "le_expr", tcc_comparison, 2)
 DEFTREECODE (GT_EXPR, "gt_expr", tcc_comparison, 2)
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 177665)
+++ gcc/emit-rtl.c	(working copy)
@@ -5474,6 +5474,11 @@ gen_const_vector (enum machine_mode mode
   return tem;
 }
 
+rtx
+gen_const_vector1 (enum machine_mode mode, int constant)
+{
+  return gen_const_vector (mode, constant);
+}
 /* Generate a vector like gen_rtx_raw_CONST_VEC, but use the zero vector when
    all elements are zero, and the one vector when all elements are one.  */
 rtx
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 177665)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,11 +30,16 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "target.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
 #include "optabs.h"
 
+
+static void expand_vector_operations_1 (gimple_stmt_iterator *);
+
+
 /* Build a constant of type TYPE, made of VALUE's bits replicated
    every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision.  */
 static tree
@@ -125,6 +130,21 @@ do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0;  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree cond;
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  cond = gimplify_build2 (gsi, code, inner_type, a, b);
+  return gimplify_build3 (gsi, COND_EXPR, inner_type, cond, 
+                    build_int_cst (inner_type, -1),
+                    build_int_cst (inner_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -333,6 +353,24 @@ uniform_vector_p (tree vec)
   return NULL_TREE;
 }
 
+/* Try to expand vector comparison expression OP0 CODE OP1 using  
+   builtin_vec_compare hardware hook, in case target does not 
+   support comparison of type TYPE, extract comparison piecewise.  
+   GSI is used inside the target hook to create the code needed
+   for the given comparison.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+ tree t = targetm.vectorize.builtin_vec_compare (gsi, type, op0, op1, code);
+
+  if (t == NULL_TREE)
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  return t;
+
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -375,8 +413,24 @@ expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
-
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+        return expand_vector_comparison (gsi, type,
+                                      gimple_assign_rhs1 (assign),
+                                      gimple_assign_rhs2 (assign), code);
       default:
 	break;
       }
@@ -432,6 +486,122 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+
+/* Expand vector condition EXP which should have the form
+   VEC_COND_EXPR<cond, vec0, vec1> into the following
+   vector:
+     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
+   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
+static tree
+expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
+{
+  tree cond = TREE_OPERAND (exp, 0);
+  tree vec0 = TREE_OPERAND (exp, 1);
+  tree vec1 = TREE_OPERAND (exp, 2);
+  tree type = TREE_TYPE (vec0);
+  tree lhs, rhs, notmask;
+  tree var, new_rhs;
+  optab op = NULL;
+  gimple new_stmt;
+  gimple_stmt_iterator gsi_tmp;
+  tree t;
+
+  if (!COMPARISON_CLASS_P (cond))
+    cond = build2 (EQ_EXPR, TREE_TYPE (cond), cond,
+			    build_vector_from_val (TREE_TYPE (cond),
+			    build_int_cst (TREE_TYPE (TREE_TYPE (cond)), -1)));
+     
+  /* Expand vector condition inside of VEC_COND_EXPR.  */
+  op = optab_for_tree_code (TREE_CODE (cond), type, optab_default);
+  if (!op || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing)
+    {
+      var = create_tmp_reg (TREE_TYPE (cond), "cond");
+      new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond),
+					  TREE_OPERAND (cond, 0),
+					  TREE_OPERAND (cond, 1),
+					  TREE_CODE (cond));
+      new_stmt = gimple_build_assign (var, new_rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (gsi_stmt (*gsi));
+    }
+  else
+    var = cond;
+  
+  gsi_tmp = *gsi;
+  gsi_prev (&gsi_tmp);
+
+  /* Expand VCOND<mask, v0, v1> to ((v0 & mask) | (v1 & ~mask))  */
+  lhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, var, vec0);
+  notmask = gimplify_build1 (gsi, BIT_NOT_EXPR, type, var);
+  rhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, notmask, vec1);
+  t = gimplify_build2 (gsi, BIT_IOR_EXPR, type, lhs, rhs);
+
+  /* Run vecower on the expresisons we have introduced.  */
+  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
+    expand_vector_operations_1 (&gsi_tmp);
+  
+  return t;
+}
+
+#define pp(x) fprintf (stderr, "-- %s\n", x)
+
+static bool
+is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
+{
+  tree type = TREE_TYPE (expr);
+
+  if (TREE_CODE (expr) == VEC_COND_EXPR)
+    return true;
+    
+  if (COMPARISON_CLASS_P (expr) && TREE_CODE (type) == VECTOR_TYPE)
+    return true;
+
+  if (TREE_CODE (expr) == BIT_IOR_EXPR || TREE_CODE (expr) == BIT_AND_EXPR
+      || TREE_CODE (expr) == BIT_XOR_EXPR)
+    return is_vector_comparison (gsi, TREE_OPERAND (expr, 0))
+	   & is_vector_comparison (gsi, TREE_OPERAND (expr, 1));
+
+  if (TREE_CODE (expr) == VAR_DECL)
+    { 
+      gimple_stmt_iterator gsi_tmp;
+      gsi_tmp = *gsi;
+      tree name = DECL_NAME (expr);
+      tree var = NULL_TREE;
+
+      for (; gsi_tmp.ptr; gsi_prev (&gsi_tmp))
+	{
+	  gimple stmt = gsi_stmt (gsi_tmp);
+
+	  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+	    continue;
+
+	  if (TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL
+	      && DECL_NAME (gimple_assign_lhs (stmt)) == name)
+	    return is_vector_comparison (&gsi_tmp, 
+					 gimple_assign_rhs_to_tree (stmt));
+	}
+    } 
+  
+  if (TREE_CODE (expr) == SSA_NAME)
+    {
+      enum tree_code code;
+      gimple exprdef = SSA_NAME_DEF_STMT (expr);
+
+      if (gimple_code (exprdef) != GIMPLE_ASSIGN)
+	return false;
+
+      if (TREE_CODE (gimple_expr_type (exprdef)) != VECTOR_TYPE)
+	return false;
+
+      
+      return is_vector_comparison (gsi, 
+				   gimple_assign_rhs_to_tree (exprdef));
+    }
+
+  return false;
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -450,11 +620,34 @@ expand_vector_operations_1 (gimple_stmt_
 
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
+  lhs = gimple_assign_lhs (stmt);
+
+  if (code == VEC_COND_EXPR)
+    {
+      tree exp = gimple_assign_rhs1 (stmt);
+      tree cond = TREE_OPERAND (exp, 0);
+      
+      if (!is_vector_comparison (gsi, cond))
+	TREE_OPERAND (exp, 0) = 
+		    build2 (NE_EXPR, TREE_TYPE (cond), cond,
+			    build_vector_from_val (TREE_TYPE (cond),
+			    build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
+      
+      if (expand_vec_cond_expr_p (TREE_TYPE (exp), 
+                                  TYPE_MODE (TREE_TYPE (exp))))
+        {
+	  update_stmt (gsi_stmt (*gsi));
+	  return;
+        }
+        
+      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
+      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
+      update_stmt (gsi_stmt (*gsi));
+    }
 
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
-  lhs = gimple_assign_lhs (stmt);
   rhs1 = gimple_assign_rhs1 (stmt);
   type = gimple_expr_type (stmt);
   if (rhs_class == GIMPLE_BINARY_RHS)
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 177665)
+++ gcc/Makefile.in	(working copy)
@@ -888,7 +888,7 @@ EXCEPT_H = except.h $(HASHTAB_H) vecprim
 TARGET_DEF = target.def target-hooks-macros.h
 C_TARGET_DEF = c-family/c-target.def target-hooks-macros.h
 COMMON_TARGET_DEF = common/common-target.def target-hooks-macros.h
-TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
+TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
 C_TARGET_H = c-family/c-target.h $(C_TARGET_DEF)
 COMMON_TARGET_H = common/common-target.h $(INPUT_H) $(COMMON_TARGET_DEF)
 MACHMODE_H = machmode.h mode-classes.def insn-modes.h
@@ -919,8 +919,9 @@ TREE_H = tree.h all-tree.def tree.def c-
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) cfghooks.h
 GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
-	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TARGET_H) tree-ssa-operands.h \
+	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TGT) tree-ssa-operands.h \
 	tree-ssa-alias.h $(INTERNAL_FN_H)
+TARGET_H = $(TGT) gimple.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 COVERAGE_H = coverage.h $(GCOV_IO_H)
 DEMANGLE_H = $(srcdir)/../include/demangle.h
@@ -3185,7 +3186,7 @@ tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h target.h
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 177665)
+++ gcc/tree-cfg.c	(working copy)
@@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
       return true;
     }
 
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      if (TREE_CODE (op0_type) != VECTOR_TYPE
+	  || TREE_CODE (op1_type) != VECTOR_TYPE)
+        {
+          error ("non-vector operands in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (!useless_type_conversion_p (op0_type, op1_type)
+	  && !useless_type_conversion_p (op1_type, op0_type))
+        {
+          error ("type mismatch in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type)) 
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+        
+      return false;
+    }
+
   /* For comparisons we do not have the operations type as the
      effective type the comparison is carried out in.  Instead
      we require that either the first operand is trivially
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 177665)
+++ gcc/c-parser.c	(working copy)
@@ -5339,6 +5339,15 @@ c_parser_conditional_expression (c_parse
       tree eptype = NULL_TREE;
 
       middle_loc = c_parser_peek_token (parser)->location;
+
+      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
+        {
+          error_at (middle_loc, "cannot ommit middle operator in "
+                                "vector comparison");
+          ret.value = error_mark_node;
+          return ret;
+        }
+      
       pedwarn (middle_loc, OPT_pedantic, 
 	       "ISO C forbids omitting the middle term of a ?: expression");
       warn_for_omitted_condop (middle_loc, cond.value);
@@ -5357,9 +5366,12 @@ c_parser_conditional_expression (c_parse
     }
   else
     {
-      cond.value
-	= c_objc_common_truthvalue_conversion
-	(cond_loc, default_conversion (cond.value));
+      if (TREE_CODE (TREE_TYPE (cond.value)) != VECTOR_TYPE)
+        {
+          cond.value
+            = c_objc_common_truthvalue_conversion
+            (cond_loc, default_conversion (cond.value));
+        }
       c_inhibit_evaluation_warnings += cond.value == truthvalue_false_node;
       exp1 = c_parser_expression_conv (parser);
       mark_exp_read (exp1.value);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177665)
+++ gcc/config/i386/i386.c	(working copy)
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -18402,27 +18403,55 @@ ix86_expand_sse_fp_minmax (rtx dest, enu
   return true;
 }
 
+rtx rtx_build_vector_from_val (enum machine_mode, HOST_WIDE_INT);
+
+/* Returns a vector of mode MODE where all the elements are ARG.  */
+rtx
+rtx_build_vector_from_val (enum machine_mode mode, HOST_WIDE_INT arg)
+{
+  rtvec v;
+  int units, i;
+  enum machine_mode inner;
+  
+  units = GET_MODE_NUNITS (mode);
+  inner = GET_MODE_INNER (mode);
+  v = rtvec_alloc (units);
+  for (i = 0; i < units; ++i)
+    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (inner, arg);
+  
+  return gen_rtx_raw_CONST_VECTOR (mode, v);
+}
+
 /* Expand an sse vector comparison.  Return the register with the result.  */
 
 static rtx
 ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1,
-		     rtx op_true, rtx op_false)
+		     rtx op_true, rtx op_false, bool no_comparison)
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx x;
 
-  cmp_op0 = force_reg (mode, cmp_op0);
-  if (!nonimmediate_operand (cmp_op1, mode))
-    cmp_op1 = force_reg (mode, cmp_op1);
+  /* Avoid useless comparison.  */
+  if (no_comparison)
+    {
+      cmp_op0 = force_reg (mode, cmp_op0);
+      x = cmp_op0;
+    }
+  else
+    {
+      cmp_op0 = force_reg (mode, cmp_op0);
+      if (!nonimmediate_operand (cmp_op1, mode))
+	cmp_op1 = force_reg (mode, cmp_op1);
+
+      x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1);
+    }
 
   if (optimize
       || reg_overlap_mentioned_p (dest, op_true)
       || reg_overlap_mentioned_p (dest, op_false))
     dest = gen_reg_rtx (mode);
 
-  x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1);
   emit_insn (gen_rtx_SET (VOIDmode, dest, x));
-
   return dest;
 }
 
@@ -18434,8 +18463,14 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx t2, t3, x;
-
-  if (op_false == CONST0_RTX (mode))
+  rtx mask_true;
+  
+  if (rtx_equal_p (op_true, rtx_build_vector_from_val (mode, -1))
+      && rtx_equal_p (op_false, CONST0_RTX (mode)))
+    {
+      emit_insn (gen_rtx_SET (VOIDmode, dest, cmp));
+    }
+  else if (op_false == CONST0_RTX (mode))
     {
       op_true = force_reg (mode, op_true);
       x = gen_rtx_AND (mode, cmp, op_true);
@@ -18512,7 +18547,7 @@ ix86_expand_fp_movcc (rtx operands[])
 	return true;
 
       tmp = ix86_expand_sse_cmp (operands[0], code, op0, op1,
-				 operands[2], operands[3]);
+				 operands[2], operands[3], false);
       ix86_expand_sse_movcc (operands[0], tmp, operands[2], operands[3]);
       return true;
     }
@@ -18555,7 +18590,7 @@ ix86_expand_fp_vcond (rtx operands[])
     return true;
 
   cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
-			     operands[1], operands[2]);
+			     operands[1], operands[2], false);
   ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
   return true;
 }
@@ -18569,7 +18604,9 @@ ix86_expand_int_vcond (rtx operands[])
   enum rtx_code code = GET_CODE (operands[3]);
   bool negate = false;
   rtx x, cop0, cop1;
+  rtx comp;
 
+  comp = operands[3];
   cop0 = operands[4];
   cop1 = operands[5];
 
@@ -18681,8 +18718,18 @@ ix86_expand_int_vcond (rtx operands[])
 	}
     }
 
-  x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
-			   operands[1+negate], operands[2-negate]);
+  if (GET_CODE (comp) == NE && XEXP (comp, 0) == NULL_RTX 
+      && XEXP (comp, 1) == NULL_RTX)
+    {
+      rtx vec =  CONST0_RTX (mode);
+      x = ix86_expand_sse_cmp (operands[0], code, cop0, vec,
+			       operands[1+negate], operands[2-negate], true);
+    }
+  else
+    {
+      x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
+			       operands[1+negate], operands[2-negate], false);
+    }
 
   ix86_expand_sse_movcc (operands[0], x, operands[1+negate],
 			 operands[2-negate]);
@@ -18774,7 +18821,7 @@ ix86_expand_sse_unpack (rtx operands[2],
 	tmp = force_reg (imode, CONST0_RTX (imode));
       else
 	tmp = ix86_expand_sse_cmp (gen_reg_rtx (imode), GT, CONST0_RTX (imode),
-				   operands[1], pc_rtx, pc_rtx);
+				   operands[1], pc_rtx, pc_rtx, false);
 
       emit_insn (unpack (dest, operands[1], tmp));
     }
@@ -32827,6 +32874,276 @@ ix86_vectorize_builtin_vec_perm (tree ve
   return ix86_builtins[(int) fcode];
 }
 
+/* Find target specific sequence for vector comparison of 
+   real-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                   enum machine_mode mode, tree v0, tree v1,
+                   enum tree_code code)
+{
+  enum ix86_builtins fcode;
+  int arg = -1;
+  tree fdef, frtype, tmp, var, t;
+  gimple new_stmt;
+  bool reverse = false;
+
+#define SWITCH_MODE(mode, fcode, code, value) \
+switch (mode) \
+  { \
+    case V2DFmode: \
+      if (!TARGET_SSE2) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PD; \
+      break; \
+    case V4DFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPD256; \
+      arg = value; \
+      break; \
+    case V4SFmode: \
+      if (!TARGET_SSE) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PS; \
+      break; \
+    case V8SFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPS256; \
+      arg = value; \
+      break; \
+    default: \
+      return NULL_TREE; \
+    /* FIXME: Similar instructions for MMX.  */ \
+  }
+
+  switch (code)
+    {
+      case EQ_EXPR:
+        SWITCH_MODE (mode, fcode, EQ, 0);
+        break;
+      
+      case NE_EXPR:
+        SWITCH_MODE (mode, fcode, NEQ, 4);
+        break;
+      
+      case GT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        reverse = true;
+        break;
+      
+      case LT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        break;
+      
+      case LE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        break;
+
+      case GE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        reverse = true;
+        break;
+
+      default:
+        return NULL_TREE;
+    }
+#undef SWITCH_MODE
+
+  fdef = ix86_builtins[(int)fcode];
+  frtype = TREE_TYPE (TREE_TYPE (fdef));
+ 
+  tmp = create_tmp_var (frtype, "tmp");
+  var = create_tmp_var (rettype, "tmp");
+
+  if (arg == -1)
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 2, v1, v0);
+    else
+      new_stmt = gimple_build_call (fdef, 2, v0, v1);
+  else
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 3, v0, v1, 
+                    build_int_cst (char_type_node, arg));
+    else
+      new_stmt = gimple_build_call (fdef, 3, v1, v0, 
+                    build_int_cst (char_type_node, arg));
+     
+  gimple_call_set_lhs (new_stmt, tmp); 
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  
+  return var;
+}
+
+/* Find target specific sequence for vector comparison of 
+   integer-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_int_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                    enum machine_mode mode, tree v0, tree v1,
+                    enum tree_code code)
+{
+  enum ix86_builtins feq, fgt;
+  tree var, t, tmp, tmp1, tmp2, defeq, defgt, gtrtype, eqrtype;
+  gimple new_stmt;
+
+  switch (mode)
+    {
+      /* SSE integer-type vectors.  */
+      case V2DImode:
+        if (!TARGET_SSE4_2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQQ;
+        fgt = IX86_BUILTIN_PCMPGTQ;
+        break;
+
+      case V4SImode:
+        if (!TARGET_SSE2) return NULL_TREE; 
+        feq = IX86_BUILTIN_PCMPEQD128;
+        fgt = IX86_BUILTIN_PCMPGTD128;
+        break;
+      
+      case V8HImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW128;
+        fgt = IX86_BUILTIN_PCMPGTW128;
+        break;
+      
+      case V16QImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB128;
+        fgt = IX86_BUILTIN_PCMPGTB128;
+        break;
+      
+      /* MMX integer-type vectors.  */
+      case V2SImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQD;
+        fgt = IX86_BUILTIN_PCMPGTD;
+        break;
+
+      case V4HImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW;
+        fgt = IX86_BUILTIN_PCMPGTW;
+        break;
+
+      case V8QImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB;
+        fgt = IX86_BUILTIN_PCMPGTB;
+        break;
+      
+      /* FIXME: Similar instructions for AVX.  */
+      default:
+        return NULL_TREE;
+    }
+
+  
+  var = create_tmp_var (rettype, "ret");
+  defeq = ix86_builtins[(int)feq];
+  defgt = ix86_builtins[(int)fgt];
+  eqrtype = TREE_TYPE (TREE_TYPE (defeq));
+  gtrtype = TREE_TYPE (TREE_TYPE (defgt));
+
+#define EQGT_CALL(gsi, stmt, var, op0, op1, gteq) \
+do { \
+  var = create_tmp_var (gteq ## rtype, "tmp"); \
+  stmt = gimple_build_call (def ## gteq, 2, op0, op1); \
+  gimple_call_set_lhs (stmt, var); \
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT); \
+} while (0)
+   
+  switch (code)
+    {
+      case EQ_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, eq);
+        break;
+
+      case NE_EXPR:
+        tmp = create_tmp_var (eqrtype, "tmp");
+
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, eq);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v0, eq);
+
+        /* t = tmp1 ^ {-1, -1,...}  */
+        t = gimplify_build2 (gsi, BIT_XOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+
+      case GT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, gt);
+        break;
+
+      case LT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v1, v0, gt);
+        break;
+
+      case GE_EXPR:
+        if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+      
+      case LE_EXPR:
+         if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v1, v0, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+     
+      default:
+        return NULL_TREE;
+    }
+#undef EQGT_CALL
+
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  return var;
+}
+
+/* Lower a comparison of two vectors V0 and V1, returning a 
+   variable with the result of comparison. Returns NULL_TREE
+   when it is impossible to find a target specific sequence.  */
+static tree 
+ix86_vectorize_builtin_vec_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                                    tree v0, tree v1, enum tree_code code)
+{
+  tree type;
+
+  /* Make sure we are comparing the same types.  */
+  if (TREE_TYPE (v0) != TREE_TYPE (v1)
+      || TREE_TYPE (TREE_TYPE (v0)) != TREE_TYPE (TREE_TYPE (v1)))
+    return NULL_TREE;
+  
+  type = TREE_TYPE (v0);
+  
+  /* Cannot compare packed unsigned integers 
+     unless it is EQ or NEQ operations.  */
+  if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE 
+      && TYPE_UNSIGNED (TREE_TYPE (type)))
+    if (code != EQ_EXPR && code != NE_EXPR)
+      return NULL_TREE;
+
+
+  if (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE)
+    return vector_fp_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+    return vector_int_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else
+    return NULL_TREE;
+}
+
 /* Return a vector mode with twice as many elements as VMODE.  */
 /* ??? Consider moving this to a table generated by genmodes.c.  */
 
@@ -35270,6 +35587,11 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   ix86_autovectorize_vector_sizes
 
+#undef TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+#define TARGET_VECTORIZE_BUILTIN_VEC_COMPARE \
+  ix86_vectorize_builtin_vec_compare
+
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function
 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22  7:32                                 ` Artem Shinkarov
@ 2011-08-22 12:06                                   ` Richard Guenther
  2011-08-22 13:56                                     ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-22 12:06 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Richard Henderson, gcc-patches, Joseph S. Myers

On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Richard
>
> I formalized an approach a little-bit, now it works without target
> hooks, but some polishing is still required. I want you to comment on
> the several important approaches that I use in the patch.
>
> So how does it work.
> 1) All the vector comparisons at the level of  type-checker are
> introduced using VEC_COND_EXPR with constant selection operands being
> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>> v1, {-1}, {0}>.
>
> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
> 2.b) first operand is something else, in that case, we specially mark
> this case, recognize it in the backend, and do not create a
> comparison, but use the mask as it was a result of some comparison.
>
> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
> vector comparison we use is_vector_comparison function, if it returns
> false, then we replace mask with mask != {0}.
>
> So we end-up with the following functionality:
> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
> comparison of two vectors, we leave it as it is, otherwise change with
> mask != {0}.
>
> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
> which correctly expands, without creating useless masking.
>
>
> Basically for me there are two questions:
> 1) Can we perform information passing in optabs in a nicer way?
> 2) How is_vector_comparison could be improved? I have several ideas,
> like checking if constant vector all consists of 0 and -1, and so on.
> But first is it conceptually fine.
>
> P.S. I tired to put the functionality of is_vector_comparison in
> tree-ssa-forwprop, but the thing is that it is called only with -On,
> which I find inappropriate, and the functionality gets more
> complicated.

Why is it inappropriate to not optimize it at -O0?  If the user
separates comparison and ?: expression it's his own fault.

Btw, the new hook is still in the patch.

I would simply always create != 0 if it isn't and let optimizers
(tree-ssa-forwprop.c) optimize this.  You still have to deal with
non-comparison operands during expansion though, but if
you always forced a != 0 from the start you can then simply
interpret it as a proper comparison result (in which case I'd
modify the backends to have an alternate pattern or directly
expand to masking operations - using the fake comparison
RTX is too much of a hack).

 tree
 constant_boolean_node (int value, tree type)
 {
-  if (type == integer_type_node)
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      tree tval;
+
+      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
+      tval = build_int_cst (TREE_TYPE (type), value);
+      return build_vector_from_val (type, tval);

as value is either 0 or 1 that won't work.  Oh, I see you pass -1
for true in the callers.  But I think we should simply decide that true (1)
means -1 for a vector boolean node (and the value parameter should
be a bool instead).  Thus,

+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      tree tval;
+
+      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
+      tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0);
+      return build_vector_from_val (type, tval);

instead.

@@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr
      floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
     {
+      int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0;
+      tree arg0_type = TREE_TYPE (arg0);
+

as I said this is not necessary - the FLOAT_TYPE_P and HONOR_NANS
macros work perfectly fine on vector types.

Richard.

>
> Thanks,
> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 12:06                                   ` Richard Guenther
@ 2011-08-22 13:56                                     ` Artem Shinkarov
  2011-08-22 15:43                                       ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-22 13:56 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Richard Henderson, gcc-patches, Joseph S. Myers

On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> Richard
>>
>> I formalized an approach a little-bit, now it works without target
>> hooks, but some polishing is still required. I want you to comment on
>> the several important approaches that I use in the patch.
>>
>> So how does it work.
>> 1) All the vector comparisons at the level of  type-checker are
>> introduced using VEC_COND_EXPR with constant selection operands being
>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>> v1, {-1}, {0}>.
>>
>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>> 2.b) first operand is something else, in that case, we specially mark
>> this case, recognize it in the backend, and do not create a
>> comparison, but use the mask as it was a result of some comparison.
>>
>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>> vector comparison we use is_vector_comparison function, if it returns
>> false, then we replace mask with mask != {0}.
>>
>> So we end-up with the following functionality:
>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>> comparison of two vectors, we leave it as it is, otherwise change with
>> mask != {0}.
>>
>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>> which correctly expands, without creating useless masking.
>>
>>
>> Basically for me there are two questions:
>> 1) Can we perform information passing in optabs in a nicer way?
>> 2) How is_vector_comparison could be improved? I have several ideas,
>> like checking if constant vector all consists of 0 and -1, and so on.
>> But first is it conceptually fine.
>>
>> P.S. I tired to put the functionality of is_vector_comparison in
>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>> which I find inappropriate, and the functionality gets more
>> complicated.
>
> Why is it inappropriate to not optimize it at -O0?  If the user
> separates comparison and ?: expression it's his own fault.

Well, because all the information is there, and I perfectly envision
the case when user expressed comparison separately, just to avoid code
duplication.

Like:
mask = a > b;
res1 = mask ? v0 : v1;
res2 = mask ? v2 : v3;

Which in this case would be different from
res1 = a > b ? v0 : v1;
res2 = a > b ? v2 : v3;

> Btw, the new hook is still in the patch.
>
> I would simply always create != 0 if it isn't and let optimizers
> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
> non-comparison operands during expansion though, but if
> you always forced a != 0 from the start you can then simply
> interpret it as a proper comparison result (in which case I'd
> modify the backends to have an alternate pattern or directly
> expand to masking operations - using the fake comparison
> RTX is too much of a hack).

Richard, I think you didn't get the problem.
I really need the way, to pass the information, that the expression
that is in the first operand of vcond is an appropriate mask. I though
for quite a while and this hack is the only answer I found, is there a
better way to do that. I could for example introduce another
tree-node, but it would be overkill as well.

Now why do I need it so much:
I want to implement the comparison in a way that {1, 5, 0, -1} is
actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
always). To check the stuff, I use is_vector_comparison in
tree-vect-generic.

So I really have the difference between mask ? x : y and mask != {0} ?
x : y, otherwise I could treat mask != {0} in the backend as just
mask.

If this link between optabs and backend breaks, then the patch falls
apart. Because every time the comparison is taken out VEC_COND_EXPR, I
will have to put != {0}. Keep in mind, that I cannot always put the
comparison inside the VEC_COND_EXPR, what if it is defined in a
function-comparison, or somehow else?

So what would be an appropriate way to connect optabs and the backend?


Thanks,
Artem.

All the rest would be adjusted later.

>  tree
>  constant_boolean_node (int value, tree type)
>  {
> -  if (type == integer_type_node)
> +  if (TREE_CODE (type) == VECTOR_TYPE)
> +    {
> +      tree tval;
> +
> +      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
> +      tval = build_int_cst (TREE_TYPE (type), value);
> +      return build_vector_from_val (type, tval);
>
> as value is either 0 or 1 that won't work.  Oh, I see you pass -1
> for true in the callers.  But I think we should simply decide that true (1)
> means -1 for a vector boolean node (and the value parameter should
> be a bool instead).  Thus,
>
> +  if (TREE_CODE (type) == VECTOR_TYPE)
> +    {
> +      tree tval;
> +
> +      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
> +      tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0);
> +      return build_vector_from_val (type, tval);
>
> instead.
>
> @@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr
>      floating-point, we can only do some of these simplifications.)  */
>   if (operand_equal_p (arg0, arg1, 0))
>     {
> +      int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0;
> +      tree arg0_type = TREE_TYPE (arg0);
> +
>
> as I said this is not necessary - the FLOAT_TYPE_P and HONOR_NANS
> macros work perfectly fine on vector types.
>
> Richard.
>
>>
>> Thanks,
>> Artem.
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 13:56                                     ` Artem Shinkarov
@ 2011-08-22 15:43                                       ` Richard Guenther
  2011-08-22 15:54                                         ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-22 15:43 UTC (permalink / raw)
  To: Artem Shinkarov; +Cc: Richard Henderson, gcc-patches, Joseph S. Myers

On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> Richard
>>>
>>> I formalized an approach a little-bit, now it works without target
>>> hooks, but some polishing is still required. I want you to comment on
>>> the several important approaches that I use in the patch.
>>>
>>> So how does it work.
>>> 1) All the vector comparisons at the level of  type-checker are
>>> introduced using VEC_COND_EXPR with constant selection operands being
>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>> v1, {-1}, {0}>.
>>>
>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>> 2.b) first operand is something else, in that case, we specially mark
>>> this case, recognize it in the backend, and do not create a
>>> comparison, but use the mask as it was a result of some comparison.
>>>
>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>> vector comparison we use is_vector_comparison function, if it returns
>>> false, then we replace mask with mask != {0}.
>>>
>>> So we end-up with the following functionality:
>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>> comparison of two vectors, we leave it as it is, otherwise change with
>>> mask != {0}.
>>>
>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>> which correctly expands, without creating useless masking.
>>>
>>>
>>> Basically for me there are two questions:
>>> 1) Can we perform information passing in optabs in a nicer way?
>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>> like checking if constant vector all consists of 0 and -1, and so on.
>>> But first is it conceptually fine.
>>>
>>> P.S. I tired to put the functionality of is_vector_comparison in
>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>> which I find inappropriate, and the functionality gets more
>>> complicated.
>>
>> Why is it inappropriate to not optimize it at -O0?  If the user
>> separates comparison and ?: expression it's his own fault.
>
> Well, because all the information is there, and I perfectly envision
> the case when user expressed comparison separately, just to avoid code
> duplication.
>
> Like:
> mask = a > b;
> res1 = mask ? v0 : v1;
> res2 = mask ? v2 : v3;
>
> Which in this case would be different from
> res1 = a > b ? v0 : v1;
> res2 = a > b ? v2 : v3;
>
>> Btw, the new hook is still in the patch.
>>
>> I would simply always create != 0 if it isn't and let optimizers
>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>> non-comparison operands during expansion though, but if
>> you always forced a != 0 from the start you can then simply
>> interpret it as a proper comparison result (in which case I'd
>> modify the backends to have an alternate pattern or directly
>> expand to masking operations - using the fake comparison
>> RTX is too much of a hack).
>
> Richard, I think you didn't get the problem.
> I really need the way, to pass the information, that the expression
> that is in the first operand of vcond is an appropriate mask. I though
> for quite a while and this hack is the only answer I found, is there a
> better way to do that. I could for example introduce another
> tree-node, but it would be overkill as well.
>
> Now why do I need it so much:
> I want to implement the comparison in a way that {1, 5, 0, -1} is
> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
> always). To check the stuff, I use is_vector_comparison in
> tree-vect-generic.
>
> So I really have the difference between mask ? x : y and mask != {0} ?
> x : y, otherwise I could treat mask != {0} in the backend as just
> mask.
>
> If this link between optabs and backend breaks, then the patch falls
> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
> will have to put != {0}. Keep in mind, that I cannot always put the
> comparison inside the VEC_COND_EXPR, what if it is defined in a
> function-comparison, or somehow else?
>
> So what would be an appropriate way to connect optabs and the backend?

Well, there is no problem in having the only valid mask operand for
VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
vec1 : vec2.  This comparison can be eliminated by optimization passes
that
either replace it by the real comparison computing the mask or just
propagating the information this mask is already {-1,...} / {0,....} by simply
dropping the comparison against zero.

For the backends I'd have vcond patterns for both an embedded comparison
and for a mask.  (Now we can rewind the discussion a bit and allow
arbitrary masks and define a vcond with a mask operand to do bitwise
selection - what matters is the C frontend semantics which we need to
translate to what the middle-end thinks of a VEC_COND_EXPR, they
do not have to agree).

If the mask is computed by a function you are of course out of luck,
but I don't see how you'd manage to infer knowledge from nowhere either.

Richard.

>
> Thanks,
> Artem.
>
> All the rest would be adjusted later.
>
>>  tree
>>  constant_boolean_node (int value, tree type)
>>  {
>> -  if (type == integer_type_node)
>> +  if (TREE_CODE (type) == VECTOR_TYPE)
>> +    {
>> +      tree tval;
>> +
>> +      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
>> +      tval = build_int_cst (TREE_TYPE (type), value);
>> +      return build_vector_from_val (type, tval);
>>
>> as value is either 0 or 1 that won't work.  Oh, I see you pass -1
>> for true in the callers.  But I think we should simply decide that true (1)
>> means -1 for a vector boolean node (and the value parameter should
>> be a bool instead).  Thus,
>>
>> +  if (TREE_CODE (type) == VECTOR_TYPE)
>> +    {
>> +      tree tval;
>> +
>> +      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
>> +      tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0);
>> +      return build_vector_from_val (type, tval);
>>
>> instead.
>>
>> @@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr
>>      floating-point, we can only do some of these simplifications.)  */
>>   if (operand_equal_p (arg0, arg1, 0))
>>     {
>> +      int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0;
>> +      tree arg0_type = TREE_TYPE (arg0);
>> +
>>
>> as I said this is not necessary - the FLOAT_TYPE_P and HONOR_NANS
>> macros work perfectly fine on vector types.
>>
>> Richard.
>>
>>>
>>> Thanks,
>>> Artem.
>>>
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 15:43                                       ` Richard Guenther
@ 2011-08-22 15:54                                         ` Artem Shinkarov
  2011-08-22 15:57                                           ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-22 15:54 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Richard Henderson, gcc-patches, Joseph S. Myers

On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> Richard
>>>>
>>>> I formalized an approach a little-bit, now it works without target
>>>> hooks, but some polishing is still required. I want you to comment on
>>>> the several important approaches that I use in the patch.
>>>>
>>>> So how does it work.
>>>> 1) All the vector comparisons at the level of  type-checker are
>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>> v1, {-1}, {0}>.
>>>>
>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>> 2.b) first operand is something else, in that case, we specially mark
>>>> this case, recognize it in the backend, and do not create a
>>>> comparison, but use the mask as it was a result of some comparison.
>>>>
>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>> vector comparison we use is_vector_comparison function, if it returns
>>>> false, then we replace mask with mask != {0}.
>>>>
>>>> So we end-up with the following functionality:
>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>> mask != {0}.
>>>>
>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>> which correctly expands, without creating useless masking.
>>>>
>>>>
>>>> Basically for me there are two questions:
>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>> But first is it conceptually fine.
>>>>
>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>> which I find inappropriate, and the functionality gets more
>>>> complicated.
>>>
>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>> separates comparison and ?: expression it's his own fault.
>>
>> Well, because all the information is there, and I perfectly envision
>> the case when user expressed comparison separately, just to avoid code
>> duplication.
>>
>> Like:
>> mask = a > b;
>> res1 = mask ? v0 : v1;
>> res2 = mask ? v2 : v3;
>>
>> Which in this case would be different from
>> res1 = a > b ? v0 : v1;
>> res2 = a > b ? v2 : v3;
>>
>>> Btw, the new hook is still in the patch.
>>>
>>> I would simply always create != 0 if it isn't and let optimizers
>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>> non-comparison operands during expansion though, but if
>>> you always forced a != 0 from the start you can then simply
>>> interpret it as a proper comparison result (in which case I'd
>>> modify the backends to have an alternate pattern or directly
>>> expand to masking operations - using the fake comparison
>>> RTX is too much of a hack).
>>
>> Richard, I think you didn't get the problem.
>> I really need the way, to pass the information, that the expression
>> that is in the first operand of vcond is an appropriate mask. I though
>> for quite a while and this hack is the only answer I found, is there a
>> better way to do that. I could for example introduce another
>> tree-node, but it would be overkill as well.
>>
>> Now why do I need it so much:
>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>> always). To check the stuff, I use is_vector_comparison in
>> tree-vect-generic.
>>
>> So I really have the difference between mask ? x : y and mask != {0} ?
>> x : y, otherwise I could treat mask != {0} in the backend as just
>> mask.
>>
>> If this link between optabs and backend breaks, then the patch falls
>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>> will have to put != {0}. Keep in mind, that I cannot always put the
>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>> function-comparison, or somehow else?
>>
>> So what would be an appropriate way to connect optabs and the backend?
>
> Well, there is no problem in having the only valid mask operand for
> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
> vec1 : vec2.

This happens already in the new version of patch (not submitted yet).

> This comparison can be eliminated by optimization passes
> that
> either replace it by the real comparison computing the mask or just
> propagating the information this mask is already {-1,...} / {0,....} by simply
> dropping the comparison against zero.

This is not a problem, because the backend recognizes these patterns,
so no optimization is needed in this part.

> For the backends I'd have vcond patterns for both an embedded comparison
> and for a mask.  (Now we can rewind the discussion a bit and allow
> arbitrary masks and define a vcond with a mask operand to do bitwise
> selection - what matters is the C frontend semantics which we need to
> translate to what the middle-end thinks of a VEC_COND_EXPR, they
> do not have to agree).

But it seems like another combinatorial explosion here. Considering
what Richard said in his e-mail, in order to support "generic" vcond,
we just need to enumerate all the possible cases. Or I didn't
understand right?

I mean, I don't mind of course, but it seems to me that it would be
cleaner to have one generic enough pattern.

Is there seriously no way to pass something from optab into the backend??

> If the mask is computed by a function you are of course out of luck,
> but I don't see how you'd manage to infer knowledge from nowhere either.

Well, take simpler example

a = {0};
for ( ; *p; p += 16)
  a &= pattern > (vec)*p;

res = a ? v0 : v1;

In this case it is simple to analyse that a is a comparison, but you
cannot embed the operations of a into VEC_COND_EXPR.


Ok, I am testing the patch that removes hooks. Could you push a little
bit the backend-patterns business?


Thanks,
Artem.

> Richard.
>
>>
>> Thanks,
>> Artem.
>>
>> All the rest would be adjusted later.
>>
>>>  tree
>>>  constant_boolean_node (int value, tree type)
>>>  {
>>> -  if (type == integer_type_node)
>>> +  if (TREE_CODE (type) == VECTOR_TYPE)
>>> +    {
>>> +      tree tval;
>>> +
>>> +      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
>>> +      tval = build_int_cst (TREE_TYPE (type), value);
>>> +      return build_vector_from_val (type, tval);
>>>
>>> as value is either 0 or 1 that won't work.  Oh, I see you pass -1
>>> for true in the callers.  But I think we should simply decide that true (1)
>>> means -1 for a vector boolean node (and the value parameter should
>>> be a bool instead).  Thus,
>>>
>>> +  if (TREE_CODE (type) == VECTOR_TYPE)
>>> +    {
>>> +      tree tval;
>>> +
>>> +      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
>>> +      tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0);
>>> +      return build_vector_from_val (type, tval);
>>>
>>> instead.
>>>
>>> @@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr
>>>      floating-point, we can only do some of these simplifications.)  */
>>>   if (operand_equal_p (arg0, arg1, 0))
>>>     {
>>> +      int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0;
>>> +      tree arg0_type = TREE_TYPE (arg0);
>>> +
>>>
>>> as I said this is not necessary - the FLOAT_TYPE_P and HONOR_NANS
>>> macros work perfectly fine on vector types.
>>>
>>> Richard.
>>>
>>>>
>>>> Thanks,
>>>> Artem.
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 15:54                                         ` Artem Shinkarov
@ 2011-08-22 15:57                                           ` Richard Guenther
  2011-08-22 16:02                                             ` Artem Shinkarov
  2011-08-22 20:46                                             ` Uros Bizjak
  0 siblings, 2 replies; 91+ messages in thread
From: Richard Guenther @ 2011-08-22 15:57 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> Richard
>>>>>
>>>>> I formalized an approach a little-bit, now it works without target
>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>> the several important approaches that I use in the patch.
>>>>>
>>>>> So how does it work.
>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>> v1, {-1}, {0}>.
>>>>>
>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>> this case, recognize it in the backend, and do not create a
>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>
>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>> false, then we replace mask with mask != {0}.
>>>>>
>>>>> So we end-up with the following functionality:
>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>> mask != {0}.
>>>>>
>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>> which correctly expands, without creating useless masking.
>>>>>
>>>>>
>>>>> Basically for me there are two questions:
>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>> But first is it conceptually fine.
>>>>>
>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>> which I find inappropriate, and the functionality gets more
>>>>> complicated.
>>>>
>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>> separates comparison and ?: expression it's his own fault.
>>>
>>> Well, because all the information is there, and I perfectly envision
>>> the case when user expressed comparison separately, just to avoid code
>>> duplication.
>>>
>>> Like:
>>> mask = a > b;
>>> res1 = mask ? v0 : v1;
>>> res2 = mask ? v2 : v3;
>>>
>>> Which in this case would be different from
>>> res1 = a > b ? v0 : v1;
>>> res2 = a > b ? v2 : v3;
>>>
>>>> Btw, the new hook is still in the patch.
>>>>
>>>> I would simply always create != 0 if it isn't and let optimizers
>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>> non-comparison operands during expansion though, but if
>>>> you always forced a != 0 from the start you can then simply
>>>> interpret it as a proper comparison result (in which case I'd
>>>> modify the backends to have an alternate pattern or directly
>>>> expand to masking operations - using the fake comparison
>>>> RTX is too much of a hack).
>>>
>>> Richard, I think you didn't get the problem.
>>> I really need the way, to pass the information, that the expression
>>> that is in the first operand of vcond is an appropriate mask. I though
>>> for quite a while and this hack is the only answer I found, is there a
>>> better way to do that. I could for example introduce another
>>> tree-node, but it would be overkill as well.
>>>
>>> Now why do I need it so much:
>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>> always). To check the stuff, I use is_vector_comparison in
>>> tree-vect-generic.
>>>
>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>> mask.
>>>
>>> If this link between optabs and backend breaks, then the patch falls
>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>> function-comparison, or somehow else?
>>>
>>> So what would be an appropriate way to connect optabs and the backend?
>>
>> Well, there is no problem in having the only valid mask operand for
>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>> vec1 : vec2.
>
> This happens already in the new version of patch (not submitted yet).
>
>> This comparison can be eliminated by optimization passes
>> that
>> either replace it by the real comparison computing the mask or just
>> propagating the information this mask is already {-1,...} / {0,....} by simply
>> dropping the comparison against zero.
>
> This is not a problem, because the backend recognizes these patterns,
> so no optimization is needed in this part.

I mean for

  mask = v1 < v2 ? {-1,...} : {0,...};
  val = VCOND_EXPR <mask != 0, v3, v4>;

optimizers can see how mask is defined and drop the != 0 test or replace
it by v1 < v2.

>> For the backends I'd have vcond patterns for both an embedded comparison
>> and for a mask.  (Now we can rewind the discussion a bit and allow
>> arbitrary masks and define a vcond with a mask operand to do bitwise
>> selection - what matters is the C frontend semantics which we need to
>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>> do not have to agree).
>
> But it seems like another combinatorial explosion here. Considering
> what Richard said in his e-mail, in order to support "generic" vcond,
> we just need to enumerate all the possible cases. Or I didn't
> understand right?

Well, the question is still what VCOND_EXPR and thus the vcond pattern
semantically does for a non-comparison operand.  I'd argue that using
the bitwise selection semantic gives us maximum flexibility and a native
instruction with AMD XOP.  This non-comparison VCOND_EXPR is
also easy to implement in the middle-end expansion code if there is
no native instruction for it - by simply emitting the bitwise operations.

But I have the feeling we are talking past each other ...?

> I mean, I don't mind of course, but it seems to me that it would be
> cleaner to have one generic enough pattern.
>
> Is there seriously no way to pass something from optab into the backend??

You can pass operands.  And information is implicitly encoded in the name.

>> If the mask is computed by a function you are of course out of luck,
>> but I don't see how you'd manage to infer knowledge from nowhere either.
>
> Well, take simpler example
>
> a = {0};
> for ( ; *p; p += 16)
>  a &= pattern > (vec)*p;
>
> res = a ? v0 : v1;
>
> In this case it is simple to analyse that a is a comparison, but you
> cannot embed the operations of a into VEC_COND_EXPR.

Sure, but if the above is C source the frontend would generate
res = a != 0 ? v0 : v1; initially.  An optimization pass could still
track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
with VEC_COND_EXPR <a, v0, v1> (no existing one would track
vector contents though).

> Ok, I am testing the patch that removes hooks. Could you push a little
> bit the backend-patterns business?

Well, I suppose we're waiting for Uros here.  I hadn't much luck with
fiddling with the mode-iterator stuff myself.

Richard.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 15:57                                           ` Richard Guenther
@ 2011-08-22 16:02                                             ` Artem Shinkarov
  2011-08-22 16:25                                               ` Richard Guenther
  2011-08-22 20:46                                             ` Uros Bizjak
  1 sibling, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-22 16:02 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> Richard
>>>>>>
>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>> the several important approaches that I use in the patch.
>>>>>>
>>>>>> So how does it work.
>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>> v1, {-1}, {0}>.
>>>>>>
>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>> this case, recognize it in the backend, and do not create a
>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>
>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>> false, then we replace mask with mask != {0}.
>>>>>>
>>>>>> So we end-up with the following functionality:
>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>> mask != {0}.
>>>>>>
>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>> which correctly expands, without creating useless masking.
>>>>>>
>>>>>>
>>>>>> Basically for me there are two questions:
>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>> But first is it conceptually fine.
>>>>>>
>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>> which I find inappropriate, and the functionality gets more
>>>>>> complicated.
>>>>>
>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>> separates comparison and ?: expression it's his own fault.
>>>>
>>>> Well, because all the information is there, and I perfectly envision
>>>> the case when user expressed comparison separately, just to avoid code
>>>> duplication.
>>>>
>>>> Like:
>>>> mask = a > b;
>>>> res1 = mask ? v0 : v1;
>>>> res2 = mask ? v2 : v3;
>>>>
>>>> Which in this case would be different from
>>>> res1 = a > b ? v0 : v1;
>>>> res2 = a > b ? v2 : v3;
>>>>
>>>>> Btw, the new hook is still in the patch.
>>>>>
>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>> non-comparison operands during expansion though, but if
>>>>> you always forced a != 0 from the start you can then simply
>>>>> interpret it as a proper comparison result (in which case I'd
>>>>> modify the backends to have an alternate pattern or directly
>>>>> expand to masking operations - using the fake comparison
>>>>> RTX is too much of a hack).
>>>>
>>>> Richard, I think you didn't get the problem.
>>>> I really need the way, to pass the information, that the expression
>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>> for quite a while and this hack is the only answer I found, is there a
>>>> better way to do that. I could for example introduce another
>>>> tree-node, but it would be overkill as well.
>>>>
>>>> Now why do I need it so much:
>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>> always). To check the stuff, I use is_vector_comparison in
>>>> tree-vect-generic.
>>>>
>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>> mask.
>>>>
>>>> If this link between optabs and backend breaks, then the patch falls
>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>> function-comparison, or somehow else?
>>>>
>>>> So what would be an appropriate way to connect optabs and the backend?
>>>
>>> Well, there is no problem in having the only valid mask operand for
>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>> vec1 : vec2.
>>
>> This happens already in the new version of patch (not submitted yet).
>>
>>> This comparison can be eliminated by optimization passes
>>> that
>>> either replace it by the real comparison computing the mask or just
>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>> dropping the comparison against zero.
>>
>> This is not a problem, because the backend recognizes these patterns,
>> so no optimization is needed in this part.
>
> I mean for
>
>  mask = v1 < v2 ? {-1,...} : {0,...};
>  val = VCOND_EXPR <mask != 0, v3, v4>;
>
> optimizers can see how mask is defined and drop the != 0 test or replace
> it by v1 < v2.

Yes, sure.

>>> For the backends I'd have vcond patterns for both an embedded comparison
>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>> selection - what matters is the C frontend semantics which we need to
>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>> do not have to agree).
>>
>> But it seems like another combinatorial explosion here. Considering
>> what Richard said in his e-mail, in order to support "generic" vcond,
>> we just need to enumerate all the possible cases. Or I didn't
>> understand right?
>
> Well, the question is still what VCOND_EXPR and thus the vcond pattern
> semantically does for a non-comparison operand.  I'd argue that using
> the bitwise selection semantic gives us maximum flexibility and a native
> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
> also easy to implement in the middle-end expansion code if there is
> no native instruction for it - by simply emitting the bitwise operations.
>
> But I have the feeling we are talking past each other ...?

I am all for the bitwise behaviour in the backend pattern, that is
something that I rely on at the moment. What I don't want to have is
the same behaviour in the frontend. So If we can guarantee, that we
add != 0, when we don't know the "nature" of the mask, then I am
perfectly fine with the back-end having bitwise-selection behaviour.

>> I mean, I don't mind of course, but it seems to me that it would be
>> cleaner to have one generic enough pattern.
>>
>> Is there seriously no way to pass something from optab into the backend??
>
> You can pass operands.  And information is implicitly encoded in the name.

I didn't quite get that, could you give an example?

>>> If the mask is computed by a function you are of course out of luck,
>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>
>> Well, take simpler example
>>
>> a = {0};
>> for ( ; *p; p += 16)
>>  a &= pattern > (vec)*p;
>>
>> res = a ? v0 : v1;
>>
>> In this case it is simple to analyse that a is a comparison, but you
>> cannot embed the operations of a into VEC_COND_EXPR.
>
> Sure, but if the above is C source the frontend would generate
> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
> vector contents though).

Yeah, sure. My point is, that we must be able to pass this information
in the backend, that we checked everything, and we are sure that a is
a corerct mask, please don't add any != 0 to it.

>> Ok, I am testing the patch that removes hooks. Could you push a little
>> bit the backend-patterns business?
>
> Well, I suppose we're waiting for Uros here.  I hadn't much luck with
> fiddling with the mode-iterator stuff myself.
>
> Richard.
>

Ok, fine. The patch is coming soon.


Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 16:02                                             ` Artem Shinkarov
@ 2011-08-22 16:25                                               ` Richard Guenther
  2011-08-22 17:16                                                 ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-22 16:25 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>> Richard
>>>>>>>
>>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>>> the several important approaches that I use in the patch.
>>>>>>>
>>>>>>> So how does it work.
>>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>>> v1, {-1}, {0}>.
>>>>>>>
>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>>> this case, recognize it in the backend, and do not create a
>>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>>
>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>>> false, then we replace mask with mask != {0}.
>>>>>>>
>>>>>>> So we end-up with the following functionality:
>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>>> mask != {0}.
>>>>>>>
>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>>> which correctly expands, without creating useless masking.
>>>>>>>
>>>>>>>
>>>>>>> Basically for me there are two questions:
>>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>>> But first is it conceptually fine.
>>>>>>>
>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>>> which I find inappropriate, and the functionality gets more
>>>>>>> complicated.
>>>>>>
>>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>>> separates comparison and ?: expression it's his own fault.
>>>>>
>>>>> Well, because all the information is there, and I perfectly envision
>>>>> the case when user expressed comparison separately, just to avoid code
>>>>> duplication.
>>>>>
>>>>> Like:
>>>>> mask = a > b;
>>>>> res1 = mask ? v0 : v1;
>>>>> res2 = mask ? v2 : v3;
>>>>>
>>>>> Which in this case would be different from
>>>>> res1 = a > b ? v0 : v1;
>>>>> res2 = a > b ? v2 : v3;
>>>>>
>>>>>> Btw, the new hook is still in the patch.
>>>>>>
>>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>>> non-comparison operands during expansion though, but if
>>>>>> you always forced a != 0 from the start you can then simply
>>>>>> interpret it as a proper comparison result (in which case I'd
>>>>>> modify the backends to have an alternate pattern or directly
>>>>>> expand to masking operations - using the fake comparison
>>>>>> RTX is too much of a hack).
>>>>>
>>>>> Richard, I think you didn't get the problem.
>>>>> I really need the way, to pass the information, that the expression
>>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>>> for quite a while and this hack is the only answer I found, is there a
>>>>> better way to do that. I could for example introduce another
>>>>> tree-node, but it would be overkill as well.
>>>>>
>>>>> Now why do I need it so much:
>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>>> always). To check the stuff, I use is_vector_comparison in
>>>>> tree-vect-generic.
>>>>>
>>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>>> mask.
>>>>>
>>>>> If this link between optabs and backend breaks, then the patch falls
>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>>> function-comparison, or somehow else?
>>>>>
>>>>> So what would be an appropriate way to connect optabs and the backend?
>>>>
>>>> Well, there is no problem in having the only valid mask operand for
>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>>> vec1 : vec2.
>>>
>>> This happens already in the new version of patch (not submitted yet).
>>>
>>>> This comparison can be eliminated by optimization passes
>>>> that
>>>> either replace it by the real comparison computing the mask or just
>>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>>> dropping the comparison against zero.
>>>
>>> This is not a problem, because the backend recognizes these patterns,
>>> so no optimization is needed in this part.
>>
>> I mean for
>>
>>  mask = v1 < v2 ? {-1,...} : {0,...};
>>  val = VCOND_EXPR <mask != 0, v3, v4>;
>>
>> optimizers can see how mask is defined and drop the != 0 test or replace
>> it by v1 < v2.
>
> Yes, sure.
>
>>>> For the backends I'd have vcond patterns for both an embedded comparison
>>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>>> selection - what matters is the C frontend semantics which we need to
>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>>> do not have to agree).
>>>
>>> But it seems like another combinatorial explosion here. Considering
>>> what Richard said in his e-mail, in order to support "generic" vcond,
>>> we just need to enumerate all the possible cases. Or I didn't
>>> understand right?
>>
>> Well, the question is still what VCOND_EXPR and thus the vcond pattern
>> semantically does for a non-comparison operand.  I'd argue that using
>> the bitwise selection semantic gives us maximum flexibility and a native
>> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
>> also easy to implement in the middle-end expansion code if there is
>> no native instruction for it - by simply emitting the bitwise operations.
>>
>> But I have the feeling we are talking past each other ...?
>
> I am all for the bitwise behaviour in the backend pattern, that is
> something that I rely on at the moment. What I don't want to have is
> the same behaviour in the frontend. So If we can guarantee, that we
> add != 0, when we don't know the "nature" of the mask, then I am
> perfectly fine with the back-end having bitwise-selection behaviour.

Well, the C frontend would simply always add that != 0 (because it
doesn't know).

>>> I mean, I don't mind of course, but it seems to me that it would be
>>> cleaner to have one generic enough pattern.
>>>
>>> Is there seriously no way to pass something from optab into the backend??
>>
>> You can pass operands.  And information is implicitly encoded in the name.
>
> I didn't quite get that, could you give an example?

It was a larger variant of "no, apart from what is obvious".

>>>> If the mask is computed by a function you are of course out of luck,
>>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>>
>>> Well, take simpler example
>>>
>>> a = {0};
>>> for ( ; *p; p += 16)
>>>  a &= pattern > (vec)*p;
>>>
>>> res = a ? v0 : v1;
>>>
>>> In this case it is simple to analyse that a is a comparison, but you
>>> cannot embed the operations of a into VEC_COND_EXPR.
>>
>> Sure, but if the above is C source the frontend would generate
>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>> vector contents though).
>
> Yeah, sure. My point is, that we must be able to pass this information
> in the backend, that we checked everything, and we are sure that a is
> a corerct mask, please don't add any != 0 to it.

But all masks are correct as soon as they appear in a VEC_COND_EXPR.
That's the whole point of the bitwise semantics.  It's only the C frontend
that needs to be careful to impose its stricter semantics.

Richard.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 16:25                                               ` Richard Guenther
@ 2011-08-22 17:16                                                 ` Artem Shinkarov
  2011-08-22 21:07                                                   ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-22 17:16 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>> Richard
>>>>>>>>
>>>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>>>> the several important approaches that I use in the patch.
>>>>>>>>
>>>>>>>> So how does it work.
>>>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>>>> v1, {-1}, {0}>.
>>>>>>>>
>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>>>> this case, recognize it in the backend, and do not create a
>>>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>>>
>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>>>> false, then we replace mask with mask != {0}.
>>>>>>>>
>>>>>>>> So we end-up with the following functionality:
>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>>>> mask != {0}.
>>>>>>>>
>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>>>> which correctly expands, without creating useless masking.
>>>>>>>>
>>>>>>>>
>>>>>>>> Basically for me there are two questions:
>>>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>>>> But first is it conceptually fine.
>>>>>>>>
>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>>>> which I find inappropriate, and the functionality gets more
>>>>>>>> complicated.
>>>>>>>
>>>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>>>> separates comparison and ?: expression it's his own fault.
>>>>>>
>>>>>> Well, because all the information is there, and I perfectly envision
>>>>>> the case when user expressed comparison separately, just to avoid code
>>>>>> duplication.
>>>>>>
>>>>>> Like:
>>>>>> mask = a > b;
>>>>>> res1 = mask ? v0 : v1;
>>>>>> res2 = mask ? v2 : v3;
>>>>>>
>>>>>> Which in this case would be different from
>>>>>> res1 = a > b ? v0 : v1;
>>>>>> res2 = a > b ? v2 : v3;
>>>>>>
>>>>>>> Btw, the new hook is still in the patch.
>>>>>>>
>>>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>>>> non-comparison operands during expansion though, but if
>>>>>>> you always forced a != 0 from the start you can then simply
>>>>>>> interpret it as a proper comparison result (in which case I'd
>>>>>>> modify the backends to have an alternate pattern or directly
>>>>>>> expand to masking operations - using the fake comparison
>>>>>>> RTX is too much of a hack).
>>>>>>
>>>>>> Richard, I think you didn't get the problem.
>>>>>> I really need the way, to pass the information, that the expression
>>>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>>>> for quite a while and this hack is the only answer I found, is there a
>>>>>> better way to do that. I could for example introduce another
>>>>>> tree-node, but it would be overkill as well.
>>>>>>
>>>>>> Now why do I need it so much:
>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>>>> always). To check the stuff, I use is_vector_comparison in
>>>>>> tree-vect-generic.
>>>>>>
>>>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>>>> mask.
>>>>>>
>>>>>> If this link between optabs and backend breaks, then the patch falls
>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>>>> function-comparison, or somehow else?
>>>>>>
>>>>>> So what would be an appropriate way to connect optabs and the backend?
>>>>>
>>>>> Well, there is no problem in having the only valid mask operand for
>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>>>> vec1 : vec2.
>>>>
>>>> This happens already in the new version of patch (not submitted yet).
>>>>
>>>>> This comparison can be eliminated by optimization passes
>>>>> that
>>>>> either replace it by the real comparison computing the mask or just
>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>>>> dropping the comparison against zero.
>>>>
>>>> This is not a problem, because the backend recognizes these patterns,
>>>> so no optimization is needed in this part.
>>>
>>> I mean for
>>>
>>>  mask = v1 < v2 ? {-1,...} : {0,...};
>>>  val = VCOND_EXPR <mask != 0, v3, v4>;
>>>
>>> optimizers can see how mask is defined and drop the != 0 test or replace
>>> it by v1 < v2.
>>
>> Yes, sure.
>>
>>>>> For the backends I'd have vcond patterns for both an embedded comparison
>>>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>>>> selection - what matters is the C frontend semantics which we need to
>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>>>> do not have to agree).
>>>>
>>>> But it seems like another combinatorial explosion here. Considering
>>>> what Richard said in his e-mail, in order to support "generic" vcond,
>>>> we just need to enumerate all the possible cases. Or I didn't
>>>> understand right?
>>>
>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern
>>> semantically does for a non-comparison operand.  I'd argue that using
>>> the bitwise selection semantic gives us maximum flexibility and a native
>>> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
>>> also easy to implement in the middle-end expansion code if there is
>>> no native instruction for it - by simply emitting the bitwise operations.
>>>
>>> But I have the feeling we are talking past each other ...?
>>
>> I am all for the bitwise behaviour in the backend pattern, that is
>> something that I rely on at the moment. What I don't want to have is
>> the same behaviour in the frontend. So If we can guarantee, that we
>> add != 0, when we don't know the "nature" of the mask, then I am
>> perfectly fine with the back-end having bitwise-selection behaviour.
>
> Well, the C frontend would simply always add that != 0 (because it
> doesn't know).
>
>>>> I mean, I don't mind of course, but it seems to me that it would be
>>>> cleaner to have one generic enough pattern.
>>>>
>>>> Is there seriously no way to pass something from optab into the backend??
>>>
>>> You can pass operands.  And information is implicitly encoded in the name.
>>
>> I didn't quite get that, could you give an example?
>
> It was a larger variant of "no, apart from what is obvious".

Ha, joking again. :)

>>>>> If the mask is computed by a function you are of course out of luck,
>>>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>>>
>>>> Well, take simpler example
>>>>
>>>> a = {0};
>>>> for ( ; *p; p += 16)
>>>>  a &= pattern > (vec)*p;
>>>>
>>>> res = a ? v0 : v1;
>>>>
>>>> In this case it is simple to analyse that a is a comparison, but you
>>>> cannot embed the operations of a into VEC_COND_EXPR.
>>>
>>> Sure, but if the above is C source the frontend would generate
>>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>>> vector contents though).
>>
>> Yeah, sure. My point is, that we must be able to pass this information
>> in the backend, that we checked everything, and we are sure that a is
>> a corerct mask, please don't add any != 0 to it.
>
> But all masks are correct as soon as they appear in a VEC_COND_EXPR.
> That's the whole point of the bitwise semantics.  It's only the C frontend
> that needs to be careful to impose its stricter semantics.
>
> Richard.
>

Ok, I see the last difference in the approaches we envision.
I am assuming, that frontend does not put != 0, but the later
optimisations (veclower in my case) check every mask in VEC_COND_EXPR
and does the same functionality as you describe. So the philosophical
question why it is better to first add and then remove, rather than
just add if needed?

In all the rest I think we agreed.


Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 15:57                                           ` Richard Guenther
  2011-08-22 16:02                                             ` Artem Shinkarov
@ 2011-08-22 20:46                                             ` Uros Bizjak
  2011-08-22 20:58                                               ` Richard Guenther
                                                                 ` (2 more replies)
  1 sibling, 3 replies; 91+ messages in thread
From: Uros Bizjak @ 2011-08-22 20:46 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Artem Shinkarov, Richard Henderson, gcc-patches, Joseph S. Myers

On Mon, Aug 22, 2011 at 5:34 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:

>> In this case it is simple to analyse that a is a comparison, but you
>> cannot embed the operations of a into VEC_COND_EXPR.
>
> Sure, but if the above is C source the frontend would generate
> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
> vector contents though).
>
>> Ok, I am testing the patch that removes hooks. Could you push a little
>> bit the backend-patterns business?
>
> Well, I suppose we're waiting for Uros here.  I hadn't much luck with
> fiddling with the mode-iterator stuff myself.

It is not _that_ trivial change, since we have ix86_expand_fp_vcond
and ix86_expand_int_vcond to merge. ATM, FP version deals with FP
operands and vice versa. We have to merge them somehow and split out
comparison part that handles FP as well as integer operands.

I also don't know why vcond is not allowed to FAIL... probably
middle-end should be enhanced for a fallback if some comparison isn't
supported by optab.

Uros.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 20:46                                             ` Uros Bizjak
@ 2011-08-22 20:58                                               ` Richard Guenther
  2011-08-22 21:12                                               ` Artem Shinkarov
  2011-08-29 12:54                                               ` Richard Guenther
  2 siblings, 0 replies; 91+ messages in thread
From: Richard Guenther @ 2011-08-22 20:58 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Artem Shinkarov, Richard Henderson, gcc-patches, Joseph S. Myers

On Mon, Aug 22, 2011 at 9:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 5:34 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>
>>> In this case it is simple to analyse that a is a comparison, but you
>>> cannot embed the operations of a into VEC_COND_EXPR.
>>
>> Sure, but if the above is C source the frontend would generate
>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>> vector contents though).
>>
>>> Ok, I am testing the patch that removes hooks. Could you push a little
>>> bit the backend-patterns business?
>>
>> Well, I suppose we're waiting for Uros here.  I hadn't much luck with
>> fiddling with the mode-iterator stuff myself.
>
> It is not _that_ trivial change, since we have ix86_expand_fp_vcond
> and ix86_expand_int_vcond to merge. ATM, FP version deals with FP
> operands and vice versa. We have to merge them somehow and split out
> comparison part that handles FP as well as integer operands.

Yeah, I tried to keep it split in fp and int compare variants but allow
the result mode and the 2nd and 3rd operand modes to vary differently
but failed to build a mode iterator that would do this...  OTOH I'm not
sure how merging fp and int compare modes would simplify things here.

> I also don't know why vcond is not allowed to FAIL... probably
> middle-end should be enhanced for a fallback if some comparison isn't
> supported by optab.

The vectorizer currently uses the optab presence to verify if the
target supports a given vec-cond-expr.  I'm not sure if we can
make its test more finegrained easily.

Richard.

>
> Uros.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 17:16                                                 ` Artem Shinkarov
@ 2011-08-22 21:07                                                   ` Richard Guenther
  2011-08-22 21:53                                                     ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-22 21:07 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>> Richard
>>>>>>>>>
>>>>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>>>>> the several important approaches that I use in the patch.
>>>>>>>>>
>>>>>>>>> So how does it work.
>>>>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>>>>> v1, {-1}, {0}>.
>>>>>>>>>
>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>>>>> this case, recognize it in the backend, and do not create a
>>>>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>>>>
>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>>>>> false, then we replace mask with mask != {0}.
>>>>>>>>>
>>>>>>>>> So we end-up with the following functionality:
>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>>>>> mask != {0}.
>>>>>>>>>
>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>>>>> which correctly expands, without creating useless masking.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Basically for me there are two questions:
>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>>>>> But first is it conceptually fine.
>>>>>>>>>
>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>>>>> which I find inappropriate, and the functionality gets more
>>>>>>>>> complicated.
>>>>>>>>
>>>>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>>>>> separates comparison and ?: expression it's his own fault.
>>>>>>>
>>>>>>> Well, because all the information is there, and I perfectly envision
>>>>>>> the case when user expressed comparison separately, just to avoid code
>>>>>>> duplication.
>>>>>>>
>>>>>>> Like:
>>>>>>> mask = a > b;
>>>>>>> res1 = mask ? v0 : v1;
>>>>>>> res2 = mask ? v2 : v3;
>>>>>>>
>>>>>>> Which in this case would be different from
>>>>>>> res1 = a > b ? v0 : v1;
>>>>>>> res2 = a > b ? v2 : v3;
>>>>>>>
>>>>>>>> Btw, the new hook is still in the patch.
>>>>>>>>
>>>>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>>>>> non-comparison operands during expansion though, but if
>>>>>>>> you always forced a != 0 from the start you can then simply
>>>>>>>> interpret it as a proper comparison result (in which case I'd
>>>>>>>> modify the backends to have an alternate pattern or directly
>>>>>>>> expand to masking operations - using the fake comparison
>>>>>>>> RTX is too much of a hack).
>>>>>>>
>>>>>>> Richard, I think you didn't get the problem.
>>>>>>> I really need the way, to pass the information, that the expression
>>>>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>>>>> for quite a while and this hack is the only answer I found, is there a
>>>>>>> better way to do that. I could for example introduce another
>>>>>>> tree-node, but it would be overkill as well.
>>>>>>>
>>>>>>> Now why do I need it so much:
>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>>>>> always). To check the stuff, I use is_vector_comparison in
>>>>>>> tree-vect-generic.
>>>>>>>
>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>>>>> mask.
>>>>>>>
>>>>>>> If this link between optabs and backend breaks, then the patch falls
>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>>>>> function-comparison, or somehow else?
>>>>>>>
>>>>>>> So what would be an appropriate way to connect optabs and the backend?
>>>>>>
>>>>>> Well, there is no problem in having the only valid mask operand for
>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>>>>> vec1 : vec2.
>>>>>
>>>>> This happens already in the new version of patch (not submitted yet).
>>>>>
>>>>>> This comparison can be eliminated by optimization passes
>>>>>> that
>>>>>> either replace it by the real comparison computing the mask or just
>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>>>>> dropping the comparison against zero.
>>>>>
>>>>> This is not a problem, because the backend recognizes these patterns,
>>>>> so no optimization is needed in this part.
>>>>
>>>> I mean for
>>>>
>>>>  mask = v1 < v2 ? {-1,...} : {0,...};
>>>>  val = VCOND_EXPR <mask != 0, v3, v4>;
>>>>
>>>> optimizers can see how mask is defined and drop the != 0 test or replace
>>>> it by v1 < v2.
>>>
>>> Yes, sure.
>>>
>>>>>> For the backends I'd have vcond patterns for both an embedded comparison
>>>>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>>>>> selection - what matters is the C frontend semantics which we need to
>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>>>>> do not have to agree).
>>>>>
>>>>> But it seems like another combinatorial explosion here. Considering
>>>>> what Richard said in his e-mail, in order to support "generic" vcond,
>>>>> we just need to enumerate all the possible cases. Or I didn't
>>>>> understand right?
>>>>
>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern
>>>> semantically does for a non-comparison operand.  I'd argue that using
>>>> the bitwise selection semantic gives us maximum flexibility and a native
>>>> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
>>>> also easy to implement in the middle-end expansion code if there is
>>>> no native instruction for it - by simply emitting the bitwise operations.
>>>>
>>>> But I have the feeling we are talking past each other ...?
>>>
>>> I am all for the bitwise behaviour in the backend pattern, that is
>>> something that I rely on at the moment. What I don't want to have is
>>> the same behaviour in the frontend. So If we can guarantee, that we
>>> add != 0, when we don't know the "nature" of the mask, then I am
>>> perfectly fine with the back-end having bitwise-selection behaviour.
>>
>> Well, the C frontend would simply always add that != 0 (because it
>> doesn't know).
>>
>>>>> I mean, I don't mind of course, but it seems to me that it would be
>>>>> cleaner to have one generic enough pattern.
>>>>>
>>>>> Is there seriously no way to pass something from optab into the backend??
>>>>
>>>> You can pass operands.  And information is implicitly encoded in the name.
>>>
>>> I didn't quite get that, could you give an example?
>>
>> It was a larger variant of "no, apart from what is obvious".
>
> Ha, joking again. :)
>
>>>>>> If the mask is computed by a function you are of course out of luck,
>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>>>>
>>>>> Well, take simpler example
>>>>>
>>>>> a = {0};
>>>>> for ( ; *p; p += 16)
>>>>>  a &= pattern > (vec)*p;
>>>>>
>>>>> res = a ? v0 : v1;
>>>>>
>>>>> In this case it is simple to analyse that a is a comparison, but you
>>>>> cannot embed the operations of a into VEC_COND_EXPR.
>>>>
>>>> Sure, but if the above is C source the frontend would generate
>>>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>>>> vector contents though).
>>>
>>> Yeah, sure. My point is, that we must be able to pass this information
>>> in the backend, that we checked everything, and we are sure that a is
>>> a corerct mask, please don't add any != 0 to it.
>>
>> But all masks are correct as soon as they appear in a VEC_COND_EXPR.
>> That's the whole point of the bitwise semantics.  It's only the C frontend
>> that needs to be careful to impose its stricter semantics.
>>
>> Richard.
>>
>
> Ok, I see the last difference in the approaches we envision.
> I am assuming, that frontend does not put != 0, but the later
> optimisations (veclower in my case) check every mask in VEC_COND_EXPR
> and does the same functionality as you describe. So the philosophical
> question why it is better to first add and then remove, rather than
> just add if needed?

Well, it's "better be right than sorry".  Thus, default to the
conservatively correct
way and let optimizers "optimize" it.

> In all the rest I think we agreed.

Fine.

Thanks,
Richard.

>
> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 20:46                                             ` Uros Bizjak
  2011-08-22 20:58                                               ` Richard Guenther
@ 2011-08-22 21:12                                               ` Artem Shinkarov
  2011-08-29 12:54                                               ` Richard Guenther
  2 siblings, 0 replies; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-22 21:12 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Richard Guenther, Richard Henderson, gcc-patches, Joseph S. Myers

On Mon, Aug 22, 2011 at 8:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 5:34 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>
>>> In this case it is simple to analyse that a is a comparison, but you
>>> cannot embed the operations of a into VEC_COND_EXPR.
>>
>> Sure, but if the above is C source the frontend would generate
>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>> vector contents though).
>>
>>> Ok, I am testing the patch that removes hooks. Could you push a little
>>> bit the backend-patterns business?
>>
>> Well, I suppose we're waiting for Uros here.  I hadn't much luck with
>> fiddling with the mode-iterator stuff myself.
>
> It is not _that_ trivial change, since we have ix86_expand_fp_vcond
> and ix86_expand_int_vcond to merge. ATM, FP version deals with FP
> operands and vice versa. We have to merge them somehow and split out
> comparison part that handles FP as well as integer operands.
>
> I also don't know why vcond is not allowed to FAIL... probably
> middle-end should be enhanced for a fallback if some comparison isn't
> supported by optab.
>
> Uros.
>

Uros

My biggest problem in the backend is to create a correct description
in *.md, which would accept the generic case. I can imagine adding all
the cases, but as I mentioned already it explodes. I mean, we will
have to do it for SSE, then the same number of cases for AVX, and so
on. I would assume that there is a chance to persuade md, that the
only thing that matters is the size of the element type of mask and
operands.

If you can help me with the pattern, I am happy to merge x86 code.


Thanks,
Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 21:07                                                   ` Richard Guenther
@ 2011-08-22 21:53                                                     ` Artem Shinkarov
  2011-08-22 22:39                                                       ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-22 21:53 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>> Richard
>>>>>>>>>>
>>>>>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>>>>>> the several important approaches that I use in the patch.
>>>>>>>>>>
>>>>>>>>>> So how does it work.
>>>>>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>>>>>> v1, {-1}, {0}>.
>>>>>>>>>>
>>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>>>>>> this case, recognize it in the backend, and do not create a
>>>>>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>>>>>
>>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>>>>>> false, then we replace mask with mask != {0}.
>>>>>>>>>>
>>>>>>>>>> So we end-up with the following functionality:
>>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>>>>>> mask != {0}.
>>>>>>>>>>
>>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>>>>>> which correctly expands, without creating useless masking.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Basically for me there are two questions:
>>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>>>>>> But first is it conceptually fine.
>>>>>>>>>>
>>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>>>>>> which I find inappropriate, and the functionality gets more
>>>>>>>>>> complicated.
>>>>>>>>>
>>>>>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>>>>>> separates comparison and ?: expression it's his own fault.
>>>>>>>>
>>>>>>>> Well, because all the information is there, and I perfectly envision
>>>>>>>> the case when user expressed comparison separately, just to avoid code
>>>>>>>> duplication.
>>>>>>>>
>>>>>>>> Like:
>>>>>>>> mask = a > b;
>>>>>>>> res1 = mask ? v0 : v1;
>>>>>>>> res2 = mask ? v2 : v3;
>>>>>>>>
>>>>>>>> Which in this case would be different from
>>>>>>>> res1 = a > b ? v0 : v1;
>>>>>>>> res2 = a > b ? v2 : v3;
>>>>>>>>
>>>>>>>>> Btw, the new hook is still in the patch.
>>>>>>>>>
>>>>>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>>>>>> non-comparison operands during expansion though, but if
>>>>>>>>> you always forced a != 0 from the start you can then simply
>>>>>>>>> interpret it as a proper comparison result (in which case I'd
>>>>>>>>> modify the backends to have an alternate pattern or directly
>>>>>>>>> expand to masking operations - using the fake comparison
>>>>>>>>> RTX is too much of a hack).
>>>>>>>>
>>>>>>>> Richard, I think you didn't get the problem.
>>>>>>>> I really need the way, to pass the information, that the expression
>>>>>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>>>>>> for quite a while and this hack is the only answer I found, is there a
>>>>>>>> better way to do that. I could for example introduce another
>>>>>>>> tree-node, but it would be overkill as well.
>>>>>>>>
>>>>>>>> Now why do I need it so much:
>>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>>>>>> always). To check the stuff, I use is_vector_comparison in
>>>>>>>> tree-vect-generic.
>>>>>>>>
>>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>>>>>> mask.
>>>>>>>>
>>>>>>>> If this link between optabs and backend breaks, then the patch falls
>>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>>>>>> function-comparison, or somehow else?
>>>>>>>>
>>>>>>>> So what would be an appropriate way to connect optabs and the backend?
>>>>>>>
>>>>>>> Well, there is no problem in having the only valid mask operand for
>>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>>>>>> vec1 : vec2.
>>>>>>
>>>>>> This happens already in the new version of patch (not submitted yet).
>>>>>>
>>>>>>> This comparison can be eliminated by optimization passes
>>>>>>> that
>>>>>>> either replace it by the real comparison computing the mask or just
>>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>>>>>> dropping the comparison against zero.
>>>>>>
>>>>>> This is not a problem, because the backend recognizes these patterns,
>>>>>> so no optimization is needed in this part.
>>>>>
>>>>> I mean for
>>>>>
>>>>>  mask = v1 < v2 ? {-1,...} : {0,...};
>>>>>  val = VCOND_EXPR <mask != 0, v3, v4>;
>>>>>
>>>>> optimizers can see how mask is defined and drop the != 0 test or replace
>>>>> it by v1 < v2.
>>>>
>>>> Yes, sure.
>>>>
>>>>>>> For the backends I'd have vcond patterns for both an embedded comparison
>>>>>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>>>>>> selection - what matters is the C frontend semantics which we need to
>>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>>>>>> do not have to agree).
>>>>>>
>>>>>> But it seems like another combinatorial explosion here. Considering
>>>>>> what Richard said in his e-mail, in order to support "generic" vcond,
>>>>>> we just need to enumerate all the possible cases. Or I didn't
>>>>>> understand right?
>>>>>
>>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern
>>>>> semantically does for a non-comparison operand.  I'd argue that using
>>>>> the bitwise selection semantic gives us maximum flexibility and a native
>>>>> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
>>>>> also easy to implement in the middle-end expansion code if there is
>>>>> no native instruction for it - by simply emitting the bitwise operations.
>>>>>
>>>>> But I have the feeling we are talking past each other ...?
>>>>
>>>> I am all for the bitwise behaviour in the backend pattern, that is
>>>> something that I rely on at the moment. What I don't want to have is
>>>> the same behaviour in the frontend. So If we can guarantee, that we
>>>> add != 0, when we don't know the "nature" of the mask, then I am
>>>> perfectly fine with the back-end having bitwise-selection behaviour.
>>>
>>> Well, the C frontend would simply always add that != 0 (because it
>>> doesn't know).
>>>
>>>>>> I mean, I don't mind of course, but it seems to me that it would be
>>>>>> cleaner to have one generic enough pattern.
>>>>>>
>>>>>> Is there seriously no way to pass something from optab into the backend??
>>>>>
>>>>> You can pass operands.  And information is implicitly encoded in the name.
>>>>
>>>> I didn't quite get that, could you give an example?
>>>
>>> It was a larger variant of "no, apart from what is obvious".
>>
>> Ha, joking again. :)
>>
>>>>>>> If the mask is computed by a function you are of course out of luck,
>>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>>>>>
>>>>>> Well, take simpler example
>>>>>>
>>>>>> a = {0};
>>>>>> for ( ; *p; p += 16)
>>>>>>  a &= pattern > (vec)*p;
>>>>>>
>>>>>> res = a ? v0 : v1;
>>>>>>
>>>>>> In this case it is simple to analyse that a is a comparison, but you
>>>>>> cannot embed the operations of a into VEC_COND_EXPR.
>>>>>
>>>>> Sure, but if the above is C source the frontend would generate
>>>>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>>>>> vector contents though).
>>>>
>>>> Yeah, sure. My point is, that we must be able to pass this information
>>>> in the backend, that we checked everything, and we are sure that a is
>>>> a corerct mask, please don't add any != 0 to it.
>>>
>>> But all masks are correct as soon as they appear in a VEC_COND_EXPR.
>>> That's the whole point of the bitwise semantics.  It's only the C frontend
>>> that needs to be careful to impose its stricter semantics.
>>>
>>> Richard.
>>>
>>
>> Ok, I see the last difference in the approaches we envision.
>> I am assuming, that frontend does not put != 0, but the later
>> optimisations (veclower in my case) check every mask in VEC_COND_EXPR
>> and does the same functionality as you describe. So the philosophical
>> question why it is better to first add and then remove, rather than
>> just add if needed?
>
> Well, it's "better be right than sorry".  Thus, default to the
> conservatively correct
> way and let optimizers "optimize" it.

How can we get sorry, it is impossible to skip the vcond during the
optimisation, but whatever, it is not really so important when to add.
Currently I have a bigger problem, see below.
>
>> In all the rest I think we agreed.
>
> Fine.
>
> Thanks,
> Richard.
>
>>
>> Artem.
>>
>

I found out that I cannot really gimplify correctly the vcond<a >b ,
c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is
that gimplifier pulls a > b always as a separate expression during the
gimplification, and I don't think that we can avoid it. So what
happens is:

vcond <a > b , c , d>
transformed to
x = b > c;
x1 = vcond <x , -1, 0>
vcond <x1, c, d>

and so on, infinitely long.

In order to fix the problem we need whether to introduce a new code
like VEC_COMP_LT, VEC_COMP_GT, and so on
whether a builtin function which we would lower
whether stick back to the idea of hook.

Anyway, representing a >b using vcond does not work.


What would be your thinking here?


Thanks,
Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 21:53                                                     ` Artem Shinkarov
@ 2011-08-22 22:39                                                       ` Richard Guenther
  2011-08-22 23:13                                                         ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-22 22:39 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Mon, Aug 22, 2011 at 10:49 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>>> Richard
>>>>>>>>>>>
>>>>>>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>>>>>>> the several important approaches that I use in the patch.
>>>>>>>>>>>
>>>>>>>>>>> So how does it work.
>>>>>>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>>>>>>> v1, {-1}, {0}>.
>>>>>>>>>>>
>>>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>>>>>>> this case, recognize it in the backend, and do not create a
>>>>>>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>>>>>>
>>>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>>>>>>> false, then we replace mask with mask != {0}.
>>>>>>>>>>>
>>>>>>>>>>> So we end-up with the following functionality:
>>>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>>>>>>> mask != {0}.
>>>>>>>>>>>
>>>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>>>>>>> which correctly expands, without creating useless masking.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Basically for me there are two questions:
>>>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>>>>>>> But first is it conceptually fine.
>>>>>>>>>>>
>>>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>>>>>>> which I find inappropriate, and the functionality gets more
>>>>>>>>>>> complicated.
>>>>>>>>>>
>>>>>>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>>>>>>> separates comparison and ?: expression it's his own fault.
>>>>>>>>>
>>>>>>>>> Well, because all the information is there, and I perfectly envision
>>>>>>>>> the case when user expressed comparison separately, just to avoid code
>>>>>>>>> duplication.
>>>>>>>>>
>>>>>>>>> Like:
>>>>>>>>> mask = a > b;
>>>>>>>>> res1 = mask ? v0 : v1;
>>>>>>>>> res2 = mask ? v2 : v3;
>>>>>>>>>
>>>>>>>>> Which in this case would be different from
>>>>>>>>> res1 = a > b ? v0 : v1;
>>>>>>>>> res2 = a > b ? v2 : v3;
>>>>>>>>>
>>>>>>>>>> Btw, the new hook is still in the patch.
>>>>>>>>>>
>>>>>>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>>>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>>>>>>> non-comparison operands during expansion though, but if
>>>>>>>>>> you always forced a != 0 from the start you can then simply
>>>>>>>>>> interpret it as a proper comparison result (in which case I'd
>>>>>>>>>> modify the backends to have an alternate pattern or directly
>>>>>>>>>> expand to masking operations - using the fake comparison
>>>>>>>>>> RTX is too much of a hack).
>>>>>>>>>
>>>>>>>>> Richard, I think you didn't get the problem.
>>>>>>>>> I really need the way, to pass the information, that the expression
>>>>>>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>>>>>>> for quite a while and this hack is the only answer I found, is there a
>>>>>>>>> better way to do that. I could for example introduce another
>>>>>>>>> tree-node, but it would be overkill as well.
>>>>>>>>>
>>>>>>>>> Now why do I need it so much:
>>>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>>>>>>> always). To check the stuff, I use is_vector_comparison in
>>>>>>>>> tree-vect-generic.
>>>>>>>>>
>>>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>>>>>>> mask.
>>>>>>>>>
>>>>>>>>> If this link between optabs and backend breaks, then the patch falls
>>>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>>>>>>> function-comparison, or somehow else?
>>>>>>>>>
>>>>>>>>> So what would be an appropriate way to connect optabs and the backend?
>>>>>>>>
>>>>>>>> Well, there is no problem in having the only valid mask operand for
>>>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>>>>>>> vec1 : vec2.
>>>>>>>
>>>>>>> This happens already in the new version of patch (not submitted yet).
>>>>>>>
>>>>>>>> This comparison can be eliminated by optimization passes
>>>>>>>> that
>>>>>>>> either replace it by the real comparison computing the mask or just
>>>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>>>>>>> dropping the comparison against zero.
>>>>>>>
>>>>>>> This is not a problem, because the backend recognizes these patterns,
>>>>>>> so no optimization is needed in this part.
>>>>>>
>>>>>> I mean for
>>>>>>
>>>>>>  mask = v1 < v2 ? {-1,...} : {0,...};
>>>>>>  val = VCOND_EXPR <mask != 0, v3, v4>;
>>>>>>
>>>>>> optimizers can see how mask is defined and drop the != 0 test or replace
>>>>>> it by v1 < v2.
>>>>>
>>>>> Yes, sure.
>>>>>
>>>>>>>> For the backends I'd have vcond patterns for both an embedded comparison
>>>>>>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>>>>>>> selection - what matters is the C frontend semantics which we need to
>>>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>>>>>>> do not have to agree).
>>>>>>>
>>>>>>> But it seems like another combinatorial explosion here. Considering
>>>>>>> what Richard said in his e-mail, in order to support "generic" vcond,
>>>>>>> we just need to enumerate all the possible cases. Or I didn't
>>>>>>> understand right?
>>>>>>
>>>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern
>>>>>> semantically does for a non-comparison operand.  I'd argue that using
>>>>>> the bitwise selection semantic gives us maximum flexibility and a native
>>>>>> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
>>>>>> also easy to implement in the middle-end expansion code if there is
>>>>>> no native instruction for it - by simply emitting the bitwise operations.
>>>>>>
>>>>>> But I have the feeling we are talking past each other ...?
>>>>>
>>>>> I am all for the bitwise behaviour in the backend pattern, that is
>>>>> something that I rely on at the moment. What I don't want to have is
>>>>> the same behaviour in the frontend. So If we can guarantee, that we
>>>>> add != 0, when we don't know the "nature" of the mask, then I am
>>>>> perfectly fine with the back-end having bitwise-selection behaviour.
>>>>
>>>> Well, the C frontend would simply always add that != 0 (because it
>>>> doesn't know).
>>>>
>>>>>>> I mean, I don't mind of course, but it seems to me that it would be
>>>>>>> cleaner to have one generic enough pattern.
>>>>>>>
>>>>>>> Is there seriously no way to pass something from optab into the backend??
>>>>>>
>>>>>> You can pass operands.  And information is implicitly encoded in the name.
>>>>>
>>>>> I didn't quite get that, could you give an example?
>>>>
>>>> It was a larger variant of "no, apart from what is obvious".
>>>
>>> Ha, joking again. :)
>>>
>>>>>>>> If the mask is computed by a function you are of course out of luck,
>>>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>>>>>>
>>>>>>> Well, take simpler example
>>>>>>>
>>>>>>> a = {0};
>>>>>>> for ( ; *p; p += 16)
>>>>>>>  a &= pattern > (vec)*p;
>>>>>>>
>>>>>>> res = a ? v0 : v1;
>>>>>>>
>>>>>>> In this case it is simple to analyse that a is a comparison, but you
>>>>>>> cannot embed the operations of a into VEC_COND_EXPR.
>>>>>>
>>>>>> Sure, but if the above is C source the frontend would generate
>>>>>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>>>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>>>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>>>>>> vector contents though).
>>>>>
>>>>> Yeah, sure. My point is, that we must be able to pass this information
>>>>> in the backend, that we checked everything, and we are sure that a is
>>>>> a corerct mask, please don't add any != 0 to it.
>>>>
>>>> But all masks are correct as soon as they appear in a VEC_COND_EXPR.
>>>> That's the whole point of the bitwise semantics.  It's only the C frontend
>>>> that needs to be careful to impose its stricter semantics.
>>>>
>>>> Richard.
>>>>
>>>
>>> Ok, I see the last difference in the approaches we envision.
>>> I am assuming, that frontend does not put != 0, but the later
>>> optimisations (veclower in my case) check every mask in VEC_COND_EXPR
>>> and does the same functionality as you describe. So the philosophical
>>> question why it is better to first add and then remove, rather than
>>> just add if needed?
>>
>> Well, it's "better be right than sorry".  Thus, default to the
>> conservatively correct
>> way and let optimizers "optimize" it.
>
> How can we get sorry, it is impossible to skip the vcond during the
> optimisation, but whatever, it is not really so important when to add.
> Currently I have a bigger problem, see below.
>>
>>> In all the rest I think we agreed.
>>
>> Fine.
>>
>> Thanks,
>> Richard.
>>
>>>
>>> Artem.
>>>
>>
>
> I found out that I cannot really gimplify correctly the vcond<a >b ,
> c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is
> that gimplifier pulls a > b always as a separate expression during the
> gimplification, and I don't think that we can avoid it. So what
> happens is:
>
> vcond <a > b , c , d>
> transformed to
> x = b > c;
> x1 = vcond <x , -1, 0>
> vcond <x1, c, d>
>
> and so on, infinitely long.

Sounds like a bug that is possible to fix.

> In order to fix the problem we need whether to introduce a new code
> like VEC_COMP_LT, VEC_COMP_GT, and so on
> whether a builtin function which we would lower
> whether stick back to the idea of hook.
>
> Anyway, representing a >b using vcond does not work.

Well, sure it will work, it just needs some work appearantly.

> What would be your thinking here?

Do you have a patch that exposes this problem?  I can have a look
tomorrow.

Richard.

>
> Thanks,
> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 22:39                                                       ` Richard Guenther
@ 2011-08-22 23:13                                                         ` Artem Shinkarov
  2011-08-23  9:53                                                           ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-22 23:13 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 14021 bytes --]

I'll just send you my current version. I'll be a little bit more specific.

The problem starts when you try to lower the following expression:

x = a > b;
x1 = vcond <x != 0, -1, 0>
vcond <x1, c, d>

Now, you go from the beginning to the end of the block, and you cannot
leave a > b, because only vconds are valid expressions to expand.

Now, you meet a > b first. You try to transform it into vcond <a > b,
-1, 0>, you build this expression, then you try to gimplify it, and
you see that you have something like:

x' = a >b;
x = vcond <x', -1, 0>
x1 = vcond <x != 0, -1, 0>
vcond <x1, c, d>

and your gsi stands at the x1 now, so the gimplification created a
comparison that optab would not understand. And I am not really sure
that you would be able to solve this problem easily.

It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
cant and x op y is a single tree that must be gimplified, and I am not
sure that you can persuade gimplifier to leave this expression
untouched.

In the attachment the current version of the patch.


Thanks,
Artem.


On Mon, Aug 22, 2011 at 9:58 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 10:49 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>>>> Richard
>>>>>>>>>>>>
>>>>>>>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>>>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>>>>>>>> the several important approaches that I use in the patch.
>>>>>>>>>>>>
>>>>>>>>>>>> So how does it work.
>>>>>>>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>>>>>>>> v1, {-1}, {0}>.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>>>>>>>> this case, recognize it in the backend, and do not create a
>>>>>>>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>>>>>>>
>>>>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>>>>>>>> false, then we replace mask with mask != {0}.
>>>>>>>>>>>>
>>>>>>>>>>>> So we end-up with the following functionality:
>>>>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>>>>>>>> mask != {0}.
>>>>>>>>>>>>
>>>>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>>>>>>>> which correctly expands, without creating useless masking.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Basically for me there are two questions:
>>>>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>>>>>>>> But first is it conceptually fine.
>>>>>>>>>>>>
>>>>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>>>>>>>> which I find inappropriate, and the functionality gets more
>>>>>>>>>>>> complicated.
>>>>>>>>>>>
>>>>>>>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>>>>>>>> separates comparison and ?: expression it's his own fault.
>>>>>>>>>>
>>>>>>>>>> Well, because all the information is there, and I perfectly envision
>>>>>>>>>> the case when user expressed comparison separately, just to avoid code
>>>>>>>>>> duplication.
>>>>>>>>>>
>>>>>>>>>> Like:
>>>>>>>>>> mask = a > b;
>>>>>>>>>> res1 = mask ? v0 : v1;
>>>>>>>>>> res2 = mask ? v2 : v3;
>>>>>>>>>>
>>>>>>>>>> Which in this case would be different from
>>>>>>>>>> res1 = a > b ? v0 : v1;
>>>>>>>>>> res2 = a > b ? v2 : v3;
>>>>>>>>>>
>>>>>>>>>>> Btw, the new hook is still in the patch.
>>>>>>>>>>>
>>>>>>>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>>>>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>>>>>>>> non-comparison operands during expansion though, but if
>>>>>>>>>>> you always forced a != 0 from the start you can then simply
>>>>>>>>>>> interpret it as a proper comparison result (in which case I'd
>>>>>>>>>>> modify the backends to have an alternate pattern or directly
>>>>>>>>>>> expand to masking operations - using the fake comparison
>>>>>>>>>>> RTX is too much of a hack).
>>>>>>>>>>
>>>>>>>>>> Richard, I think you didn't get the problem.
>>>>>>>>>> I really need the way, to pass the information, that the expression
>>>>>>>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>>>>>>>> for quite a while and this hack is the only answer I found, is there a
>>>>>>>>>> better way to do that. I could for example introduce another
>>>>>>>>>> tree-node, but it would be overkill as well.
>>>>>>>>>>
>>>>>>>>>> Now why do I need it so much:
>>>>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>>>>>>>> always). To check the stuff, I use is_vector_comparison in
>>>>>>>>>> tree-vect-generic.
>>>>>>>>>>
>>>>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>>>>>>>> mask.
>>>>>>>>>>
>>>>>>>>>> If this link between optabs and backend breaks, then the patch falls
>>>>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>>>>>>>> function-comparison, or somehow else?
>>>>>>>>>>
>>>>>>>>>> So what would be an appropriate way to connect optabs and the backend?
>>>>>>>>>
>>>>>>>>> Well, there is no problem in having the only valid mask operand for
>>>>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>>>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>>>>>>>> vec1 : vec2.
>>>>>>>>
>>>>>>>> This happens already in the new version of patch (not submitted yet).
>>>>>>>>
>>>>>>>>> This comparison can be eliminated by optimization passes
>>>>>>>>> that
>>>>>>>>> either replace it by the real comparison computing the mask or just
>>>>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>>>>>>>> dropping the comparison against zero.
>>>>>>>>
>>>>>>>> This is not a problem, because the backend recognizes these patterns,
>>>>>>>> so no optimization is needed in this part.
>>>>>>>
>>>>>>> I mean for
>>>>>>>
>>>>>>>  mask = v1 < v2 ? {-1,...} : {0,...};
>>>>>>>  val = VCOND_EXPR <mask != 0, v3, v4>;
>>>>>>>
>>>>>>> optimizers can see how mask is defined and drop the != 0 test or replace
>>>>>>> it by v1 < v2.
>>>>>>
>>>>>> Yes, sure.
>>>>>>
>>>>>>>>> For the backends I'd have vcond patterns for both an embedded comparison
>>>>>>>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>>>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>>>>>>>> selection - what matters is the C frontend semantics which we need to
>>>>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>>>>>>>> do not have to agree).
>>>>>>>>
>>>>>>>> But it seems like another combinatorial explosion here. Considering
>>>>>>>> what Richard said in his e-mail, in order to support "generic" vcond,
>>>>>>>> we just need to enumerate all the possible cases. Or I didn't
>>>>>>>> understand right?
>>>>>>>
>>>>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern
>>>>>>> semantically does for a non-comparison operand.  I'd argue that using
>>>>>>> the bitwise selection semantic gives us maximum flexibility and a native
>>>>>>> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
>>>>>>> also easy to implement in the middle-end expansion code if there is
>>>>>>> no native instruction for it - by simply emitting the bitwise operations.
>>>>>>>
>>>>>>> But I have the feeling we are talking past each other ...?
>>>>>>
>>>>>> I am all for the bitwise behaviour in the backend pattern, that is
>>>>>> something that I rely on at the moment. What I don't want to have is
>>>>>> the same behaviour in the frontend. So If we can guarantee, that we
>>>>>> add != 0, when we don't know the "nature" of the mask, then I am
>>>>>> perfectly fine with the back-end having bitwise-selection behaviour.
>>>>>
>>>>> Well, the C frontend would simply always add that != 0 (because it
>>>>> doesn't know).
>>>>>
>>>>>>>> I mean, I don't mind of course, but it seems to me that it would be
>>>>>>>> cleaner to have one generic enough pattern.
>>>>>>>>
>>>>>>>> Is there seriously no way to pass something from optab into the backend??
>>>>>>>
>>>>>>> You can pass operands.  And information is implicitly encoded in the name.
>>>>>>
>>>>>> I didn't quite get that, could you give an example?
>>>>>
>>>>> It was a larger variant of "no, apart from what is obvious".
>>>>
>>>> Ha, joking again. :)
>>>>
>>>>>>>>> If the mask is computed by a function you are of course out of luck,
>>>>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>>>>>>>
>>>>>>>> Well, take simpler example
>>>>>>>>
>>>>>>>> a = {0};
>>>>>>>> for ( ; *p; p += 16)
>>>>>>>>  a &= pattern > (vec)*p;
>>>>>>>>
>>>>>>>> res = a ? v0 : v1;
>>>>>>>>
>>>>>>>> In this case it is simple to analyse that a is a comparison, but you
>>>>>>>> cannot embed the operations of a into VEC_COND_EXPR.
>>>>>>>
>>>>>>> Sure, but if the above is C source the frontend would generate
>>>>>>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>>>>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>>>>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>>>>>>> vector contents though).
>>>>>>
>>>>>> Yeah, sure. My point is, that we must be able to pass this information
>>>>>> in the backend, that we checked everything, and we are sure that a is
>>>>>> a corerct mask, please don't add any != 0 to it.
>>>>>
>>>>> But all masks are correct as soon as they appear in a VEC_COND_EXPR.
>>>>> That's the whole point of the bitwise semantics.  It's only the C frontend
>>>>> that needs to be careful to impose its stricter semantics.
>>>>>
>>>>> Richard.
>>>>>
>>>>
>>>> Ok, I see the last difference in the approaches we envision.
>>>> I am assuming, that frontend does not put != 0, but the later
>>>> optimisations (veclower in my case) check every mask in VEC_COND_EXPR
>>>> and does the same functionality as you describe. So the philosophical
>>>> question why it is better to first add and then remove, rather than
>>>> just add if needed?
>>>
>>> Well, it's "better be right than sorry".  Thus, default to the
>>> conservatively correct
>>> way and let optimizers "optimize" it.
>>
>> How can we get sorry, it is impossible to skip the vcond during the
>> optimisation, but whatever, it is not really so important when to add.
>> Currently I have a bigger problem, see below.
>>>
>>>> In all the rest I think we agreed.
>>>
>>> Fine.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>> Artem.
>>>>
>>>
>>
>> I found out that I cannot really gimplify correctly the vcond<a >b ,
>> c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is
>> that gimplifier pulls a > b always as a separate expression during the
>> gimplification, and I don't think that we can avoid it. So what
>> happens is:
>>
>> vcond <a > b , c , d>
>> transformed to
>> x = b > c;
>> x1 = vcond <x , -1, 0>
>> vcond <x1, c, d>
>>
>> and so on, infinitely long.
>
> Sounds like a bug that is possible to fix.
>
>> In order to fix the problem we need whether to introduce a new code
>> like VEC_COMP_LT, VEC_COMP_GT, and so on
>> whether a builtin function which we would lower
>> whether stick back to the idea of hook.
>>
>> Anyway, representing a >b using vcond does not work.
>
> Well, sure it will work, it just needs some work appearantly.
>
>> What would be your thinking here?
>
> Do you have a patch that exposes this problem?  I can have a look
> tomorrow.
>
> Richard.
>
>>
>> Thanks,
>> Artem.
>>
>

[-- Attachment #2: vcond-gimplify-problem.diff --]
[-- Type: text/plain, Size: 68651 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 177665)
+++ gcc/doc/extend.texi	(working copy)
@@ -6553,6 +6553,97 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+In C vector comparison is supported within standard comparison operators:
+@code{==, !=, <, <=, >, >=}. Both integer-type and real-type vectors
+can be compared but only of the same type. The result of the
+comparison is a signed integer-type vector where the size of each
+element must be the same as the size of compared vectors element.
+Comparison is happening element by element. False value is 0, true
+value is -1 (constant of the appropriate type where all bits are set).
+Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
+c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
+@end smallexample
+
+In addition to the vector comparison C supports conditional expressions
+where the condition is a vector of signed integers. In that case result
+of the condition is used as a mask to select either from the first 
+operand or from the second. Consider the following example:
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,7@};
+v4si c = @{2,3,4,5@};
+v4si d = @{6,7,8,9@};
+v4si res;
+
+res = a >= b ? c : d;  /* res would contain @{6, 3, 4, 9@}  */
+@end smallexample
+
+The number of elements in the condition must be the same as number of
+elements in the both operands. The same stands for the size of the type
+of the elements. The type of the vector conditional is determined by
+the types of the operands which must be the same. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+typedef float v4f __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{2,3,4,5@};
+v4f f = @{1.,  5., 7., -8.@};
+v4f g = @{3., -2., 8.,  1.@};
+v4si ires;
+v4f fres;
+
+fres = a <= b ? f : g;  /* fres would contain @{1., 5., 7., -8.@}  */
+ires = f <= g ? a : b;  /* fres would contain @{1,  3,  3,   4@}  */
+@end smallexample
+
+For the convenience condition in the vector conditional can be just a
+vector of signed integer type. In that case this vector is implicitly
+compared with vectors of zeroes. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+
+ires = a ? b : a;  /* synonym for ires = a != @{0,0,0,0@} ? a :b;  */
+@end smallexample
+
+Pleas note that the conditional where the operands are vectors and the
+condition is integer works in a standard way -- returns first operand
+if the condition is true and second otherwise. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+int x,y;
+
+/* standard conditional returning A or B  */
+ires = x > y ? a : b;  
+
+/* vector conditional where the condition is (x > y ? a : b)  */
+ires = (x > y ? a : b) ? b : a; 
+@end smallexample
+
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 177665)
+++ gcc/doc/tm.texi	(working copy)
@@ -5738,6 +5738,10 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_COMPARE (gimple_stmt_iterator *@var{gsi}, tree @var{type}, tree @var{v0}, tree @var{v1}, enum tree_code @var{code})
+This hook should check whether it is possible to express vectorcomparison using the hardware-specific instructions and return resulttree. Hook should return NULL_TREE if expansion is impossible.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_PERM (tree @var{type}, tree *@var{mask_element_type})
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 177665)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -5676,6 +5676,8 @@ misalignment value (@var{misalign}).
 Return true if vector alignment is reachable (by peeling N iterations) for the given type.
 @end deftypefn
 
+@hook TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+
 @hook TARGET_VECTORIZE_BUILTIN_VEC_PERM
 Target builtin that implements vector permute.
 @end deftypefn
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 177665)
+++ gcc/targhooks.c	(working copy)
@@ -969,6 +969,18 @@ default_builtin_vector_alignment_reachab
   return true;
 }
 
+/* Replaces vector comparison with the target-specific instructions 
+   and returns the resulting variable or NULL_TREE otherwise.  */
+tree 
+default_builtin_vec_compare (gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, 
+                             tree type ATTRIBUTE_UNUSED, 
+                             tree v0 ATTRIBUTE_UNUSED, 
+                             tree v1 ATTRIBUTE_UNUSED, 
+                             enum tree_code code ATTRIBUTE_UNUSED)
+{
+  return NULL_TREE;
+}
+
 /* By default, assume that a target supports any factor of misalignment
    memory access if it supports movmisalign patten.
    is_packed is true if the memory access is defined in a packed struct.  */
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 177665)
+++ gcc/targhooks.h	(working copy)
@@ -86,6 +86,11 @@ extern int default_builtin_vectorization
 extern tree default_builtin_reciprocal (unsigned int, bool, bool);
 
 extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
+
+extern tree default_builtin_vec_compare (gimple_stmt_iterator *gsi, 
+                                         tree type, tree v0, tree v1, 
+                                         enum tree_code code);
+
 extern bool
 default_builtin_support_vector_misalignment (enum machine_mode mode,
 					     const_tree,
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 177665)
+++ gcc/target.def	(working copy)
@@ -988,6 +988,15 @@ DEFHOOK
  bool, (tree vec_type, tree mask),
  hook_bool_tree_tree_true)
 
+/* Implement hardware vector comparison or return false.  */
+DEFHOOK
+(builtin_vec_compare,
+ "This hook should check whether it is possible to express vector\
+comparison using the hardware-specific instructions and return result\
+tree. Hook should return NULL_TREE if expansion is impossible.",
+ tree, (gimple_stmt_iterator *gsi, tree type, tree v0, tree v1, enum tree_code code),
+ default_builtin_vec_compare)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 177665)
+++ gcc/optabs.c	(working copy)
@@ -6572,16 +6572,37 @@ expand_vec_cond_expr (tree vec_cond_type
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (op0, unsignedp, icode);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
+  
+  if (COMPARISON_CLASS_P (op0))
+    {
+      comparison = vector_compare_rtx (op0, unsignedp, icode);
+      create_output_operand (&ops[0], target, mode);
+      create_input_operand (&ops[1], rtx_op1, mode);
+      create_input_operand (&ops[2], rtx_op2, mode);
+      create_fixed_operand (&ops[3], comparison);
+      create_fixed_operand (&ops[4], XEXP (comparison, 0));
+      create_fixed_operand (&ops[5], XEXP (comparison, 1));
+
+    }
+  else
+    {
+      rtx rtx_op0;
+      rtx vec; 
+    
+      rtx_op0 = expand_normal (op0);
+      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX); 
+      vec = CONST0_RTX (mode);
+
+      create_output_operand (&ops[0], target, mode);
+      create_input_operand (&ops[1], rtx_op1, mode);
+      create_input_operand (&ops[2], rtx_op2, mode);
+      create_input_operand (&ops[3], comparison, mode);
+      create_input_operand (&ops[4], rtx_op0, mode);
+      create_input_operand (&ops[5], vec, mode);
+    }
 
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_fixed_operand (&ops[3], comparison);
-  create_fixed_operand (&ops[4], XEXP (comparison, 0));
-  create_fixed_operand (&ops[5], XEXP (comparison, 1));
   expand_insn (icode, 6, ops);
   return ops[0].value;
 }
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 177665)
+++ gcc/target.h	(working copy)
@@ -51,6 +51,7 @@
 #define GCC_TARGET_H
 
 #include "insn-modes.h"
+#include "gimple.h"
 
 #ifdef ENABLE_CHECKING
 
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 177665)
+++ gcc/fold-const.c	(working copy)
@@ -5930,12 +5930,21 @@ extract_muldiv_1 (tree t, tree c, enum t
 }
 \f
 /* Return a node which has the indicated constant VALUE (either 0 or
-   1), and is of the indicated TYPE.  */
+   1 for scalars and is either {-1,-1,..} or {0,0,...} for vectors), 
+   and is of the indicated TYPE.  */
 
 tree
 constant_boolean_node (int value, tree type)
 {
-  if (type == integer_type_node)
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      tree tval;
+      
+      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
+      tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0);
+      return build_vector_from_val (type, tval);
+    }
+  else if (type == integer_type_node)
     return value ? integer_one_node : integer_zero_node;
   else if (type == boolean_type_node)
     return value ? boolean_true_node : boolean_false_node;
@@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
      floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
     {
+      tree arg0_type = TREE_TYPE (arg0);
+      
       switch (code)
 	{
 	case EQ_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	  if (! FLOAT_TYPE_P (arg0_type)
+	      || ! HONOR_NANS (TYPE_MODE (arg0_type)))
 	    return constant_boolean_node (1, type);
 	  break;
 
 	case GE_EXPR:
 	case LE_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	  if (! FLOAT_TYPE_P (arg0_type)
+	      || ! HONOR_NANS (TYPE_MODE (arg0_type)))
 	    return constant_boolean_node (1, type);
 	  return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);
 
 	case NE_EXPR:
 	  /* For NE, we can only do this simplification if integer
 	     or we don't honor IEEE floating point NaNs.  */
-	  if (FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      && HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	  if (FLOAT_TYPE_P (arg0_type)
+	      && HONOR_NANS (TYPE_MODE (arg0_type)))
 	    break;
 	  /* ... fall through ...  */
 	case GT_EXPR:
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
@@ -0,0 +1,78 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(count, res, i0, i1, c0, c1, op, fmt0, fmt1) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if ((res)[__i] != \
+                ((i0)[__i] op (i1)[__i]  \
+		? (c0)[__i] : (c1)[__i]))  \
+	{ \
+            __builtin_printf (fmt0 " != (" fmt1 " " #op " " fmt1 " ? " \
+			      fmt0 " : " fmt0 ")", \
+	    (res)[__i], (i0)[__i], (i1)[__i],\
+	    (c0)[__i], (c1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, c0, c1, res, fmt0, fmt1); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >, fmt0, fmt1); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >=, fmt0, fmt1); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <, fmt0, fmt1); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <=, fmt0, fmt1); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, ==, fmt0, fmt1); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, !=, fmt0, fmt1); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+  vector (4, int) i0 = {argc, 1,  2,  10}; 
+  vector (4, int) i1 = {0, argc, 2, (int)-23};
+  vector (4, int) ires;
+  vector (4, float) f0 = {1., 7., (float)argc, 4.};
+  vector (4, float) f1 = {6., 2., 8., (float)argc};
+  vector (4, float) fres;
+
+  vector (2, double) d0 = {1., (double)argc};
+  vector (2, double) d1 = {6., 2.};
+  vector (2, double) dres;
+  vector (2, long) l0 = {argc, 3};
+  vector (2, long) l1 = {5,  8};
+  vector (2, long) lres;
+  
+  /* Thes tests work fine.  */
+  test (4, i0, i1, f0, f1, fres, "%f", "%i");
+  test (4, f0, f1, i0, i1, ires, "%i", "%f");
+  test (2, d0, d1, l0, l1, lres, "%i", "%f");
+  test (2, l0, l1, d0, d1, dres, "%f", "%i");
+
+  /* Condition expressed with a single variable.  */
+  dres = l0 ? d0 : d1;
+  check_compare (2, dres, l0, ((vector (2, long)){-1,-1}), d0, d1, ==, "%f", "%i");
+  
+  lres = l1 ? l0 : l1;
+  check_compare (2, lres, l1, ((vector (2, long)){-1,-1}), l0, l1, ==, "%i", "%i");
+ 
+  fres = i0 ? f0 : f1;
+  check_compare (4, fres, i0, ((vector (4, int)){-1,-1,-1,-1}), 
+		 f0, f1, ==, "%f", "%i");
+
+  ires = i1 ? i0 : i1;
+  check_compare (4, ires, i1, ((vector (4, int)){-1,-1,-1,-1}), 
+		 i0, i1, ==, "%i", "%i");
+
+  return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
@@ -0,0 +1,123 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define check_compare(count, res, i0, i1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+      if ((res)[__i] != ((i0)[__i] op (i1)[__i] ? -1 : 0)) \
+	{ \
+            __builtin_printf ("%i != ((" fmt " " #op " " fmt " ? -1 : 0) ", \
+			      (res)[__i], (i0)[__i], (i1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, res, fmt); \
+do { \
+    res = (v0 > v1); \
+    check_compare (count, res, v0, v1, >, fmt); \
+    res = (v0 < v1); \
+    check_compare (count, res, v0, v1, <, fmt); \
+    res = (v0 >= v1); \
+    check_compare (count, res, v0, v1, >=, fmt); \
+    res = (v0 <= v1); \
+    check_compare (count, res, v0, v1, <=, fmt); \
+    res = (v0 == v1); \
+    check_compare (count, res, v0, v1, ==, fmt); \
+    res = (v0 != v1); \
+    check_compare (count, res, v0, v1, !=, fmt); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0;
+    vector (4, INT) i1;
+    vector (4, int) ires;
+    int i;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, i0, i1, ires, "%i");
+#undef INT
+
+#define INT unsigned int 
+    vector (4, int) ures;
+    vector (4, INT) u0;
+    vector (4, INT) u1;
+
+    u0 = (vector (4, INT)){argc, 1,  2,  10};
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, u0, u1, ures, "%u");
+#undef INT
+
+
+#define SHORT short
+    vector (8, SHORT) s0;
+    vector (8, SHORT) s1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, s0, s1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;
+    vector (8, SHORT) us1;
+    vector (8, short) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, us0, us1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;
+    vector (16, CHAR) c1;
+    vector (16, signed char) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, c0, c1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;
+    vector (16, CHAR) uc1;
+    vector (16, signed char) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, uc0, uc1, ucres, "%u");
+#undef CHAR
+/* Float comparison.  */
+    vector (4, float) f0;
+    vector (4, float) f1;
+    vector (4, int) ifres;
+
+    f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
+    f1 = (vector (4, float)){0., 3., 2., (float)-23};    
+    test (4, f0, f1, ifres, "%f");
+    
+/* Double comparison.  */
+    vector (2, double) d0;
+    vector (2, double) d1;
+    vector (2, long) idres;
+
+    d0 = (vector (2, double)){(double)argc,  10.};
+    d1 = (vector (2, double)){0., (double)-23};    
+    test (2, d0, d1, idres, "%f");
+
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
@@ -0,0 +1,154 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(type, count, res, i0, i1, c0, c1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if (vidx (type, res, __i) != \
+                ((vidx (type, i0, __i) op vidx (type, i1, __i))  \
+		? vidx (type, c0, __i) : vidx (type, c1, __i)))  \
+	{ \
+            __builtin_printf (fmt " != ((" fmt " " #op " " fmt ") ? " fmt " : " fmt ")", \
+	    vidx (type, res, __i), vidx (type, i0, __i), vidx (type, i1, __i),\
+	    vidx (type, c0, __i), vidx (type, c1, __i)); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(type, count, v0, v1, c0, c1, res, fmt); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >, fmt); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >=, fmt); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <, fmt); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <=, fmt); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, ==, fmt); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, !=, fmt); \
+} while (0)
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0; vector (4, INT) i1;
+    vector (4, INT) ic0; vector (4, INT) ic1;
+    vector (4, INT) ires;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    ic0 = (vector (4, INT)){1, argc,  argc,  10};
+    ic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, i0, i1, ic0, ic1, ires, "%i");
+#undef INT
+
+#define INT  unsigned int
+    vector (4, INT) ui0; vector (4, INT) ui1;
+    vector (4, INT) uic0; vector (4, INT) uic1;
+    vector (4, INT) uires;
+
+    ui0 = (vector (4, INT)){argc, 1,  2,  10};
+    ui1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    uic0 = (vector (4, INT)){1, argc,  argc,  10};
+    uic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, ui0, ui1, uic0, uic1, uires, "%u");
+#undef INT
+
+#define SHORT short
+    vector (8, SHORT) s0;   vector (8, SHORT) s1;
+    vector (8, SHORT) sc0;   vector (8, SHORT) sc1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    sc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    sc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, s0, s1, sc0, sc1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;   vector (8, SHORT) us1;
+    vector (8, SHORT) usc0;   vector (8, SHORT) usc1;
+    vector (8, SHORT) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    usc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    usc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, us0, us1, usc0, usc1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;   vector (16, CHAR) c1;
+    vector (16, CHAR) cc0;   vector (16, CHAR) cc1;
+    vector (16, CHAR) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    cc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    cc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, c0, c1, cc0, cc1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;   vector (16, CHAR) uc1;
+    vector (16, CHAR) ucc0;   vector (16, CHAR) ucc1;
+    vector (16, CHAR) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    ucc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    ucc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, uc0, uc1, ucc0, ucc1, ucres, "%u");
+#undef CHAR
+
+/* Float version.  */
+   vector (4, float) f0 = {1., 7., (float)argc, 4.};
+   vector (4, float) f1 = {6., 2., 8., (float)argc};
+   vector (4, float) fc0 = {3., 12., 4., (float)argc};
+   vector (4, float) fc1 = {7., 5., (float)argc, 6.};
+   vector (4, float) fres;
+
+   test (float, 4, f0, f1, fc0, fc1, fres, "%f");
+
+/* Double version.  */
+   vector (2, double) d0 = {1., (double)argc};
+   vector (2, double) d1 = {6., 2.};
+   vector (2, double) dc0 = {(double)argc, 7.};
+   vector (2, double) dc1 = {7., 5.};
+   vector (2, double) dres;
+
+   //test (double, 2, d0, d1, dc0, dc1, dres, "%f");
+
+
+   return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+/* Check that constant folding in 
+   these simple cases works.  */
+vector (4, int)
+foo (vector (4, int) x)
+{
+  return   (x == x) + (x != x) + (x >  x) 
+	 + (x <  x) + (x >= x) + (x <= x);
+}
+
+int 
+main (int argc, char *argv[])
+{
+  vector (4, int) t = {argc, 2, argc, 42};
+  vector (4, int) r;
+  int i;
+
+  r = foo (t);
+
+  for (i = 0; i < 4; i++)
+    if (r[i] != -3)
+      __builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+void
+foo (vector (4, int) x, vector (4, float) y)
+{
+  vector (4, int) p4;
+  vector (4, int) r4;
+  vector (4, unsigned int) q4;
+  vector (8, int) r8;
+  vector (4, float) f4;
+  
+  r4 = x > y;	    /* { dg-error "comparing vectors with different element types" } */
+  r8 = (x != p4);   /* { dg-error "incompatible types when assigning to type" } */
+  r8 == r4;	    /* { dg-error "comparing vectors with different number of elements" } */
+
+  r4 ? y : p4;	    /* { dg-error "vectors of different types involved in vector comparison" } */
+  r4 ? r4 : r8;	    /* { dg-error "vectors of different length found in vector comparison" } */
+  y ? f4 : y;	    /* { dg-error "non-integer type in vector condition" } */
+  
+  /* Do not trigger that  */
+  q4 ? p4 : r4;	    /* { "vector comparison must be of signed integer vector type" } */
+}
Index: gcc/testsuite/gcc.dg/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */   
+
+/* Test if C_MAYBE_CONST are folded correctly when 
+   creating VEC_COND_EXPR.  */
+
+typedef int vec __attribute__((vector_size(16)));
+
+vec i,j;
+extern vec a, b, c;
+
+vec 
+foo (int x)
+{
+  return (x ? i : j) ? a : b;
+}
+
+vec 
+bar (int x)
+{
+  return a ? (x ? i : j) : b;
+}
+
+vec 
+baz (int x)
+{
+  return a ? b : (x ? i : j);
+}
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 177665)
+++ gcc/c-typeck.c	(working copy)
@@ -4009,6 +4009,52 @@ ep_convert_and_check (tree type, tree ex
   return convert (type, expr);
 }
 
+static tree
+fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
+{
+  bool wrap = true;
+  bool maybe_const = false;
+  tree vcond, tmp;
+
+  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
+  tmp = c_fully_fold (ifexp, false, &maybe_const);
+  ifexp = save_expr (tmp);
+  wrap &= maybe_const;
+  
+  tmp = c_fully_fold (op1, false, &maybe_const);
+  op1 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  tmp = c_fully_fold (op2, false, &maybe_const);
+  op2 = save_expr (tmp);
+  wrap &= maybe_const;
+
+  /* Currently the expansion of VEC_COND_EXPR does not allow
+     expessions where the type of vectors you compare differs
+     form the type of vectors you select from. For the time
+     being we insert implicit conversions.  */
+  if ((COMPARISON_CLASS_P (ifexp)
+       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
+      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
+    {
+      tree comp_type = COMPARISON_CLASS_P (ifexp)
+		       ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
+		       : TREE_TYPE (ifexp);
+      
+      op1 = convert (comp_type, op1);
+      op2 = convert (comp_type, op2);
+      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
+      vcond = convert (TREE_TYPE (op1), vcond);
+    }
+  else
+    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
+
+  /*if (!wrap)
+    vcond = c_wrap_maybe_const (vcond, true);*/
+
+  return vcond;
+}
+
 /* Build and return a conditional expression IFEXP ? OP1 : OP2.  If
    IFEXP_BCP then the condition is a call to __builtin_constant_p, and
    if folded to an integer constant then the unselected half may
@@ -4058,6 +4104,49 @@ build_conditional_expr (location_t colon
   type2 = TREE_TYPE (op2);
   code2 = TREE_CODE (type2);
 
+  if (TREE_CODE (TREE_TYPE (ifexp)) == VECTOR_TYPE)
+    {
+      if (TREE_CODE (type1) != VECTOR_TYPE
+	  || TREE_CODE (type2) != VECTOR_TYPE)
+        {
+          error_at (colon_loc, "vector comparison arguments must be of "
+                               "type vector");
+          return error_mark_node;
+        }
+
+      if (TREE_CODE (TREE_TYPE (TREE_TYPE (ifexp))) != INTEGER_TYPE)
+        {
+          error_at (colon_loc, "non-integer type in vector condition");
+          return error_mark_node;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
+          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
+             != TYPE_VECTOR_SUBPARTS (type1))
+        {
+          error_at (colon_loc, "vectors of different length found in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+      
+      if (TREE_TYPE (type1) != TREE_TYPE (type2))
+        {
+          error_at (colon_loc, "vectors of different types involved in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+
+      if (TYPE_SIZE (TREE_TYPE (TREE_TYPE (ifexp))) 
+          != TYPE_SIZE (TREE_TYPE (type1)))
+        {
+          error_at (colon_loc, "vector-condition element type must be "
+                               "the same as result vector element type");
+          return error_mark_node;
+        }
+      
+      return fold_build_vec_cond_expr (ifexp, op1, op2);
+    }
+
   /* C90 does not permit non-lvalue arrays in conditional expressions.
      In C99 they will be pointers by now.  */
   if (code1 == ARRAY_TYPE || code2 == ARRAY_TYPE)
@@ -9906,6 +9995,37 @@ build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          /*break;  */
+
+	  ret = fold_build_vec_cond_expr 
+		       (build2 (code, result_type, op0, op1), 
+			build_vector_from_val (result_type,
+					       build_int_cst (intt, -1)),
+			build_vector_from_val (result_type,
+					       build_int_cst (intt,  0)));
+	  goto return_build_binary_op;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -10018,6 +10138,37 @@ build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          /* break; */
+	  ret = fold_build_vec_cond_expr 
+		       (build2 (code, result_type, op0, op1), 
+			build_vector_from_val (result_type,
+					       build_int_cst (intt, -1)),
+			build_vector_from_val (result_type,
+					       build_int_cst (intt,  0)));
+	  goto return_build_binary_op;
+
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10425,6 +10576,10 @@ c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 177665)
+++ gcc/gimplify.c	(working copy)
@@ -7064,6 +7064,22 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
+        case VEC_COND_EXPR:
+	  {
+	    enum gimplify_status r0, r1, r2;
+
+	    r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+				post_p, is_gimple_condexpr, fb_rvalue);
+	    r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    recalculate_side_effects (*expr_p);
+
+	    ret = MIN (r0, MIN (r1, r2));
+	  }
+	  break;
+
 	case TARGET_MEM_REF:
 	  {
 	    enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE;
@@ -7348,6 +7364,36 @@ gimplify_expr (tree *expr_p, gimple_seq
 		{
 		  tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1));
 
+		  /* Vector comparisons is a valid gimple expression
+		     which could be lowered down later.  */
+		  if (TREE_CODE (type) == VECTOR_TYPE)
+		    {
+		      goto expr_2;
+		      /* XXX my humble attempt to avoid comparisons.
+		      enum gimplify_status r0, r1;
+		      tree t, f;
+
+		      debug_tree (*expr_p);
+
+		      r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+					  post_p, is_gimple_condexpr, fb_rvalue);
+		      r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+					  post_p, is_gimple_val, fb_rvalue);
+		      
+		      t = build_vector_from_val (TREE_TYPE (*expr_p),
+				    build_int_cst (TREE_TYPE (TREE_TYPE (*expr_p)), -1));
+		      f = build_vector_from_val (TREE_TYPE (*expr_p),
+				    build_int_cst (TREE_TYPE (TREE_TYPE (*expr_p)), 0));
+
+		      recalculate_side_effects (*expr_p);  
+		      t = build3 (VEC_COND_EXPR, TREE_TYPE (*expr_p), *expr_p, t, f);
+		      *expr_p = t;
+										
+		      ret = MIN (r0, r1);
+		      break;*/
+		    }
+
+
 		  if (!AGGREGATE_TYPE_P (type))
 		    {
 		      tree org_type = TREE_TYPE (*expr_p);
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 177665)
+++ gcc/tree.def	(working copy)
@@ -704,7 +704,10 @@ DEFTREECODE (TRUTH_NOT_EXPR, "truth_not_
    The others are allowed only for integer (or pointer or enumeral)
    or real types.
    In all cases the operands will have the same type,
-   and the value is always the type used by the language for booleans.  */
+   and the value is either the type used by the language for booleans
+   or an integer vector type of the same size and with the same number
+   of elements as the comparison operands.  True for a vector of
+   comparison results has all bits set while false is equal to zero.  */
 DEFTREECODE (LT_EXPR, "lt_expr", tcc_comparison, 2)
 DEFTREECODE (LE_EXPR, "le_expr", tcc_comparison, 2)
 DEFTREECODE (GT_EXPR, "gt_expr", tcc_comparison, 2)
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 177665)
+++ gcc/emit-rtl.c	(working copy)
@@ -5474,6 +5474,11 @@ gen_const_vector (enum machine_mode mode
   return tem;
 }
 
+rtx
+gen_const_vector1 (enum machine_mode mode, int constant)
+{
+  return gen_const_vector (mode, constant);
+}
 /* Generate a vector like gen_rtx_raw_CONST_VEC, but use the zero vector when
    all elements are zero, and the one vector when all elements are one.  */
 rtx
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	(revision 177665)
+++ gcc/tree-ssa-forwprop.c	(working copy)
@@ -585,6 +585,128 @@ forward_propagate_into_cond (gimple_stmt
   return 0;
 }
 
+
+static tree
+combine_vec_cond_expr_cond (location_t loc, enum tree_code code, 
+			    tree type, tree op0, tree op1)
+{
+  tree t;
+
+  if (op0 == NULL_TREE && op1 == NULL_TREE)
+    return NULL_TREE;
+
+  if (op0 == NULL_TREE)
+    return op1;
+
+  if (op1 == NULL_TREE)
+    return op0;
+
+  gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
+
+  t = fold_binary_loc (loc, code, type, op0, op1);
+  if (!t)
+    return NULL_TREE;
+
+  /* Require that we got a boolean type out if we put one in.  */
+  gcc_assert (TREE_CODE (TREE_TYPE (t)) == TREE_CODE (type));
+
+  /* Canonicalize the combined condition for use in a COND_EXPR.  */
+  /* t = canonicalize_cond_expr_cond (t); */
+
+  /* Bail out if we required an invariant but didn't get one.  */
+  if (!t)
+    return NULL_TREE;
+
+  return t;
+}
+
+
+
+static tree
+forward_propagate_into_vec_comp (location_t loc, tree expr)
+{
+  tree tmp = NULL_TREE;
+  tree rhs0 = NULL_TREE, rhs1 = NULL_TREE;
+  bool single_use0_p = false, single_use1_p = false;
+
+  /* For comparisons use the first operand, that is likely to
+     simplify comparisons against constants.  */
+  /* debug_tree (expr);  */
+
+  if (TREE_CODE (expr) == VEC_COND_EXPR)
+    {
+      tree type = TREE_TYPE (expr);
+      tree lhs = forward_propagate_into_vec_comp (loc, TREE_OPERAND (expr, 0));
+      tree rhs = forward_propagate_into_vec_comp (loc, TREE_OPERAND (expr, 1));
+
+      return combine_vec_cond_expr_cond (loc, TREE_CODE (expr), 
+					type, lhs, rhs);
+    }
+  else if (TREE_CODE (expr) == SSA_NAME)
+    {
+      gimple def_stmt = get_prop_source_stmt (expr, false, &single_use0_p);
+      if (def_stmt && can_propagate_from (def_stmt))
+	{
+	  expr = rhs_to_tree (TREE_TYPE (expr), def_stmt);
+	  return forward_propagate_into_vec_comp (loc, expr);
+	}
+      else
+	return tmp;
+    }
+
+  return tmp;
+}
+
+
+
+
+/* The same as forward_propogate_into_cond only for vector conditions.  */
+static int
+forward_propagate_into_vec_cond (gimple_stmt_iterator *gsi_p)
+{
+  gimple stmt = gsi_stmt (*gsi_p);
+  location_t loc = gimple_location (stmt);
+  tree tmp = NULL_TREE;
+  tree cond = gimple_assign_rhs1 (stmt);
+
+  /* We can do tree combining on SSA_NAME and comparison expressions.  */
+  if (TREE_CODE (cond) == VEC_COND_EXPR)
+    tmp = forward_propagate_into_vec_comp (loc, cond);
+  else if (TREE_CODE (cond) == SSA_NAME)
+    {
+      tree name = cond, rhs0;
+      gimple def_stmt = get_prop_source_stmt (name, true, NULL);
+      if (!def_stmt || !can_propagate_from (def_stmt))
+	return 0;
+
+      rhs0 = gimple_assign_rhs1 (def_stmt);
+      tmp = forward_propagate_into_vec_comp (loc, rhs0);
+    }
+
+  /* XXX Don't change anything for the time being.  */
+  tmp = NULL_TREE;
+
+  if (tmp)
+    {
+      if (tmp)
+	{
+	  fprintf (dump_file, "  Replaced '");
+	  print_generic_expr (dump_file, cond, 0);
+	  fprintf (dump_file, "' with '");
+	  print_generic_expr (dump_file, tmp, 0);
+	  fprintf (dump_file, "'\n");
+	}
+
+      gimple_assign_set_rhs_from_tree (gsi_p, unshare_expr (tmp));
+      stmt = gsi_stmt (*gsi_p);
+      update_stmt (stmt);
+
+      return is_gimple_min_invariant (tmp) ? 2 : 1;
+    }
+
+  return 0;
+}
+
 /* We've just substituted an ADDR_EXPR into stmt.  Update all the
    relevant data structures to match.  */
 
@@ -2445,6 +2567,20 @@ ssa_forward_propagate_and_combine (void)
 		    stmt = gsi_stmt (gsi);
 		    if (did_something == 2)
 		      cfg_changed = true;
+		    fold_undefer_overflow_warnings
+		      (!TREE_NO_WARNING (rhs1) && did_something, stmt,
+		       WARN_STRICT_OVERFLOW_CONDITIONAL);
+		    changed = did_something != 0;
+		  }
+		else if (code == VEC_COND_EXPR)
+		  {
+		    /* In this case the entire VEC_COND_EXPR is in rhs1. */
+		    int did_something;
+		    fold_defer_overflow_warnings ();
+		    did_something = forward_propagate_into_vec_cond (&gsi);
+		    stmt = gsi_stmt (gsi);
+		    if (did_something == 2)
+		      cfg_changed = true;
 		    fold_undefer_overflow_warnings
 		      (!TREE_NO_WARNING (rhs1) && did_something, stmt,
 		       WARN_STRICT_OVERFLOW_CONDITIONAL);
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 177665)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,11 +30,16 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "target.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
 #include "optabs.h"
 
+
+static void expand_vector_operations_1 (gimple_stmt_iterator *);
+
+
 /* Build a constant of type TYPE, made of VALUE's bits replicated
    every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision.  */
 static tree
@@ -125,6 +130,31 @@ do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0
+   
+   INNER_TYPE is the type of A and B elements
+   
+   returned expression is of signed integer type with the 
+   size equal to the size of INNER_TYPE.  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree cond;
+  tree comp_type;
+
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  
+  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
+
+  cond = gimplify_build2 (gsi, code, comp_type, a, b);
+  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond, 
+                    build_int_cst (comp_type, -1),
+                    build_int_cst (comp_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -333,6 +363,49 @@ uniform_vector_p (tree vec)
   return NULL_TREE;
 }
 
+/* Try to expand vector comparison expression OP0 CODE OP1 using  
+   builtin_vec_compare hardware hook, in case target does not 
+   support comparison of type TYPE, extract comparison piecewise.  
+   GSI is used inside the target hook to create the code needed
+   for the given comparison.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+  tree t;
+  /*if (expand_vec_cond_expr_p (type, TYPE_MODE (type)))
+    {
+      tree arg_type = TREE_TYPE (op0);
+      tree if_true, if_false, ifexp;
+      tree el_type = TREE_TYPE (type);
+      
+      //el_type = lang_hooks.types.type_for_size (TYPE_PRECISION (el_type), 0);
+
+      if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
+      if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
+      ifexp = gimplify_build2 (gsi, code, type, op0, op1);
+
+      debug_tree (ifexp);
+      debug_tree (if_true);
+      debug_tree (if_false);
+
+      if (arg_type != type)
+	{
+	  if_true = convert (arg_type, if_true);
+	  if_false = convert (arg_type, if_true);
+	  t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
+	  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR,  type, t);
+	}
+      else
+	t = gimplify_build3 (gsi, VEC_COND_EXPR, type, ifexp, if_true, if_false);
+    }
+  else
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);*/
+  return gimplify_build2  (gsi, code, type, op0, op1);;
+
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -375,8 +448,27 @@ expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+	{
+	  tree rhs1 = gimple_assign_rhs1 (assign);
+	  tree rhs2 = gimple_assign_rhs2 (assign);
 
+	  return expand_vector_comparison (gsi, type, rhs1, rhs2, code);
+	}
       default:
 	break;
       }
@@ -432,6 +524,126 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+
+/* Expand vector condition EXP which should have the form
+   VEC_COND_EXPR<cond, vec0, vec1> into the following
+   vector:
+     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
+   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
+static tree
+expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
+{
+  tree cond = TREE_OPERAND (exp, 0);
+  tree vec0 = TREE_OPERAND (exp, 1);
+  tree vec1 = TREE_OPERAND (exp, 2);
+  tree type = TREE_TYPE (vec0);
+  tree lhs, rhs, notmask;
+  tree var, new_rhs;
+  optab op = NULL;
+  gimple new_stmt;
+  gimple_stmt_iterator gsi_tmp;
+  tree t;
+
+  
+  if (COMPARISON_CLASS_P (cond))
+    {
+      /* Expand vector condition inside of VEC_COND_EXPR.  */
+      op = optab_for_tree_code (TREE_CODE (cond), type, optab_default);
+      if (!op || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing)
+	{
+	  tree op0 = TREE_OPERAND (cond, 0);
+	  tree op1 = TREE_OPERAND (cond, 1);
+
+	  var = create_tmp_reg (TREE_TYPE (cond), "cond");
+	  new_rhs = expand_vector_piecewise (gsi, do_compare, 
+					     TREE_TYPE (cond),
+					     TREE_TYPE (TREE_TYPE (op1)),
+					     op0, op1, TREE_CODE (cond));
+
+	  new_stmt = gimple_build_assign (var, new_rhs);
+	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+	  update_stmt (gsi_stmt (*gsi));
+	}
+      else
+	var = cond;
+    }
+  else
+    var = cond;
+  
+  gsi_tmp = *gsi;
+  gsi_prev (&gsi_tmp);
+
+  /* Expand VCOND<mask, v0, v1> to ((v0 & mask) | (v1 & ~mask))  */
+  lhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, var, vec0);
+  notmask = gimplify_build1 (gsi, BIT_NOT_EXPR, type, var);
+  rhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, notmask, vec1);
+  t = gimplify_build2 (gsi, BIT_IOR_EXPR, type, lhs, rhs);
+
+  /* Run vecower on the expresisons we have introduced.  */
+  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
+    expand_vector_operations_1 (&gsi_tmp);
+  
+  return t;
+}
+
+static bool
+is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
+{
+  tree type = TREE_TYPE (expr);
+
+  if (TREE_CODE (expr) == VEC_COND_EXPR)
+    return true;
+    
+  if (COMPARISON_CLASS_P (expr) && TREE_CODE (type) == VECTOR_TYPE)
+    return true;
+
+  if (TREE_CODE (expr) == BIT_IOR_EXPR || TREE_CODE (expr) == BIT_AND_EXPR
+      || TREE_CODE (expr) == BIT_XOR_EXPR)
+    return is_vector_comparison (gsi, TREE_OPERAND (expr, 0))
+	   & is_vector_comparison (gsi, TREE_OPERAND (expr, 1));
+
+  if (TREE_CODE (expr) == VAR_DECL)
+    { 
+      gimple_stmt_iterator gsi_tmp;
+      tree name = DECL_NAME (expr);
+      tree var = NULL_TREE;
+      
+      gsi_tmp = *gsi;
+
+      for (; gsi_tmp.ptr; gsi_prev (&gsi_tmp))
+	{
+	  gimple stmt = gsi_stmt (gsi_tmp);
+
+	  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+	    continue;
+
+	  if (TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL
+	      && DECL_NAME (gimple_assign_lhs (stmt)) == name)
+	    return is_vector_comparison (&gsi_tmp, 
+					 gimple_assign_rhs_to_tree (stmt));
+	}
+    } 
+  
+  if (TREE_CODE (expr) == SSA_NAME)
+    {
+      enum tree_code code;
+      gimple exprdef = SSA_NAME_DEF_STMT (expr);
+
+      if (gimple_code (exprdef) != GIMPLE_ASSIGN)
+	return false;
+
+      if (TREE_CODE (gimple_expr_type (exprdef)) != VECTOR_TYPE)
+	return false;
+
+      
+      return is_vector_comparison (gsi, 
+				   gimple_assign_rhs_to_tree (exprdef));
+    }
+
+  return false;
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -450,11 +662,34 @@ expand_vector_operations_1 (gimple_stmt_
 
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
+  lhs = gimple_assign_lhs (stmt);
+
+  if (code == VEC_COND_EXPR)
+    {
+      tree exp = gimple_assign_rhs1 (stmt);
+      tree cond = TREE_OPERAND (exp, 0);
+      
+      if (!is_vector_comparison (gsi, cond))
+	TREE_OPERAND (exp, 0) = 
+		    build2 (NE_EXPR, TREE_TYPE (cond), cond,
+			    build_vector_from_val (TREE_TYPE (cond),
+			    build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0)));
+      
+      if (expand_vec_cond_expr_p (TREE_TYPE (exp), 
+                                  TYPE_MODE (TREE_TYPE (exp))))
+        {
+	  update_stmt (gsi_stmt (*gsi));
+	  return;
+        }
+        
+      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
+      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
+      update_stmt (gsi_stmt (*gsi));
+    }
 
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
-  lhs = gimple_assign_lhs (stmt);
   rhs1 = gimple_assign_rhs1 (stmt);
   type = gimple_expr_type (stmt);
   if (rhs_class == GIMPLE_BINARY_RHS)
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 177665)
+++ gcc/Makefile.in	(working copy)
@@ -888,7 +888,7 @@ EXCEPT_H = except.h $(HASHTAB_H) vecprim
 TARGET_DEF = target.def target-hooks-macros.h
 C_TARGET_DEF = c-family/c-target.def target-hooks-macros.h
 COMMON_TARGET_DEF = common/common-target.def target-hooks-macros.h
-TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
+TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
 C_TARGET_H = c-family/c-target.h $(C_TARGET_DEF)
 COMMON_TARGET_H = common/common-target.h $(INPUT_H) $(COMMON_TARGET_DEF)
 MACHMODE_H = machmode.h mode-classes.def insn-modes.h
@@ -919,8 +919,9 @@ TREE_H = tree.h all-tree.def tree.def c-
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) cfghooks.h
 GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
-	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TARGET_H) tree-ssa-operands.h \
+	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TGT) tree-ssa-operands.h \
 	tree-ssa-alias.h $(INTERNAL_FN_H)
+TARGET_H = $(TGT) gimple.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 COVERAGE_H = coverage.h $(GCOV_IO_H)
 DEMANGLE_H = $(srcdir)/../include/demangle.h
@@ -3185,7 +3186,7 @@ tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h target.h
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 177665)
+++ gcc/tree-cfg.c	(working copy)
@@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
       return true;
     }
 
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      if (TREE_CODE (op0_type) != VECTOR_TYPE
+	  || TREE_CODE (op1_type) != VECTOR_TYPE)
+        {
+          error ("non-vector operands in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (!useless_type_conversion_p (op0_type, op1_type)
+	  && !useless_type_conversion_p (op1_type, op0_type))
+        {
+          error ("type mismatch in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type)) 
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+        
+      return false;
+    }
+
   /* For comparisons we do not have the operations type as the
      effective type the comparison is carried out in.  Instead
      we require that either the first operand is trivially
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 177665)
+++ gcc/c-parser.c	(working copy)
@@ -5339,6 +5339,15 @@ c_parser_conditional_expression (c_parse
       tree eptype = NULL_TREE;
 
       middle_loc = c_parser_peek_token (parser)->location;
+
+      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
+        {
+          error_at (middle_loc, "cannot ommit middle operator in "
+                                "vector comparison");
+          ret.value = error_mark_node;
+          return ret;
+        }
+      
       pedwarn (middle_loc, OPT_pedantic, 
 	       "ISO C forbids omitting the middle term of a ?: expression");
       warn_for_omitted_condop (middle_loc, cond.value);
@@ -5357,9 +5366,12 @@ c_parser_conditional_expression (c_parse
     }
   else
     {
-      cond.value
-	= c_objc_common_truthvalue_conversion
-	(cond_loc, default_conversion (cond.value));
+      if (TREE_CODE (TREE_TYPE (cond.value)) != VECTOR_TYPE)
+        {
+          cond.value
+            = c_objc_common_truthvalue_conversion
+            (cond_loc, default_conversion (cond.value));
+        }
       c_inhibit_evaluation_warnings += cond.value == truthvalue_false_node;
       exp1 = c_parser_expression_conv (parser);
       mark_exp_read (exp1.value);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177665)
+++ gcc/config/i386/i386.c	(working copy)
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -18402,27 +18403,55 @@ ix86_expand_sse_fp_minmax (rtx dest, enu
   return true;
 }
 
+rtx rtx_build_vector_from_val (enum machine_mode, HOST_WIDE_INT);
+
+/* Returns a vector of mode MODE where all the elements are ARG.  */
+rtx
+rtx_build_vector_from_val (enum machine_mode mode, HOST_WIDE_INT arg)
+{
+  rtvec v;
+  int units, i;
+  enum machine_mode inner;
+  
+  units = GET_MODE_NUNITS (mode);
+  inner = GET_MODE_INNER (mode);
+  v = rtvec_alloc (units);
+  for (i = 0; i < units; ++i)
+    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (inner, arg);
+  
+  return gen_rtx_raw_CONST_VECTOR (mode, v);
+}
+
 /* Expand an sse vector comparison.  Return the register with the result.  */
 
 static rtx
 ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1,
-		     rtx op_true, rtx op_false)
+		     rtx op_true, rtx op_false, bool no_comparison)
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx x;
 
-  cmp_op0 = force_reg (mode, cmp_op0);
-  if (!nonimmediate_operand (cmp_op1, mode))
-    cmp_op1 = force_reg (mode, cmp_op1);
+  /* Avoid useless comparison.  */
+  if (no_comparison)
+    {
+      cmp_op0 = force_reg (mode, cmp_op0);
+      x = cmp_op0;
+    }
+  else
+    {
+      cmp_op0 = force_reg (mode, cmp_op0);
+      if (!nonimmediate_operand (cmp_op1, mode))
+	cmp_op1 = force_reg (mode, cmp_op1);
+
+      x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1);
+    }
 
   if (optimize
       || reg_overlap_mentioned_p (dest, op_true)
       || reg_overlap_mentioned_p (dest, op_false))
     dest = gen_reg_rtx (mode);
 
-  x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1);
   emit_insn (gen_rtx_SET (VOIDmode, dest, x));
-
   return dest;
 }
 
@@ -18434,8 +18463,14 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx t2, t3, x;
-
-  if (op_false == CONST0_RTX (mode))
+  rtx mask_true;
+  
+  if (rtx_equal_p (op_true, rtx_build_vector_from_val (mode, -1))
+      && rtx_equal_p (op_false, CONST0_RTX (mode)))
+    {
+      emit_insn (gen_rtx_SET (VOIDmode, dest, cmp));
+    }
+  else if (op_false == CONST0_RTX (mode))
     {
       op_true = force_reg (mode, op_true);
       x = gen_rtx_AND (mode, cmp, op_true);
@@ -18512,7 +18547,7 @@ ix86_expand_fp_movcc (rtx operands[])
 	return true;
 
       tmp = ix86_expand_sse_cmp (operands[0], code, op0, op1,
-				 operands[2], operands[3]);
+				 operands[2], operands[3], false);
       ix86_expand_sse_movcc (operands[0], tmp, operands[2], operands[3]);
       return true;
     }
@@ -18555,7 +18590,7 @@ ix86_expand_fp_vcond (rtx operands[])
     return true;
 
   cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
-			     operands[1], operands[2]);
+			     operands[1], operands[2], false);
   ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
   return true;
 }
@@ -18569,7 +18604,9 @@ ix86_expand_int_vcond (rtx operands[])
   enum rtx_code code = GET_CODE (operands[3]);
   bool negate = false;
   rtx x, cop0, cop1;
+  rtx comp;
 
+  comp = operands[3];
   cop0 = operands[4];
   cop1 = operands[5];
 
@@ -18681,8 +18718,18 @@ ix86_expand_int_vcond (rtx operands[])
 	}
     }
 
-  x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
-			   operands[1+negate], operands[2-negate]);
+  if (GET_CODE (comp) == NE && XEXP (comp, 0) == NULL_RTX 
+      && XEXP (comp, 1) == NULL_RTX)
+    {
+      rtx vec =  CONST0_RTX (mode);
+      x = ix86_expand_sse_cmp (operands[0], code, cop0, vec,
+			       operands[1+negate], operands[2-negate], true);
+    }
+  else
+    {
+      x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
+			       operands[1+negate], operands[2-negate], false);
+    }
 
   ix86_expand_sse_movcc (operands[0], x, operands[1+negate],
 			 operands[2-negate]);
@@ -18774,7 +18821,7 @@ ix86_expand_sse_unpack (rtx operands[2],
 	tmp = force_reg (imode, CONST0_RTX (imode));
       else
 	tmp = ix86_expand_sse_cmp (gen_reg_rtx (imode), GT, CONST0_RTX (imode),
-				   operands[1], pc_rtx, pc_rtx);
+				   operands[1], pc_rtx, pc_rtx, false);
 
       emit_insn (unpack (dest, operands[1], tmp));
     }
@@ -32827,6 +32874,276 @@ ix86_vectorize_builtin_vec_perm (tree ve
   return ix86_builtins[(int) fcode];
 }
 
+/* Find target specific sequence for vector comparison of 
+   real-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                   enum machine_mode mode, tree v0, tree v1,
+                   enum tree_code code)
+{
+  enum ix86_builtins fcode;
+  int arg = -1;
+  tree fdef, frtype, tmp, var, t;
+  gimple new_stmt;
+  bool reverse = false;
+
+#define SWITCH_MODE(mode, fcode, code, value) \
+switch (mode) \
+  { \
+    case V2DFmode: \
+      if (!TARGET_SSE2) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PD; \
+      break; \
+    case V4DFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPD256; \
+      arg = value; \
+      break; \
+    case V4SFmode: \
+      if (!TARGET_SSE) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PS; \
+      break; \
+    case V8SFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPS256; \
+      arg = value; \
+      break; \
+    default: \
+      return NULL_TREE; \
+    /* FIXME: Similar instructions for MMX.  */ \
+  }
+
+  switch (code)
+    {
+      case EQ_EXPR:
+        SWITCH_MODE (mode, fcode, EQ, 0);
+        break;
+      
+      case NE_EXPR:
+        SWITCH_MODE (mode, fcode, NEQ, 4);
+        break;
+      
+      case GT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        reverse = true;
+        break;
+      
+      case LT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        break;
+      
+      case LE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        break;
+
+      case GE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        reverse = true;
+        break;
+
+      default:
+        return NULL_TREE;
+    }
+#undef SWITCH_MODE
+
+  fdef = ix86_builtins[(int)fcode];
+  frtype = TREE_TYPE (TREE_TYPE (fdef));
+ 
+  tmp = create_tmp_var (frtype, "tmp");
+  var = create_tmp_var (rettype, "tmp");
+
+  if (arg == -1)
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 2, v1, v0);
+    else
+      new_stmt = gimple_build_call (fdef, 2, v0, v1);
+  else
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 3, v0, v1, 
+                    build_int_cst (char_type_node, arg));
+    else
+      new_stmt = gimple_build_call (fdef, 3, v1, v0, 
+                    build_int_cst (char_type_node, arg));
+     
+  gimple_call_set_lhs (new_stmt, tmp); 
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  
+  return var;
+}
+
+/* Find target specific sequence for vector comparison of 
+   integer-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_int_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                    enum machine_mode mode, tree v0, tree v1,
+                    enum tree_code code)
+{
+  enum ix86_builtins feq, fgt;
+  tree var, t, tmp, tmp1, tmp2, defeq, defgt, gtrtype, eqrtype;
+  gimple new_stmt;
+
+  switch (mode)
+    {
+      /* SSE integer-type vectors.  */
+      case V2DImode:
+        if (!TARGET_SSE4_2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQQ;
+        fgt = IX86_BUILTIN_PCMPGTQ;
+        break;
+
+      case V4SImode:
+        if (!TARGET_SSE2) return NULL_TREE; 
+        feq = IX86_BUILTIN_PCMPEQD128;
+        fgt = IX86_BUILTIN_PCMPGTD128;
+        break;
+      
+      case V8HImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW128;
+        fgt = IX86_BUILTIN_PCMPGTW128;
+        break;
+      
+      case V16QImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB128;
+        fgt = IX86_BUILTIN_PCMPGTB128;
+        break;
+      
+      /* MMX integer-type vectors.  */
+      case V2SImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQD;
+        fgt = IX86_BUILTIN_PCMPGTD;
+        break;
+
+      case V4HImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW;
+        fgt = IX86_BUILTIN_PCMPGTW;
+        break;
+
+      case V8QImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB;
+        fgt = IX86_BUILTIN_PCMPGTB;
+        break;
+      
+      /* FIXME: Similar instructions for AVX.  */
+      default:
+        return NULL_TREE;
+    }
+
+  
+  var = create_tmp_var (rettype, "ret");
+  defeq = ix86_builtins[(int)feq];
+  defgt = ix86_builtins[(int)fgt];
+  eqrtype = TREE_TYPE (TREE_TYPE (defeq));
+  gtrtype = TREE_TYPE (TREE_TYPE (defgt));
+
+#define EQGT_CALL(gsi, stmt, var, op0, op1, gteq) \
+do { \
+  var = create_tmp_var (gteq ## rtype, "tmp"); \
+  stmt = gimple_build_call (def ## gteq, 2, op0, op1); \
+  gimple_call_set_lhs (stmt, var); \
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT); \
+} while (0)
+   
+  switch (code)
+    {
+      case EQ_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, eq);
+        break;
+
+      case NE_EXPR:
+        tmp = create_tmp_var (eqrtype, "tmp");
+
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, eq);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v0, eq);
+
+        /* t = tmp1 ^ {-1, -1,...}  */
+        t = gimplify_build2 (gsi, BIT_XOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+
+      case GT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, gt);
+        break;
+
+      case LT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v1, v0, gt);
+        break;
+
+      case GE_EXPR:
+        if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+      
+      case LE_EXPR:
+         if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v1, v0, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+     
+      default:
+        return NULL_TREE;
+    }
+#undef EQGT_CALL
+
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  return var;
+}
+
+/* Lower a comparison of two vectors V0 and V1, returning a 
+   variable with the result of comparison. Returns NULL_TREE
+   when it is impossible to find a target specific sequence.  */
+static tree 
+ix86_vectorize_builtin_vec_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                                    tree v0, tree v1, enum tree_code code)
+{
+  tree type;
+
+  /* Make sure we are comparing the same types.  */
+  if (TREE_TYPE (v0) != TREE_TYPE (v1)
+      || TREE_TYPE (TREE_TYPE (v0)) != TREE_TYPE (TREE_TYPE (v1)))
+    return NULL_TREE;
+  
+  type = TREE_TYPE (v0);
+  
+  /* Cannot compare packed unsigned integers 
+     unless it is EQ or NEQ operations.  */
+  if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE 
+      && TYPE_UNSIGNED (TREE_TYPE (type)))
+    if (code != EQ_EXPR && code != NE_EXPR)
+      return NULL_TREE;
+
+
+  if (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE)
+    return vector_fp_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+    return vector_int_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else
+    return NULL_TREE;
+}
+
 /* Return a vector mode with twice as many elements as VMODE.  */
 /* ??? Consider moving this to a table generated by genmodes.c.  */
 
@@ -35270,6 +35587,11 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   ix86_autovectorize_vector_sizes
 
+#undef TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+#define TARGET_VECTORIZE_BUILTIN_VEC_COMPARE \
+  ix86_vectorize_builtin_vec_compare
+
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function
 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 23:13                                                         ` Artem Shinkarov
@ 2011-08-23  9:53                                                           ` Richard Guenther
  2011-08-23 10:12                                                             ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-23  9:53 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> I'll just send you my current version. I'll be a little bit more specific.
>
> The problem starts when you try to lower the following expression:
>
> x = a > b;
> x1 = vcond <x != 0, -1, 0>
> vcond <x1, c, d>
>
> Now, you go from the beginning to the end of the block, and you cannot
> leave a > b, because only vconds are valid expressions to expand.
>
> Now, you meet a > b first. You try to transform it into vcond <a > b,
> -1, 0>, you build this expression, then you try to gimplify it, and
> you see that you have something like:
>
> x' = a >b;
> x = vcond <x', -1, 0>
> x1 = vcond <x != 0, -1, 0>
> vcond <x1, c, d>
>
> and your gsi stands at the x1 now, so the gimplification created a
> comparison that optab would not understand. And I am not really sure
> that you would be able to solve this problem easily.
>
> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
> cant and x op y is a single tree that must be gimplified, and I am not
> sure that you can persuade gimplifier to leave this expression
> untouched.
>
> In the attachment the current version of the patch.

I can't reproduce it with your patch.  For

#define vector(elcount, type)  \
    __attribute__((vector_size((elcount)*sizeof(type)))) type

vector (4, float) x, y;
vector (4, int) a,b;
int
main (int argc, char *argv[])
{
  vector (4, int) i0 = x < y;
  vector (4, int) i1 = i0 ? a : b;
  return 0;
}

I get from the C frontend:

  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
-1, -1 } , { 0, 0, 0, 0 } > ;
  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
SAVE_EXPR <b> > ;

but I have expected i0 != 0 in the second VEC_COND_EXPR.

I do see that the gimplifier pulls away the condition for the first
VEC_COND_EXPR though:

  x.0 = x;
  y.1 = y;
  D.2735 = x.0 < y.1;
  D.2734 = D.2735;
  D.2736 = D.2734;
  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
{ 0, 0, 0, 0 } > ;

which is, I believe because of the SAVE_EXPR wrapped around the
comparison.  Why do you bother wrapping all operands in save-exprs?

With that the

  /* Currently the expansion of VEC_COND_EXPR does not allow
     expessions where the type of vectors you compare differs
     form the type of vectors you select from. For the time
     being we insert implicit conversions.  */
  if ((COMPARISON_CLASS_P (ifexp)
       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
      || TREE_TYPE (ifexp) != TREE_TYPE (op1))

checks will fail (because ifexp is a SAVE_EXPR).

I'll run into errors when not adding the SAVE_EXPR around the ifexp,
the transform into x < y ? {-1,...} : {0,...} is not happening.

>
> Thanks,
> Artem.
>
>
> On Mon, Aug 22, 2011 at 9:58 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 10:49 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov
>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>>>>> Richard
>>>>>>>>>>>>>
>>>>>>>>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>>>>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>>>>>>>>> the several important approaches that I use in the patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So how does it work.
>>>>>>>>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>>>>>>>>> v1, {-1}, {0}>.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>>>>>>>>> this case, recognize it in the backend, and do not create a
>>>>>>>>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>>>>>>>>> false, then we replace mask with mask != {0}.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So we end-up with the following functionality:
>>>>>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>>>>>>>>> mask != {0}.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>>>>>>>>> which correctly expands, without creating useless masking.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Basically for me there are two questions:
>>>>>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>>>>>>>>> But first is it conceptually fine.
>>>>>>>>>>>>>
>>>>>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>>>>>>>>> which I find inappropriate, and the functionality gets more
>>>>>>>>>>>>> complicated.
>>>>>>>>>>>>
>>>>>>>>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>>>>>>>>> separates comparison and ?: expression it's his own fault.
>>>>>>>>>>>
>>>>>>>>>>> Well, because all the information is there, and I perfectly envision
>>>>>>>>>>> the case when user expressed comparison separately, just to avoid code
>>>>>>>>>>> duplication.
>>>>>>>>>>>
>>>>>>>>>>> Like:
>>>>>>>>>>> mask = a > b;
>>>>>>>>>>> res1 = mask ? v0 : v1;
>>>>>>>>>>> res2 = mask ? v2 : v3;
>>>>>>>>>>>
>>>>>>>>>>> Which in this case would be different from
>>>>>>>>>>> res1 = a > b ? v0 : v1;
>>>>>>>>>>> res2 = a > b ? v2 : v3;
>>>>>>>>>>>
>>>>>>>>>>>> Btw, the new hook is still in the patch.
>>>>>>>>>>>>
>>>>>>>>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>>>>>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>>>>>>>>> non-comparison operands during expansion though, but if
>>>>>>>>>>>> you always forced a != 0 from the start you can then simply
>>>>>>>>>>>> interpret it as a proper comparison result (in which case I'd
>>>>>>>>>>>> modify the backends to have an alternate pattern or directly
>>>>>>>>>>>> expand to masking operations - using the fake comparison
>>>>>>>>>>>> RTX is too much of a hack).
>>>>>>>>>>>
>>>>>>>>>>> Richard, I think you didn't get the problem.
>>>>>>>>>>> I really need the way, to pass the information, that the expression
>>>>>>>>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>>>>>>>>> for quite a while and this hack is the only answer I found, is there a
>>>>>>>>>>> better way to do that. I could for example introduce another
>>>>>>>>>>> tree-node, but it would be overkill as well.
>>>>>>>>>>>
>>>>>>>>>>> Now why do I need it so much:
>>>>>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>>>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>>>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>>>>>>>>> always). To check the stuff, I use is_vector_comparison in
>>>>>>>>>>> tree-vect-generic.
>>>>>>>>>>>
>>>>>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>>>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>>>>>>>>> mask.
>>>>>>>>>>>
>>>>>>>>>>> If this link between optabs and backend breaks, then the patch falls
>>>>>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>>>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>>>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>>>>>>>>> function-comparison, or somehow else?
>>>>>>>>>>>
>>>>>>>>>>> So what would be an appropriate way to connect optabs and the backend?
>>>>>>>>>>
>>>>>>>>>> Well, there is no problem in having the only valid mask operand for
>>>>>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>>>>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>>>>>>>>> vec1 : vec2.
>>>>>>>>>
>>>>>>>>> This happens already in the new version of patch (not submitted yet).
>>>>>>>>>
>>>>>>>>>> This comparison can be eliminated by optimization passes
>>>>>>>>>> that
>>>>>>>>>> either replace it by the real comparison computing the mask or just
>>>>>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>>>>>>>>> dropping the comparison against zero.
>>>>>>>>>
>>>>>>>>> This is not a problem, because the backend recognizes these patterns,
>>>>>>>>> so no optimization is needed in this part.
>>>>>>>>
>>>>>>>> I mean for
>>>>>>>>
>>>>>>>>  mask = v1 < v2 ? {-1,...} : {0,...};
>>>>>>>>  val = VCOND_EXPR <mask != 0, v3, v4>;
>>>>>>>>
>>>>>>>> optimizers can see how mask is defined and drop the != 0 test or replace
>>>>>>>> it by v1 < v2.
>>>>>>>
>>>>>>> Yes, sure.
>>>>>>>
>>>>>>>>>> For the backends I'd have vcond patterns for both an embedded comparison
>>>>>>>>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>>>>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>>>>>>>>> selection - what matters is the C frontend semantics which we need to
>>>>>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>>>>>>>>> do not have to agree).
>>>>>>>>>
>>>>>>>>> But it seems like another combinatorial explosion here. Considering
>>>>>>>>> what Richard said in his e-mail, in order to support "generic" vcond,
>>>>>>>>> we just need to enumerate all the possible cases. Or I didn't
>>>>>>>>> understand right?
>>>>>>>>
>>>>>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern
>>>>>>>> semantically does for a non-comparison operand.  I'd argue that using
>>>>>>>> the bitwise selection semantic gives us maximum flexibility and a native
>>>>>>>> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
>>>>>>>> also easy to implement in the middle-end expansion code if there is
>>>>>>>> no native instruction for it - by simply emitting the bitwise operations.
>>>>>>>>
>>>>>>>> But I have the feeling we are talking past each other ...?
>>>>>>>
>>>>>>> I am all for the bitwise behaviour in the backend pattern, that is
>>>>>>> something that I rely on at the moment. What I don't want to have is
>>>>>>> the same behaviour in the frontend. So If we can guarantee, that we
>>>>>>> add != 0, when we don't know the "nature" of the mask, then I am
>>>>>>> perfectly fine with the back-end having bitwise-selection behaviour.
>>>>>>
>>>>>> Well, the C frontend would simply always add that != 0 (because it
>>>>>> doesn't know).
>>>>>>
>>>>>>>>> I mean, I don't mind of course, but it seems to me that it would be
>>>>>>>>> cleaner to have one generic enough pattern.
>>>>>>>>>
>>>>>>>>> Is there seriously no way to pass something from optab into the backend??
>>>>>>>>
>>>>>>>> You can pass operands.  And information is implicitly encoded in the name.
>>>>>>>
>>>>>>> I didn't quite get that, could you give an example?
>>>>>>
>>>>>> It was a larger variant of "no, apart from what is obvious".
>>>>>
>>>>> Ha, joking again. :)
>>>>>
>>>>>>>>>> If the mask is computed by a function you are of course out of luck,
>>>>>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>>>>>>>>
>>>>>>>>> Well, take simpler example
>>>>>>>>>
>>>>>>>>> a = {0};
>>>>>>>>> for ( ; *p; p += 16)
>>>>>>>>>  a &= pattern > (vec)*p;
>>>>>>>>>
>>>>>>>>> res = a ? v0 : v1;
>>>>>>>>>
>>>>>>>>> In this case it is simple to analyse that a is a comparison, but you
>>>>>>>>> cannot embed the operations of a into VEC_COND_EXPR.
>>>>>>>>
>>>>>>>> Sure, but if the above is C source the frontend would generate
>>>>>>>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>>>>>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>>>>>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>>>>>>>> vector contents though).
>>>>>>>
>>>>>>> Yeah, sure. My point is, that we must be able to pass this information
>>>>>>> in the backend, that we checked everything, and we are sure that a is
>>>>>>> a corerct mask, please don't add any != 0 to it.
>>>>>>
>>>>>> But all masks are correct as soon as they appear in a VEC_COND_EXPR.
>>>>>> That's the whole point of the bitwise semantics.  It's only the C frontend
>>>>>> that needs to be careful to impose its stricter semantics.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>
>>>>> Ok, I see the last difference in the approaches we envision.
>>>>> I am assuming, that frontend does not put != 0, but the later
>>>>> optimisations (veclower in my case) check every mask in VEC_COND_EXPR
>>>>> and does the same functionality as you describe. So the philosophical
>>>>> question why it is better to first add and then remove, rather than
>>>>> just add if needed?
>>>>
>>>> Well, it's "better be right than sorry".  Thus, default to the
>>>> conservatively correct
>>>> way and let optimizers "optimize" it.
>>>
>>> How can we get sorry, it is impossible to skip the vcond during the
>>> optimisation, but whatever, it is not really so important when to add.
>>> Currently I have a bigger problem, see below.
>>>>
>>>>> In all the rest I think we agreed.
>>>>
>>>> Fine.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>>
>>>>> Artem.
>>>>>
>>>>
>>>
>>> I found out that I cannot really gimplify correctly the vcond<a >b ,
>>> c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is
>>> that gimplifier pulls a > b always as a separate expression during the
>>> gimplification, and I don't think that we can avoid it. So what
>>> happens is:
>>>
>>> vcond <a > b , c , d>
>>> transformed to
>>> x = b > c;
>>> x1 = vcond <x , -1, 0>
>>> vcond <x1, c, d>
>>>
>>> and so on, infinitely long.
>>
>> Sounds like a bug that is possible to fix.
>>
>>> In order to fix the problem we need whether to introduce a new code
>>> like VEC_COMP_LT, VEC_COMP_GT, and so on
>>> whether a builtin function which we would lower
>>> whether stick back to the idea of hook.
>>>
>>> Anyway, representing a >b using vcond does not work.
>>
>> Well, sure it will work, it just needs some work appearantly.
>>
>>> What would be your thinking here?
>>
>> Do you have a patch that exposes this problem?  I can have a look
>> tomorrow.
>>
>> Richard.
>>
>>>
>>> Thanks,
>>> Artem.
>>>
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-23  9:53                                                           ` Richard Guenther
@ 2011-08-23 10:12                                                             ` Artem Shinkarov
  2011-08-23 10:45                                                               ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-23 10:12 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> I'll just send you my current version. I'll be a little bit more specific.
>>
>> The problem starts when you try to lower the following expression:
>>
>> x = a > b;
>> x1 = vcond <x != 0, -1, 0>
>> vcond <x1, c, d>
>>
>> Now, you go from the beginning to the end of the block, and you cannot
>> leave a > b, because only vconds are valid expressions to expand.
>>
>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>> -1, 0>, you build this expression, then you try to gimplify it, and
>> you see that you have something like:
>>
>> x' = a >b;
>> x = vcond <x', -1, 0>
>> x1 = vcond <x != 0, -1, 0>
>> vcond <x1, c, d>
>>
>> and your gsi stands at the x1 now, so the gimplification created a
>> comparison that optab would not understand. And I am not really sure
>> that you would be able to solve this problem easily.
>>
>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>> cant and x op y is a single tree that must be gimplified, and I am not
>> sure that you can persuade gimplifier to leave this expression
>> untouched.
>>
>> In the attachment the current version of the patch.
>
> I can't reproduce it with your patch.  For
>
> #define vector(elcount, type)  \
>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>
> vector (4, float) x, y;
> vector (4, int) a,b;
> int
> main (int argc, char *argv[])
> {
>  vector (4, int) i0 = x < y;
>  vector (4, int) i1 = i0 ? a : b;
>  return 0;
> }
>
> I get from the C frontend:
>
>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
> -1, -1 } , { 0, 0, 0, 0 } > ;
>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
> SAVE_EXPR <b> > ;
>
> but I have expected i0 != 0 in the second VEC_COND_EXPR.

I don't put it there. This patch adds != 0, rather removing. But this
could be changed.

> I do see that the gimplifier pulls away the condition for the first
> VEC_COND_EXPR though:
>
>  x.0 = x;
>  y.1 = y;
>  D.2735 = x.0 < y.1;
>  D.2734 = D.2735;
>  D.2736 = D.2734;
>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
> { 0, 0, 0, 0 } > ;
>
> which is, I believe because of the SAVE_EXPR wrapped around the
> comparison.  Why do you bother wrapping all operands in save-exprs?

I bother because they could be MAYBE_CONST which breaks the
gimplifier. But I don't really know if you can do it better. I can
always do this checking on operands of constructed vcond...

You are right, that if you just put a comparison of variables there
then we are fine. My point is that whenever gimplifier is pulling out
the comparison from the first operand, replacing it with the variable,
then we are screwed, because there is no chance to put it back, and
that is exactly what happens in expand_vector_comparison, if you
uncomment the replacement -- comparison is always represented as x = a
> b.

> With that the
>
>  /* Currently the expansion of VEC_COND_EXPR does not allow
>     expessions where the type of vectors you compare differs
>     form the type of vectors you select from. For the time
>     being we insert implicit conversions.  */
>  if ((COMPARISON_CLASS_P (ifexp)
>       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>
> checks will fail (because ifexp is a SAVE_EXPR).
>
> I'll run into errors when not adding the SAVE_EXPR around the ifexp,
> the transform into x < y ? {-1,...} : {0,...} is not happening.
>>
>> Thanks,
>> Artem.
>>
>>
>> On Mon, Aug 22, 2011 at 9:58 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 10:49 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov
>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther
>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov
>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther
>>>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov
>>>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther
>>>>>>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov
>>>>>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>>>>>>>> Richard
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I formalized an approach a little-bit, now it works without target
>>>>>>>>>>>>>> hooks, but some polishing is still required. I want you to comment on
>>>>>>>>>>>>>> the several important approaches that I use in the patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So how does it work.
>>>>>>>>>>>>>> 1) All the vector comparisons at the level of  type-checker are
>>>>>>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being
>>>>>>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0
>>>>>>>>>>>>>>> v1, {-1}, {0}>.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered:
>>>>>>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes.
>>>>>>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark
>>>>>>>>>>>>>> this case, recognize it in the backend, and do not create a
>>>>>>>>>>>>>> comparison, but use the mask as it was a result of some comparison.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a
>>>>>>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns
>>>>>>>>>>>>>> false, then we replace mask with mask != {0}.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So we end-up with the following functionality:
>>>>>>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of
>>>>>>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with
>>>>>>>>>>>>>> mask != {0}.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR,
>>>>>>>>>>>>>> which correctly expands, without creating useless masking.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Basically for me there are two questions:
>>>>>>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way?
>>>>>>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas,
>>>>>>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on.
>>>>>>>>>>>>>> But first is it conceptually fine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in
>>>>>>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On,
>>>>>>>>>>>>>> which I find inappropriate, and the functionality gets more
>>>>>>>>>>>>>> complicated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why is it inappropriate to not optimize it at -O0?  If the user
>>>>>>>>>>>>> separates comparison and ?: expression it's his own fault.
>>>>>>>>>>>>
>>>>>>>>>>>> Well, because all the information is there, and I perfectly envision
>>>>>>>>>>>> the case when user expressed comparison separately, just to avoid code
>>>>>>>>>>>> duplication.
>>>>>>>>>>>>
>>>>>>>>>>>> Like:
>>>>>>>>>>>> mask = a > b;
>>>>>>>>>>>> res1 = mask ? v0 : v1;
>>>>>>>>>>>> res2 = mask ? v2 : v3;
>>>>>>>>>>>>
>>>>>>>>>>>> Which in this case would be different from
>>>>>>>>>>>> res1 = a > b ? v0 : v1;
>>>>>>>>>>>> res2 = a > b ? v2 : v3;
>>>>>>>>>>>>
>>>>>>>>>>>>> Btw, the new hook is still in the patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would simply always create != 0 if it isn't and let optimizers
>>>>>>>>>>>>> (tree-ssa-forwprop.c) optimize this.  You still have to deal with
>>>>>>>>>>>>> non-comparison operands during expansion though, but if
>>>>>>>>>>>>> you always forced a != 0 from the start you can then simply
>>>>>>>>>>>>> interpret it as a proper comparison result (in which case I'd
>>>>>>>>>>>>> modify the backends to have an alternate pattern or directly
>>>>>>>>>>>>> expand to masking operations - using the fake comparison
>>>>>>>>>>>>> RTX is too much of a hack).
>>>>>>>>>>>>
>>>>>>>>>>>> Richard, I think you didn't get the problem.
>>>>>>>>>>>> I really need the way, to pass the information, that the expression
>>>>>>>>>>>> that is in the first operand of vcond is an appropriate mask. I though
>>>>>>>>>>>> for quite a while and this hack is the only answer I found, is there a
>>>>>>>>>>>> better way to do that. I could for example introduce another
>>>>>>>>>>>> tree-node, but it would be overkill as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Now why do I need it so much:
>>>>>>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is
>>>>>>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of
>>>>>>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not
>>>>>>>>>>>> always). To check the stuff, I use is_vector_comparison in
>>>>>>>>>>>> tree-vect-generic.
>>>>>>>>>>>>
>>>>>>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ?
>>>>>>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just
>>>>>>>>>>>> mask.
>>>>>>>>>>>>
>>>>>>>>>>>> If this link between optabs and backend breaks, then the patch falls
>>>>>>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I
>>>>>>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the
>>>>>>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a
>>>>>>>>>>>> function-comparison, or somehow else?
>>>>>>>>>>>>
>>>>>>>>>>>> So what would be an appropriate way to connect optabs and the backend?
>>>>>>>>>>>
>>>>>>>>>>> Well, there is no problem in having the only valid mask operand for
>>>>>>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask.
>>>>>>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ?
>>>>>>>>>>> vec1 : vec2.
>>>>>>>>>>
>>>>>>>>>> This happens already in the new version of patch (not submitted yet).
>>>>>>>>>>
>>>>>>>>>>> This comparison can be eliminated by optimization passes
>>>>>>>>>>> that
>>>>>>>>>>> either replace it by the real comparison computing the mask or just
>>>>>>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply
>>>>>>>>>>> dropping the comparison against zero.
>>>>>>>>>>
>>>>>>>>>> This is not a problem, because the backend recognizes these patterns,
>>>>>>>>>> so no optimization is needed in this part.
>>>>>>>>>
>>>>>>>>> I mean for
>>>>>>>>>
>>>>>>>>>  mask = v1 < v2 ? {-1,...} : {0,...};
>>>>>>>>>  val = VCOND_EXPR <mask != 0, v3, v4>;
>>>>>>>>>
>>>>>>>>> optimizers can see how mask is defined and drop the != 0 test or replace
>>>>>>>>> it by v1 < v2.
>>>>>>>>
>>>>>>>> Yes, sure.
>>>>>>>>
>>>>>>>>>>> For the backends I'd have vcond patterns for both an embedded comparison
>>>>>>>>>>> and for a mask.  (Now we can rewind the discussion a bit and allow
>>>>>>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise
>>>>>>>>>>> selection - what matters is the C frontend semantics which we need to
>>>>>>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they
>>>>>>>>>>> do not have to agree).
>>>>>>>>>>
>>>>>>>>>> But it seems like another combinatorial explosion here. Considering
>>>>>>>>>> what Richard said in his e-mail, in order to support "generic" vcond,
>>>>>>>>>> we just need to enumerate all the possible cases. Or I didn't
>>>>>>>>>> understand right?
>>>>>>>>>
>>>>>>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern
>>>>>>>>> semantically does for a non-comparison operand.  I'd argue that using
>>>>>>>>> the bitwise selection semantic gives us maximum flexibility and a native
>>>>>>>>> instruction with AMD XOP.  This non-comparison VCOND_EXPR is
>>>>>>>>> also easy to implement in the middle-end expansion code if there is
>>>>>>>>> no native instruction for it - by simply emitting the bitwise operations.
>>>>>>>>>
>>>>>>>>> But I have the feeling we are talking past each other ...?
>>>>>>>>
>>>>>>>> I am all for the bitwise behaviour in the backend pattern, that is
>>>>>>>> something that I rely on at the moment. What I don't want to have is
>>>>>>>> the same behaviour in the frontend. So If we can guarantee, that we
>>>>>>>> add != 0, when we don't know the "nature" of the mask, then I am
>>>>>>>> perfectly fine with the back-end having bitwise-selection behaviour.
>>>>>>>
>>>>>>> Well, the C frontend would simply always add that != 0 (because it
>>>>>>> doesn't know).
>>>>>>>
>>>>>>>>>> I mean, I don't mind of course, but it seems to me that it would be
>>>>>>>>>> cleaner to have one generic enough pattern.
>>>>>>>>>>
>>>>>>>>>> Is there seriously no way to pass something from optab into the backend??
>>>>>>>>>
>>>>>>>>> You can pass operands.  And information is implicitly encoded in the name.
>>>>>>>>
>>>>>>>> I didn't quite get that, could you give an example?
>>>>>>>
>>>>>>> It was a larger variant of "no, apart from what is obvious".
>>>>>>
>>>>>> Ha, joking again. :)
>>>>>>
>>>>>>>>>>> If the mask is computed by a function you are of course out of luck,
>>>>>>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either.
>>>>>>>>>>
>>>>>>>>>> Well, take simpler example
>>>>>>>>>>
>>>>>>>>>> a = {0};
>>>>>>>>>> for ( ; *p; p += 16)
>>>>>>>>>>  a &= pattern > (vec)*p;
>>>>>>>>>>
>>>>>>>>>> res = a ? v0 : v1;
>>>>>>>>>>
>>>>>>>>>> In this case it is simple to analyse that a is a comparison, but you
>>>>>>>>>> cannot embed the operations of a into VEC_COND_EXPR.
>>>>>>>>>
>>>>>>>>> Sure, but if the above is C source the frontend would generate
>>>>>>>>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>>>>>>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>>>>>>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>>>>>>>>> vector contents though).
>>>>>>>>
>>>>>>>> Yeah, sure. My point is, that we must be able to pass this information
>>>>>>>> in the backend, that we checked everything, and we are sure that a is
>>>>>>>> a corerct mask, please don't add any != 0 to it.
>>>>>>>
>>>>>>> But all masks are correct as soon as they appear in a VEC_COND_EXPR.
>>>>>>> That's the whole point of the bitwise semantics.  It's only the C frontend
>>>>>>> that needs to be careful to impose its stricter semantics.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>
>>>>>> Ok, I see the last difference in the approaches we envision.
>>>>>> I am assuming, that frontend does not put != 0, but the later
>>>>>> optimisations (veclower in my case) check every mask in VEC_COND_EXPR
>>>>>> and does the same functionality as you describe. So the philosophical
>>>>>> question why it is better to first add and then remove, rather than
>>>>>> just add if needed?
>>>>>
>>>>> Well, it's "better be right than sorry".  Thus, default to the
>>>>> conservatively correct
>>>>> way and let optimizers "optimize" it.
>>>>
>>>> How can we get sorry, it is impossible to skip the vcond during the
>>>> optimisation, but whatever, it is not really so important when to add.
>>>> Currently I have a bigger problem, see below.
>>>>>
>>>>>> In all the rest I think we agreed.
>>>>>
>>>>> Fine.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> Artem.
>>>>>>
>>>>>
>>>>
>>>> I found out that I cannot really gimplify correctly the vcond<a >b ,
>>>> c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is
>>>> that gimplifier pulls a > b always as a separate expression during the
>>>> gimplification, and I don't think that we can avoid it. So what
>>>> happens is:
>>>>
>>>> vcond <a > b , c , d>
>>>> transformed to
>>>> x = b > c;
>>>> x1 = vcond <x , -1, 0>
>>>> vcond <x1, c, d>
>>>>
>>>> and so on, infinitely long.
>>>
>>> Sounds like a bug that is possible to fix.
>>>
>>>> In order to fix the problem we need whether to introduce a new code
>>>> like VEC_COMP_LT, VEC_COMP_GT, and so on
>>>> whether a builtin function which we would lower
>>>> whether stick back to the idea of hook.
>>>>
>>>> Anyway, representing a >b using vcond does not work.
>>>
>>> Well, sure it will work, it just needs some work appearantly.
>>>
>>>> What would be your thinking here?
>>>
>>> Do you have a patch that exposes this problem?  I can have a look
>>> tomorrow.
>>>
>>> Richard.
>>>
>>>>
>>>> Thanks,
>>>> Artem.
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-23 10:12                                                             ` Artem Shinkarov
@ 2011-08-23 10:45                                                               ` Richard Guenther
  2011-08-23 11:08                                                                 ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-23 10:45 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> I'll just send you my current version. I'll be a little bit more specific.
>>>
>>> The problem starts when you try to lower the following expression:
>>>
>>> x = a > b;
>>> x1 = vcond <x != 0, -1, 0>
>>> vcond <x1, c, d>
>>>
>>> Now, you go from the beginning to the end of the block, and you cannot
>>> leave a > b, because only vconds are valid expressions to expand.
>>>
>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>> you see that you have something like:
>>>
>>> x' = a >b;
>>> x = vcond <x', -1, 0>
>>> x1 = vcond <x != 0, -1, 0>
>>> vcond <x1, c, d>
>>>
>>> and your gsi stands at the x1 now, so the gimplification created a
>>> comparison that optab would not understand. And I am not really sure
>>> that you would be able to solve this problem easily.
>>>
>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>> cant and x op y is a single tree that must be gimplified, and I am not
>>> sure that you can persuade gimplifier to leave this expression
>>> untouched.
>>>
>>> In the attachment the current version of the patch.
>>
>> I can't reproduce it with your patch.  For
>>
>> #define vector(elcount, type)  \
>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>
>> vector (4, float) x, y;
>> vector (4, int) a,b;
>> int
>> main (int argc, char *argv[])
>> {
>>  vector (4, int) i0 = x < y;
>>  vector (4, int) i1 = i0 ? a : b;
>>  return 0;
>> }
>>
>> I get from the C frontend:
>>
>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>> SAVE_EXPR <b> > ;
>>
>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>
> I don't put it there. This patch adds != 0, rather removing. But this
> could be changed.

?

>> I do see that the gimplifier pulls away the condition for the first
>> VEC_COND_EXPR though:
>>
>>  x.0 = x;
>>  y.1 = y;
>>  D.2735 = x.0 < y.1;
>>  D.2734 = D.2735;
>>  D.2736 = D.2734;
>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>> { 0, 0, 0, 0 } > ;
>>
>> which is, I believe because of the SAVE_EXPR wrapped around the
>> comparison.  Why do you bother wrapping all operands in save-exprs?
>
> I bother because they could be MAYBE_CONST which breaks the
> gimplifier. But I don't really know if you can do it better. I can
> always do this checking on operands of constructed vcond...

Err, the patch does

+  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
+  tmp = c_fully_fold (ifexp, false, &maybe_const);
+  ifexp = save_expr (tmp);
+  wrap &= maybe_const;

why is

  ifexp = save_expr (tmp);

necessary here?  SAVE_EXPR is if you need to protect side-effects
from being evaluated twice if you use an operand twice.  But all
operands are just used a single time.

And I expected, instead of

+  if ((COMPARISON_CLASS_P (ifexp)
+       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
+      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
+    {
+      tree comp_type = COMPARISON_CLASS_P (ifexp)
+                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
+                      : TREE_TYPE (ifexp);
+
+      op1 = convert (comp_type, op1);
+      op2 = convert (comp_type, op2);
+      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
+      vcond = convert (TREE_TYPE (op1), vcond);
+    }
+  else
+    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);

  if (!COMPARISON_CLASS_P (ifexp))
    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
                         build_vector_from_val (TREE_TYPE (ifexp), 0));

  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
    {
...

> You are right, that if you just put a comparison of variables there
> then we are fine. My point is that whenever gimplifier is pulling out
> the comparison from the first operand, replacing it with the variable,
> then we are screwed, because there is no chance to put it back, and
> that is exactly what happens in expand_vector_comparison, if you
> uncomment the replacement -- comparison is always represented as x = a
>> b.
>
>> With that the
>>
>>  /* Currently the expansion of VEC_COND_EXPR does not allow
>>     expessions where the type of vectors you compare differs
>>     form the type of vectors you select from. For the time
>>     being we insert implicit conversions.  */
>>  if ((COMPARISON_CLASS_P (ifexp)
>>       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>>      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>
>> checks will fail (because ifexp is a SAVE_EXPR).
>>
>> I'll run into errors when not adding the SAVE_EXPR around the ifexp,
>> the transform into x < y ? {-1,...} : {0,...} is not happening.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-23 10:45                                                               ` Richard Guenther
@ 2011-08-23 11:08                                                                 ` Artem Shinkarov
  2011-08-23 11:12                                                                   ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-23 11:08 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Tue, Aug 23, 2011 at 11:08 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> I'll just send you my current version. I'll be a little bit more specific.
>>>>
>>>> The problem starts when you try to lower the following expression:
>>>>
>>>> x = a > b;
>>>> x1 = vcond <x != 0, -1, 0>
>>>> vcond <x1, c, d>
>>>>
>>>> Now, you go from the beginning to the end of the block, and you cannot
>>>> leave a > b, because only vconds are valid expressions to expand.
>>>>
>>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>>> you see that you have something like:
>>>>
>>>> x' = a >b;
>>>> x = vcond <x', -1, 0>
>>>> x1 = vcond <x != 0, -1, 0>
>>>> vcond <x1, c, d>
>>>>
>>>> and your gsi stands at the x1 now, so the gimplification created a
>>>> comparison that optab would not understand. And I am not really sure
>>>> that you would be able to solve this problem easily.
>>>>
>>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>>> cant and x op y is a single tree that must be gimplified, and I am not
>>>> sure that you can persuade gimplifier to leave this expression
>>>> untouched.
>>>>
>>>> In the attachment the current version of the patch.
>>>
>>> I can't reproduce it with your patch.  For
>>>
>>> #define vector(elcount, type)  \
>>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>
>>> vector (4, float) x, y;
>>> vector (4, int) a,b;
>>> int
>>> main (int argc, char *argv[])
>>> {
>>>  vector (4, int) i0 = x < y;
>>>  vector (4, int) i1 = i0 ? a : b;
>>>  return 0;
>>> }
>>>
>>> I get from the C frontend:
>>>
>>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>>> SAVE_EXPR <b> > ;
>>>
>>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>>
>> I don't put it there. This patch adds != 0, rather removing. But this
>> could be changed.
>
> ?
>
>>> I do see that the gimplifier pulls away the condition for the first
>>> VEC_COND_EXPR though:
>>>
>>>  x.0 = x;
>>>  y.1 = y;
>>>  D.2735 = x.0 < y.1;
>>>  D.2734 = D.2735;
>>>  D.2736 = D.2734;
>>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>>> { 0, 0, 0, 0 } > ;
>>>
>>> which is, I believe because of the SAVE_EXPR wrapped around the
>>> comparison.  Why do you bother wrapping all operands in save-exprs?
>>
>> I bother because they could be MAYBE_CONST which breaks the
>> gimplifier. But I don't really know if you can do it better. I can
>> always do this checking on operands of constructed vcond...
>
> Err, the patch does
>
> +  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
> +  tmp = c_fully_fold (ifexp, false, &maybe_const);
> +  ifexp = save_expr (tmp);
> +  wrap &= maybe_const;
>
> why is
>
>  ifexp = save_expr (tmp);
>
> necessary here?  SAVE_EXPR is if you need to protect side-effects
> from being evaluated twice if you use an operand twice.  But all
> operands are just used a single time.

Again, the only reason why save_expr is there is to avoid MAYBE_CONST
nodes to break the gimplification. But may be it is a wrong way of
doing it, but it does the job.

> And I expected, instead of
>
> +  if ((COMPARISON_CLASS_P (ifexp)
> +       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
> +      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
> +    {
> +      tree comp_type = COMPARISON_CLASS_P (ifexp)
> +                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
> +                      : TREE_TYPE (ifexp);
> +
> +      op1 = convert (comp_type, op1);
> +      op2 = convert (comp_type, op2);
> +      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
> +      vcond = convert (TREE_TYPE (op1), vcond);
> +    }
> +  else
> +    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
>
>  if (!COMPARISON_CLASS_P (ifexp))
>    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
>                         build_vector_from_val (TREE_TYPE (ifexp), 0));
>
>  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
>    {
> ...
>
Why?
This is a function to constuct any vcond. The result of ifexp is
always signed integer vector if it is a comparison, but we need to
make sure that all the elements of vcond have the same type.

And I didn't really understand if we can guarantee that vector
comparison would not be lifted out by the gimplifier. It happens in
case I put this save_expr, it could possibly happen in some other
cases. How can we prevent that?


Artem.

>> You are right, that if you just put a comparison of variables there
>> then we are fine. My point is that whenever gimplifier is pulling out
>> the comparison from the first operand, replacing it with the variable,
>> then we are screwed, because there is no chance to put it back, and
>> that is exactly what happens in expand_vector_comparison, if you
>> uncomment the replacement -- comparison is always represented as x = a
>>> b.
>>
>>> With that the
>>>
>>>  /* Currently the expansion of VEC_COND_EXPR does not allow
>>>     expessions where the type of vectors you compare differs
>>>     form the type of vectors you select from. For the time
>>>     being we insert implicit conversions.  */
>>>  if ((COMPARISON_CLASS_P (ifexp)
>>>       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>>>      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>
>>> checks will fail (because ifexp is a SAVE_EXPR).
>>>
>>> I'll run into errors when not adding the SAVE_EXPR around the ifexp,
>>> the transform into x < y ? {-1,...} : {0,...} is not happening.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-23 11:08                                                                 ` Artem Shinkarov
@ 2011-08-23 11:12                                                                   ` Richard Guenther
  2011-08-23 11:23                                                                     ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-23 11:12 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Tue, Aug 23, 2011 at 12:24 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 11:08 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> I'll just send you my current version. I'll be a little bit more specific.
>>>>>
>>>>> The problem starts when you try to lower the following expression:
>>>>>
>>>>> x = a > b;
>>>>> x1 = vcond <x != 0, -1, 0>
>>>>> vcond <x1, c, d>
>>>>>
>>>>> Now, you go from the beginning to the end of the block, and you cannot
>>>>> leave a > b, because only vconds are valid expressions to expand.
>>>>>
>>>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>>>> you see that you have something like:
>>>>>
>>>>> x' = a >b;
>>>>> x = vcond <x', -1, 0>
>>>>> x1 = vcond <x != 0, -1, 0>
>>>>> vcond <x1, c, d>
>>>>>
>>>>> and your gsi stands at the x1 now, so the gimplification created a
>>>>> comparison that optab would not understand. And I am not really sure
>>>>> that you would be able to solve this problem easily.
>>>>>
>>>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>>>> cant and x op y is a single tree that must be gimplified, and I am not
>>>>> sure that you can persuade gimplifier to leave this expression
>>>>> untouched.
>>>>>
>>>>> In the attachment the current version of the patch.
>>>>
>>>> I can't reproduce it with your patch.  For
>>>>
>>>> #define vector(elcount, type)  \
>>>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>
>>>> vector (4, float) x, y;
>>>> vector (4, int) a,b;
>>>> int
>>>> main (int argc, char *argv[])
>>>> {
>>>>  vector (4, int) i0 = x < y;
>>>>  vector (4, int) i1 = i0 ? a : b;
>>>>  return 0;
>>>> }
>>>>
>>>> I get from the C frontend:
>>>>
>>>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>>>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>>>> SAVE_EXPR <b> > ;
>>>>
>>>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>>>
>>> I don't put it there. This patch adds != 0, rather removing. But this
>>> could be changed.
>>
>> ?
>>
>>>> I do see that the gimplifier pulls away the condition for the first
>>>> VEC_COND_EXPR though:
>>>>
>>>>  x.0 = x;
>>>>  y.1 = y;
>>>>  D.2735 = x.0 < y.1;
>>>>  D.2734 = D.2735;
>>>>  D.2736 = D.2734;
>>>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>>>> { 0, 0, 0, 0 } > ;
>>>>
>>>> which is, I believe because of the SAVE_EXPR wrapped around the
>>>> comparison.  Why do you bother wrapping all operands in save-exprs?
>>>
>>> I bother because they could be MAYBE_CONST which breaks the
>>> gimplifier. But I don't really know if you can do it better. I can
>>> always do this checking on operands of constructed vcond...
>>
>> Err, the patch does
>>
>> +  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
>> +  tmp = c_fully_fold (ifexp, false, &maybe_const);
>> +  ifexp = save_expr (tmp);
>> +  wrap &= maybe_const;
>>
>> why is
>>
>>  ifexp = save_expr (tmp);
>>
>> necessary here?  SAVE_EXPR is if you need to protect side-effects
>> from being evaluated twice if you use an operand twice.  But all
>> operands are just used a single time.
>
> Again, the only reason why save_expr is there is to avoid MAYBE_CONST
> nodes to break the gimplification. But may be it is a wrong way of
> doing it, but it does the job.
>
>> And I expected, instead of
>>
>> +  if ((COMPARISON_CLASS_P (ifexp)
>> +       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>> +      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>> +    {
>> +      tree comp_type = COMPARISON_CLASS_P (ifexp)
>> +                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>> +                      : TREE_TYPE (ifexp);
>> +
>> +      op1 = convert (comp_type, op1);
>> +      op2 = convert (comp_type, op2);
>> +      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>> +      vcond = convert (TREE_TYPE (op1), vcond);
>> +    }
>> +  else
>> +    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
>>
>>  if (!COMPARISON_CLASS_P (ifexp))
>>    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
>>                         build_vector_from_val (TREE_TYPE (ifexp), 0));
>>
>>  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>    {
>> ...
>>
> Why?
> This is a function to constuct any vcond. The result of ifexp is
> always signed integer vector if it is a comparison, but we need to
> make sure that all the elements of vcond have the same type.
>
> And I didn't really understand if we can guarantee that vector
> comparison would not be lifted out by the gimplifier. It happens in
> case I put this save_expr, it could possibly happen in some other
> cases. How can we prevent that?

We don't need to prevent it.  If the C frontend makes sure that the
mask of a VEC_COND_EXPR is always {-1,...} or {0,....} by expanding
mask ? v1 : v2 to VEC_COND_EXPR <mask != 0, v1, v2> then
the expansion can do the obvious thing with a non-comparison mask
(have another md pattern for this case to handle AMD XOP vcond
or simply emit bitwise mask operations).

The gimplifier shouldn't unnecessarily pull out the comparison, but
you instructed it to - by means of wrapping it inside a SAVE_EXPR.

Richard.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-23 11:12                                                                   ` Richard Guenther
@ 2011-08-23 11:23                                                                     ` Artem Shinkarov
  2011-08-23 11:26                                                                       ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-23 11:23 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Tue, Aug 23, 2011 at 11:33 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 12:24 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 11:08 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> I'll just send you my current version. I'll be a little bit more specific.
>>>>>>
>>>>>> The problem starts when you try to lower the following expression:
>>>>>>
>>>>>> x = a > b;
>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>> vcond <x1, c, d>
>>>>>>
>>>>>> Now, you go from the beginning to the end of the block, and you cannot
>>>>>> leave a > b, because only vconds are valid expressions to expand.
>>>>>>
>>>>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>>>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>>>>> you see that you have something like:
>>>>>>
>>>>>> x' = a >b;
>>>>>> x = vcond <x', -1, 0>
>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>> vcond <x1, c, d>
>>>>>>
>>>>>> and your gsi stands at the x1 now, so the gimplification created a
>>>>>> comparison that optab would not understand. And I am not really sure
>>>>>> that you would be able to solve this problem easily.
>>>>>>
>>>>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>>>>> cant and x op y is a single tree that must be gimplified, and I am not
>>>>>> sure that you can persuade gimplifier to leave this expression
>>>>>> untouched.
>>>>>>
>>>>>> In the attachment the current version of the patch.
>>>>>
>>>>> I can't reproduce it with your patch.  For
>>>>>
>>>>> #define vector(elcount, type)  \
>>>>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>
>>>>> vector (4, float) x, y;
>>>>> vector (4, int) a,b;
>>>>> int
>>>>> main (int argc, char *argv[])
>>>>> {
>>>>>  vector (4, int) i0 = x < y;
>>>>>  vector (4, int) i1 = i0 ? a : b;
>>>>>  return 0;
>>>>> }
>>>>>
>>>>> I get from the C frontend:
>>>>>
>>>>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>>>>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>>>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>>>>> SAVE_EXPR <b> > ;
>>>>>
>>>>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>>>>
>>>> I don't put it there. This patch adds != 0, rather removing. But this
>>>> could be changed.
>>>
>>> ?
>>>
>>>>> I do see that the gimplifier pulls away the condition for the first
>>>>> VEC_COND_EXPR though:
>>>>>
>>>>>  x.0 = x;
>>>>>  y.1 = y;
>>>>>  D.2735 = x.0 < y.1;
>>>>>  D.2734 = D.2735;
>>>>>  D.2736 = D.2734;
>>>>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>>>>> { 0, 0, 0, 0 } > ;
>>>>>
>>>>> which is, I believe because of the SAVE_EXPR wrapped around the
>>>>> comparison.  Why do you bother wrapping all operands in save-exprs?
>>>>
>>>> I bother because they could be MAYBE_CONST which breaks the
>>>> gimplifier. But I don't really know if you can do it better. I can
>>>> always do this checking on operands of constructed vcond...
>>>
>>> Err, the patch does
>>>
>>> +  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
>>> +  tmp = c_fully_fold (ifexp, false, &maybe_const);
>>> +  ifexp = save_expr (tmp);
>>> +  wrap &= maybe_const;
>>>
>>> why is
>>>
>>>  ifexp = save_expr (tmp);
>>>
>>> necessary here?  SAVE_EXPR is if you need to protect side-effects
>>> from being evaluated twice if you use an operand twice.  But all
>>> operands are just used a single time.
>>
>> Again, the only reason why save_expr is there is to avoid MAYBE_CONST
>> nodes to break the gimplification. But may be it is a wrong way of
>> doing it, but it does the job.
>>
>>> And I expected, instead of
>>>
>>> +  if ((COMPARISON_CLASS_P (ifexp)
>>> +       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>>> +      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>> +    {
>>> +      tree comp_type = COMPARISON_CLASS_P (ifexp)
>>> +                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>>> +                      : TREE_TYPE (ifexp);
>>> +
>>> +      op1 = convert (comp_type, op1);
>>> +      op2 = convert (comp_type, op2);
>>> +      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>>> +      vcond = convert (TREE_TYPE (op1), vcond);
>>> +    }
>>> +  else
>>> +    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
>>>
>>>  if (!COMPARISON_CLASS_P (ifexp))
>>>    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
>>>                         build_vector_from_val (TREE_TYPE (ifexp), 0));
>>>
>>>  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>    {
>>> ...
>>>
>> Why?
>> This is a function to constuct any vcond. The result of ifexp is
>> always signed integer vector if it is a comparison, but we need to
>> make sure that all the elements of vcond have the same type.
>>
>> And I didn't really understand if we can guarantee that vector
>> comparison would not be lifted out by the gimplifier. It happens in
>> case I put this save_expr, it could possibly happen in some other
>> cases. How can we prevent that?
>
> We don't need to prevent it.  If the C frontend makes sure that the
> mask of a VEC_COND_EXPR is always {-1,...} or {0,....} by expanding
> mask ? v1 : v2 to VEC_COND_EXPR <mask != 0, v1, v2> then
> the expansion can do the obvious thing with a non-comparison mask
> (have another md pattern for this case to handle AMD XOP vcond
> or simply emit bitwise mask operations).
>
> The gimplifier shouldn't unnecessarily pull out the comparison, but
> you instructed it to - by means of wrapping it inside a SAVE_EXPR.
>
> Richard.
>

I'm confused.
There is a set of problems which are tightly connected and you address
only one one of them.

I need to do something with C_MAYBE_CONST_EXPR node to allow the
gimplification of the expression. In order to achieve that I am
wrapping expression which can contain C_MAYBE_EXPR_NODE into
SAVE_EXPR. This works fine, but, the vector condition is lifted out.
So the question is how to get rid of C_MAYBE_CONST_EXPR nodes, making
sure that the expression is still inside VEC_COND_EXPR?

All the rest is fine -- a > b is transformed to VEC_COND_EXPR of the
integer type, and when we are using it we can add != 0 to the mask, no
problem. The problem is to make sure that the vector expression is not
lifted out from the VEC_COND_EXPR and that C_MAYBE_CONST_EXPRs are
also no there at the same time.

Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-23 11:23                                                                     ` Artem Shinkarov
@ 2011-08-23 11:26                                                                       ` Richard Guenther
  2011-08-23 11:41                                                                         ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-23 11:26 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Tue, Aug 23, 2011 at 12:45 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 11:33 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 12:24 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Tue, Aug 23, 2011 at 11:08 AM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>> I'll just send you my current version. I'll be a little bit more specific.
>>>>>>>
>>>>>>> The problem starts when you try to lower the following expression:
>>>>>>>
>>>>>>> x = a > b;
>>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>>> vcond <x1, c, d>
>>>>>>>
>>>>>>> Now, you go from the beginning to the end of the block, and you cannot
>>>>>>> leave a > b, because only vconds are valid expressions to expand.
>>>>>>>
>>>>>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>>>>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>>>>>> you see that you have something like:
>>>>>>>
>>>>>>> x' = a >b;
>>>>>>> x = vcond <x', -1, 0>
>>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>>> vcond <x1, c, d>
>>>>>>>
>>>>>>> and your gsi stands at the x1 now, so the gimplification created a
>>>>>>> comparison that optab would not understand. And I am not really sure
>>>>>>> that you would be able to solve this problem easily.
>>>>>>>
>>>>>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>>>>>> cant and x op y is a single tree that must be gimplified, and I am not
>>>>>>> sure that you can persuade gimplifier to leave this expression
>>>>>>> untouched.
>>>>>>>
>>>>>>> In the attachment the current version of the patch.
>>>>>>
>>>>>> I can't reproduce it with your patch.  For
>>>>>>
>>>>>> #define vector(elcount, type)  \
>>>>>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>>
>>>>>> vector (4, float) x, y;
>>>>>> vector (4, int) a,b;
>>>>>> int
>>>>>> main (int argc, char *argv[])
>>>>>> {
>>>>>>  vector (4, int) i0 = x < y;
>>>>>>  vector (4, int) i1 = i0 ? a : b;
>>>>>>  return 0;
>>>>>> }
>>>>>>
>>>>>> I get from the C frontend:
>>>>>>
>>>>>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>>>>>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>>>>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>>>>>> SAVE_EXPR <b> > ;
>>>>>>
>>>>>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>>>>>
>>>>> I don't put it there. This patch adds != 0, rather removing. But this
>>>>> could be changed.
>>>>
>>>> ?
>>>>
>>>>>> I do see that the gimplifier pulls away the condition for the first
>>>>>> VEC_COND_EXPR though:
>>>>>>
>>>>>>  x.0 = x;
>>>>>>  y.1 = y;
>>>>>>  D.2735 = x.0 < y.1;
>>>>>>  D.2734 = D.2735;
>>>>>>  D.2736 = D.2734;
>>>>>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>>>>>> { 0, 0, 0, 0 } > ;
>>>>>>
>>>>>> which is, I believe because of the SAVE_EXPR wrapped around the
>>>>>> comparison.  Why do you bother wrapping all operands in save-exprs?
>>>>>
>>>>> I bother because they could be MAYBE_CONST which breaks the
>>>>> gimplifier. But I don't really know if you can do it better. I can
>>>>> always do this checking on operands of constructed vcond...
>>>>
>>>> Err, the patch does
>>>>
>>>> +  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
>>>> +  tmp = c_fully_fold (ifexp, false, &maybe_const);
>>>> +  ifexp = save_expr (tmp);
>>>> +  wrap &= maybe_const;
>>>>
>>>> why is
>>>>
>>>>  ifexp = save_expr (tmp);
>>>>
>>>> necessary here?  SAVE_EXPR is if you need to protect side-effects
>>>> from being evaluated twice if you use an operand twice.  But all
>>>> operands are just used a single time.
>>>
>>> Again, the only reason why save_expr is there is to avoid MAYBE_CONST
>>> nodes to break the gimplification. But may be it is a wrong way of
>>> doing it, but it does the job.
>>>
>>>> And I expected, instead of
>>>>
>>>> +  if ((COMPARISON_CLASS_P (ifexp)
>>>> +       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>>>> +      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>> +    {
>>>> +      tree comp_type = COMPARISON_CLASS_P (ifexp)
>>>> +                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>>>> +                      : TREE_TYPE (ifexp);
>>>> +
>>>> +      op1 = convert (comp_type, op1);
>>>> +      op2 = convert (comp_type, op2);
>>>> +      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>>>> +      vcond = convert (TREE_TYPE (op1), vcond);
>>>> +    }
>>>> +  else
>>>> +    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
>>>>
>>>>  if (!COMPARISON_CLASS_P (ifexp))
>>>>    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
>>>>                         build_vector_from_val (TREE_TYPE (ifexp), 0));
>>>>
>>>>  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>>    {
>>>> ...
>>>>
>>> Why?
>>> This is a function to constuct any vcond. The result of ifexp is
>>> always signed integer vector if it is a comparison, but we need to
>>> make sure that all the elements of vcond have the same type.
>>>
>>> And I didn't really understand if we can guarantee that vector
>>> comparison would not be lifted out by the gimplifier. It happens in
>>> case I put this save_expr, it could possibly happen in some other
>>> cases. How can we prevent that?
>>
>> We don't need to prevent it.  If the C frontend makes sure that the
>> mask of a VEC_COND_EXPR is always {-1,...} or {0,....} by expanding
>> mask ? v1 : v2 to VEC_COND_EXPR <mask != 0, v1, v2> then
>> the expansion can do the obvious thing with a non-comparison mask
>> (have another md pattern for this case to handle AMD XOP vcond
>> or simply emit bitwise mask operations).
>>
>> The gimplifier shouldn't unnecessarily pull out the comparison, but
>> you instructed it to - by means of wrapping it inside a SAVE_EXPR.
>>
>> Richard.
>>
>
> I'm confused.
> There is a set of problems which are tightly connected and you address
> only one one of them.
>
> I need to do something with C_MAYBE_CONST_EXPR node to allow the
> gimplification of the expression. In order to achieve that I am
> wrapping expression which can contain C_MAYBE_EXPR_NODE into
> SAVE_EXPR. This works fine, but, the vector condition is lifted out.
> So the question is how to get rid of C_MAYBE_CONST_EXPR nodes, making
> sure that the expression is still inside VEC_COND_EXPR?

I can't answer this, but no C_MAYBE_CONST_EXPR nodes may survive
until gimplification.  I thought c_fully_fold is exactly used (instead
of c_save_expr) because it _doesn't_ wrap things in C_MAYBE_CONST_EXPR
nodes.  Instead you delay that (well, commented out in your patch).

> All the rest is fine -- a > b is transformed to VEC_COND_EXPR of the
> integer type, and when we are using it we can add != 0 to the mask, no
> problem. The problem is to make sure that the vector expression is not
> lifted out from the VEC_COND_EXPR and that C_MAYBE_CONST_EXPRs are
> also no there at the same time.

Well, for example for floating-point comparisons and -fnon-call-exceptions
you _will_ get comparisons lifted out of the VEC_COND_EXPR.  But
that shouldn't be an issue because C semantics are ensured for
the mask ? v0 : v1 source form by changing it to mask != 0 ? v0 : v1 and
the VEC_COND_EXPR semantic for a non-comparison mask operand
is (v0 & mask) | (v1 & ~mask).  Which means that we have to be able to
expand mask = v0 < v1 anyway, but we'll simply expand it if it were
VEC_COND_EXPR <v0<v1, {-1,}, {0,}>.

So, I don't really see any problems for the C frontend or gimplification side.
We do have to make expansion handle more cases, but they can be all
dispatched to make use of the vcond named expander and handling
the mask ? v1 : v2 case with bitwise operations (to be optimized later
by introducing another named expander to match XOP vcond).

Richard.

> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-23 11:26                                                                       ` Richard Guenther
@ 2011-08-23 11:41                                                                         ` Artem Shinkarov
  2011-08-23 11:58                                                                           ` Artem Shinkarov
  2011-08-23 12:06                                                                           ` Richard Guenther
  0 siblings, 2 replies; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-23 11:41 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Tue, Aug 23, 2011 at 11:56 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 12:45 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 11:33 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Aug 23, 2011 at 12:24 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Tue, Aug 23, 2011 at 11:08 AM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Tue, Aug 23, 2011 at 11:44 AM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> On Tue, Aug 23, 2011 at 9:17 AM, Richard Guenther
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> On Mon, Aug 22, 2011 at 11:11 PM, Artem Shinkarov
>>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>>> I'll just send you my current version. I'll be a little bit more specific.
>>>>>>>>
>>>>>>>> The problem starts when you try to lower the following expression:
>>>>>>>>
>>>>>>>> x = a > b;
>>>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>>>> vcond <x1, c, d>
>>>>>>>>
>>>>>>>> Now, you go from the beginning to the end of the block, and you cannot
>>>>>>>> leave a > b, because only vconds are valid expressions to expand.
>>>>>>>>
>>>>>>>> Now, you meet a > b first. You try to transform it into vcond <a > b,
>>>>>>>> -1, 0>, you build this expression, then you try to gimplify it, and
>>>>>>>> you see that you have something like:
>>>>>>>>
>>>>>>>> x' = a >b;
>>>>>>>> x = vcond <x', -1, 0>
>>>>>>>> x1 = vcond <x != 0, -1, 0>
>>>>>>>> vcond <x1, c, d>
>>>>>>>>
>>>>>>>> and your gsi stands at the x1 now, so the gimplification created a
>>>>>>>> comparison that optab would not understand. And I am not really sure
>>>>>>>> that you would be able to solve this problem easily.
>>>>>>>>
>>>>>>>> It would helpr, if you could create vcond<x, op, y, op0, op1>, but you
>>>>>>>> cant and x op y is a single tree that must be gimplified, and I am not
>>>>>>>> sure that you can persuade gimplifier to leave this expression
>>>>>>>> untouched.
>>>>>>>>
>>>>>>>> In the attachment the current version of the patch.
>>>>>>>
>>>>>>> I can't reproduce it with your patch.  For
>>>>>>>
>>>>>>> #define vector(elcount, type)  \
>>>>>>>    __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>>>
>>>>>>> vector (4, float) x, y;
>>>>>>> vector (4, int) a,b;
>>>>>>> int
>>>>>>> main (int argc, char *argv[])
>>>>>>> {
>>>>>>>  vector (4, int) i0 = x < y;
>>>>>>>  vector (4, int) i1 = i0 ? a : b;
>>>>>>>  return 0;
>>>>>>> }
>>>>>>>
>>>>>>> I get from the C frontend:
>>>>>>>
>>>>>>>  vector(4) int i0 =  VEC_COND_EXPR < SAVE_EXPR <x < y> , { -1, -1,
>>>>>>> -1, -1 } , { 0, 0, 0, 0 } > ;
>>>>>>>  vector(4) int i1 =  VEC_COND_EXPR < SAVE_EXPR <i0> , SAVE_EXPR <a> ,
>>>>>>> SAVE_EXPR <b> > ;
>>>>>>>
>>>>>>> but I have expected i0 != 0 in the second VEC_COND_EXPR.
>>>>>>
>>>>>> I don't put it there. This patch adds != 0, rather removing. But this
>>>>>> could be changed.
>>>>>
>>>>> ?
>>>>>
>>>>>>> I do see that the gimplifier pulls away the condition for the first
>>>>>>> VEC_COND_EXPR though:
>>>>>>>
>>>>>>>  x.0 = x;
>>>>>>>  y.1 = y;
>>>>>>>  D.2735 = x.0 < y.1;
>>>>>>>  D.2734 = D.2735;
>>>>>>>  D.2736 = D.2734;
>>>>>>>  i0 = [vec_cond_expr]  VEC_COND_EXPR < D.2736 , { -1, -1, -1, -1 } ,
>>>>>>> { 0, 0, 0, 0 } > ;
>>>>>>>
>>>>>>> which is, I believe because of the SAVE_EXPR wrapped around the
>>>>>>> comparison.  Why do you bother wrapping all operands in save-exprs?
>>>>>>
>>>>>> I bother because they could be MAYBE_CONST which breaks the
>>>>>> gimplifier. But I don't really know if you can do it better. I can
>>>>>> always do this checking on operands of constructed vcond...
>>>>>
>>>>> Err, the patch does
>>>>>
>>>>> +  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
>>>>> +  tmp = c_fully_fold (ifexp, false, &maybe_const);
>>>>> +  ifexp = save_expr (tmp);
>>>>> +  wrap &= maybe_const;
>>>>>
>>>>> why is
>>>>>
>>>>>  ifexp = save_expr (tmp);
>>>>>
>>>>> necessary here?  SAVE_EXPR is if you need to protect side-effects
>>>>> from being evaluated twice if you use an operand twice.  But all
>>>>> operands are just used a single time.
>>>>
>>>> Again, the only reason why save_expr is there is to avoid MAYBE_CONST
>>>> nodes to break the gimplification. But may be it is a wrong way of
>>>> doing it, but it does the job.
>>>>
>>>>> And I expected, instead of
>>>>>
>>>>> +  if ((COMPARISON_CLASS_P (ifexp)
>>>>> +       && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != TREE_TYPE (op1))
>>>>> +      || TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>>> +    {
>>>>> +      tree comp_type = COMPARISON_CLASS_P (ifexp)
>>>>> +                      ? TREE_TYPE (TREE_OPERAND (ifexp, 0))
>>>>> +                      : TREE_TYPE (ifexp);
>>>>> +
>>>>> +      op1 = convert (comp_type, op1);
>>>>> +      op2 = convert (comp_type, op2);
>>>>> +      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
>>>>> +      vcond = convert (TREE_TYPE (op1), vcond);
>>>>> +    }
>>>>> +  else
>>>>> +    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
>>>>>
>>>>>  if (!COMPARISON_CLASS_P (ifexp))
>>>>>    ifexp = build2 (NE_EXPR, TREE_TYPE (ifexp), ifexp,
>>>>>                         build_vector_from_val (TREE_TYPE (ifexp), 0));
>>>>>
>>>>>  if (TREE_TYPE (ifexp) != TREE_TYPE (op1))
>>>>>    {
>>>>> ...
>>>>>
>>>> Why?
>>>> This is a function to constuct any vcond. The result of ifexp is
>>>> always signed integer vector if it is a comparison, but we need to
>>>> make sure that all the elements of vcond have the same type.
>>>>
>>>> And I didn't really understand if we can guarantee that vector
>>>> comparison would not be lifted out by the gimplifier. It happens in
>>>> case I put this save_expr, it could possibly happen in some other
>>>> cases. How can we prevent that?
>>>
>>> We don't need to prevent it.  If the C frontend makes sure that the
>>> mask of a VEC_COND_EXPR is always {-1,...} or {0,....} by expanding
>>> mask ? v1 : v2 to VEC_COND_EXPR <mask != 0, v1, v2> then
>>> the expansion can do the obvious thing with a non-comparison mask
>>> (have another md pattern for this case to handle AMD XOP vcond
>>> or simply emit bitwise mask operations).
>>>
>>> The gimplifier shouldn't unnecessarily pull out the comparison, but
>>> you instructed it to - by means of wrapping it inside a SAVE_EXPR.
>>>
>>> Richard.
>>>
>>
>> I'm confused.
>> There is a set of problems which are tightly connected and you address
>> only one one of them.
>>
>> I need to do something with C_MAYBE_CONST_EXPR node to allow the
>> gimplification of the expression. In order to achieve that I am
>> wrapping expression which can contain C_MAYBE_EXPR_NODE into
>> SAVE_EXPR. This works fine, but, the vector condition is lifted out.
>> So the question is how to get rid of C_MAYBE_CONST_EXPR nodes, making
>> sure that the expression is still inside VEC_COND_EXPR?
>
> I can't answer this, but no C_MAYBE_CONST_EXPR nodes may survive
> until gimplification.  I thought c_fully_fold is exactly used (instead
> of c_save_expr) because it _doesn't_ wrap things in C_MAYBE_CONST_EXPR
> nodes.  Instead you delay that (well, commented out in your patch).

Ok. So for the time being save_expr is the only way that we know to
avoid C_MAYBE_CONST_EXPR nodes.

>> All the rest is fine -- a > b is transformed to VEC_COND_EXPR of the
>> integer type, and when we are using it we can add != 0 to the mask, no
>> problem. The problem is to make sure that the vector expression is not
>> lifted out from the VEC_COND_EXPR and that C_MAYBE_CONST_EXPRs are
>> also no there at the same time.
>
> Well, for example for floating-point comparisons and -fnon-call-exceptions
> you _will_ get comparisons lifted out of the VEC_COND_EXPR.  But
> that shouldn't be an issue because C semantics are ensured for
> the mask ? v0 : v1 source form by changing it to mask != 0 ? v0 : v1 and
> the VEC_COND_EXPR semantic for a non-comparison mask operand
> is (v0 & mask) | (v1 & ~mask).  Which means that we have to be able to
> expand mask = v0 < v1 anyway, but we'll simply expand it if it were
> VEC_COND_EXPR <v0<v1, {-1,}, {0,}>.

Richard, I think you almost get it, but there is a tiny thing you have missed.
Look, let's assume, that by some reason when we gimplified a > b, the
comparison was lifted out. So we have the following situation:

D.1 = a > b;
comp = vcond<D.1, v0, v1>
...

Ok?
Now, I fully agree that we want to treat lifted a > b as VCOND. Now,
what I am doing in the veclower is when I meet vector comparison a >
b, I wrap it in the VCOND, otherwise it would not be recognized by
optabs. literally I am doing:

rhs = gimplify_build3 (gsi, VEC_COND_EXPR, a, b, {-1}, {0}>

And here is a devil hidden. By some reason, when this expression is
gimplified, a > b is lifted again and is left outside the
VEC_COND_EXPR, and that is the problem I am trying to fight with. Have
any ideas what could be done here?


Artem.
> So, I don't really see any problems for the C frontend or gimplification side.
> We do have to make expansion handle more cases, but they can be all
> dispatched to make use of the vcond named expander and handling
> the mask ? v1 : v2 case with bitwise operations (to be optimized later
> by introducing another named expander to match XOP vcond).
>
> Richard.
>
>> Artem.
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-23 11:41                                                                         ` Artem Shinkarov
@ 2011-08-23 11:58                                                                           ` Artem Shinkarov
  2011-08-23 12:06                                                                           ` Richard Guenther
  1 sibling, 0 replies; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-23 11:58 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

Sorry, not
rhs = gimplify_build3 (gsi, VEC_COND_EXPR, a, b, {-1}, {0}>

but rather

rhs = gimplify_build3 (gsi, VEC_COND_EXPR, build2 (GT_EXPR, type, a,
b), {-1}, {0}>


Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-23 11:41                                                                         ` Artem Shinkarov
  2011-08-23 11:58                                                                           ` Artem Shinkarov
@ 2011-08-23 12:06                                                                           ` Richard Guenther
  2011-08-23 12:37                                                                             ` Artem Shinkarov
  1 sibling, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-23 12:06 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Tue, Aug 23, 2011 at 1:11 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 11:56 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 12:45 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> I'm confused.
>>> There is a set of problems which are tightly connected and you address
>>> only one one of them.
>>>
>>> I need to do something with C_MAYBE_CONST_EXPR node to allow the
>>> gimplification of the expression. In order to achieve that I am
>>> wrapping expression which can contain C_MAYBE_EXPR_NODE into
>>> SAVE_EXPR. This works fine, but, the vector condition is lifted out.
>>> So the question is how to get rid of C_MAYBE_CONST_EXPR nodes, making
>>> sure that the expression is still inside VEC_COND_EXPR?
>>
>> I can't answer this, but no C_MAYBE_CONST_EXPR nodes may survive
>> until gimplification.  I thought c_fully_fold is exactly used (instead
>> of c_save_expr) because it _doesn't_ wrap things in C_MAYBE_CONST_EXPR
>> nodes.  Instead you delay that (well, commented out in your patch).
>
> Ok. So for the time being save_expr is the only way that we know to
> avoid C_MAYBE_CONST_EXPR nodes.
>
>>> All the rest is fine -- a > b is transformed to VEC_COND_EXPR of the
>>> integer type, and when we are using it we can add != 0 to the mask, no
>>> problem. The problem is to make sure that the vector expression is not
>>> lifted out from the VEC_COND_EXPR and that C_MAYBE_CONST_EXPRs are
>>> also no there at the same time.
>>
>> Well, for example for floating-point comparisons and -fnon-call-exceptions
>> you _will_ get comparisons lifted out of the VEC_COND_EXPR.  But
>> that shouldn't be an issue because C semantics are ensured for
>> the mask ? v0 : v1 source form by changing it to mask != 0 ? v0 : v1 and
>> the VEC_COND_EXPR semantic for a non-comparison mask operand
>> is (v0 & mask) | (v1 & ~mask).  Which means that we have to be able to
>> expand mask = v0 < v1 anyway, but we'll simply expand it if it were
>> VEC_COND_EXPR <v0<v1, {-1,}, {0,}>.
>
> Richard, I think you almost get it, but there is a tiny thing you have missed.
> Look, let's assume, that by some reason when we gimplified a > b, the
> comparison was lifted out. So we have the following situation:
>
> D.1 = a > b;
> comp = vcond<D.1, v0, v1>
> ...
>
> Ok?
> Now, I fully agree that we want to treat lifted a > b as VCOND. Now,
> what I am doing in the veclower is when I meet vector comparison a >
> b, I wrap it in the VCOND, otherwise it would not be recognized by
> optabs. literally I am doing:
>
> rhs = gimplify_build3 (gsi, VEC_COND_EXPR, a, b, {-1}, {0}>
>
> And here is a devil hidden. By some reason, when this expression is
> gimplified, a > b is lifted again and is left outside the
> VEC_COND_EXPR, and that is the problem I am trying to fight with. Have
> any ideas what could be done here?

Well, don't do it.  Check if the target can expand

 D.1 = a > b;

via feeding it vcond <a < b, {-1,...}, {0,...} > and if not, expand it piecewise
in veclower.  If it can handle it - leave it alone!

In expand_expr_real_2 add to the EQ_EXPR (etc.) case the case
of a vector-typed comparison and use the vcond optab for it, again
via vcond <a < b, {-1,...}, {0,...} >.  If you look at the EQ_EXPR case
it dispatches to do_store_flag - that's the best place to handle
vector-typed compares.

Richard.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-23 12:06                                                                           ` Richard Guenther
@ 2011-08-23 12:37                                                                             ` Artem Shinkarov
  2011-08-25  9:22                                                                               ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-23 12:37 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Tue, Aug 23, 2011 at 12:23 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 23, 2011 at 1:11 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Tue, Aug 23, 2011 at 11:56 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Aug 23, 2011 at 12:45 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> I'm confused.
>>>> There is a set of problems which are tightly connected and you address
>>>> only one one of them.
>>>>
>>>> I need to do something with C_MAYBE_CONST_EXPR node to allow the
>>>> gimplification of the expression. In order to achieve that I am
>>>> wrapping expression which can contain C_MAYBE_EXPR_NODE into
>>>> SAVE_EXPR. This works fine, but, the vector condition is lifted out.
>>>> So the question is how to get rid of C_MAYBE_CONST_EXPR nodes, making
>>>> sure that the expression is still inside VEC_COND_EXPR?
>>>
>>> I can't answer this, but no C_MAYBE_CONST_EXPR nodes may survive
>>> until gimplification.  I thought c_fully_fold is exactly used (instead
>>> of c_save_expr) because it _doesn't_ wrap things in C_MAYBE_CONST_EXPR
>>> nodes.  Instead you delay that (well, commented out in your patch).
>>
>> Ok. So for the time being save_expr is the only way that we know to
>> avoid C_MAYBE_CONST_EXPR nodes.
>>
>>>> All the rest is fine -- a > b is transformed to VEC_COND_EXPR of the
>>>> integer type, and when we are using it we can add != 0 to the mask, no
>>>> problem. The problem is to make sure that the vector expression is not
>>>> lifted out from the VEC_COND_EXPR and that C_MAYBE_CONST_EXPRs are
>>>> also no there at the same time.
>>>
>>> Well, for example for floating-point comparisons and -fnon-call-exceptions
>>> you _will_ get comparisons lifted out of the VEC_COND_EXPR.  But
>>> that shouldn't be an issue because C semantics are ensured for
>>> the mask ? v0 : v1 source form by changing it to mask != 0 ? v0 : v1 and
>>> the VEC_COND_EXPR semantic for a non-comparison mask operand
>>> is (v0 & mask) | (v1 & ~mask).  Which means that we have to be able to
>>> expand mask = v0 < v1 anyway, but we'll simply expand it if it were
>>> VEC_COND_EXPR <v0<v1, {-1,}, {0,}>.
>>
>> Richard, I think you almost get it, but there is a tiny thing you have missed.
>> Look, let's assume, that by some reason when we gimplified a > b, the
>> comparison was lifted out. So we have the following situation:
>>
>> D.1 = a > b;
>> comp = vcond<D.1, v0, v1>
>> ...
>>
>> Ok?
>> Now, I fully agree that we want to treat lifted a > b as VCOND. Now,
>> what I am doing in the veclower is when I meet vector comparison a >
>> b, I wrap it in the VCOND, otherwise it would not be recognized by
>> optabs. literally I am doing:
>>
>> rhs = gimplify_build3 (gsi, VEC_COND_EXPR, a, b, {-1}, {0}>
>>
>> And here is a devil hidden. By some reason, when this expression is
>> gimplified, a > b is lifted again and is left outside the
>> VEC_COND_EXPR, and that is the problem I am trying to fight with. Have
>> any ideas what could be done here?
>
> Well, don't do it.  Check if the target can expand
>
>  D.1 = a > b;
>
> via feeding it vcond <a < b, {-1,...}, {0,...} > and if not, expand it piecewise
> in veclower.  If it can handle it - leave it alone!
>
> In expand_expr_real_2 add to the EQ_EXPR (etc.) case the case
> of a vector-typed comparison and use the vcond optab for it, again
> via vcond <a < b, {-1,...}, {0,...} >.  If you look at the EQ_EXPR case
> it dispatches to do_store_flag - that's the best place to handle
> vector-typed compares.
>
> Richard.
>
That sounds like a plan. I'll investigate if it can be done.
Also, if we can handle a > b, then we don't need to construct vcond<a
> b, {-1}, {0}>, we will know that it would be constructed correctly
when expanding.


Thanks for your help,
Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-23 12:37                                                                             ` Artem Shinkarov
@ 2011-08-25  9:22                                                                               ` Artem Shinkarov
  2011-08-25  9:58                                                                                 ` Richard Guenther
  2011-08-25 11:02                                                                                 ` Richard Guenther
  0 siblings, 2 replies; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-25  9:22 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 949 bytes --]

Here is a cleaned-up patch without the hook. Mostly it works in a way
we discussed.

So I think it is a right time to do something about vcond patterns,
which would allow me to get rid of conversions that I need to put all
over the code.

Also at the moment the patch breaks lto frontend with a simple example:
#define vector(elcount, type)  \
__attribute__((vector_size((elcount)*sizeof(type)))) type

int main (int argc, char *argv[]) {
    vector (4, float) f0;
    vector (4, float) f1;

    f0 =  f1 != f0
          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};

    return (int)f0[argc];
}

test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244

I looked into the file, the conversion function is defined as
gcc_unreachable (). I am not very familiar with lto, so I don't really
know what is the right way to treat the conversions.

And I seriously need help with backend patterns.


Thanks,
Artem.

[-- Attachment #2: vector-compare-vcond-6.diff --]
[-- Type: text/plain, Size: 51730 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 177665)
+++ gcc/doc/extend.texi	(working copy)
@@ -6553,6 +6553,97 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+In C vector comparison is supported within standard comparison operators:
+@code{==, !=, <, <=, >, >=}. Both integer-type and real-type vectors
+can be compared but only of the same type. The result of the
+comparison is a signed integer-type vector where the size of each
+element must be the same as the size of compared vectors element.
+Comparison is happening element by element. False value is 0, true
+value is -1 (constant of the appropriate type where all bits are set).
+Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
+c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
+@end smallexample
+
+In addition to the vector comparison C supports conditional expressions
+where the condition is a vector of signed integers. In that case result
+of the condition is used as a mask to select either from the first 
+operand or from the second. Consider the following example:
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,7@};
+v4si c = @{2,3,4,5@};
+v4si d = @{6,7,8,9@};
+v4si res;
+
+res = a >= b ? c : d;  /* res would contain @{6, 3, 4, 9@}  */
+@end smallexample
+
+The number of elements in the condition must be the same as number of
+elements in the both operands. The same stands for the size of the type
+of the elements. The type of the vector conditional is determined by
+the types of the operands which must be the same. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+typedef float v4f __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{2,3,4,5@};
+v4f f = @{1.,  5., 7., -8.@};
+v4f g = @{3., -2., 8.,  1.@};
+v4si ires;
+v4f fres;
+
+fres = a <= b ? f : g;  /* fres would contain @{1., 5., 7., -8.@}  */
+ires = f <= g ? a : b;  /* fres would contain @{1,  3,  3,   4@}  */
+@end smallexample
+
+For the convenience condition in the vector conditional can be just a
+vector of signed integer type. In that case this vector is implicitly
+compared with vectors of zeroes. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+
+ires = a ? b : a;  /* synonym for ires = a != @{0,0,0,0@} ? a :b;  */
+@end smallexample
+
+Pleas note that the conditional where the operands are vectors and the
+condition is integer works in a standard way -- returns first operand
+if the condition is true and second otherwise. Consider an example:
+
+@smallexample
+typedef int  v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,0,3,0@};
+v4si b = @{2,3,4,5@};
+v4si ires;
+int x,y;
+
+/* standard conditional returning A or B  */
+ires = x > y ? a : b;  
+
+/* vector conditional where the condition is (x > y ? a : b)  */
+ires = (x > y ? a : b) ? b : a; 
+@end smallexample
+
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	(revision 177665)
+++ gcc/targhooks.h	(working copy)
@@ -86,6 +86,7 @@ extern int default_builtin_vectorization
 extern tree default_builtin_reciprocal (unsigned int, bool, bool);
 
 extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
+
 extern bool
 default_builtin_support_vector_misalignment (enum machine_mode mode,
 					     const_tree,
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 177665)
+++ gcc/optabs.c	(working copy)
@@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (op0, unsignedp, icode);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
+  
+  if (COMPARISON_CLASS_P (op0))
+    {
+      comparison = vector_compare_rtx (op0, unsignedp, icode);
+      create_output_operand (&ops[0], target, mode);
+      create_input_operand (&ops[1], rtx_op1, mode);
+      create_input_operand (&ops[2], rtx_op2, mode);
+      create_fixed_operand (&ops[3], comparison);
+      create_fixed_operand (&ops[4], XEXP (comparison, 0));
+      create_fixed_operand (&ops[5], XEXP (comparison, 1));
+    }
+  else
+    {
+      rtx rtx_op0;
+      rtx vec; 
+    
+      rtx_op0 = expand_normal (op0);
+      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX); 
+      vec = CONST0_RTX (mode);
+
+      create_output_operand (&ops[0], target, mode);
+      create_input_operand (&ops[1], rtx_op1, mode);
+      create_input_operand (&ops[2], rtx_op2, mode);
+      create_input_operand (&ops[3], comparison, mode);
+      create_input_operand (&ops[4], rtx_op0, mode);
+      create_input_operand (&ops[5], vec, mode);
+    }
 
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_fixed_operand (&ops[3], comparison);
-  create_fixed_operand (&ops[4], XEXP (comparison, 0));
-  create_fixed_operand (&ops[5], XEXP (comparison, 1));
   expand_insn (icode, 6, ops);
   return ops[0].value;
 }
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 177665)
+++ gcc/target.h	(working copy)
@@ -51,6 +51,7 @@
 #define GCC_TARGET_H
 
 #include "insn-modes.h"
+#include "gimple.h"
 
 #ifdef ENABLE_CHECKING
 
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 177665)
+++ gcc/fold-const.c	(working copy)
@@ -5930,12 +5930,21 @@ extract_muldiv_1 (tree t, tree c, enum t
 }
 \f
 /* Return a node which has the indicated constant VALUE (either 0 or
-   1), and is of the indicated TYPE.  */
+   1 for scalars and is either {-1,-1,..} or {0,0,...} for vectors), 
+   and is of the indicated TYPE.  */
 
 tree
 constant_boolean_node (int value, tree type)
 {
-  if (type == integer_type_node)
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      tree tval;
+      
+      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
+      tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0);
+      return build_vector_from_val (type, tval);
+    }
+  else if (type == integer_type_node)
     return value ? integer_one_node : integer_zero_node;
   else if (type == boolean_type_node)
     return value ? boolean_true_node : boolean_false_node;
@@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
      floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
     {
+      tree arg0_type = TREE_TYPE (arg0);
+      
       switch (code)
 	{
 	case EQ_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	  if (! FLOAT_TYPE_P (arg0_type)
+	      || ! HONOR_NANS (TYPE_MODE (arg0_type)))
 	    return constant_boolean_node (1, type);
 	  break;
 
 	case GE_EXPR:
 	case LE_EXPR:
-	  if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	  if (! FLOAT_TYPE_P (arg0_type)
+	      || ! HONOR_NANS (TYPE_MODE (arg0_type)))
 	    return constant_boolean_node (1, type);
 	  return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1);
 
 	case NE_EXPR:
 	  /* For NE, we can only do this simplification if integer
 	     or we don't honor IEEE floating point NaNs.  */
-	  if (FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      && HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+	  if (FLOAT_TYPE_P (arg0_type)
+	      && HONOR_NANS (TYPE_MODE (arg0_type)))
 	    break;
 	  /* ... fall through ...  */
 	case GT_EXPR:
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-2.c	(revision 0)
@@ -0,0 +1,78 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(count, res, i0, i1, c0, c1, op, fmt0, fmt1) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if ((res)[__i] != \
+                ((i0)[__i] op (i1)[__i]  \
+		? (c0)[__i] : (c1)[__i]))  \
+	{ \
+            __builtin_printf (fmt0 " != (" fmt1 " " #op " " fmt1 " ? " \
+			      fmt0 " : " fmt0 ")", \
+	    (res)[__i], (i0)[__i], (i1)[__i],\
+	    (c0)[__i], (c1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, c0, c1, res, fmt0, fmt1); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >, fmt0, fmt1); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, >=, fmt0, fmt1); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <, fmt0, fmt1); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, <=, fmt0, fmt1); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, ==, fmt0, fmt1); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (count, res, v0, v1, c0, c1, !=, fmt0, fmt1); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+  vector (4, int) i0 = {argc, 1,  2,  10}; 
+  vector (4, int) i1 = {0, argc, 2, (int)-23};
+  vector (4, int) ires;
+  vector (4, float) f0 = {1., 7., (float)argc, 4.};
+  vector (4, float) f1 = {6., 2., 8., (float)argc};
+  vector (4, float) fres;
+
+  vector (2, double) d0 = {1., (double)argc};
+  vector (2, double) d1 = {6., 2.};
+  vector (2, double) dres;
+  vector (2, long) l0 = {argc, 3};
+  vector (2, long) l1 = {5,  8};
+  vector (2, long) lres;
+  
+  /* Thes tests work fine.  */
+  test (4, i0, i1, f0, f1, fres, "%f", "%i");
+  test (4, f0, f1, i0, i1, ires, "%i", "%f");
+  test (2, d0, d1, l0, l1, lres, "%i", "%f");
+  test (2, l0, l1, d0, d1, dres, "%f", "%i");
+
+  /* Condition expressed with a single variable.  */
+  dres = l0 ? d0 : d1;
+  check_compare (2, dres, l0, ((vector (2, long)){0, 0}), d0, d1, !=, "%f", "%i");
+  
+  lres = l1 ? l0 : l1;
+  check_compare (2, lres, l1, ((vector (2, long)){0, 0}), l0, l1, !=, "%i", "%i");
+ 
+  fres = i0 ? f0 : f1;
+  check_compare (4, fres, i0, ((vector (4, int)){0, 0, 0, 0}), 
+		 f0, f1, !=, "%f", "%i");
+
+  ires = i1 ? i0 : i1;
+  check_compare (4, ires, i1, ((vector (4, int)){0, 0, 0, 0}), 
+		 i0, i1, !=, "%i", "%i");
+
+  return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
@@ -0,0 +1,123 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define check_compare(count, res, i0, i1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+      if ((res)[__i] != ((i0)[__i] op (i1)[__i] ? -1 : 0)) \
+	{ \
+            __builtin_printf ("%i != ((" fmt " " #op " " fmt " ? -1 : 0) ", \
+			      (res)[__i], (i0)[__i], (i1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, res, fmt); \
+do { \
+    res = (v0 > v1); \
+    check_compare (count, res, v0, v1, >, fmt); \
+    res = (v0 < v1); \
+    check_compare (count, res, v0, v1, <, fmt); \
+    res = (v0 >= v1); \
+    check_compare (count, res, v0, v1, >=, fmt); \
+    res = (v0 <= v1); \
+    check_compare (count, res, v0, v1, <=, fmt); \
+    res = (v0 == v1); \
+    check_compare (count, res, v0, v1, ==, fmt); \
+    res = (v0 != v1); \
+    check_compare (count, res, v0, v1, !=, fmt); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0;
+    vector (4, INT) i1;
+    vector (4, int) ires;
+    int i;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, i0, i1, ires, "%i");
+#undef INT
+
+#define INT unsigned int 
+    vector (4, int) ures;
+    vector (4, INT) u0;
+    vector (4, INT) u1;
+
+    u0 = (vector (4, INT)){argc, 1,  2,  10};
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, u0, u1, ures, "%u");
+#undef INT
+
+
+#define SHORT short
+    vector (8, SHORT) s0;
+    vector (8, SHORT) s1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, s0, s1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;
+    vector (8, SHORT) us1;
+    vector (8, short) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, us0, us1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;
+    vector (16, CHAR) c1;
+    vector (16, signed char) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, c0, c1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;
+    vector (16, CHAR) uc1;
+    vector (16, signed char) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, uc0, uc1, ucres, "%u");
+#undef CHAR
+/* Float comparison.  */
+    vector (4, float) f0;
+    vector (4, float) f1;
+    vector (4, int) ifres;
+
+    f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
+    f1 = (vector (4, float)){0., 3., 2., (float)-23};    
+    test (4, f0, f1, ifres, "%f");
+    
+/* Double comparison.  */
+    vector (2, double) d0;
+    vector (2, double) d1;
+    vector (2, long) idres;
+
+    d0 = (vector (2, double)){(double)argc,  10.};
+    d1 = (vector (2, double)){0., (double)-23};    
+    test (2, d0, d1, idres, "%f");
+
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-vcond-1.c	(revision 0)
@@ -0,0 +1,154 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(type, count, res, i0, i1, c0, c1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if (vidx (type, res, __i) != \
+                ((vidx (type, i0, __i) op vidx (type, i1, __i))  \
+		? vidx (type, c0, __i) : vidx (type, c1, __i)))  \
+	{ \
+            __builtin_printf (fmt " != ((" fmt " " #op " " fmt ") ? " fmt " : " fmt ")", \
+	    vidx (type, res, __i), vidx (type, i0, __i), vidx (type, i1, __i),\
+	    vidx (type, c0, __i), vidx (type, c1, __i)); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(type, count, v0, v1, c0, c1, res, fmt); \
+do { \
+    res = (v0 > v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >, fmt); \
+    res = (v0 >= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, >=, fmt); \
+    res = (v0 < v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <, fmt); \
+    res = (v0 <= v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, <=, fmt); \
+    res = (v0 == v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, ==, fmt); \
+    res = (v0 != v1) ? c0: c1; \
+    check_compare (type, count, res, v0, v1, c0, c1, !=, fmt); \
+} while (0)
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0; vector (4, INT) i1;
+    vector (4, INT) ic0; vector (4, INT) ic1;
+    vector (4, INT) ires;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    ic0 = (vector (4, INT)){1, argc,  argc,  10};
+    ic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, i0, i1, ic0, ic1, ires, "%i");
+#undef INT
+
+#define INT  unsigned int
+    vector (4, INT) ui0; vector (4, INT) ui1;
+    vector (4, INT) uic0; vector (4, INT) uic1;
+    vector (4, INT) uires;
+
+    ui0 = (vector (4, INT)){argc, 1,  2,  10};
+    ui1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    
+    uic0 = (vector (4, INT)){1, argc,  argc,  10};
+    uic1 = (vector (4, INT)){2, 3, argc, (INT)-23};    
+    test (INT, 4, ui0, ui1, uic0, uic1, uires, "%u");
+#undef INT
+
+#define SHORT short
+    vector (8, SHORT) s0;   vector (8, SHORT) s1;
+    vector (8, SHORT) sc0;   vector (8, SHORT) sc1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    sc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    sc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, s0, s1, sc0, sc1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;   vector (8, SHORT) us1;
+    vector (8, SHORT) usc0;   vector (8, SHORT) usc1;
+    vector (8, SHORT) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
+    
+    usc0 = (vector (8, SHORT)){argc, 1,  argc,  10,  6, 87, (SHORT)-5, argc};
+    usc1= (vector (8, SHORT)){0, 5, 2, (SHORT)-23, 2, 10, (SHORT)-2, argc};
+
+    test (SHORT, 8, us0, us1, usc0, usc1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;   vector (16, CHAR) c1;
+    vector (16, CHAR) cc0;   vector (16, CHAR) cc1;
+    vector (16, CHAR) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    cc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    cc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, c0, c1, cc0, cc1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;   vector (16, CHAR) uc1;
+    vector (16, CHAR) ucc0;   vector (16, CHAR) ucc1;
+    vector (16, CHAR) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  4,  7, 87, (CHAR)-5, 2, \
+                             argc, 1,  3,  18,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-5, 28, 10, (CHAR)-2, 0};
+    
+    ucc0 = (vector (16, CHAR)){argc, 1,  argc,  4,  7, 87, (CHAR)-23, 2, \
+                             33, 8,  3,  18,  6, 87, (CHAR)-5, 41 };
+
+    ucc1 = (vector (16, CHAR)){0, 27, 2, (CHAR)-1, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 0x23, (CHAR)-5, 28, 1, (CHAR)-2, 0};
+
+    test (CHAR, 16, uc0, uc1, ucc0, ucc1, ucres, "%u");
+#undef CHAR
+
+/* Float version.  */
+   vector (4, float) f0 = {1., 7., (float)argc, 4.};
+   vector (4, float) f1 = {6., 2., 8., (float)argc};
+   vector (4, float) fc0 = {3., 12., 4., (float)argc};
+   vector (4, float) fc1 = {7., 5., (float)argc, 6.};
+   vector (4, float) fres;
+
+   test (float, 4, f0, f1, fc0, fc1, fres, "%f");
+
+/* Double version.  */
+   vector (2, double) d0 = {1., (double)argc};
+   vector (2, double) d1 = {6., 2.};
+   vector (2, double) dc0 = {(double)argc, 7.};
+   vector (2, double) dc1 = {7., 5.};
+   vector (2, double) dres;
+
+   //test (double, 2, d0, d1, dc0, dc1, dres, "%f");
+
+
+   return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+/* Check that constant folding in 
+   these simple cases works.  */
+vector (4, int)
+foo (vector (4, int) x)
+{
+  return   (x == x) + (x != x) + (x >  x) 
+	 + (x <  x) + (x >= x) + (x <= x);
+}
+
+int 
+main (int argc, char *argv[])
+{
+  vector (4, int) t = {argc, 2, argc, 42};
+  vector (4, int) r;
+  int i;
+
+  r = foo (t);
+
+  for (i = 0; i < 4; i++)
+    if (r[i] != -3)
+      __builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+void
+foo (vector (4, int) x, vector (4, float) y)
+{
+  vector (4, int) p4;
+  vector (4, int) r4;
+  vector (4, unsigned int) q4;
+  vector (8, int) r8;
+  vector (4, float) f4;
+  
+  r4 = x > y;	    /* { dg-error "comparing vectors with different element types" } */
+  r8 = (x != p4);   /* { dg-error "incompatible types when assigning to type" } */
+  r8 == r4;	    /* { dg-error "comparing vectors with different number of elements" } */
+
+  r4 ? y : p4;	    /* { dg-error "vectors of different types involved in vector comparison" } */
+  r4 ? r4 : r8;	    /* { dg-error "vectors of different length found in vector comparison" } */
+  y ? f4 : y;	    /* { dg-error "non-integer type in vector condition" } */
+  
+  /* Do not trigger that  */
+  q4 ? p4 : r4;	    /* { "vector comparison must be of signed integer vector type" } */
+}
Index: gcc/testsuite/gcc.dg/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */   
+
+/* Test if C_MAYBE_CONST are folded correctly when 
+   creating VEC_COND_EXPR.  */
+
+typedef int vec __attribute__((vector_size(16)));
+
+vec i,j;
+extern vec a, b, c;
+
+vec 
+foo (int x)
+{
+  return (x ? i : j) ? a : b;
+}
+
+vec 
+bar (int x)
+{
+  return a ? (x ? i : j) : b;
+}
+
+vec 
+baz (int x)
+{
+  return a ? b : (x ? i : j);
+}
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 177665)
+++ gcc/expr.c	(working copy)
@@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
     case UNGE_EXPR:
     case UNEQ_EXPR:
     case LTGT_EXPR:
+      if (TREE_CODE (ops->type) == VECTOR_TYPE)
+	{
+	  enum tree_code code = ops->code;
+	  tree arg0 = ops->op0;
+	  tree arg1 = ops->op1;
+	  tree arg_type = TREE_TYPE (arg0);
+	  tree el_type = TREE_TYPE (arg_type);
+	  tree t, ifexp, if_true, if_false;
+	  
+	  el_type = lang_hooks.types.type_for_size (TYPE_PRECISION (el_type), 0);
+
+	  ifexp = build2 (code, type, arg0, arg1);
+	  if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
+	  if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
+	  
+	  if (arg_type != type)
+	    {
+	      if_true = convert (arg_type, if_true);
+	      if_false = convert (arg_type, if_false);
+	      t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
+	      t = convert (type, t);
+	    }
+	  else
+	    t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
+            
+	  return expand_expr (t,
+			      modifier != EXPAND_STACK_PARM ? target : NULL_RTX, 
+			      tmode != VOIDmode ? tmode : mode, 
+			      modifier);
+	}
+
       temp = do_store_flag (ops,
 			    modifier != EXPAND_STACK_PARM ? target : NULL_RTX,
 			    tmode != VOIDmode ? tmode : mode);
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 177665)
+++ gcc/c-typeck.c	(working copy)
@@ -4009,6 +4009,66 @@ ep_convert_and_check (tree type, tree ex
   return convert (type, expr);
 }
 
+static tree
+fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
+{
+  bool wrap = true;
+  bool maybe_const = false;
+  bool need_convert = false;
+  tree vcond, tmp;
+  tree res_type = TREE_TYPE (op1);
+
+  if (! COMPARISON_CLASS_P (ifexp))
+    {
+      tree intt, rt;
+
+      intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (res_type)),0);
+      rt = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (res_type));
+
+      ifexp = build2 (NE_EXPR, rt, ifexp, 
+		      build_vector_from_val (rt, build_int_cst (intt, 0)));
+    }
+  
+  /* Currently the expansion of VEC_COND_EXPR does not allow
+     expessions where the type of vectors you compare differs
+     form the type of vectors you select from. For the time
+     being we insert implicit conversions.  */
+  if (TREE_TYPE (ifexp) != res_type)
+    {
+      tree comp_type = TREE_TYPE (ifexp);
+      
+      op1 = convert (comp_type, op1);
+      op2 = convert (comp_type, op2);
+      vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2);
+      need_convert = true;
+    }
+  else
+    vcond = build3 (VEC_COND_EXPR, TREE_TYPE (op1), ifexp, op1, op2);
+
+  
+  /* Avoid C_MAYBE_CONST in VEC_COND_EXPR.  */
+  
+  tmp = c_fully_fold (TREE_OPERAND (vcond, 0), false, &maybe_const);
+  TREE_OPERAND (vcond, 0) = save_expr (tmp);
+  wrap &= maybe_const;
+  
+  tmp = c_fully_fold (TREE_OPERAND (vcond, 1), false, &maybe_const);
+  TREE_OPERAND (vcond, 1) = save_expr (tmp);
+  wrap &= maybe_const;
+
+  tmp = c_fully_fold (TREE_OPERAND (vcond, 2), false, &maybe_const);
+  TREE_OPERAND (vcond, 2) = save_expr (tmp);
+  wrap &= maybe_const;
+
+  if (!wrap)
+    vcond = c_wrap_maybe_const (vcond, true);
+  
+  if (need_convert)
+    vcond = convert (res_type, vcond);
+
+  return vcond;
+}
+
 /* Build and return a conditional expression IFEXP ? OP1 : OP2.  If
    IFEXP_BCP then the condition is a call to __builtin_constant_p, and
    if folded to an integer constant then the unselected half may
@@ -4058,6 +4118,49 @@ build_conditional_expr (location_t colon
   type2 = TREE_TYPE (op2);
   code2 = TREE_CODE (type2);
 
+  if (TREE_CODE (TREE_TYPE (ifexp)) == VECTOR_TYPE)
+    {
+      if (TREE_CODE (type1) != VECTOR_TYPE
+	  || TREE_CODE (type2) != VECTOR_TYPE)
+        {
+          error_at (colon_loc, "vector comparison arguments must be of "
+                               "type vector");
+          return error_mark_node;
+        }
+
+      if (TREE_CODE (TREE_TYPE (TREE_TYPE (ifexp))) != INTEGER_TYPE)
+        {
+          error_at (colon_loc, "non-integer type in vector condition");
+          return error_mark_node;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2)
+          || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp))
+             != TYPE_VECTOR_SUBPARTS (type1))
+        {
+          error_at (colon_loc, "vectors of different length found in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+      
+      if (TREE_TYPE (type1) != TREE_TYPE (type2))
+        {
+          error_at (colon_loc, "vectors of different types involved in "
+                               "vector comparison");
+          return error_mark_node;
+        }
+
+      if (TYPE_SIZE (TREE_TYPE (TREE_TYPE (ifexp))) 
+          != TYPE_SIZE (TREE_TYPE (type1)))
+        {
+          error_at (colon_loc, "vector-condition element type must be "
+                               "the same as result vector element type");
+          return error_mark_node;
+        }
+      
+      return fold_build_vec_cond_expr (ifexp, op1, op2);
+    }
+
   /* C90 does not permit non-lvalue arrays in conditional expressions.
      In C99 they will be pointers by now.  */
   if (code1 == ARRAY_TYPE || code2 == ARRAY_TYPE)
@@ -9906,6 +10009,29 @@ build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -10018,6 +10144,29 @@ build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10425,6 +10574,10 @@ c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 177665)
+++ gcc/gimplify.c	(working copy)
@@ -7064,6 +7064,22 @@ gimplify_expr (tree *expr_p, gimple_seq
 	  }
 	  break;
 
+        case VEC_COND_EXPR:
+	  {
+	    enum gimplify_status r0, r1, r2;
+
+	    r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+				post_p, is_gimple_condexpr, fb_rvalue);
+	    r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+				post_p, is_gimple_val, fb_rvalue);
+	    recalculate_side_effects (*expr_p);
+
+	    ret = MIN (r0, MIN (r1, r2));
+	  }
+	  break;
+
 	case TARGET_MEM_REF:
 	  {
 	    enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE;
@@ -7348,6 +7364,11 @@ gimplify_expr (tree *expr_p, gimple_seq
 		{
 		  tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1));
 
+		  /* Vector comparisons is a valid gimple expression
+		     which could be lowered down later.  */
+		  if (TREE_CODE (type) == VECTOR_TYPE)
+		    goto expr_2;
+
 		  if (!AGGREGATE_TYPE_P (type))
 		    {
 		      tree org_type = TREE_TYPE (*expr_p);
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 177665)
+++ gcc/tree.def	(working copy)
@@ -704,7 +704,10 @@ DEFTREECODE (TRUTH_NOT_EXPR, "truth_not_
    The others are allowed only for integer (or pointer or enumeral)
    or real types.
    In all cases the operands will have the same type,
-   and the value is always the type used by the language for booleans.  */
+   and the value is either the type used by the language for booleans
+   or an integer vector type of the same size and with the same number
+   of elements as the comparison operands.  True for a vector of
+   comparison results has all bits set while false is equal to zero.  */
 DEFTREECODE (LT_EXPR, "lt_expr", tcc_comparison, 2)
 DEFTREECODE (LE_EXPR, "le_expr", tcc_comparison, 2)
 DEFTREECODE (GT_EXPR, "gt_expr", tcc_comparison, 2)
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 177665)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,11 +30,16 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "target.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
 #include "optabs.h"
 
+
+static void expand_vector_operations_1 (gimple_stmt_iterator *);
+
+
 /* Build a constant of type TYPE, made of VALUE's bits replicated
    every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision.  */
 static tree
@@ -125,6 +130,31 @@ do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0
+   
+   INNER_TYPE is the type of A and B elements
+   
+   returned expression is of signed integer type with the 
+   size equal to the size of INNER_TYPE.  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree cond;
+  tree comp_type;
+
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  
+  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
+
+  cond = gimplify_build2 (gsi, code, comp_type, a, b);
+  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond, 
+                    build_int_cst (comp_type, -1),
+                    build_int_cst (comp_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -333,6 +363,24 @@ uniform_vector_p (tree vec)
   return NULL_TREE;
 }
 
+/* Try to expand vector comparison expression OP0 CODE OP1 by
+   querying optab if the following expression:
+	VEC_COND_EXPR< OP0 CODE OP1, {-1,...}, {0,...}>
+   can be expanded.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+  tree t;
+  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  else
+    t = gimplify_build2  (gsi, code, type, op0, op1);
+
+  return t;
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -375,8 +423,27 @@ expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+	{
+	  tree rhs1 = gimple_assign_rhs1 (assign);
+	  tree rhs2 = gimple_assign_rhs2 (assign);
 
+	  return expand_vector_comparison (gsi, type, rhs1, rhs2, code);
+	}
       default:
 	break;
       }
@@ -432,6 +499,126 @@ type_for_widest_vector_mode (enum machin
     }
 }
 
+
+
+/* Expand vector condition EXP which should have the form
+   VEC_COND_EXPR<cond, vec0, vec1> into the following
+   vector:
+     {cond[i] != 0 ? vec0[i] : vec1[i], ... }
+   i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)).  */
+static tree
+expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
+{
+  tree cond = TREE_OPERAND (exp, 0);
+  tree vec0 = TREE_OPERAND (exp, 1);
+  tree vec1 = TREE_OPERAND (exp, 2);
+  tree type = TREE_TYPE (vec0);
+  tree lhs, rhs, notmask;
+  tree var, new_rhs;
+  optab op = NULL;
+  gimple new_stmt;
+  gimple_stmt_iterator gsi_tmp;
+  tree t;
+
+  
+  if (COMPARISON_CLASS_P (cond))
+    {
+      /* Expand vector condition inside of VEC_COND_EXPR.  */
+      if (! expand_vec_cond_expr_p (TREE_TYPE (cond), 
+				    TYPE_MODE (TREE_TYPE (cond))))
+	{
+	  tree op0 = TREE_OPERAND (cond, 0);
+	  tree op1 = TREE_OPERAND (cond, 1);
+
+	  var = create_tmp_reg (TREE_TYPE (cond), "cond");
+	  new_rhs = expand_vector_piecewise (gsi, do_compare, 
+					     TREE_TYPE (cond),
+					     TREE_TYPE (TREE_TYPE (op1)),
+					     op0, op1, TREE_CODE (cond));
+
+	  new_stmt = gimple_build_assign (var, new_rhs);
+	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+	  update_stmt (gsi_stmt (*gsi));
+	}
+      else
+	var = cond;
+    }
+  else
+    var = cond;
+  
+  gsi_tmp = *gsi;
+  gsi_prev (&gsi_tmp);
+
+  /* Expand VCOND<mask, v0, v1> to ((v0 & mask) | (v1 & ~mask))  */
+  lhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, var, vec0);
+  notmask = gimplify_build1 (gsi, BIT_NOT_EXPR, type, var);
+  rhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, notmask, vec1);
+  t = gimplify_build2 (gsi, BIT_IOR_EXPR, type, lhs, rhs);
+
+  /* Run vecower on the expresisons we have introduced.  */
+  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
+    expand_vector_operations_1 (&gsi_tmp);
+  
+  return t;
+}
+
+static bool
+is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
+{
+  tree type = TREE_TYPE (expr);
+
+  if (TREE_CODE (expr) == VEC_COND_EXPR)
+    return true;
+    
+  if (COMPARISON_CLASS_P (expr) && TREE_CODE (type) == VECTOR_TYPE)
+    return true;
+
+  if (TREE_CODE (expr) == BIT_IOR_EXPR || TREE_CODE (expr) == BIT_AND_EXPR
+      || TREE_CODE (expr) == BIT_XOR_EXPR)
+    return is_vector_comparison (gsi, TREE_OPERAND (expr, 0))
+	   & is_vector_comparison (gsi, TREE_OPERAND (expr, 1));
+
+  if (TREE_CODE (expr) == VAR_DECL)
+    { 
+      gimple_stmt_iterator gsi_tmp;
+      tree name = DECL_NAME (expr);
+      tree var = NULL_TREE;
+      
+      gsi_tmp = *gsi;
+
+      for (; gsi_tmp.ptr; gsi_prev (&gsi_tmp))
+	{
+	  gimple stmt = gsi_stmt (gsi_tmp);
+
+	  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+	    continue;
+
+	  if (TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL
+	      && DECL_NAME (gimple_assign_lhs (stmt)) == name)
+	    return is_vector_comparison (&gsi_tmp, 
+					 gimple_assign_rhs_to_tree (stmt));
+	}
+    } 
+  
+  if (TREE_CODE (expr) == SSA_NAME)
+    {
+      enum tree_code code;
+      gimple exprdef = SSA_NAME_DEF_STMT (expr);
+
+      if (gimple_code (exprdef) != GIMPLE_ASSIGN)
+	return false;
+
+      if (TREE_CODE (gimple_expr_type (exprdef)) != VECTOR_TYPE)
+	return false;
+
+      
+      return is_vector_comparison (gsi, 
+				   gimple_assign_rhs_to_tree (exprdef));
+    }
+
+  return false;
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
@@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
 
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
+  lhs = gimple_assign_lhs (stmt);
+
+  if (code == VEC_COND_EXPR)
+    {
+      tree exp = gimple_assign_rhs1 (stmt);
+      tree cond = TREE_OPERAND (exp, 0);
+      
+      /* Try to get rid from the useless vector comparison 
+	 x != {0,0,...} which is inserted by the typechecker.  */
+      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
+	{
+	  tree el = uniform_vector_p (TREE_OPERAND (cond, 1));
+	  
+	  if (el != NULL_TREE && TREE_CONSTANT (el) 
+	      && TREE_CODE (TREE_TYPE (el)) == INTEGER_TYPE
+	      && tree_low_cst (el, 0) == 0
+	      && is_vector_comparison (gsi, TREE_OPERAND (cond, 0)))
+	    cond = TREE_OPERAND (cond, 0);
+	}
+      
+      if (expand_vec_cond_expr_p (TREE_TYPE (exp), 
+                                  TYPE_MODE (TREE_TYPE (exp))))
+        {
+	  update_stmt (gsi_stmt (*gsi));
+	  return;
+        }
+        
+      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
+      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
+      update_stmt (gsi_stmt (*gsi));
+    }
 
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
-  lhs = gimple_assign_lhs (stmt);
   rhs1 = gimple_assign_rhs1 (stmt);
   type = gimple_expr_type (stmt);
   if (rhs_class == GIMPLE_BINARY_RHS)
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 177665)
+++ gcc/Makefile.in	(working copy)
@@ -888,7 +888,7 @@ EXCEPT_H = except.h $(HASHTAB_H) vecprim
 TARGET_DEF = target.def target-hooks-macros.h
 C_TARGET_DEF = c-family/c-target.def target-hooks-macros.h
 COMMON_TARGET_DEF = common/common-target.def target-hooks-macros.h
-TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
+TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
 C_TARGET_H = c-family/c-target.h $(C_TARGET_DEF)
 COMMON_TARGET_H = common/common-target.h $(INPUT_H) $(COMMON_TARGET_DEF)
 MACHMODE_H = machmode.h mode-classes.def insn-modes.h
@@ -919,8 +919,9 @@ TREE_H = tree.h all-tree.def tree.def c-
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) cfghooks.h
 GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
-	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TARGET_H) tree-ssa-operands.h \
+	vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TGT) tree-ssa-operands.h \
 	tree-ssa-alias.h $(INTERNAL_FN_H)
+TARGET_H = $(TGT) gimple.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 COVERAGE_H = coverage.h $(GCOV_IO_H)
 DEMANGLE_H = $(srcdir)/../include/demangle.h
@@ -3185,7 +3186,7 @@ tree-vect-generic.o : tree-vect-generic.
     $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \
     $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \
     langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \
-    coretypes.h insn-codes.h
+    coretypes.h insn-codes.h target.h
 df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \
    hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 177665)
+++ gcc/tree-cfg.c	(working copy)
@@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
       return true;
     }
 
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      if (TREE_CODE (op0_type) != VECTOR_TYPE
+	  || TREE_CODE (op1_type) != VECTOR_TYPE)
+        {
+          error ("non-vector operands in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (!useless_type_conversion_p (op0_type, op1_type)
+	  && !useless_type_conversion_p (op1_type, op0_type))
+        {
+          error ("type mismatch in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type)) 
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+        
+      return false;
+    }
+
   /* For comparisons we do not have the operations type as the
      effective type the comparison is carried out in.  Instead
      we require that either the first operand is trivially
Index: gcc/c-parser.c
===================================================================
--- gcc/c-parser.c	(revision 177665)
+++ gcc/c-parser.c	(working copy)
@@ -5339,6 +5339,15 @@ c_parser_conditional_expression (c_parse
       tree eptype = NULL_TREE;
 
       middle_loc = c_parser_peek_token (parser)->location;
+
+      if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE)
+        {
+          error_at (middle_loc, "cannot ommit middle operator in "
+                                "vector comparison");
+          ret.value = error_mark_node;
+          return ret;
+        }
+      
       pedwarn (middle_loc, OPT_pedantic, 
 	       "ISO C forbids omitting the middle term of a ?: expression");
       warn_for_omitted_condop (middle_loc, cond.value);
@@ -5357,9 +5366,12 @@ c_parser_conditional_expression (c_parse
     }
   else
     {
-      cond.value
-	= c_objc_common_truthvalue_conversion
-	(cond_loc, default_conversion (cond.value));
+      if (TREE_CODE (TREE_TYPE (cond.value)) != VECTOR_TYPE)
+        {
+          cond.value
+            = c_objc_common_truthvalue_conversion
+            (cond_loc, default_conversion (cond.value));
+        }
       c_inhibit_evaluation_warnings += cond.value == truthvalue_false_node;
       exp1 = c_parser_expression_conv (parser);
       mark_exp_read (exp1.value);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177665)
+++ gcc/config/i386/i386.c	(working copy)
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -18402,27 +18403,55 @@ ix86_expand_sse_fp_minmax (rtx dest, enu
   return true;
 }
 
+rtx rtx_build_vector_from_val (enum machine_mode, HOST_WIDE_INT);
+
+/* Returns a vector of mode MODE where all the elements are ARG.  */
+rtx
+rtx_build_vector_from_val (enum machine_mode mode, HOST_WIDE_INT arg)
+{
+  rtvec v;
+  int units, i;
+  enum machine_mode inner;
+  
+  units = GET_MODE_NUNITS (mode);
+  inner = GET_MODE_INNER (mode);
+  v = rtvec_alloc (units);
+  for (i = 0; i < units; ++i)
+    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (inner, arg);
+  
+  return gen_rtx_raw_CONST_VECTOR (mode, v);
+}
+
 /* Expand an sse vector comparison.  Return the register with the result.  */
 
 static rtx
 ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1,
-		     rtx op_true, rtx op_false)
+		     rtx op_true, rtx op_false, bool no_comparison)
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx x;
 
-  cmp_op0 = force_reg (mode, cmp_op0);
-  if (!nonimmediate_operand (cmp_op1, mode))
-    cmp_op1 = force_reg (mode, cmp_op1);
+  /* Avoid useless comparison.  */
+  if (no_comparison)
+    {
+      cmp_op0 = force_reg (mode, cmp_op0);
+      x = cmp_op0;
+    }
+  else
+    {
+      cmp_op0 = force_reg (mode, cmp_op0);
+      if (!nonimmediate_operand (cmp_op1, mode))
+	cmp_op1 = force_reg (mode, cmp_op1);
+
+      x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1);
+    }
 
   if (optimize
       || reg_overlap_mentioned_p (dest, op_true)
       || reg_overlap_mentioned_p (dest, op_false))
     dest = gen_reg_rtx (mode);
 
-  x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1);
   emit_insn (gen_rtx_SET (VOIDmode, dest, x));
-
   return dest;
 }
 
@@ -18434,8 +18463,14 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx t2, t3, x;
-
-  if (op_false == CONST0_RTX (mode))
+  rtx mask_true;
+  
+  if (rtx_equal_p (op_true, rtx_build_vector_from_val (mode, -1))
+      && rtx_equal_p (op_false, CONST0_RTX (mode)))
+    {
+      emit_insn (gen_rtx_SET (VOIDmode, dest, cmp));
+    }
+  else if (op_false == CONST0_RTX (mode))
     {
       op_true = force_reg (mode, op_true);
       x = gen_rtx_AND (mode, cmp, op_true);
@@ -18512,7 +18547,7 @@ ix86_expand_fp_movcc (rtx operands[])
 	return true;
 
       tmp = ix86_expand_sse_cmp (operands[0], code, op0, op1,
-				 operands[2], operands[3]);
+				 operands[2], operands[3], false);
       ix86_expand_sse_movcc (operands[0], tmp, operands[2], operands[3]);
       return true;
     }
@@ -18555,7 +18590,7 @@ ix86_expand_fp_vcond (rtx operands[])
     return true;
 
   cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
-			     operands[1], operands[2]);
+			     operands[1], operands[2], false);
   ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
   return true;
 }
@@ -18569,12 +18604,27 @@ ix86_expand_int_vcond (rtx operands[])
   enum rtx_code code = GET_CODE (operands[3]);
   bool negate = false;
   rtx x, cop0, cop1;
+  rtx comp, cond0, cond1;
+  bool single_var = false;
 
+  comp = operands[3];
   cop0 = operands[4];
   cop1 = operands[5];
 
+  /* If we have a single-variable vcond, the second comparison
+     operand is {0,0...}. Replace it with CONST0_RTX, in order
+     to get some more optimisations later.  */
+  if (GET_CODE (comp) == NE && XEXP (comp, 0) == NULL_RTX 
+      && XEXP (comp, 1) == NULL_RTX)
+    {
+      cond0 = cop0;
+      cond1 = CONST0_RTX (mode);
+      single_var = true;
+    }
+
+
   /* XOP supports all of the comparisons on all vector int types.  */
-  if (!TARGET_XOP)
+  if (!TARGET_XOP && !single_var)
     {
       /* Canonicalize the comparison to EQ, GT, GTU.  */
       switch (code)
@@ -18681,8 +18731,16 @@ ix86_expand_int_vcond (rtx operands[])
 	}
     }
 
-  x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
-			   operands[1+negate], operands[2-negate]);
+  if (single_var)
+    {
+      x = ix86_expand_sse_cmp (operands[0], code, cond0, cond1,
+			       operands[1+negate], operands[2-negate], true);
+    }
+  else
+    {
+      x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
+			       operands[1+negate], operands[2-negate], false);
+    }
 
   ix86_expand_sse_movcc (operands[0], x, operands[1+negate],
 			 operands[2-negate]);
@@ -18774,7 +18832,7 @@ ix86_expand_sse_unpack (rtx operands[2],
 	tmp = force_reg (imode, CONST0_RTX (imode));
       else
 	tmp = ix86_expand_sse_cmp (gen_reg_rtx (imode), GT, CONST0_RTX (imode),
-				   operands[1], pc_rtx, pc_rtx);
+				   operands[1], pc_rtx, pc_rtx, false);
 
       emit_insn (unpack (dest, operands[1], tmp));
     }

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-25  9:22                                                                               ` Artem Shinkarov
@ 2011-08-25  9:58                                                                                 ` Richard Guenther
  2011-08-25 10:15                                                                                   ` Artem Shinkarov
  2011-08-25 11:02                                                                                 ` Richard Guenther
  1 sibling, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-25  9:58 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Here is a cleaned-up patch without the hook. Mostly it works in a way
> we discussed.
>
> So I think it is a right time to do something about vcond patterns,
> which would allow me to get rid of conversions that I need to put all
> over the code.
>
> Also at the moment the patch breaks lto frontend with a simple example:
> #define vector(elcount, type)  \
> __attribute__((vector_size((elcount)*sizeof(type)))) type
>
> int main (int argc, char *argv[]) {
>    vector (4, float) f0;
>    vector (4, float) f1;
>
>    f0 =  f1 != f0
>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>
>    return (int)f0[argc];
> }
>
> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>
> I looked into the file, the conversion function is defined as
> gcc_unreachable (). I am not very familiar with lto, so I don't really
> know what is the right way to treat the conversions.

convert cannot be called from the middle-end, instead use fold_convert.

> And I seriously need help with backend patterns.

I'll look at the patch in detail later today.

Richard.

>
> Thanks,
> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-25  9:58                                                                                 ` Richard Guenther
@ 2011-08-25 10:15                                                                                   ` Artem Shinkarov
  0 siblings, 0 replies; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-25 10:15 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Thu, Aug 25, 2011 at 8:34 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>> we discussed.
>>
>> So I think it is a right time to do something about vcond patterns,
>> which would allow me to get rid of conversions that I need to put all
>> over the code.
>>
>> Also at the moment the patch breaks lto frontend with a simple example:
>> #define vector(elcount, type)  \
>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>
>> int main (int argc, char *argv[]) {
>>    vector (4, float) f0;
>>    vector (4, float) f1;
>>
>>    f0 =  f1 != f0
>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>>
>>    return (int)f0[argc];
>> }
>>
>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>>
>> I looked into the file, the conversion function is defined as
>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>> know what is the right way to treat the conversions.
>
> convert cannot be called from the middle-end, instead use fold_convert.

Thanks, great. I didn't know that. Using fold_convert solves my
problem and make all my tests pass.

>
>> And I seriously need help with backend patterns.
>
> I'll look at the patch in detail later today.

Thanks,
Artem.

> Richard.
>
>>
>> Thanks,
>> Artem.
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-25  9:22                                                                               ` Artem Shinkarov
  2011-08-25  9:58                                                                                 ` Richard Guenther
@ 2011-08-25 11:02                                                                                 ` Richard Guenther
  2011-08-25 11:49                                                                                   ` Artem Shinkarov
  1 sibling, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-25 11:02 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Here is a cleaned-up patch without the hook. Mostly it works in a way
> we discussed.
>
> So I think it is a right time to do something about vcond patterns,
> which would allow me to get rid of conversions that I need to put all
> over the code.
>
> Also at the moment the patch breaks lto frontend with a simple example:
> #define vector(elcount, type)  \
> __attribute__((vector_size((elcount)*sizeof(type)))) type
>
> int main (int argc, char *argv[]) {
>    vector (4, float) f0;
>    vector (4, float) f1;
>
>    f0 =  f1 != f0
>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>
>    return (int)f0[argc];
> }
>
> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>
> I looked into the file, the conversion function is defined as
> gcc_unreachable (). I am not very familiar with lto, so I don't really
> know what is the right way to treat the conversions.
>
> And I seriously need help with backend patterns.

On the patch.

The documentation needs review by a native english speaker, but here
are some factual comments:

+In C vector comparison is supported within standard comparison operators:

it should read 'In GNU C' here and everywhere else as this is a GNU
extension.

 The result of the
+comparison is a signed integer-type vector where the size of each
+element must be the same as the size of compared vectors element.

The result type of the comparison is determined by the C frontend,
it isn't under control of the user.  What you are implying here is
restrictions on vector assignments, which are documented elsewhere.
I'd just say

'The result of the comparison is a vector of the same width and number
of elements as the comparison operands with a signed integral element
type.'

+In addition to the vector comparison C supports conditional expressions

See above.

+For the convenience condition in the vector conditional can be just a
+vector of signed integer type.

'of integer type.'  I don't see a reason to disallow unsigned integers,
they can be equally well compared against zero.

Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h     (revision 177665)
+++ gcc/targhooks.h     (working copy)
@@ -86,6 +86,7 @@ extern int default_builtin_vectorization
 extern tree default_builtin_reciprocal (unsigned int, bool, bool);

 extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
+
 extern bool
 default_builtin_support_vector_misalignment (enum machine_mode mode,
                                             const_tree,

spurious whitespace change.

Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c        (revision 177665)
+++ gcc/optabs.c        (working copy)
@@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
...
+  else
+    {
+      rtx rtx_op0;
+      rtx vec;
+
+      rtx_op0 = expand_normal (op0);
+      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
+      vec = CONST0_RTX (mode);
+
+      create_output_operand (&ops[0], target, mode);
+      create_input_operand (&ops[1], rtx_op1, mode);
+      create_input_operand (&ops[2], rtx_op2, mode);
+      create_input_operand (&ops[3], comparison, mode);
+      create_input_operand (&ops[4], rtx_op0, mode);
+      create_input_operand (&ops[5], vec, mode);

this still builds the fake(?) != comparison, but as you said you need help
with the .md part if we want to use a machine specific pattern for this
case (which we eventually want, for the sake of using XOP vcond).

Index: gcc/target.h
===================================================================
--- gcc/target.h        (revision 177665)
+++ gcc/target.h        (working copy)
@@ -51,6 +51,7 @@
 #define GCC_TARGET_H

 #include "insn-modes.h"
+#include "gimple.h"

 #ifdef ENABLE_CHECKING

spurious change.

@@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
      floating-point, we can only do some of these simplifications.)  */
   if (operand_equal_p (arg0, arg1, 0))
     {
+      tree arg0_type = TREE_TYPE (arg0);
+
       switch (code)
        {
        case EQ_EXPR:
-         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
-             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
+         if (! FLOAT_TYPE_P (arg0_type)
+             || ! HONOR_NANS (TYPE_MODE (arg0_type)))
...

Likewise.

@@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
     case UNGE_EXPR:
     case UNEQ_EXPR:
     case LTGT_EXPR:
+      if (TREE_CODE (ops->type) == VECTOR_TYPE)
+       {
+         enum tree_code code = ops->code;
+         tree arg0 = ops->op0;
+         tree arg1 = ops->op1;

move this code to do_store_flag (we really store a flag value).  It should
also simply do what expand_vec_cond_expr does, probably simply
calling that with the {-1,...} {0,...} extra args should work.

As for the still required conversions, you should be able to delay those
from the C frontend (and here) to expand_vec_cond_expr by, after
expanding op1 and op2, wrapping a subreg around it with a proper mode
(using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
and then convert the result back to the original mode.

I'll leave the C frontend pieces of the patch for review by Joseph, but

+static tree
+fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)

is missing a function comment.

+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+         tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree cond;
+  tree comp_type;
+
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+
+  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
+

Use

  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0);

instead.  But I think you don't want to use TYPE_PRECISION on
FP types.  Instead you want a signed integer type of the same (mode)
size as the vector element type, thus

  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
(TYPE_MODE (inner_type)), 0);

+  cond = gimplify_build2 (gsi, code, comp_type, a, b);

the result type of a comparison is boolean_type_node, not comp_type.

+  cond = gimplify_build2 (gsi, code, comp_type, a, b);
+  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
+                    build_int_cst (comp_type, -1),
+                    build_int_cst (comp_type, 0));

writing this as

+  return gimplify_build3 (gsi, COND_EXPR, comp_type,
                     fold_build2 (code, boolean_type_node, a, b),
+                    build_int_cst (comp_type, -1),
+                    build_int_cst (comp_type, 0));

will get the gimplifier a better chance at simplifcation.

+  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))

I think we are expecting the scalar type and the vector mode here
from looking at the single existing caller.  It probably doesn't make
a difference (we only check TYPE_UNSIGNED of it, which should
also work for vector types), but let's be consistent.  Thus,

    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))

+  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
+    t = expand_vector_piecewise (gsi, do_compare, type,
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  else
+    t = gimplify_build2  (gsi, code, type, op0, op1);

the else case looks odd.  Why re-build a stmt that already exists?
Simply return NULL_TREE instead?

+static tree
+expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
+{
...
+      /* Expand vector condition inside of VEC_COND_EXPR.  */
+      if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
+                                   TYPE_MODE (TREE_TYPE (cond))))
+       {
...
+         new_rhs = expand_vector_piecewise (gsi, do_compare,
+                                            TREE_TYPE (cond),
+                                            TREE_TYPE (TREE_TYPE (op1)),
+                                            op0, op1, TREE_CODE (cond));

I'm not sure it is beneficial to expand a < b ? v0 : v1 to

tem = { a[0] < b[0] ? -1 : 0, ... }
v0 & tem | v1 & ~tem;

instead of

{ a[0] < b[0] ? v0[0] : v1[0], ... }

even if the bitwise operations could be carried out using vectors.
It's definitely beneficial to do the first if the CPU can create the
bitmask.

+  /* Run vecower on the expresisons we have introduced.  */
+  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
+    expand_vector_operations_1 (&gsi_tmp);

do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)

+static bool
+is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
+{

This function is lacking a comment.

@@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
...
+      /* Try to get rid from the useless vector comparison
+        x != {0,0,...} which is inserted by the typechecker.  */
+      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)

how and why?  You simply drop that comparison - that doesn't look
correct.  And in fact TREE_OPERAND (cond, 0) will never be a
comparison - that wouldn't be valid gimple.  Please leave this
optimization to SSA based forward propagation (I can help you here
once the patch is in).

+      if (expand_vec_cond_expr_p (TREE_TYPE (exp),
+                                  TYPE_MODE (TREE_TYPE (exp))))
+        {
+         update_stmt (gsi_stmt (*gsi));
+         return;

no need to update the stmt when you do nothing.

+      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
+      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
+      update_stmt (gsi_stmt (*gsi));
+    }

missing return;, just for clarity that you are done here.

You don't do anything for comparisons here, in case they are split
away from the VEC_COND_EXPR by the gimplifier.  But if the
target doesn't support VEC_COND_EXPRs we have to lower them.
I suggest checking your testcases on i?86-linux (or with -m32 -march=i486).

-TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
+TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h

huh, no please ;)  I suppose that's no longer necessary anyway now.

I'll leave the i386.c pieces to the x86 target maintainers to review.
They probably will change once the .md file changes are sorted out.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-25 11:02                                                                                 ` Richard Guenther
@ 2011-08-25 11:49                                                                                   ` Artem Shinkarov
  2011-08-25 12:14                                                                                     ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-25 11:49 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>> we discussed.
>>
>> So I think it is a right time to do something about vcond patterns,
>> which would allow me to get rid of conversions that I need to put all
>> over the code.
>>
>> Also at the moment the patch breaks lto frontend with a simple example:
>> #define vector(elcount, type)  \
>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>
>> int main (int argc, char *argv[]) {
>>    vector (4, float) f0;
>>    vector (4, float) f1;
>>
>>    f0 =  f1 != f0
>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>>
>>    return (int)f0[argc];
>> }
>>
>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>>
>> I looked into the file, the conversion function is defined as
>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>> know what is the right way to treat the conversions.
>>
>> And I seriously need help with backend patterns.
>
> On the patch.
>
> The documentation needs review by a native english speaker, but here
> are some factual comments:
>
> +In C vector comparison is supported within standard comparison operators:
>
> it should read 'In GNU C' here and everywhere else as this is a GNU
> extension.
>
>  The result of the
> +comparison is a signed integer-type vector where the size of each
> +element must be the same as the size of compared vectors element.
>
> The result type of the comparison is determined by the C frontend,
> it isn't under control of the user.  What you are implying here is
> restrictions on vector assignments, which are documented elsewhere.
> I'd just say
>
> 'The result of the comparison is a vector of the same width and number
> of elements as the comparison operands with a signed integral element
> type.'
>
> +In addition to the vector comparison C supports conditional expressions
>
> See above.
>
> +For the convenience condition in the vector conditional can be just a
> +vector of signed integer type.
>
> 'of integer type.'  I don't see a reason to disallow unsigned integers,
> they can be equally well compared against zero.

I'll have a final go on the documentation, it is untouched from the old patches.

> Index: gcc/targhooks.h
> ===================================================================
> --- gcc/targhooks.h     (revision 177665)
> +++ gcc/targhooks.h     (working copy)
> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization
>  extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>
>  extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
> +
>  extern bool
>  default_builtin_support_vector_misalignment (enum machine_mode mode,
>                                             const_tree,
>
> spurious whitespace change.

Yes, thanks.

> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        (revision 177665)
> +++ gcc/optabs.c        (working copy)
> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
> ...
> +  else
> +    {
> +      rtx rtx_op0;
> +      rtx vec;
> +
> +      rtx_op0 = expand_normal (op0);
> +      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
> +      vec = CONST0_RTX (mode);
> +
> +      create_output_operand (&ops[0], target, mode);
> +      create_input_operand (&ops[1], rtx_op1, mode);
> +      create_input_operand (&ops[2], rtx_op2, mode);
> +      create_input_operand (&ops[3], comparison, mode);
> +      create_input_operand (&ops[4], rtx_op0, mode);
> +      create_input_operand (&ops[5], vec, mode);
>
> this still builds the fake(?) != comparison, but as you said you need help
> with the .md part if we want to use a machine specific pattern for this
> case (which we eventually want, for the sake of using XOP vcond).

Yes, I am waiting for it. This is the only way at the moment to make
sure that in
m = a > b;
r = m ? c : d;

m in the vcond is not transformed into the m != 0.

> Index: gcc/target.h
> ===================================================================
> --- gcc/target.h        (revision 177665)
> +++ gcc/target.h        (working copy)
> @@ -51,6 +51,7 @@
>  #define GCC_TARGET_H
>
>  #include "insn-modes.h"
> +#include "gimple.h"
>
>  #ifdef ENABLE_CHECKING
>
> spurious change.

Old stuff, fixed.

> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
>      floating-point, we can only do some of these simplifications.)  */
>   if (operand_equal_p (arg0, arg1, 0))
>     {
> +      tree arg0_type = TREE_TYPE (arg0);
> +
>       switch (code)
>        {
>        case EQ_EXPR:
> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
> +         if (! FLOAT_TYPE_P (arg0_type)
> +             || ! HONOR_NANS (TYPE_MODE (arg0_type)))
> ...

Ok.

>
> Likewise.
>
> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>     case UNGE_EXPR:
>     case UNEQ_EXPR:
>     case LTGT_EXPR:
> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
> +       {
> +         enum tree_code code = ops->code;
> +         tree arg0 = ops->op0;
> +         tree arg1 = ops->op1;
>
> move this code to do_store_flag (we really store a flag value).  It should
> also simply do what expand_vec_cond_expr does, probably simply
> calling that with the {-1,...} {0,...} extra args should work.

I started to do that, but the code in do_store_flag is completely
different from what I am doing, and it looks confusing. I just call
expand_vec_cond_expr and that is it. I can write a separate function,
but the code is quite small.

>
> As for the still required conversions, you should be able to delay those
> from the C frontend (and here) to expand_vec_cond_expr by, after
> expanding op1 and op2, wrapping a subreg around it with a proper mode
> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
> and then convert the result back to the original mode.
>
> I'll leave the C frontend pieces of the patch for review by Joseph, but

Conversions are there until we fix the backend. When backend will be
able to digest f0 > f1 ? int0 : int1, all the conversions will go
away.

> +static tree
> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
>
> is missing a function comment.

fixed.

> +static tree
> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
> +         tree bitpos, tree bitsize, enum tree_code code)
> +{
> +  tree cond;
> +  tree comp_type;
> +
> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
> +  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
> +
> +  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
> +
>
> Use
>
>  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0);
>
> instead.  But I think you don't want to use TYPE_PRECISION on
> FP types.  Instead you want a signed integer type of the same (mode)
> size as the vector element type, thus
>
>  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
> (TYPE_MODE (inner_type)), 0);

Hm, I thought that at this stage we don't wan to know anything about
modes. I mean here I am really building the same integer type as the
operands of the comparison have. But I can use MODE_BITSIZE as well, I
don't think that it could happen that the size of the mode is
different from the size of the type. Or could it?

> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>
> the result type of a comparison is boolean_type_node, not comp_type.
>
> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
> +                    build_int_cst (comp_type, -1),
> +                    build_int_cst (comp_type, 0));
>
> writing this as
>
> +  return gimplify_build3 (gsi, COND_EXPR, comp_type,
>                     fold_build2 (code, boolean_type_node, a, b),
> +                    build_int_cst (comp_type, -1),
> +                    build_int_cst (comp_type, 0));
>
> will get the gimplifier a better chance at simplifcation.
>
> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>
> I think we are expecting the scalar type and the vector mode here
> from looking at the single existing caller.  It probably doesn't make
> a difference (we only check TYPE_UNSIGNED of it, which should
> also work for vector types), but let's be consistent.  Thus,

Ok.

>    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
>
> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
> +    t = expand_vector_piecewise (gsi, do_compare, type,
> +                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
> +  else
> +    t = gimplify_build2  (gsi, code, type, op0, op1);
>
> the else case looks odd.  Why re-build a stmt that already exists?
> Simply return NULL_TREE instead?

I can adjust. The reason it is written that way is that
expand_vector_operations_1 is using the result of the function to
update rhs.

> +static tree
> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
> +{
> ...
> +      /* Expand vector condition inside of VEC_COND_EXPR.  */
> +      if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
> +                                   TYPE_MODE (TREE_TYPE (cond))))
> +       {
> ...
> +         new_rhs = expand_vector_piecewise (gsi, do_compare,
> +                                            TREE_TYPE (cond),
> +                                            TREE_TYPE (TREE_TYPE (op1)),
> +                                            op0, op1, TREE_CODE (cond));
>
> I'm not sure it is beneficial to expand a < b ? v0 : v1 to
>
> tem = { a[0] < b[0] ? -1 : 0, ... }
> v0 & tem | v1 & ~tem;
>
> instead of
>
> { a[0] < b[0] ? v0[0] : v1[0], ... }
>
> even if the bitwise operations could be carried out using vectors.
> It's definitely beneficial to do the first if the CPU can create the
> bitmask.
>

o_O

I thought you always wanted to do (m & v0) | (~m & v1).
Do you want to have two cases of the expansion then -- when we have
mask available and when we don't? But it is really unlikely that we
can get the mask, but cannot get vcond. Because condition is actually
vcond. So once again -- do we always expand to {a[0] > b[0]  ? v[0] :
c[0], ...}?

> +  /* Run vecower on the expresisons we have introduced.  */
> +  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
> +    expand_vector_operations_1 (&gsi_tmp);
>
> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)
>
> +static bool
> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
> +{
>
> This function is lacking a comment.
>
> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
> ...
> +      /* Try to get rid from the useless vector comparison
> +        x != {0,0,...} which is inserted by the typechecker.  */
> +      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
>
> how and why?  You simply drop that comparison - that doesn't look
> correct.  And in fact TREE_OPERAND (cond, 0) will never be a
> comparison - that wouldn't be valid gimple.  Please leave this
> optimization to SSA based forward propagation (I can help you here
> once the patch is in).

No-no-no. This is the second part of avoiding
m = a > b;
r = m ? v0 : v1;

to prevent m expansion to m != {0}.

I do not _simply_ drop the comparison. I drop it only if
is_vector_comparison returned true. It means that we can never get
into the situation that we are dropping actually a comparison inserted
by the user. But what I really want to achieve here is to drop the
comparison that the frontend inserts every time when it sees an
expression there.

As I said earlier, tree forward propagation kicks only using -On, and
I would really like to make sure that I can get rid of useless != {0}
at any level.

>
> +      if (expand_vec_cond_expr_p (TREE_TYPE (exp),
> +                                  TYPE_MODE (TREE_TYPE (exp))))
> +        {
> +         update_stmt (gsi_stmt (*gsi));
> +         return;
>
> no need to update the stmt when you do nothing.
>
> +      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
> +      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
> +      update_stmt (gsi_stmt (*gsi));
> +    }
>
> missing return;, just for clarity that you are done here.

Ok.

> You don't do anything for comparisons here, in case they are split
> away from the VEC_COND_EXPR by the gimplifier.  But if the
> target doesn't support VEC_COND_EXPRs we have to lower them.
> I suggest checking your testcases on i?86-linux (or with -m32 -march=i486).
>

expand_vector_operations_1 take care about any vector comparison,
considering it as a binary operation. See expand_vector_operation and
do_compare for more details.

> -TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
> +TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
>
> huh, no please ;)  I suppose that's no longer necessary anyway now.
>

Yeah, fixed. :)

> I'll leave the i386.c pieces to the x86 target maintainers to review.
> They probably will change once the .md file changes are sorted out.

If they ever going to be sorted out...

> Thanks,
> Richard.
>


Thanks,
Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-25 11:49                                                                                   ` Artem Shinkarov
@ 2011-08-25 12:14                                                                                     ` Richard Guenther
  2011-08-25 13:29                                                                                       ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-25 12:14 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>>> we discussed.
>>>
>>> So I think it is a right time to do something about vcond patterns,
>>> which would allow me to get rid of conversions that I need to put all
>>> over the code.
>>>
>>> Also at the moment the patch breaks lto frontend with a simple example:
>>> #define vector(elcount, type)  \
>>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>
>>> int main (int argc, char *argv[]) {
>>>    vector (4, float) f0;
>>>    vector (4, float) f1;
>>>
>>>    f0 =  f1 != f0
>>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>>>
>>>    return (int)f0[argc];
>>> }
>>>
>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>>>
>>> I looked into the file, the conversion function is defined as
>>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>>> know what is the right way to treat the conversions.
>>>
>>> And I seriously need help with backend patterns.
>>
>> On the patch.
>>
>> The documentation needs review by a native english speaker, but here
>> are some factual comments:
>>
>> +In C vector comparison is supported within standard comparison operators:
>>
>> it should read 'In GNU C' here and everywhere else as this is a GNU
>> extension.
>>
>>  The result of the
>> +comparison is a signed integer-type vector where the size of each
>> +element must be the same as the size of compared vectors element.
>>
>> The result type of the comparison is determined by the C frontend,
>> it isn't under control of the user.  What you are implying here is
>> restrictions on vector assignments, which are documented elsewhere.
>> I'd just say
>>
>> 'The result of the comparison is a vector of the same width and number
>> of elements as the comparison operands with a signed integral element
>> type.'
>>
>> +In addition to the vector comparison C supports conditional expressions
>>
>> See above.
>>
>> +For the convenience condition in the vector conditional can be just a
>> +vector of signed integer type.
>>
>> 'of integer type.'  I don't see a reason to disallow unsigned integers,
>> they can be equally well compared against zero.
>
> I'll have a final go on the documentation, it is untouched from the old patches.
>
>> Index: gcc/targhooks.h
>> ===================================================================
>> --- gcc/targhooks.h     (revision 177665)
>> +++ gcc/targhooks.h     (working copy)
>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization
>>  extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>>
>>  extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
>> +
>>  extern bool
>>  default_builtin_support_vector_misalignment (enum machine_mode mode,
>>                                             const_tree,
>>
>> spurious whitespace change.
>
> Yes, thanks.
>
>> Index: gcc/optabs.c
>> ===================================================================
>> --- gcc/optabs.c        (revision 177665)
>> +++ gcc/optabs.c        (working copy)
>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
>> ...
>> +  else
>> +    {
>> +      rtx rtx_op0;
>> +      rtx vec;
>> +
>> +      rtx_op0 = expand_normal (op0);
>> +      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
>> +      vec = CONST0_RTX (mode);
>> +
>> +      create_output_operand (&ops[0], target, mode);
>> +      create_input_operand (&ops[1], rtx_op1, mode);
>> +      create_input_operand (&ops[2], rtx_op2, mode);
>> +      create_input_operand (&ops[3], comparison, mode);
>> +      create_input_operand (&ops[4], rtx_op0, mode);
>> +      create_input_operand (&ops[5], vec, mode);
>>
>> this still builds the fake(?) != comparison, but as you said you need help
>> with the .md part if we want to use a machine specific pattern for this
>> case (which we eventually want, for the sake of using XOP vcond).
>
> Yes, I am waiting for it. This is the only way at the moment to make
> sure that in
> m = a > b;
> r = m ? c : d;
>
> m in the vcond is not transformed into the m != 0.
>
>> Index: gcc/target.h
>> ===================================================================
>> --- gcc/target.h        (revision 177665)
>> +++ gcc/target.h        (working copy)
>> @@ -51,6 +51,7 @@
>>  #define GCC_TARGET_H
>>
>>  #include "insn-modes.h"
>> +#include "gimple.h"
>>
>>  #ifdef ENABLE_CHECKING
>>
>> spurious change.
>
> Old stuff, fixed.
>
>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
>>      floating-point, we can only do some of these simplifications.)  */
>>   if (operand_equal_p (arg0, arg1, 0))
>>     {
>> +      tree arg0_type = TREE_TYPE (arg0);
>> +
>>       switch (code)
>>        {
>>        case EQ_EXPR:
>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>> +         if (! FLOAT_TYPE_P (arg0_type)
>> +             || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>> ...
>
> Ok.
>
>>
>> Likewise.
>>
>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>     case UNGE_EXPR:
>>     case UNEQ_EXPR:
>>     case LTGT_EXPR:
>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>> +       {
>> +         enum tree_code code = ops->code;
>> +         tree arg0 = ops->op0;
>> +         tree arg1 = ops->op1;
>>
>> move this code to do_store_flag (we really store a flag value).  It should
>> also simply do what expand_vec_cond_expr does, probably simply
>> calling that with the {-1,...} {0,...} extra args should work.
>
> I started to do that, but the code in do_store_flag is completely
> different from what I am doing, and it looks confusing. I just call
> expand_vec_cond_expr and that is it. I can write a separate function,
> but the code is quite small.

Hm?  I see in your patch

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 177665)
+++ gcc/expr.c  (working copy)
@@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
     case UNGE_EXPR:
     case UNEQ_EXPR:
     case LTGT_EXPR:
+      if (TREE_CODE (ops->type) == VECTOR_TYPE)
+       {
+         enum tree_code code = ops->code;
+         tree arg0 = ops->op0;
+         tree arg1 = ops->op1;
+         tree arg_type = TREE_TYPE (arg0);
+         tree el_type = TREE_TYPE (arg_type);
+         tree t, ifexp, if_true, if_false;
+
+         el_type = lang_hooks.types.type_for_size (TYPE_PRECISION
(el_type), 0);
+
+
+         ifexp = build2 (code, type, arg0, arg1);
+         if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
+         if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
+
+         if (arg_type != type)
+           {
+             if_true = convert (arg_type, if_true);
+             if_false = convert (arg_type, if_false);
+             t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
+             t = convert (type, t);
+           }
+         else
+           t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
+
+         return expand_expr (t,
+                             modifier != EXPAND_STACK_PARM ? target :
NULL_RTX,
+                             tmode != VOIDmode ? tmode : mode,
+                             modifier);
+       }

that's not exactly "calling expand_vec_cond_expr".

>>
>> As for the still required conversions, you should be able to delay those
>> from the C frontend (and here) to expand_vec_cond_expr by, after
>> expanding op1 and op2, wrapping a subreg around it with a proper mode
>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
>> and then convert the result back to the original mode.
>>
>> I'll leave the C frontend pieces of the patch for review by Joseph, but
>
> Conversions are there until we fix the backend. When backend will be
> able to digest f0 > f1 ? int0 : int1, all the conversions will go
> away.
>
>> +static tree
>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
>>
>> is missing a function comment.
>
> fixed.
>
>> +static tree
>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
>> +         tree bitpos, tree bitsize, enum tree_code code)
>> +{
>> +  tree cond;
>> +  tree comp_type;
>> +
>> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
>> +  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
>> +
>> +  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
>> +
>>
>> Use
>>
>>  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0);
>>
>> instead.  But I think you don't want to use TYPE_PRECISION on
>> FP types.  Instead you want a signed integer type of the same (mode)
>> size as the vector element type, thus
>>
>>  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
>> (TYPE_MODE (inner_type)), 0);
>
> Hm, I thought that at this stage we don't wan to know anything about
> modes. I mean here I am really building the same integer type as the
> operands of the comparison have. But I can use MODE_BITSIZE as well, I
> don't think that it could happen that the size of the mode is
> different from the size of the type. Or could it?

The comparison could be on floating-point types where TYPE_PRECISION
can be, for example, 80 for x87 doubles.  You want an integer type
of the same width, so yes, GET_MODE_BITSIZE is the correct thing
to use here.

>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>
>> the result type of a comparison is boolean_type_node, not comp_type.
>>
>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
>> +                    build_int_cst (comp_type, -1),
>> +                    build_int_cst (comp_type, 0));
>>
>> writing this as
>>
>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type,
>>                     fold_build2 (code, boolean_type_node, a, b),
>> +                    build_int_cst (comp_type, -1),
>> +                    build_int_cst (comp_type, 0));
>>
>> will get the gimplifier a better chance at simplifcation.
>>
>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>
>> I think we are expecting the scalar type and the vector mode here
>> from looking at the single existing caller.  It probably doesn't make
>> a difference (we only check TYPE_UNSIGNED of it, which should
>> also work for vector types), but let's be consistent.  Thus,
>
> Ok.
>
>>    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
>>
>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>> +    t = expand_vector_piecewise (gsi, do_compare, type,
>> +                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
>> +  else
>> +    t = gimplify_build2  (gsi, code, type, op0, op1);
>>
>> the else case looks odd.  Why re-build a stmt that already exists?
>> Simply return NULL_TREE instead?
>
> I can adjust. The reason it is written that way is that
> expand_vector_operations_1 is using the result of the function to
> update rhs.

Ok, so it should check whether there was any lowering done then.

>> +static tree
>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
>> +{
>> ...
>> +      /* Expand vector condition inside of VEC_COND_EXPR.  */
>> +      if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
>> +                                   TYPE_MODE (TREE_TYPE (cond))))
>> +       {
>> ...
>> +         new_rhs = expand_vector_piecewise (gsi, do_compare,
>> +                                            TREE_TYPE (cond),
>> +                                            TREE_TYPE (TREE_TYPE (op1)),
>> +                                            op0, op1, TREE_CODE (cond));
>>
>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to
>>
>> tem = { a[0] < b[0] ? -1 : 0, ... }
>> v0 & tem | v1 & ~tem;
>>
>> instead of
>>
>> { a[0] < b[0] ? v0[0] : v1[0], ... }
>>
>> even if the bitwise operations could be carried out using vectors.
>> It's definitely beneficial to do the first if the CPU can create the
>> bitmask.
>>
>
> o_O
>
> I thought you always wanted to do (m & v0) | (~m & v1).
> Do you want to have two cases of the expansion then -- when we have
> mask available and when we don't? But it is really unlikely that we
> can get the mask, but cannot get vcond. Because condition is actually
> vcond. So once again -- do we always expand to {a[0] > b[0]  ? v[0] :
> c[0], ...}?

Hm, yeah.  I suppose with the current setup it's hard to only
get the mask but not the full vcond ;)  So it probably makes
sense to always expand to {a[0] > b[0]  ? v[0] :c[0],...} as
fallback.  Sorry for the confusion ;)

>> +  /* Run vecower on the expresisons we have introduced.  */
>> +  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
>> +    expand_vector_operations_1 (&gsi_tmp);
>>
>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)
>>
>> +static bool
>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
>> +{
>>
>> This function is lacking a comment.
>>
>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
>> ...
>> +      /* Try to get rid from the useless vector comparison
>> +        x != {0,0,...} which is inserted by the typechecker.  */
>> +      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
>>
>> how and why?  You simply drop that comparison - that doesn't look
>> correct.  And in fact TREE_OPERAND (cond, 0) will never be a
>> comparison - that wouldn't be valid gimple.  Please leave this
>> optimization to SSA based forward propagation (I can help you here
>> once the patch is in).
>
> No-no-no. This is the second part of avoiding
> m = a > b;
> r = m ? v0 : v1;
>
> to prevent m expansion to m != {0}.
>
> I do not _simply_ drop the comparison. I drop it only if
> is_vector_comparison returned true. It means that we can never get
> into the situation that we are dropping actually a comparison inserted
> by the user. But what I really want to achieve here is to drop the
> comparison that the frontend inserts every time when it sees an
> expression there.
>
> As I said earlier, tree forward propagation kicks only using -On, and
> I would really like to make sure that I can get rid of useless != {0}
> at any level.

Please don't.  If the language extension forces a != 0 then it should
appear at -O0.  The code is fishy anyway in the way it walks stmts
in is_vector_comparison.  At least I don't like to see this optimization
done here for the sake of -O0 in this initial patch - you could try
arguing about it as a followup improvement (well, probably with not
much luck).  -O0 is about compile-speed and debugging, doing
data-flow by walking stmts backward is slow.

>>
>> +      if (expand_vec_cond_expr_p (TREE_TYPE (exp),
>> +                                  TYPE_MODE (TREE_TYPE (exp))))
>> +        {
>> +         update_stmt (gsi_stmt (*gsi));
>> +         return;
>>
>> no need to update the stmt when you do nothing.
>>
>> +      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
>> +      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
>> +      update_stmt (gsi_stmt (*gsi));
>> +    }
>>
>> missing return;, just for clarity that you are done here.
>
> Ok.
>
>> You don't do anything for comparisons here, in case they are split
>> away from the VEC_COND_EXPR by the gimplifier.  But if the
>> target doesn't support VEC_COND_EXPRs we have to lower them.
>> I suggest checking your testcases on i?86-linux (or with -m32 -march=i486).
>>
>
> expand_vector_operations_1 take care about any vector comparison,
> considering it as a binary operation. See expand_vector_operation and
> do_compare for more details.

Ah, ok, I missed that piece.

>> -TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
>> +TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
>>
>> huh, no please ;)  I suppose that's no longer necessary anyway now.
>>
>
> Yeah, fixed. :)
>
>> I'll leave the i386.c pieces to the x86 target maintainers to review.
>> They probably will change once the .md file changes are sorted out.
>
> If they ever going to be sorted out...

Well, we can move the conversion stuff to the point of expansion
using convert_move.  That'll keep the middle-end and the C frontend
clean and move the "hack" towards the backends.

Richard.

>> Thanks,
>> Richard.
>>
>
>
> Thanks,
> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-25 12:14                                                                                     ` Richard Guenther
@ 2011-08-25 13:29                                                                                       ` Artem Shinkarov
  2011-08-25 13:30                                                                                         ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-25 13:29 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Thu, Aug 25, 2011 at 12:39 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>>>> we discussed.
>>>>
>>>> So I think it is a right time to do something about vcond patterns,
>>>> which would allow me to get rid of conversions that I need to put all
>>>> over the code.
>>>>
>>>> Also at the moment the patch breaks lto frontend with a simple example:
>>>> #define vector(elcount, type)  \
>>>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>
>>>> int main (int argc, char *argv[]) {
>>>>    vector (4, float) f0;
>>>>    vector (4, float) f1;
>>>>
>>>>    f0 =  f1 != f0
>>>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>>>>
>>>>    return (int)f0[argc];
>>>> }
>>>>
>>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>>>>
>>>> I looked into the file, the conversion function is defined as
>>>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>>>> know what is the right way to treat the conversions.
>>>>
>>>> And I seriously need help with backend patterns.
>>>
>>> On the patch.
>>>
>>> The documentation needs review by a native english speaker, but here
>>> are some factual comments:
>>>
>>> +In C vector comparison is supported within standard comparison operators:
>>>
>>> it should read 'In GNU C' here and everywhere else as this is a GNU
>>> extension.
>>>
>>>  The result of the
>>> +comparison is a signed integer-type vector where the size of each
>>> +element must be the same as the size of compared vectors element.
>>>
>>> The result type of the comparison is determined by the C frontend,
>>> it isn't under control of the user.  What you are implying here is
>>> restrictions on vector assignments, which are documented elsewhere.
>>> I'd just say
>>>
>>> 'The result of the comparison is a vector of the same width and number
>>> of elements as the comparison operands with a signed integral element
>>> type.'
>>>
>>> +In addition to the vector comparison C supports conditional expressions
>>>
>>> See above.
>>>
>>> +For the convenience condition in the vector conditional can be just a
>>> +vector of signed integer type.
>>>
>>> 'of integer type.'  I don't see a reason to disallow unsigned integers,
>>> they can be equally well compared against zero.
>>
>> I'll have a final go on the documentation, it is untouched from the old patches.
>>
>>> Index: gcc/targhooks.h
>>> ===================================================================
>>> --- gcc/targhooks.h     (revision 177665)
>>> +++ gcc/targhooks.h     (working copy)
>>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization
>>>  extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>>>
>>>  extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
>>> +
>>>  extern bool
>>>  default_builtin_support_vector_misalignment (enum machine_mode mode,
>>>                                             const_tree,
>>>
>>> spurious whitespace change.
>>
>> Yes, thanks.
>>
>>> Index: gcc/optabs.c
>>> ===================================================================
>>> --- gcc/optabs.c        (revision 177665)
>>> +++ gcc/optabs.c        (working copy)
>>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
>>> ...
>>> +  else
>>> +    {
>>> +      rtx rtx_op0;
>>> +      rtx vec;
>>> +
>>> +      rtx_op0 = expand_normal (op0);
>>> +      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
>>> +      vec = CONST0_RTX (mode);
>>> +
>>> +      create_output_operand (&ops[0], target, mode);
>>> +      create_input_operand (&ops[1], rtx_op1, mode);
>>> +      create_input_operand (&ops[2], rtx_op2, mode);
>>> +      create_input_operand (&ops[3], comparison, mode);
>>> +      create_input_operand (&ops[4], rtx_op0, mode);
>>> +      create_input_operand (&ops[5], vec, mode);
>>>
>>> this still builds the fake(?) != comparison, but as you said you need help
>>> with the .md part if we want to use a machine specific pattern for this
>>> case (which we eventually want, for the sake of using XOP vcond).
>>
>> Yes, I am waiting for it. This is the only way at the moment to make
>> sure that in
>> m = a > b;
>> r = m ? c : d;
>>
>> m in the vcond is not transformed into the m != 0.
>>
>>> Index: gcc/target.h
>>> ===================================================================
>>> --- gcc/target.h        (revision 177665)
>>> +++ gcc/target.h        (working copy)
>>> @@ -51,6 +51,7 @@
>>>  #define GCC_TARGET_H
>>>
>>>  #include "insn-modes.h"
>>> +#include "gimple.h"
>>>
>>>  #ifdef ENABLE_CHECKING
>>>
>>> spurious change.
>>
>> Old stuff, fixed.
>>
>>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
>>>      floating-point, we can only do some of these simplifications.)  */
>>>   if (operand_equal_p (arg0, arg1, 0))
>>>     {
>>> +      tree arg0_type = TREE_TYPE (arg0);
>>> +
>>>       switch (code)
>>>        {
>>>        case EQ_EXPR:
>>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>> +         if (! FLOAT_TYPE_P (arg0_type)
>>> +             || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>>> ...
>>
>> Ok.
>>
>>>
>>> Likewise.
>>>
>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>>     case UNGE_EXPR:
>>>     case UNEQ_EXPR:
>>>     case LTGT_EXPR:
>>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>> +       {
>>> +         enum tree_code code = ops->code;
>>> +         tree arg0 = ops->op0;
>>> +         tree arg1 = ops->op1;
>>>
>>> move this code to do_store_flag (we really store a flag value).  It should
>>> also simply do what expand_vec_cond_expr does, probably simply
>>> calling that with the {-1,...} {0,...} extra args should work.
>>
>> I started to do that, but the code in do_store_flag is completely
>> different from what I am doing, and it looks confusing. I just call
>> expand_vec_cond_expr and that is it. I can write a separate function,
>> but the code is quite small.
>
> Hm?  I see in your patch
>
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  (revision 177665)
> +++ gcc/expr.c  (working copy)
> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>     case UNGE_EXPR:
>     case UNEQ_EXPR:
>     case LTGT_EXPR:
> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
> +       {
> +         enum tree_code code = ops->code;
> +         tree arg0 = ops->op0;
> +         tree arg1 = ops->op1;
> +         tree arg_type = TREE_TYPE (arg0);
> +         tree el_type = TREE_TYPE (arg_type);
> +         tree t, ifexp, if_true, if_false;
> +
> +         el_type = lang_hooks.types.type_for_size (TYPE_PRECISION
> (el_type), 0);
> +
> +
> +         ifexp = build2 (code, type, arg0, arg1);
> +         if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
> +         if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
> +
> +         if (arg_type != type)
> +           {
> +             if_true = convert (arg_type, if_true);
> +             if_false = convert (arg_type, if_false);
> +             t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
> +             t = convert (type, t);
> +           }
> +         else
> +           t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
> +
> +         return expand_expr (t,
> +                             modifier != EXPAND_STACK_PARM ? target :
> NULL_RTX,
> +                             tmode != VOIDmode ? tmode : mode,
> +                             modifier);
> +       }
>
> that's not exactly "calling expand_vec_cond_expr".

Well, actually it is. Keep in mind that clean backend would imply
removing the conversions. But I'll make a function.

>>>
>>> As for the still required conversions, you should be able to delay those
>>> from the C frontend (and here) to expand_vec_cond_expr by, after
>>> expanding op1 and op2, wrapping a subreg around it with a proper mode
>>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
>>> and then convert the result back to the original mode.
>>>
>>> I'll leave the C frontend pieces of the patch for review by Joseph, but
>>
>> Conversions are there until we fix the backend. When backend will be
>> able to digest f0 > f1 ? int0 : int1, all the conversions will go
>> away.
>>
>>> +static tree
>>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
>>>
>>> is missing a function comment.
>>
>> fixed.
>>
>>> +static tree
>>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
>>> +         tree bitpos, tree bitsize, enum tree_code code)
>>> +{
>>> +  tree cond;
>>> +  tree comp_type;
>>> +
>>> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
>>> +  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
>>> +
>>> +  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
>>> +
>>>
>>> Use
>>>
>>>  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0);
>>>
>>> instead.  But I think you don't want to use TYPE_PRECISION on
>>> FP types.  Instead you want a signed integer type of the same (mode)
>>> size as the vector element type, thus
>>>
>>>  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
>>> (TYPE_MODE (inner_type)), 0);
>>
>> Hm, I thought that at this stage we don't wan to know anything about
>> modes. I mean here I am really building the same integer type as the
>> operands of the comparison have. But I can use MODE_BITSIZE as well, I
>> don't think that it could happen that the size of the mode is
>> different from the size of the type. Or could it?
>
> The comparison could be on floating-point types where TYPE_PRECISION
> can be, for example, 80 for x87 doubles.  You want an integer type
> of the same width, so yes, GET_MODE_BITSIZE is the correct thing
> to use here.

Ok.

>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>
>>> the result type of a comparison is boolean_type_node, not comp_type.
>>>
>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
>>> +                    build_int_cst (comp_type, -1),
>>> +                    build_int_cst (comp_type, 0));
>>>
>>> writing this as
>>>
>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type,
>>>                     fold_build2 (code, boolean_type_node, a, b),
>>> +                    build_int_cst (comp_type, -1),
>>> +                    build_int_cst (comp_type, 0));
>>>
>>> will get the gimplifier a better chance at simplifcation.
>>>
>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>
>>> I think we are expecting the scalar type and the vector mode here
>>> from looking at the single existing caller.  It probably doesn't make
>>> a difference (we only check TYPE_UNSIGNED of it, which should
>>> also work for vector types), but let's be consistent.  Thus,
>>
>> Ok.
>>
>>>    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
>>>
>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>> +    t = expand_vector_piecewise (gsi, do_compare, type,
>>> +                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
>>> +  else
>>> +    t = gimplify_build2  (gsi, code, type, op0, op1);
>>>
>>> the else case looks odd.  Why re-build a stmt that already exists?
>>> Simply return NULL_TREE instead?
>>
>> I can adjust. The reason it is written that way is that
>> expand_vector_operations_1 is using the result of the function to
>> update rhs.
>
> Ok, so it should check whether there was any lowering done then.
>
>>> +static tree
>>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
>>> +{
>>> ...
>>> +      /* Expand vector condition inside of VEC_COND_EXPR.  */
>>> +      if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
>>> +                                   TYPE_MODE (TREE_TYPE (cond))))
>>> +       {
>>> ...
>>> +         new_rhs = expand_vector_piecewise (gsi, do_compare,
>>> +                                            TREE_TYPE (cond),
>>> +                                            TREE_TYPE (TREE_TYPE (op1)),
>>> +                                            op0, op1, TREE_CODE (cond));
>>>
>>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to
>>>
>>> tem = { a[0] < b[0] ? -1 : 0, ... }
>>> v0 & tem | v1 & ~tem;
>>>
>>> instead of
>>>
>>> { a[0] < b[0] ? v0[0] : v1[0], ... }
>>>
>>> even if the bitwise operations could be carried out using vectors.
>>> It's definitely beneficial to do the first if the CPU can create the
>>> bitmask.
>>>
>>
>> o_O
>>
>> I thought you always wanted to do (m & v0) | (~m & v1).
>> Do you want to have two cases of the expansion then -- when we have
>> mask available and when we don't? But it is really unlikely that we
>> can get the mask, but cannot get vcond. Because condition is actually
>> vcond. So once again -- do we always expand to {a[0] > b[0]  ? v[0] :
>> c[0], ...}?
>
> Hm, yeah.  I suppose with the current setup it's hard to only
> get the mask but not the full vcond ;)  So it probably makes
> sense to always expand to {a[0] > b[0]  ? v[0] :c[0],...} as
> fallback.  Sorry for the confusion ;)

Ok.

>>> +  /* Run vecower on the expresisons we have introduced.  */
>>> +  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
>>> +    expand_vector_operations_1 (&gsi_tmp);
>>>
>>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)
>>>
>>> +static bool
>>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
>>> +{
>>>
>>> This function is lacking a comment.
>>>
>>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
>>> ...
>>> +      /* Try to get rid from the useless vector comparison
>>> +        x != {0,0,...} which is inserted by the typechecker.  */
>>> +      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
>>>
>>> how and why?  You simply drop that comparison - that doesn't look
>>> correct.  And in fact TREE_OPERAND (cond, 0) will never be a
>>> comparison - that wouldn't be valid gimple.  Please leave this
>>> optimization to SSA based forward propagation (I can help you here
>>> once the patch is in).
>>
>> No-no-no. This is the second part of avoiding
>> m = a > b;
>> r = m ? v0 : v1;
>>
>> to prevent m expansion to m != {0}.
>>
>> I do not _simply_ drop the comparison. I drop it only if
>> is_vector_comparison returned true. It means that we can never get
>> into the situation that we are dropping actually a comparison inserted
>> by the user. But what I really want to achieve here is to drop the
>> comparison that the frontend inserts every time when it sees an
>> expression there.
>>
>> As I said earlier, tree forward propagation kicks only using -On, and
>> I would really like to make sure that I can get rid of useless != {0}
>> at any level.

> Please don't.  If the language extension forces a != 0 then it should
> appear at -O0.  The code is fishy anyway in the way it walks stmts
> in is_vector_comparison.  At least I don't like to see this optimization
> done here for the sake of -O0 in this initial patch - you could try
> arguing about it as a followup improvement (well, probably with not
> much luck).  -O0 is about compile-speed and debugging, doing
> data-flow by walking stmts backward is slow.

Ok, then I seriously don't see any motivation to support the
VEC_COND_EXPR. The following code:

m = a > b;
r = (m & v0) | (~m & v1)

gives me much more flexibility and  control. What the VEC_COND_EXPR is
good for? Syntactical sugar?

How about throwing away all the VEC_COND_EXPR parts supporting only
conditions (implicitly expressed using vconds)? If we would agree on
implicit conversions for real types, then this is a functionality that
perfectly satisfies my needs.

I don't see any interest from the backend people and I cannot wait
forever, so why don't we start with a simple thing?



Artem.

>>>
>>> +      if (expand_vec_cond_expr_p (TREE_TYPE (exp),
>>> +                                  TYPE_MODE (TREE_TYPE (exp))))
>>> +        {
>>> +         update_stmt (gsi_stmt (*gsi));
>>> +         return;
>>>
>>> no need to update the stmt when you do nothing.
>>>
>>> +      new_rhs = expand_vec_cond_expr_piecewise (gsi, exp);
>>> +      gimple_assign_set_rhs_from_tree (gsi, new_rhs);
>>> +      update_stmt (gsi_stmt (*gsi));
>>> +    }
>>>
>>> missing return;, just for clarity that you are done here.
>>
>> Ok.
>>
>>> You don't do anything for comparisons here, in case they are split
>>> away from the VEC_COND_EXPR by the gimplifier.  But if the
>>> target doesn't support VEC_COND_EXPRs we have to lower them.
>>> I suggest checking your testcases on i?86-linux (or with -m32 -march=i486).
>>>
>>
>> expand_vector_operations_1 take care about any vector comparison,
>> considering it as a binary operation. See expand_vector_operation and
>> do_compare for more details.
>
> Ah, ok, I missed that piece.
>
>>> -TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
>>> +TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h
>>>
>>> huh, no please ;)  I suppose that's no longer necessary anyway now.
>>>
>>
>> Yeah, fixed. :)
>>
>>> I'll leave the i386.c pieces to the x86 target maintainers to review.
>>> They probably will change once the .md file changes are sorted out.
>>
>> If they ever going to be sorted out...
>
> Well, we can move the conversion stuff to the point of expansion
> using convert_move.  That'll keep the middle-end and the C frontend
> clean and move the "hack" towards the backends.
>
> Richard.
>
>>> Thanks,
>>> Richard.
>>>
>>
>>
>> Thanks,
>> Artem.
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-25 13:29                                                                                       ` Artem Shinkarov
@ 2011-08-25 13:30                                                                                         ` Richard Guenther
  2011-08-25 13:31                                                                                           ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-25 13:30 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Thu, Aug 25, 2011 at 2:45 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 12:39 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>>>>> we discussed.
>>>>>
>>>>> So I think it is a right time to do something about vcond patterns,
>>>>> which would allow me to get rid of conversions that I need to put all
>>>>> over the code.
>>>>>
>>>>> Also at the moment the patch breaks lto frontend with a simple example:
>>>>> #define vector(elcount, type)  \
>>>>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>
>>>>> int main (int argc, char *argv[]) {
>>>>>    vector (4, float) f0;
>>>>>    vector (4, float) f1;
>>>>>
>>>>>    f0 =  f1 != f0
>>>>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>>>>>
>>>>>    return (int)f0[argc];
>>>>> }
>>>>>
>>>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>>>>>
>>>>> I looked into the file, the conversion function is defined as
>>>>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>>>>> know what is the right way to treat the conversions.
>>>>>
>>>>> And I seriously need help with backend patterns.
>>>>
>>>> On the patch.
>>>>
>>>> The documentation needs review by a native english speaker, but here
>>>> are some factual comments:
>>>>
>>>> +In C vector comparison is supported within standard comparison operators:
>>>>
>>>> it should read 'In GNU C' here and everywhere else as this is a GNU
>>>> extension.
>>>>
>>>>  The result of the
>>>> +comparison is a signed integer-type vector where the size of each
>>>> +element must be the same as the size of compared vectors element.
>>>>
>>>> The result type of the comparison is determined by the C frontend,
>>>> it isn't under control of the user.  What you are implying here is
>>>> restrictions on vector assignments, which are documented elsewhere.
>>>> I'd just say
>>>>
>>>> 'The result of the comparison is a vector of the same width and number
>>>> of elements as the comparison operands with a signed integral element
>>>> type.'
>>>>
>>>> +In addition to the vector comparison C supports conditional expressions
>>>>
>>>> See above.
>>>>
>>>> +For the convenience condition in the vector conditional can be just a
>>>> +vector of signed integer type.
>>>>
>>>> 'of integer type.'  I don't see a reason to disallow unsigned integers,
>>>> they can be equally well compared against zero.
>>>
>>> I'll have a final go on the documentation, it is untouched from the old patches.
>>>
>>>> Index: gcc/targhooks.h
>>>> ===================================================================
>>>> --- gcc/targhooks.h     (revision 177665)
>>>> +++ gcc/targhooks.h     (working copy)
>>>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization
>>>>  extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>>>>
>>>>  extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
>>>> +
>>>>  extern bool
>>>>  default_builtin_support_vector_misalignment (enum machine_mode mode,
>>>>                                             const_tree,
>>>>
>>>> spurious whitespace change.
>>>
>>> Yes, thanks.
>>>
>>>> Index: gcc/optabs.c
>>>> ===================================================================
>>>> --- gcc/optabs.c        (revision 177665)
>>>> +++ gcc/optabs.c        (working copy)
>>>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
>>>> ...
>>>> +  else
>>>> +    {
>>>> +      rtx rtx_op0;
>>>> +      rtx vec;
>>>> +
>>>> +      rtx_op0 = expand_normal (op0);
>>>> +      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
>>>> +      vec = CONST0_RTX (mode);
>>>> +
>>>> +      create_output_operand (&ops[0], target, mode);
>>>> +      create_input_operand (&ops[1], rtx_op1, mode);
>>>> +      create_input_operand (&ops[2], rtx_op2, mode);
>>>> +      create_input_operand (&ops[3], comparison, mode);
>>>> +      create_input_operand (&ops[4], rtx_op0, mode);
>>>> +      create_input_operand (&ops[5], vec, mode);
>>>>
>>>> this still builds the fake(?) != comparison, but as you said you need help
>>>> with the .md part if we want to use a machine specific pattern for this
>>>> case (which we eventually want, for the sake of using XOP vcond).
>>>
>>> Yes, I am waiting for it. This is the only way at the moment to make
>>> sure that in
>>> m = a > b;
>>> r = m ? c : d;
>>>
>>> m in the vcond is not transformed into the m != 0.
>>>
>>>> Index: gcc/target.h
>>>> ===================================================================
>>>> --- gcc/target.h        (revision 177665)
>>>> +++ gcc/target.h        (working copy)
>>>> @@ -51,6 +51,7 @@
>>>>  #define GCC_TARGET_H
>>>>
>>>>  #include "insn-modes.h"
>>>> +#include "gimple.h"
>>>>
>>>>  #ifdef ENABLE_CHECKING
>>>>
>>>> spurious change.
>>>
>>> Old stuff, fixed.
>>>
>>>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
>>>>      floating-point, we can only do some of these simplifications.)  */
>>>>   if (operand_equal_p (arg0, arg1, 0))
>>>>     {
>>>> +      tree arg0_type = TREE_TYPE (arg0);
>>>> +
>>>>       switch (code)
>>>>        {
>>>>        case EQ_EXPR:
>>>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>>> +         if (! FLOAT_TYPE_P (arg0_type)
>>>> +             || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>>>> ...
>>>
>>> Ok.
>>>
>>>>
>>>> Likewise.
>>>>
>>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>>>     case UNGE_EXPR:
>>>>     case UNEQ_EXPR:
>>>>     case LTGT_EXPR:
>>>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>>> +       {
>>>> +         enum tree_code code = ops->code;
>>>> +         tree arg0 = ops->op0;
>>>> +         tree arg1 = ops->op1;
>>>>
>>>> move this code to do_store_flag (we really store a flag value).  It should
>>>> also simply do what expand_vec_cond_expr does, probably simply
>>>> calling that with the {-1,...} {0,...} extra args should work.
>>>
>>> I started to do that, but the code in do_store_flag is completely
>>> different from what I am doing, and it looks confusing. I just call
>>> expand_vec_cond_expr and that is it. I can write a separate function,
>>> but the code is quite small.
>>
>> Hm?  I see in your patch
>>
>> Index: gcc/expr.c
>> ===================================================================
>> --- gcc/expr.c  (revision 177665)
>> +++ gcc/expr.c  (working copy)
>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>     case UNGE_EXPR:
>>     case UNEQ_EXPR:
>>     case LTGT_EXPR:
>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>> +       {
>> +         enum tree_code code = ops->code;
>> +         tree arg0 = ops->op0;
>> +         tree arg1 = ops->op1;
>> +         tree arg_type = TREE_TYPE (arg0);
>> +         tree el_type = TREE_TYPE (arg_type);
>> +         tree t, ifexp, if_true, if_false;
>> +
>> +         el_type = lang_hooks.types.type_for_size (TYPE_PRECISION
>> (el_type), 0);
>> +
>> +
>> +         ifexp = build2 (code, type, arg0, arg1);
>> +         if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
>> +         if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
>> +
>> +         if (arg_type != type)
>> +           {
>> +             if_true = convert (arg_type, if_true);
>> +             if_false = convert (arg_type, if_false);
>> +             t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
>> +             t = convert (type, t);
>> +           }
>> +         else
>> +           t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
>> +
>> +         return expand_expr (t,
>> +                             modifier != EXPAND_STACK_PARM ? target :
>> NULL_RTX,
>> +                             tmode != VOIDmode ? tmode : mode,
>> +                             modifier);
>> +       }
>>
>> that's not exactly "calling expand_vec_cond_expr".
>
> Well, actually it is. Keep in mind that clean backend would imply
> removing the conversions. But I'll make a function.

Why does

  return expand_vec_cond_expr (build2 (ops->code, type, ops->op0, ops->op1),
                                               build_vector_from_val
(type, build_int_cst (el_type, -1)),
                                               build_vector_from_val
(type, build_int_cst (el_type, 0)));

not work?  If you push the conversions to expand_vec_cond_expr
by doing them on RTL you simplify things here and remove the requirement
from doing them in the C frontend for VEC_COND_EXPR as well.

>>>>
>>>> As for the still required conversions, you should be able to delay those
>>>> from the C frontend (and here) to expand_vec_cond_expr by, after
>>>> expanding op1 and op2, wrapping a subreg around it with a proper mode
>>>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
>>>> and then convert the result back to the original mode.
>>>>
>>>> I'll leave the C frontend pieces of the patch for review by Joseph, but
>>>
>>> Conversions are there until we fix the backend. When backend will be
>>> able to digest f0 > f1 ? int0 : int1, all the conversions will go
>>> away.
>>>
>>>> +static tree
>>>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
>>>>
>>>> is missing a function comment.
>>>
>>> fixed.
>>>
>>>> +static tree
>>>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
>>>> +         tree bitpos, tree bitsize, enum tree_code code)
>>>> +{
>>>> +  tree cond;
>>>> +  tree comp_type;
>>>> +
>>>> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
>>>> +  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
>>>> +
>>>> +  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
>>>> +
>>>>
>>>> Use
>>>>
>>>>  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0);
>>>>
>>>> instead.  But I think you don't want to use TYPE_PRECISION on
>>>> FP types.  Instead you want a signed integer type of the same (mode)
>>>> size as the vector element type, thus
>>>>
>>>>  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
>>>> (TYPE_MODE (inner_type)), 0);
>>>
>>> Hm, I thought that at this stage we don't wan to know anything about
>>> modes. I mean here I am really building the same integer type as the
>>> operands of the comparison have. But I can use MODE_BITSIZE as well, I
>>> don't think that it could happen that the size of the mode is
>>> different from the size of the type. Or could it?
>>
>> The comparison could be on floating-point types where TYPE_PRECISION
>> can be, for example, 80 for x87 doubles.  You want an integer type
>> of the same width, so yes, GET_MODE_BITSIZE is the correct thing
>> to use here.
>
> Ok.
>
>>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>>
>>>> the result type of a comparison is boolean_type_node, not comp_type.
>>>>
>>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
>>>> +                    build_int_cst (comp_type, -1),
>>>> +                    build_int_cst (comp_type, 0));
>>>>
>>>> writing this as
>>>>
>>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type,
>>>>                     fold_build2 (code, boolean_type_node, a, b),
>>>> +                    build_int_cst (comp_type, -1),
>>>> +                    build_int_cst (comp_type, 0));
>>>>
>>>> will get the gimplifier a better chance at simplifcation.
>>>>
>>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>>
>>>> I think we are expecting the scalar type and the vector mode here
>>>> from looking at the single existing caller.  It probably doesn't make
>>>> a difference (we only check TYPE_UNSIGNED of it, which should
>>>> also work for vector types), but let's be consistent.  Thus,
>>>
>>> Ok.
>>>
>>>>    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
>>>>
>>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>> +    t = expand_vector_piecewise (gsi, do_compare, type,
>>>> +                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
>>>> +  else
>>>> +    t = gimplify_build2  (gsi, code, type, op0, op1);
>>>>
>>>> the else case looks odd.  Why re-build a stmt that already exists?
>>>> Simply return NULL_TREE instead?
>>>
>>> I can adjust. The reason it is written that way is that
>>> expand_vector_operations_1 is using the result of the function to
>>> update rhs.
>>
>> Ok, so it should check whether there was any lowering done then.
>>
>>>> +static tree
>>>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
>>>> +{
>>>> ...
>>>> +      /* Expand vector condition inside of VEC_COND_EXPR.  */
>>>> +      if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
>>>> +                                   TYPE_MODE (TREE_TYPE (cond))))
>>>> +       {
>>>> ...
>>>> +         new_rhs = expand_vector_piecewise (gsi, do_compare,
>>>> +                                            TREE_TYPE (cond),
>>>> +                                            TREE_TYPE (TREE_TYPE (op1)),
>>>> +                                            op0, op1, TREE_CODE (cond));
>>>>
>>>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to
>>>>
>>>> tem = { a[0] < b[0] ? -1 : 0, ... }
>>>> v0 & tem | v1 & ~tem;
>>>>
>>>> instead of
>>>>
>>>> { a[0] < b[0] ? v0[0] : v1[0], ... }
>>>>
>>>> even if the bitwise operations could be carried out using vectors.
>>>> It's definitely beneficial to do the first if the CPU can create the
>>>> bitmask.
>>>>
>>>
>>> o_O
>>>
>>> I thought you always wanted to do (m & v0) | (~m & v1).
>>> Do you want to have two cases of the expansion then -- when we have
>>> mask available and when we don't? But it is really unlikely that we
>>> can get the mask, but cannot get vcond. Because condition is actually
>>> vcond. So once again -- do we always expand to {a[0] > b[0]  ? v[0] :
>>> c[0], ...}?
>>
>> Hm, yeah.  I suppose with the current setup it's hard to only
>> get the mask but not the full vcond ;)  So it probably makes
>> sense to always expand to {a[0] > b[0]  ? v[0] :c[0],...} as
>> fallback.  Sorry for the confusion ;)
>
> Ok.
>
>>>> +  /* Run vecower on the expresisons we have introduced.  */
>>>> +  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
>>>> +    expand_vector_operations_1 (&gsi_tmp);
>>>>
>>>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)
>>>>
>>>> +static bool
>>>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
>>>> +{
>>>>
>>>> This function is lacking a comment.
>>>>
>>>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
>>>> ...
>>>> +      /* Try to get rid from the useless vector comparison
>>>> +        x != {0,0,...} which is inserted by the typechecker.  */
>>>> +      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
>>>>
>>>> how and why?  You simply drop that comparison - that doesn't look
>>>> correct.  And in fact TREE_OPERAND (cond, 0) will never be a
>>>> comparison - that wouldn't be valid gimple.  Please leave this
>>>> optimization to SSA based forward propagation (I can help you here
>>>> once the patch is in).
>>>
>>> No-no-no. This is the second part of avoiding
>>> m = a > b;
>>> r = m ? v0 : v1;
>>>
>>> to prevent m expansion to m != {0}.
>>>
>>> I do not _simply_ drop the comparison. I drop it only if
>>> is_vector_comparison returned true. It means that we can never get
>>> into the situation that we are dropping actually a comparison inserted
>>> by the user. But what I really want to achieve here is to drop the
>>> comparison that the frontend inserts every time when it sees an
>>> expression there.
>>>
>>> As I said earlier, tree forward propagation kicks only using -On, and
>>> I would really like to make sure that I can get rid of useless != {0}
>>> at any level.
>
>> Please don't.  If the language extension forces a != 0 then it should
>> appear at -O0.  The code is fishy anyway in the way it walks stmts
>> in is_vector_comparison.  At least I don't like to see this optimization
>> done here for the sake of -O0 in this initial patch - you could try
>> arguing about it as a followup improvement (well, probably with not
>> much luck).  -O0 is about compile-speed and debugging, doing
>> data-flow by walking stmts backward is slow.
>
> Ok, then I seriously don't see any motivation to support the
> VEC_COND_EXPR. The following code:
>
> m = a > b;
> r = (m & v0) | (~m & v1)
>
> gives me much more flexibility and  control. What the VEC_COND_EXPR is
> good for? Syntactical sugar?
>
> How about throwing away all the VEC_COND_EXPR parts supporting only
> conditions (implicitly expressed using vconds)? If we would agree on
> implicit conversions for real types, then this is a functionality that
> perfectly satisfies my needs.
>
> I don't see any interest from the backend people and I cannot wait
> forever, so why don't we start with a simple thing?

But the simple thing is already what the backend supports.

Richard.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-25 13:30                                                                                         ` Richard Guenther
@ 2011-08-25 13:31                                                                                           ` Artem Shinkarov
  2011-08-25 14:49                                                                                             ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-25 13:31 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Thu, Aug 25, 2011 at 2:00 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 2:45 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Thu, Aug 25, 2011 at 12:39 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>>>>>> we discussed.
>>>>>>
>>>>>> So I think it is a right time to do something about vcond patterns,
>>>>>> which would allow me to get rid of conversions that I need to put all
>>>>>> over the code.
>>>>>>
>>>>>> Also at the moment the patch breaks lto frontend with a simple example:
>>>>>> #define vector(elcount, type)  \
>>>>>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>>
>>>>>> int main (int argc, char *argv[]) {
>>>>>>    vector (4, float) f0;
>>>>>>    vector (4, float) f1;
>>>>>>
>>>>>>    f0 =  f1 != f0
>>>>>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>>>>>>
>>>>>>    return (int)f0[argc];
>>>>>> }
>>>>>>
>>>>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>>>>>>
>>>>>> I looked into the file, the conversion function is defined as
>>>>>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>>>>>> know what is the right way to treat the conversions.
>>>>>>
>>>>>> And I seriously need help with backend patterns.
>>>>>
>>>>> On the patch.
>>>>>
>>>>> The documentation needs review by a native english speaker, but here
>>>>> are some factual comments:
>>>>>
>>>>> +In C vector comparison is supported within standard comparison operators:
>>>>>
>>>>> it should read 'In GNU C' here and everywhere else as this is a GNU
>>>>> extension.
>>>>>
>>>>>  The result of the
>>>>> +comparison is a signed integer-type vector where the size of each
>>>>> +element must be the same as the size of compared vectors element.
>>>>>
>>>>> The result type of the comparison is determined by the C frontend,
>>>>> it isn't under control of the user.  What you are implying here is
>>>>> restrictions on vector assignments, which are documented elsewhere.
>>>>> I'd just say
>>>>>
>>>>> 'The result of the comparison is a vector of the same width and number
>>>>> of elements as the comparison operands with a signed integral element
>>>>> type.'
>>>>>
>>>>> +In addition to the vector comparison C supports conditional expressions
>>>>>
>>>>> See above.
>>>>>
>>>>> +For the convenience condition in the vector conditional can be just a
>>>>> +vector of signed integer type.
>>>>>
>>>>> 'of integer type.'  I don't see a reason to disallow unsigned integers,
>>>>> they can be equally well compared against zero.
>>>>
>>>> I'll have a final go on the documentation, it is untouched from the old patches.
>>>>
>>>>> Index: gcc/targhooks.h
>>>>> ===================================================================
>>>>> --- gcc/targhooks.h     (revision 177665)
>>>>> +++ gcc/targhooks.h     (working copy)
>>>>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization
>>>>>  extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>>>>>
>>>>>  extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
>>>>> +
>>>>>  extern bool
>>>>>  default_builtin_support_vector_misalignment (enum machine_mode mode,
>>>>>                                             const_tree,
>>>>>
>>>>> spurious whitespace change.
>>>>
>>>> Yes, thanks.
>>>>
>>>>> Index: gcc/optabs.c
>>>>> ===================================================================
>>>>> --- gcc/optabs.c        (revision 177665)
>>>>> +++ gcc/optabs.c        (working copy)
>>>>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
>>>>> ...
>>>>> +  else
>>>>> +    {
>>>>> +      rtx rtx_op0;
>>>>> +      rtx vec;
>>>>> +
>>>>> +      rtx_op0 = expand_normal (op0);
>>>>> +      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
>>>>> +      vec = CONST0_RTX (mode);
>>>>> +
>>>>> +      create_output_operand (&ops[0], target, mode);
>>>>> +      create_input_operand (&ops[1], rtx_op1, mode);
>>>>> +      create_input_operand (&ops[2], rtx_op2, mode);
>>>>> +      create_input_operand (&ops[3], comparison, mode);
>>>>> +      create_input_operand (&ops[4], rtx_op0, mode);
>>>>> +      create_input_operand (&ops[5], vec, mode);
>>>>>
>>>>> this still builds the fake(?) != comparison, but as you said you need help
>>>>> with the .md part if we want to use a machine specific pattern for this
>>>>> case (which we eventually want, for the sake of using XOP vcond).
>>>>
>>>> Yes, I am waiting for it. This is the only way at the moment to make
>>>> sure that in
>>>> m = a > b;
>>>> r = m ? c : d;
>>>>
>>>> m in the vcond is not transformed into the m != 0.
>>>>
>>>>> Index: gcc/target.h
>>>>> ===================================================================
>>>>> --- gcc/target.h        (revision 177665)
>>>>> +++ gcc/target.h        (working copy)
>>>>> @@ -51,6 +51,7 @@
>>>>>  #define GCC_TARGET_H
>>>>>
>>>>>  #include "insn-modes.h"
>>>>> +#include "gimple.h"
>>>>>
>>>>>  #ifdef ENABLE_CHECKING
>>>>>
>>>>> spurious change.
>>>>
>>>> Old stuff, fixed.
>>>>
>>>>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
>>>>>      floating-point, we can only do some of these simplifications.)  */
>>>>>   if (operand_equal_p (arg0, arg1, 0))
>>>>>     {
>>>>> +      tree arg0_type = TREE_TYPE (arg0);
>>>>> +
>>>>>       switch (code)
>>>>>        {
>>>>>        case EQ_EXPR:
>>>>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>>>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>>>> +         if (! FLOAT_TYPE_P (arg0_type)
>>>>> +             || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>>>>> ...
>>>>
>>>> Ok.
>>>>
>>>>>
>>>>> Likewise.
>>>>>
>>>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>>>>     case UNGE_EXPR:
>>>>>     case UNEQ_EXPR:
>>>>>     case LTGT_EXPR:
>>>>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>>>> +       {
>>>>> +         enum tree_code code = ops->code;
>>>>> +         tree arg0 = ops->op0;
>>>>> +         tree arg1 = ops->op1;
>>>>>
>>>>> move this code to do_store_flag (we really store a flag value).  It should
>>>>> also simply do what expand_vec_cond_expr does, probably simply
>>>>> calling that with the {-1,...} {0,...} extra args should work.
>>>>
>>>> I started to do that, but the code in do_store_flag is completely
>>>> different from what I am doing, and it looks confusing. I just call
>>>> expand_vec_cond_expr and that is it. I can write a separate function,
>>>> but the code is quite small.
>>>
>>> Hm?  I see in your patch
>>>
>>> Index: gcc/expr.c
>>> ===================================================================
>>> --- gcc/expr.c  (revision 177665)
>>> +++ gcc/expr.c  (working copy)
>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>>     case UNGE_EXPR:
>>>     case UNEQ_EXPR:
>>>     case LTGT_EXPR:
>>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>> +       {
>>> +         enum tree_code code = ops->code;
>>> +         tree arg0 = ops->op0;
>>> +         tree arg1 = ops->op1;
>>> +         tree arg_type = TREE_TYPE (arg0);
>>> +         tree el_type = TREE_TYPE (arg_type);
>>> +         tree t, ifexp, if_true, if_false;
>>> +
>>> +         el_type = lang_hooks.types.type_for_size (TYPE_PRECISION
>>> (el_type), 0);
>>> +
>>> +
>>> +         ifexp = build2 (code, type, arg0, arg1);
>>> +         if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
>>> +         if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
>>> +
>>> +         if (arg_type != type)
>>> +           {
>>> +             if_true = convert (arg_type, if_true);
>>> +             if_false = convert (arg_type, if_false);
>>> +             t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
>>> +             t = convert (type, t);
>>> +           }
>>> +         else
>>> +           t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
>>> +
>>> +         return expand_expr (t,
>>> +                             modifier != EXPAND_STACK_PARM ? target :
>>> NULL_RTX,
>>> +                             tmode != VOIDmode ? tmode : mode,
>>> +                             modifier);
>>> +       }
>>>
>>> that's not exactly "calling expand_vec_cond_expr".
>>
>> Well, actually it is. Keep in mind that clean backend would imply
>> removing the conversions. But I'll make a function.
>
> Why does
>
>  return expand_vec_cond_expr (build2 (ops->code, type, ops->op0, ops->op1),
>                                               build_vector_from_val
> (type, build_int_cst (el_type, -1)),
>                                               build_vector_from_val
> (type, build_int_cst (el_type, 0)));
>
> not work?  If you push the conversions to expand_vec_cond_expr
> by doing them on RTL you simplify things here and remove the requirement
> from doing them in the C frontend for VEC_COND_EXPR as well.

It does not work because vcond <a > b, c, d> requires a,b,c,d to have
the same type. Now here we are dealing only with comparisons, so in
case of floats we have vcond < f0 > f1, {-1,...}, {0,...}> which we
have to transform into
(vsi)(vcond< f0 >f1, (vsf){-1,...}, (vsf){0,...}>).

Ok, so is it ok to do make this conversion here for the real types?

>>>>>
>>>>> As for the still required conversions, you should be able to delay those
>>>>> from the C frontend (and here) to expand_vec_cond_expr by, after
>>>>> expanding op1 and op2, wrapping a subreg around it with a proper mode
>>>>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
>>>>> and then convert the result back to the original mode.
>>>>>
>>>>> I'll leave the C frontend pieces of the patch for review by Joseph, but
>>>>
>>>> Conversions are there until we fix the backend. When backend will be
>>>> able to digest f0 > f1 ? int0 : int1, all the conversions will go
>>>> away.
>>>>
>>>>> +static tree
>>>>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
>>>>>
>>>>> is missing a function comment.
>>>>
>>>> fixed.
>>>>
>>>>> +static tree
>>>>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
>>>>> +         tree bitpos, tree bitsize, enum tree_code code)
>>>>> +{
>>>>> +  tree cond;
>>>>> +  tree comp_type;
>>>>> +
>>>>> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
>>>>> +  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
>>>>> +
>>>>> +  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
>>>>> +
>>>>>
>>>>> Use
>>>>>
>>>>>  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0);
>>>>>
>>>>> instead.  But I think you don't want to use TYPE_PRECISION on
>>>>> FP types.  Instead you want a signed integer type of the same (mode)
>>>>> size as the vector element type, thus
>>>>>
>>>>>  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
>>>>> (TYPE_MODE (inner_type)), 0);
>>>>
>>>> Hm, I thought that at this stage we don't wan to know anything about
>>>> modes. I mean here I am really building the same integer type as the
>>>> operands of the comparison have. But I can use MODE_BITSIZE as well, I
>>>> don't think that it could happen that the size of the mode is
>>>> different from the size of the type. Or could it?
>>>
>>> The comparison could be on floating-point types where TYPE_PRECISION
>>> can be, for example, 80 for x87 doubles.  You want an integer type
>>> of the same width, so yes, GET_MODE_BITSIZE is the correct thing
>>> to use here.
>>
>> Ok.
>>
>>>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>>>
>>>>> the result type of a comparison is boolean_type_node, not comp_type.
>>>>>
>>>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
>>>>> +                    build_int_cst (comp_type, -1),
>>>>> +                    build_int_cst (comp_type, 0));
>>>>>
>>>>> writing this as
>>>>>
>>>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type,
>>>>>                     fold_build2 (code, boolean_type_node, a, b),
>>>>> +                    build_int_cst (comp_type, -1),
>>>>> +                    build_int_cst (comp_type, 0));
>>>>>
>>>>> will get the gimplifier a better chance at simplifcation.
>>>>>
>>>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>>>
>>>>> I think we are expecting the scalar type and the vector mode here
>>>>> from looking at the single existing caller.  It probably doesn't make
>>>>> a difference (we only check TYPE_UNSIGNED of it, which should
>>>>> also work for vector types), but let's be consistent.  Thus,
>>>>
>>>> Ok.
>>>>
>>>>>    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
>>>>>
>>>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>>> +    t = expand_vector_piecewise (gsi, do_compare, type,
>>>>> +                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
>>>>> +  else
>>>>> +    t = gimplify_build2  (gsi, code, type, op0, op1);
>>>>>
>>>>> the else case looks odd.  Why re-build a stmt that already exists?
>>>>> Simply return NULL_TREE instead?
>>>>
>>>> I can adjust. The reason it is written that way is that
>>>> expand_vector_operations_1 is using the result of the function to
>>>> update rhs.
>>>
>>> Ok, so it should check whether there was any lowering done then.
>>>
>>>>> +static tree
>>>>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
>>>>> +{
>>>>> ...
>>>>> +      /* Expand vector condition inside of VEC_COND_EXPR.  */
>>>>> +      if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
>>>>> +                                   TYPE_MODE (TREE_TYPE (cond))))
>>>>> +       {
>>>>> ...
>>>>> +         new_rhs = expand_vector_piecewise (gsi, do_compare,
>>>>> +                                            TREE_TYPE (cond),
>>>>> +                                            TREE_TYPE (TREE_TYPE (op1)),
>>>>> +                                            op0, op1, TREE_CODE (cond));
>>>>>
>>>>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to
>>>>>
>>>>> tem = { a[0] < b[0] ? -1 : 0, ... }
>>>>> v0 & tem | v1 & ~tem;
>>>>>
>>>>> instead of
>>>>>
>>>>> { a[0] < b[0] ? v0[0] : v1[0], ... }
>>>>>
>>>>> even if the bitwise operations could be carried out using vectors.
>>>>> It's definitely beneficial to do the first if the CPU can create the
>>>>> bitmask.
>>>>>
>>>>
>>>> o_O
>>>>
>>>> I thought you always wanted to do (m & v0) | (~m & v1).
>>>> Do you want to have two cases of the expansion then -- when we have
>>>> mask available and when we don't? But it is really unlikely that we
>>>> can get the mask, but cannot get vcond. Because condition is actually
>>>> vcond. So once again -- do we always expand to {a[0] > b[0]  ? v[0] :
>>>> c[0], ...}?
>>>
>>> Hm, yeah.  I suppose with the current setup it's hard to only
>>> get the mask but not the full vcond ;)  So it probably makes
>>> sense to always expand to {a[0] > b[0]  ? v[0] :c[0],...} as
>>> fallback.  Sorry for the confusion ;)
>>
>> Ok.
>>
>>>>> +  /* Run vecower on the expresisons we have introduced.  */
>>>>> +  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
>>>>> +    expand_vector_operations_1 (&gsi_tmp);
>>>>>
>>>>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)
>>>>>
>>>>> +static bool
>>>>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
>>>>> +{
>>>>>
>>>>> This function is lacking a comment.
>>>>>
>>>>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
>>>>> ...
>>>>> +      /* Try to get rid from the useless vector comparison
>>>>> +        x != {0,0,...} which is inserted by the typechecker.  */
>>>>> +      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
>>>>>
>>>>> how and why?  You simply drop that comparison - that doesn't look
>>>>> correct.  And in fact TREE_OPERAND (cond, 0) will never be a
>>>>> comparison - that wouldn't be valid gimple.  Please leave this
>>>>> optimization to SSA based forward propagation (I can help you here
>>>>> once the patch is in).
>>>>
>>>> No-no-no. This is the second part of avoiding
>>>> m = a > b;
>>>> r = m ? v0 : v1;
>>>>
>>>> to prevent m expansion to m != {0}.
>>>>
>>>> I do not _simply_ drop the comparison. I drop it only if
>>>> is_vector_comparison returned true. It means that we can never get
>>>> into the situation that we are dropping actually a comparison inserted
>>>> by the user. But what I really want to achieve here is to drop the
>>>> comparison that the frontend inserts every time when it sees an
>>>> expression there.
>>>>
>>>> As I said earlier, tree forward propagation kicks only using -On, and
>>>> I would really like to make sure that I can get rid of useless != {0}
>>>> at any level.
>>
>>> Please don't.  If the language extension forces a != 0 then it should
>>> appear at -O0.  The code is fishy anyway in the way it walks stmts
>>> in is_vector_comparison.  At least I don't like to see this optimization
>>> done here for the sake of -O0 in this initial patch - you could try
>>> arguing about it as a followup improvement (well, probably with not
>>> much luck).  -O0 is about compile-speed and debugging, doing
>>> data-flow by walking stmts backward is slow.
>>
>> Ok, then I seriously don't see any motivation to support the
>> VEC_COND_EXPR. The following code:
>>
>> m = a > b;
>> r = (m & v0) | (~m & v1)
>>
>> gives me much more flexibility and  control. What the VEC_COND_EXPR is
>> good for? Syntactical sugar?
>>
>> How about throwing away all the VEC_COND_EXPR parts supporting only
>> conditions (implicitly expressed using vconds)? If we would agree on
>> implicit conversions for real types, then this is a functionality that
>> perfectly satisfies my needs.
>>
>> I don't see any interest from the backend people and I cannot wait
>> forever, so why don't we start with a simple thing?
>
> But the simple thing is already what the backend supports.
>
> Richard.
>

Well, it is not "what" it is "how" -- that is what we are discussing
for three weeks already.

Ok, so the question now is, whether it is fine to have conversions
inside expand_expr_real_2? If we agree that it is ok to do, then I can
adjust the patch.


Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-25 13:31                                                                                           ` Artem Shinkarov
@ 2011-08-25 14:49                                                                                             ` Richard Guenther
  2011-08-27 10:50                                                                                               ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-25 14:49 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

On Thu, Aug 25, 2011 at 3:15 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 2:00 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Thu, Aug 25, 2011 at 2:45 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> On Thu, Aug 25, 2011 at 12:39 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Thu, Aug 25, 2011 at 1:07 PM, Artem Shinkarov
>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>> On Thu, Aug 25, 2011 at 11:09 AM, Richard Guenther
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Thu, Aug 25, 2011 at 8:20 AM, Artem Shinkarov
>>>>>> <artyom.shinkaroff@gmail.com> wrote:
>>>>>>> Here is a cleaned-up patch without the hook. Mostly it works in a way
>>>>>>> we discussed.
>>>>>>>
>>>>>>> So I think it is a right time to do something about vcond patterns,
>>>>>>> which would allow me to get rid of conversions that I need to put all
>>>>>>> over the code.
>>>>>>>
>>>>>>> Also at the moment the patch breaks lto frontend with a simple example:
>>>>>>> #define vector(elcount, type)  \
>>>>>>> __attribute__((vector_size((elcount)*sizeof(type)))) type
>>>>>>>
>>>>>>> int main (int argc, char *argv[]) {
>>>>>>>    vector (4, float) f0;
>>>>>>>    vector (4, float) f1;
>>>>>>>
>>>>>>>    f0 =  f1 != f0
>>>>>>>          ? (vector (4, float)){-1,-1,-1,-1} : (vector (4, float)){0,0,0,0};
>>>>>>>
>>>>>>>    return (int)f0[argc];
>>>>>>> }
>>>>>>>
>>>>>>> test-lto.c:8:14: internal compiler error: in convert, at lto/lto-lang.c:1244
>>>>>>>
>>>>>>> I looked into the file, the conversion function is defined as
>>>>>>> gcc_unreachable (). I am not very familiar with lto, so I don't really
>>>>>>> know what is the right way to treat the conversions.
>>>>>>>
>>>>>>> And I seriously need help with backend patterns.
>>>>>>
>>>>>> On the patch.
>>>>>>
>>>>>> The documentation needs review by a native english speaker, but here
>>>>>> are some factual comments:
>>>>>>
>>>>>> +In C vector comparison is supported within standard comparison operators:
>>>>>>
>>>>>> it should read 'In GNU C' here and everywhere else as this is a GNU
>>>>>> extension.
>>>>>>
>>>>>>  The result of the
>>>>>> +comparison is a signed integer-type vector where the size of each
>>>>>> +element must be the same as the size of compared vectors element.
>>>>>>
>>>>>> The result type of the comparison is determined by the C frontend,
>>>>>> it isn't under control of the user.  What you are implying here is
>>>>>> restrictions on vector assignments, which are documented elsewhere.
>>>>>> I'd just say
>>>>>>
>>>>>> 'The result of the comparison is a vector of the same width and number
>>>>>> of elements as the comparison operands with a signed integral element
>>>>>> type.'
>>>>>>
>>>>>> +In addition to the vector comparison C supports conditional expressions
>>>>>>
>>>>>> See above.
>>>>>>
>>>>>> +For the convenience condition in the vector conditional can be just a
>>>>>> +vector of signed integer type.
>>>>>>
>>>>>> 'of integer type.'  I don't see a reason to disallow unsigned integers,
>>>>>> they can be equally well compared against zero.
>>>>>
>>>>> I'll have a final go on the documentation, it is untouched from the old patches.
>>>>>
>>>>>> Index: gcc/targhooks.h
>>>>>> ===================================================================
>>>>>> --- gcc/targhooks.h     (revision 177665)
>>>>>> +++ gcc/targhooks.h     (working copy)
>>>>>> @@ -86,6 +86,7 @@ extern int default_builtin_vectorization
>>>>>>  extern tree default_builtin_reciprocal (unsigned int, bool, bool);
>>>>>>
>>>>>>  extern bool default_builtin_vector_alignment_reachable (const_tree, bool);
>>>>>> +
>>>>>>  extern bool
>>>>>>  default_builtin_support_vector_misalignment (enum machine_mode mode,
>>>>>>                                             const_tree,
>>>>>>
>>>>>> spurious whitespace change.
>>>>>
>>>>> Yes, thanks.
>>>>>
>>>>>> Index: gcc/optabs.c
>>>>>> ===================================================================
>>>>>> --- gcc/optabs.c        (revision 177665)
>>>>>> +++ gcc/optabs.c        (working copy)
>>>>>> @@ -6572,16 +6572,36 @@ expand_vec_cond_expr (tree vec_cond_type
>>>>>> ...
>>>>>> +  else
>>>>>> +    {
>>>>>> +      rtx rtx_op0;
>>>>>> +      rtx vec;
>>>>>> +
>>>>>> +      rtx_op0 = expand_normal (op0);
>>>>>> +      comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX);
>>>>>> +      vec = CONST0_RTX (mode);
>>>>>> +
>>>>>> +      create_output_operand (&ops[0], target, mode);
>>>>>> +      create_input_operand (&ops[1], rtx_op1, mode);
>>>>>> +      create_input_operand (&ops[2], rtx_op2, mode);
>>>>>> +      create_input_operand (&ops[3], comparison, mode);
>>>>>> +      create_input_operand (&ops[4], rtx_op0, mode);
>>>>>> +      create_input_operand (&ops[5], vec, mode);
>>>>>>
>>>>>> this still builds the fake(?) != comparison, but as you said you need help
>>>>>> with the .md part if we want to use a machine specific pattern for this
>>>>>> case (which we eventually want, for the sake of using XOP vcond).
>>>>>
>>>>> Yes, I am waiting for it. This is the only way at the moment to make
>>>>> sure that in
>>>>> m = a > b;
>>>>> r = m ? c : d;
>>>>>
>>>>> m in the vcond is not transformed into the m != 0.
>>>>>
>>>>>> Index: gcc/target.h
>>>>>> ===================================================================
>>>>>> --- gcc/target.h        (revision 177665)
>>>>>> +++ gcc/target.h        (working copy)
>>>>>> @@ -51,6 +51,7 @@
>>>>>>  #define GCC_TARGET_H
>>>>>>
>>>>>>  #include "insn-modes.h"
>>>>>> +#include "gimple.h"
>>>>>>
>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>
>>>>>> spurious change.
>>>>>
>>>>> Old stuff, fixed.
>>>>>
>>>>>> @@ -9073,26 +9082,28 @@ fold_comparison (location_t loc, enum tr
>>>>>>      floating-point, we can only do some of these simplifications.)  */
>>>>>>   if (operand_equal_p (arg0, arg1, 0))
>>>>>>     {
>>>>>> +      tree arg0_type = TREE_TYPE (arg0);
>>>>>> +
>>>>>>       switch (code)
>>>>>>        {
>>>>>>        case EQ_EXPR:
>>>>>> -         if (! FLOAT_TYPE_P (TREE_TYPE (arg0))
>>>>>> -             || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0))))
>>>>>> +         if (! FLOAT_TYPE_P (arg0_type)
>>>>>> +             || ! HONOR_NANS (TYPE_MODE (arg0_type)))
>>>>>> ...
>>>>>
>>>>> Ok.
>>>>>
>>>>>>
>>>>>> Likewise.
>>>>>>
>>>>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>>>>>     case UNGE_EXPR:
>>>>>>     case UNEQ_EXPR:
>>>>>>     case LTGT_EXPR:
>>>>>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>>>>> +       {
>>>>>> +         enum tree_code code = ops->code;
>>>>>> +         tree arg0 = ops->op0;
>>>>>> +         tree arg1 = ops->op1;
>>>>>>
>>>>>> move this code to do_store_flag (we really store a flag value).  It should
>>>>>> also simply do what expand_vec_cond_expr does, probably simply
>>>>>> calling that with the {-1,...} {0,...} extra args should work.
>>>>>
>>>>> I started to do that, but the code in do_store_flag is completely
>>>>> different from what I am doing, and it looks confusing. I just call
>>>>> expand_vec_cond_expr and that is it. I can write a separate function,
>>>>> but the code is quite small.
>>>>
>>>> Hm?  I see in your patch
>>>>
>>>> Index: gcc/expr.c
>>>> ===================================================================
>>>> --- gcc/expr.c  (revision 177665)
>>>> +++ gcc/expr.c  (working copy)
>>>> @@ -8440,6 +8440,37 @@ expand_expr_real_2 (sepops ops, rtx targ
>>>>     case UNGE_EXPR:
>>>>     case UNEQ_EXPR:
>>>>     case LTGT_EXPR:
>>>> +      if (TREE_CODE (ops->type) == VECTOR_TYPE)
>>>> +       {
>>>> +         enum tree_code code = ops->code;
>>>> +         tree arg0 = ops->op0;
>>>> +         tree arg1 = ops->op1;
>>>> +         tree arg_type = TREE_TYPE (arg0);
>>>> +         tree el_type = TREE_TYPE (arg_type);
>>>> +         tree t, ifexp, if_true, if_false;
>>>> +
>>>> +         el_type = lang_hooks.types.type_for_size (TYPE_PRECISION
>>>> (el_type), 0);
>>>> +
>>>> +
>>>> +         ifexp = build2 (code, type, arg0, arg1);
>>>> +         if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
>>>> +         if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
>>>> +
>>>> +         if (arg_type != type)
>>>> +           {
>>>> +             if_true = convert (arg_type, if_true);
>>>> +             if_false = convert (arg_type, if_false);
>>>> +             t = build3 (VEC_COND_EXPR, arg_type, ifexp, if_true, if_false);
>>>> +             t = convert (type, t);
>>>> +           }
>>>> +         else
>>>> +           t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
>>>> +
>>>> +         return expand_expr (t,
>>>> +                             modifier != EXPAND_STACK_PARM ? target :
>>>> NULL_RTX,
>>>> +                             tmode != VOIDmode ? tmode : mode,
>>>> +                             modifier);
>>>> +       }
>>>>
>>>> that's not exactly "calling expand_vec_cond_expr".
>>>
>>> Well, actually it is. Keep in mind that clean backend would imply
>>> removing the conversions. But I'll make a function.
>>
>> Why does
>>
>>  return expand_vec_cond_expr (build2 (ops->code, type, ops->op0, ops->op1),
>>                                               build_vector_from_val
>> (type, build_int_cst (el_type, -1)),
>>                                               build_vector_from_val
>> (type, build_int_cst (el_type, 0)));
>>
>> not work?  If you push the conversions to expand_vec_cond_expr
>> by doing them on RTL you simplify things here and remove the requirement
>> from doing them in the C frontend for VEC_COND_EXPR as well.
>
> It does not work because vcond <a > b, c, d> requires a,b,c,d to have
> the same type. Now here we are dealing only with comparisons, so in
> case of floats we have vcond < f0 > f1, {-1,...}, {0,...}> which we
> have to transform into
> (vsi)(vcond< f0 >f1, (vsf){-1,...}, (vsf){0,...}>).
>
> Ok, so is it ok to do make this conversion here for the real types?
>
>>>>>>
>>>>>> As for the still required conversions, you should be able to delay those
>>>>>> from the C frontend (and here) to expand_vec_cond_expr by, after
>>>>>> expanding op1 and op2, wrapping a subreg around it with a proper mode
>>>>>> (using convert_mode (GET_MODE (comparison), rtx_op1)) should work),
>>>>>> and then convert the result back to the original mode.
>>>>>>
>>>>>> I'll leave the C frontend pieces of the patch for review by Joseph, but
>>>>>
>>>>> Conversions are there until we fix the backend. When backend will be
>>>>> able to digest f0 > f1 ? int0 : int1, all the conversions will go
>>>>> away.
>>>>>
>>>>>> +static tree
>>>>>> +fold_build_vec_cond_expr (tree ifexp, tree op1, tree op2)
>>>>>>
>>>>>> is missing a function comment.
>>>>>
>>>>> fixed.
>>>>>
>>>>>> +static tree
>>>>>> +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
>>>>>> +         tree bitpos, tree bitsize, enum tree_code code)
>>>>>> +{
>>>>>> +  tree cond;
>>>>>> +  tree comp_type;
>>>>>> +
>>>>>> +  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
>>>>>> +  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
>>>>>> +
>>>>>> +  comp_type = lang_hooks.types.type_for_size (TYPE_PRECISION (inner_type), 0);
>>>>>> +
>>>>>>
>>>>>> Use
>>>>>>
>>>>>>  comp_type = build_nonstandard_integer_type (TYPE_PRECISION (inner_type), 0);
>>>>>>
>>>>>> instead.  But I think you don't want to use TYPE_PRECISION on
>>>>>> FP types.  Instead you want a signed integer type of the same (mode)
>>>>>> size as the vector element type, thus
>>>>>>
>>>>>>  comp_type = build_nonstandard_integer_type (GET_MODE_BITSIZE
>>>>>> (TYPE_MODE (inner_type)), 0);
>>>>>
>>>>> Hm, I thought that at this stage we don't wan to know anything about
>>>>> modes. I mean here I am really building the same integer type as the
>>>>> operands of the comparison have. But I can use MODE_BITSIZE as well, I
>>>>> don't think that it could happen that the size of the mode is
>>>>> different from the size of the type. Or could it?
>>>>
>>>> The comparison could be on floating-point types where TYPE_PRECISION
>>>> can be, for example, 80 for x87 doubles.  You want an integer type
>>>> of the same width, so yes, GET_MODE_BITSIZE is the correct thing
>>>> to use here.
>>>
>>> Ok.
>>>
>>>>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>>>>
>>>>>> the result type of a comparison is boolean_type_node, not comp_type.
>>>>>>
>>>>>> +  cond = gimplify_build2 (gsi, code, comp_type, a, b);
>>>>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type, cond,
>>>>>> +                    build_int_cst (comp_type, -1),
>>>>>> +                    build_int_cst (comp_type, 0));
>>>>>>
>>>>>> writing this as
>>>>>>
>>>>>> +  return gimplify_build3 (gsi, COND_EXPR, comp_type,
>>>>>>                     fold_build2 (code, boolean_type_node, a, b),
>>>>>> +                    build_int_cst (comp_type, -1),
>>>>>> +                    build_int_cst (comp_type, 0));
>>>>>>
>>>>>> will get the gimplifier a better chance at simplifcation.
>>>>>>
>>>>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>>>>
>>>>>> I think we are expecting the scalar type and the vector mode here
>>>>>> from looking at the single existing caller.  It probably doesn't make
>>>>>> a difference (we only check TYPE_UNSIGNED of it, which should
>>>>>> also work for vector types), but let's be consistent.  Thus,
>>>>>
>>>>> Ok.
>>>>>
>>>>>>    if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
>>>>>>
>>>>>> +  if (! expand_vec_cond_expr_p (type, TYPE_MODE (type)))
>>>>>> +    t = expand_vector_piecewise (gsi, do_compare, type,
>>>>>> +                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
>>>>>> +  else
>>>>>> +    t = gimplify_build2  (gsi, code, type, op0, op1);
>>>>>>
>>>>>> the else case looks odd.  Why re-build a stmt that already exists?
>>>>>> Simply return NULL_TREE instead?
>>>>>
>>>>> I can adjust. The reason it is written that way is that
>>>>> expand_vector_operations_1 is using the result of the function to
>>>>> update rhs.
>>>>
>>>> Ok, so it should check whether there was any lowering done then.
>>>>
>>>>>> +static tree
>>>>>> +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp)
>>>>>> +{
>>>>>> ...
>>>>>> +      /* Expand vector condition inside of VEC_COND_EXPR.  */
>>>>>> +      if (! expand_vec_cond_expr_p (TREE_TYPE (cond),
>>>>>> +                                   TYPE_MODE (TREE_TYPE (cond))))
>>>>>> +       {
>>>>>> ...
>>>>>> +         new_rhs = expand_vector_piecewise (gsi, do_compare,
>>>>>> +                                            TREE_TYPE (cond),
>>>>>> +                                            TREE_TYPE (TREE_TYPE (op1)),
>>>>>> +                                            op0, op1, TREE_CODE (cond));
>>>>>>
>>>>>> I'm not sure it is beneficial to expand a < b ? v0 : v1 to
>>>>>>
>>>>>> tem = { a[0] < b[0] ? -1 : 0, ... }
>>>>>> v0 & tem | v1 & ~tem;
>>>>>>
>>>>>> instead of
>>>>>>
>>>>>> { a[0] < b[0] ? v0[0] : v1[0], ... }
>>>>>>
>>>>>> even if the bitwise operations could be carried out using vectors.
>>>>>> It's definitely beneficial to do the first if the CPU can create the
>>>>>> bitmask.
>>>>>>
>>>>>
>>>>> o_O
>>>>>
>>>>> I thought you always wanted to do (m & v0) | (~m & v1).
>>>>> Do you want to have two cases of the expansion then -- when we have
>>>>> mask available and when we don't? But it is really unlikely that we
>>>>> can get the mask, but cannot get vcond. Because condition is actually
>>>>> vcond. So once again -- do we always expand to {a[0] > b[0]  ? v[0] :
>>>>> c[0], ...}?
>>>>
>>>> Hm, yeah.  I suppose with the current setup it's hard to only
>>>> get the mask but not the full vcond ;)  So it probably makes
>>>> sense to always expand to {a[0] > b[0]  ? v[0] :c[0],...} as
>>>> fallback.  Sorry for the confusion ;)
>>>
>>> Ok.
>>>
>>>>>> +  /* Run vecower on the expresisons we have introduced.  */
>>>>>> +  for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp))
>>>>>> +    expand_vector_operations_1 (&gsi_tmp);
>>>>>>
>>>>>> do not use gsi.ptr directly, use gsi_stmt (gsi_tm) != gsi_stmt (gsi)
>>>>>>
>>>>>> +static bool
>>>>>> +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr)
>>>>>> +{
>>>>>>
>>>>>> This function is lacking a comment.
>>>>>>
>>>>>> @@ -450,11 +637,41 @@ expand_vector_operations_1 (gimple_stmt_
>>>>>> ...
>>>>>> +      /* Try to get rid from the useless vector comparison
>>>>>> +        x != {0,0,...} which is inserted by the typechecker.  */
>>>>>> +      if (COMPARISON_CLASS_P (cond) && TREE_CODE (cond) == NE_EXPR)
>>>>>>
>>>>>> how and why?  You simply drop that comparison - that doesn't look
>>>>>> correct.  And in fact TREE_OPERAND (cond, 0) will never be a
>>>>>> comparison - that wouldn't be valid gimple.  Please leave this
>>>>>> optimization to SSA based forward propagation (I can help you here
>>>>>> once the patch is in).
>>>>>
>>>>> No-no-no. This is the second part of avoiding
>>>>> m = a > b;
>>>>> r = m ? v0 : v1;
>>>>>
>>>>> to prevent m expansion to m != {0}.
>>>>>
>>>>> I do not _simply_ drop the comparison. I drop it only if
>>>>> is_vector_comparison returned true. It means that we can never get
>>>>> into the situation that we are dropping actually a comparison inserted
>>>>> by the user. But what I really want to achieve here is to drop the
>>>>> comparison that the frontend inserts every time when it sees an
>>>>> expression there.
>>>>>
>>>>> As I said earlier, tree forward propagation kicks only using -On, and
>>>>> I would really like to make sure that I can get rid of useless != {0}
>>>>> at any level.
>>>
>>>> Please don't.  If the language extension forces a != 0 then it should
>>>> appear at -O0.  The code is fishy anyway in the way it walks stmts
>>>> in is_vector_comparison.  At least I don't like to see this optimization
>>>> done here for the sake of -O0 in this initial patch - you could try
>>>> arguing about it as a followup improvement (well, probably with not
>>>> much luck).  -O0 is about compile-speed and debugging, doing
>>>> data-flow by walking stmts backward is slow.
>>>
>>> Ok, then I seriously don't see any motivation to support the
>>> VEC_COND_EXPR. The following code:
>>>
>>> m = a > b;
>>> r = (m & v0) | (~m & v1)
>>>
>>> gives me much more flexibility and  control. What the VEC_COND_EXPR is
>>> good for? Syntactical sugar?
>>>
>>> How about throwing away all the VEC_COND_EXPR parts supporting only
>>> conditions (implicitly expressed using vconds)? If we would agree on
>>> implicit conversions for real types, then this is a functionality that
>>> perfectly satisfies my needs.
>>>
>>> I don't see any interest from the backend people and I cannot wait
>>> forever, so why don't we start with a simple thing?
>>
>> But the simple thing is already what the backend supports.
>>
>> Richard.
>>
>
> Well, it is not "what" it is "how" -- that is what we are discussing
> for three weeks already.
>
> Ok, so the question now is, whether it is fine to have conversions
> inside expand_expr_real_2? If we agree that it is ok to do, then I can
> adjust the patch.

Yes, it is ok to have them there, but preferably on RTL and preferably
in expand_vec_cond_expr by using simplify_gen_subreg, untested patch:

Index: optabs.c
===================================================================
--- optabs.c    (revision 178060)
+++ optabs.c    (working copy)
@@ -6664,16 +6664,20 @@ expand_vec_cond_expr (tree vec_cond_type

   comparison = vector_compare_rtx (op0, unsignedp, icode);
   rtx_op1 = expand_normal (op1);
+  rtx_op1 = simplify_gen_subreg (GET_MODE (comparison), rtx_op1,
+                                GET_MODE (rtx_op1), 0);
   rtx_op2 = expand_normal (op2);
+  rtx_op2 = simplify_gen_subreg (GET_MODE (comparison), rtx_op2,
+                                GET_MODE (rtx_op2), 0);

-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
+  create_output_operand (&ops[0], target, GET_MODE (comparison));
+  create_input_operand (&ops[1], rtx_op1, GET_MODE (comparison));
+  create_input_operand (&ops[2], rtx_op2, GET_MODE (comparison));
   create_fixed_operand (&ops[3], comparison);
   create_fixed_operand (&ops[4], XEXP (comparison, 0));
   create_fixed_operand (&ops[5], XEXP (comparison, 1));
   expand_insn (icode, 6, ops);
-  return ops[0].value;
+  return simplify_gen_subreg (mode, ops[0].value, GET_MODE (comparison), 0);
 }




>
> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-25 14:49                                                                                             ` Richard Guenther
@ 2011-08-27 10:50                                                                                               ` Artem Shinkarov
  2011-08-29 12:46                                                                                                 ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-08-27 10:50 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 3059 bytes --]

Hi

Here is a patch with vector comparison only.
Comparison is expanded using VEC_COND_EXPR, conversions between the
different types inside the VEC_COND_EXPR are happening in optabs.c.

The comparison generally works, however, the x86 backend does not
recognize vectors of all 1s of type float and double, which is very
bad, but I hope it could be fixed easily. Here is my humble attempt:

Index: gcc/config/i386/predicates.md
===================================================================
--- gcc/config/i386/predicates.md       (revision 177665)
+++ gcc/config/i386/predicates.md       (working copy)
@@ -763,7 +763,19 @@ (define_predicate "vector_all_ones_opera
       for (i = 0; i < nunits; ++i)
         {
           rtx x = CONST_VECTOR_ELT (op, i);
-          if (x != constm1_rtx)
+         rtx y;
+
+         if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
+           {
+             REAL_VALUE_TYPE r;
+             REAL_VALUE_FROM_INT (r, -1, -1, GET_MODE (x));
+             y = CONST_DOUBLE_FROM_REAL_VALUE (r, GET_MODE (x));
+           }
+         else
+           y = constm1_rtx;
+
+         /* if (x != constm1_rtx) */
+         if (!rtx_equal_p (x, y))
             return false;
         }
       return true;

But the problem I have here is that -1 actually converts to -1.0,
where I need to treat -0x1 as float. Something like:

int p = -1;
void *x = &p;
float r = *((float *)x);

Is there any way to do that in this context? Or may be there is
another way to support real-typed vectors of -1 as constants?


ChangeLog

20011-08-27 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>

	gcc/
	* optabs.c (vector_compare_rtx): Allow comparison operands
	and vcond operands have different type.
	(expand_vec_cond_expr): Convert operands in case they do
	not match.
	* fold-const.c (constant_boolean_node): Adjust the meaning
	of boolean for vector types: true = {-1,..}, false = {0,..}.
	(fold_unary_loc): Avoid conversion of vector comparison to
	boolean type.
	* expr.c (expand_expr_real_2): Expand vector comparison by
	building an appropriate VEC_COND_EXPR.
	* c-typeck.c (build_binary_op): Typecheck vector comparisons.
	(c_objc_common_truthvalue_conversion): Adjust.
	* gimplify.c (gimplify_expr): Support vector comparison
	in gimple.
	* tree.def: Adjust comment.
	* tree-vect-generic.c (do_compare): Helper function.
	(expand_vector_comparison): Check if hardware supports
	vector comparison of the given type or expand vector
	piecewise.
	(expand_vector_operation): Treat comparison as binary
	operation of vector type.
	(expand_vector_operations_1): Adjust.
	* tree-cfg.c (verify_gimple_comparison): Adjust.

	gcc/config/i386
	* i386.c (ix86_expand_sse_movcc): Consider a case when
	vcond operators are {-1,..} and {0,..}.

	gcc/doc
	* extend.texi: Adjust.

	gcc/testsuite
	* gcc.c-torture/execute/vector-compare-1.c: New test.
	* gcc.c-torture/execute/vector-compare-2.c: New test.
	* gcc.dg/vector-compare-1.c: New test.
	* gcc.dg/vector-compare-2.c: New test.

Bootstrapped and tested on x86_64-unknown-linux-gnu.


Artem.

[-- Attachment #2: vec-compare.v7.diff --]
[-- Type: text/plain, Size: 23678 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 177665)
+++ gcc/doc/extend.texi	(working copy)
@@ -6553,6 +6553,29 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+In GNU C vector comparison is supported within standard comparison
+operators: @code{==, !=, <, <=, >, >=}. Comparison operands can be
+vector expressions of integer-type or real-type. Comparison between
+integer-type vectors and real-type vectors are not supported.  The
+result of the comparison is a vector of the same width and number of
+elements as the comparison operands with a signed integral element
+type.
+
+Vectors are compared element-wise producing 0 when comparison is false
+and -1 (constant of the appropriate type where all bits are set)
+otherwise. Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
+c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 177665)
+++ gcc/optabs.c	(working copy)
@@ -6502,7 +6502,8 @@ get_rtx_code (enum tree_code tcode, bool
    unsigned operators. Do not generate compare instruction.  */
 
 static rtx
-vector_compare_rtx (tree cond, bool unsignedp, enum insn_code icode)
+vector_compare_rtx (tree cond, bool unsignedp, enum insn_code icode, 
+		    bool legitimize)
 {
   struct expand_operand ops[2];
   enum rtx_code rcode;
@@ -6525,7 +6526,8 @@ vector_compare_rtx (tree cond, bool unsi
 
   create_input_operand (&ops[0], rtx_op0, GET_MODE (rtx_op0));
   create_input_operand (&ops[1], rtx_op1, GET_MODE (rtx_op1));
-  if (!maybe_legitimize_operands (icode, 4, 2, ops))
+  
+  if (legitimize && !maybe_legitimize_operands (icode, 4, 2, ops))
     gcc_unreachable ();
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
@@ -6566,24 +6568,49 @@ expand_vec_cond_expr (tree vec_cond_type
   enum insn_code icode;
   rtx comparison, rtx_op1, rtx_op2;
   enum machine_mode mode = TYPE_MODE (vec_cond_type);
-  bool unsignedp = TYPE_UNSIGNED (vec_cond_type);
+  bool unsignedp; 
+  enum machine_mode comp_mode;
+  tree comp_type;
+  
+  gcc_assert (COMPARISON_CLASS_P (op0));
+  
+  comp_type = TREE_TYPE (TREE_OPERAND (op0, 0));
+  comp_mode = TYPE_MODE (comp_type);
+  unsignedp = TYPE_UNSIGNED (comp_type);
+
+  if (mode != comp_mode)
+    icode = get_vcond_icode (comp_type, comp_mode);
+  else
+    icode = get_vcond_icode (vec_cond_type, mode);
 
-  icode = get_vcond_icode (vec_cond_type, mode);
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (op0, unsignedp, icode);
+  comparison = vector_compare_rtx (op0, unsignedp, icode, mode == comp_mode);
+  
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
-
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_fixed_operand (&ops[3], comparison);
-  create_fixed_operand (&ops[4], XEXP (comparison, 0));
-  create_fixed_operand (&ops[5], XEXP (comparison, 1));
+  
+  if (comp_mode != mode)
+    {
+      rtx_op1 = simplify_gen_subreg (comp_mode, rtx_op1, 
+				     GET_MODE (rtx_op1), 0);
+      rtx_op2 = simplify_gen_subreg (comp_mode, rtx_op2, 
+				     GET_MODE (rtx_op2), 0);
+    }
+
+  create_output_operand (&ops[0], target, comp_mode);
+  create_input_operand (&ops[1], rtx_op1, comp_mode);
+  create_input_operand (&ops[2], rtx_op2, comp_mode);
+  create_input_operand (&ops[3], comparison, mode);
+  create_input_operand (&ops[4], XEXP (comparison, 0), comp_mode);
+  create_input_operand (&ops[5], XEXP (comparison, 1), comp_mode);
   expand_insn (icode, 6, ops);
-  return ops[0].value;
+
+  if (mode != comp_mode)
+    return simplify_gen_subreg (mode, ops[0].value, comp_mode, 0);
+  else
+    return ops[0].value;
 }
 
 \f
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 177665)
+++ gcc/fold-const.c	(working copy)
@@ -5930,12 +5930,21 @@ extract_muldiv_1 (tree t, tree c, enum t
 }
 \f
 /* Return a node which has the indicated constant VALUE (either 0 or
-   1), and is of the indicated TYPE.  */
+   1 for scalars and is either {-1,-1,..} or {0,0,...} for vectors), 
+   and is of the indicated TYPE.  */
 
 tree
 constant_boolean_node (int value, tree type)
 {
-  if (type == integer_type_node)
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      tree tval;
+      
+      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
+      tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0);
+      return build_vector_from_val (type, tval);
+    }
+  else if (type == integer_type_node)
     return value ? integer_one_node : integer_zero_node;
   else if (type == boolean_type_node)
     return value ? boolean_true_node : boolean_false_node;
@@ -7667,6 +7676,16 @@ fold_unary_loc (location_t loc, enum tre
 	    return build2_loc (loc, TREE_CODE (op0), type,
 			       TREE_OPERAND (op0, 0),
 			       TREE_OPERAND (op0, 1));
+	  else if (TREE_CODE (type) == VECTOR_TYPE)
+	    {
+	      tree el_type = TREE_TYPE (type);
+	      tree op_el_type = TREE_TYPE (TREE_TYPE (op0));
+
+	      if (el_type == op_el_type)
+		return op0;
+	      else
+		build1_loc (loc, VIEW_CONVERT_EXPR, type, op0);
+	    }
 	  else if (!INTEGRAL_TYPE_P (type))
 	    return build3_loc (loc, COND_EXPR, type, op0,
 			       fold_convert (type, boolean_true_node),
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
@@ -0,0 +1,123 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define check_compare(count, res, i0, i1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+      if ((res)[__i] != ((i0)[__i] op (i1)[__i] ? -1 : 0)) \
+	{ \
+            __builtin_printf ("%i != ((" fmt " " #op " " fmt " ? -1 : 0) ", \
+			      (res)[__i], (i0)[__i], (i1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, res, fmt); \
+do { \
+    res = (v0 > v1); \
+    check_compare (count, res, v0, v1, >, fmt); \
+    res = (v0 < v1); \
+    check_compare (count, res, v0, v1, <, fmt); \
+    res = (v0 >= v1); \
+    check_compare (count, res, v0, v1, >=, fmt); \
+    res = (v0 <= v1); \
+    check_compare (count, res, v0, v1, <=, fmt); \
+    res = (v0 == v1); \
+    check_compare (count, res, v0, v1, ==, fmt); \
+    res = (v0 != v1); \
+    check_compare (count, res, v0, v1, !=, fmt); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0;
+    vector (4, INT) i1;
+    vector (4, int) ires;
+    int i;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, i0, i1, ires, "%i");
+#undef INT
+
+#define INT unsigned int 
+    vector (4, int) ures;
+    vector (4, INT) u0;
+    vector (4, INT) u1;
+
+    u0 = (vector (4, INT)){argc, 1,  2,  10};
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, u0, u1, ures, "%u");
+#undef INT
+
+
+#define SHORT short
+    vector (8, SHORT) s0;
+    vector (8, SHORT) s1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, s0, s1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;
+    vector (8, SHORT) us1;
+    vector (8, short) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, us0, us1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;
+    vector (16, CHAR) c1;
+    vector (16, signed char) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, c0, c1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;
+    vector (16, CHAR) uc1;
+    vector (16, signed char) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, uc0, uc1, ucres, "%u");
+#undef CHAR
+/* Float comparison.  */
+    vector (4, float) f0;
+    vector (4, float) f1;
+    vector (4, int) ifres;
+
+    f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
+    f1 = (vector (4, float)){0., 3., 2., (float)-23};    
+    test (4, f0, f1, ifres, "%f");
+    
+/* Double comparison.  */
+    vector (2, double) d0;
+    vector (2, double) d1;
+    vector (2, long) idres;
+
+    d0 = (vector (2, double)){(double)argc,  10.};
+    d1 = (vector (2, double)){0., (double)-23};    
+    test (2, d0, d1, idres, "%f");
+
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+/* Check that constant folding in 
+   these simple cases works.  */
+vector (4, int)
+foo (vector (4, int) x)
+{
+  return   (x == x) + (x != x) + (x >  x) 
+	 + (x <  x) + (x >= x) + (x <= x);
+}
+
+int 
+main (int argc, char *argv[])
+{
+  vector (4, int) t = {argc, 2, argc, 42};
+  vector (4, int) r;
+  int i;
+
+  r = foo (t);
+
+  for (i = 0; i < 4; i++)
+    if (r[i] != -3)
+      __builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+void
+foo (vector (4, int) x, vector (4, float) y)
+{
+  vector (4, int) p4;
+  vector (4, int) r4;
+  vector (4, unsigned int) q4;
+  vector (8, int) r8;
+  vector (4, float) f4;
+  
+  r4 = x > y;	    /* { dg-error "comparing vectors with different element types" } */
+  r8 = (x != p4);   /* { dg-error "incompatible types when assigning to type" } */
+  r8 == r4;	    /* { dg-error "comparing vectors with different number of elements" } */
+}
Index: gcc/testsuite/gcc.dg/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */   
+
+/* Test if C_MAYBE_CONST are folded correctly when 
+   creating VEC_COND_EXPR.  */
+
+typedef int vec __attribute__((vector_size(16)));
+
+vec i,j;
+extern vec a, b, c;
+
+extern int p, q, z;
+extern vec foo (int);
+
+vec 
+foo (int x)
+{
+  return  foo (p ? q :z) > a;
+}
+
+vec 
+bar (int x)
+{
+  return  b > foo (p ? q :z);
+}
+
+
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 177665)
+++ gcc/expr.c	(working copy)
@@ -8440,6 +8440,29 @@ expand_expr_real_2 (sepops ops, rtx targ
     case UNGE_EXPR:
     case UNEQ_EXPR:
     case LTGT_EXPR:
+      if (TREE_CODE (ops->type) == VECTOR_TYPE)
+	{
+	  enum tree_code code = ops->code;
+	  tree arg0 = ops->op0;
+	  tree arg1 = ops->op1;
+	  tree el_type = TREE_TYPE (TREE_TYPE (arg0));
+	  tree t, ifexp, if_true, if_false;
+	  
+	  el_type = build_nonstandard_integer_type 
+			(GET_MODE_BITSIZE (TYPE_MODE (el_type)), 0);
+
+	  ifexp = build2 (code, type, arg0, arg1);
+	  if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
+	  if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
+	  
+	  t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
+            
+	  return expand_expr (t,
+			      modifier != EXPAND_STACK_PARM ? target : NULL_RTX, 
+			      tmode != VOIDmode ? tmode : mode, 
+			      modifier);
+	}
+
       temp = do_store_flag (ops,
 			    modifier != EXPAND_STACK_PARM ? target : NULL_RTX,
 			    tmode != VOIDmode ? tmode : mode);
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 177665)
+++ gcc/c-typeck.c	(working copy)
@@ -9906,6 +9906,29 @@ build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -10018,6 +10041,29 @@ build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10425,6 +10471,10 @@ c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	(revision 177665)
+++ gcc/gimplify.c	(working copy)
@@ -7348,6 +7348,11 @@ gimplify_expr (tree *expr_p, gimple_seq
 		{
 		  tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1));
 
+		  /* Vector comparisons is a valid gimple expression
+		     which could be lowered down later.  */
+		  if (TREE_CODE (type) == VECTOR_TYPE)
+		    goto expr_2;
+
 		  if (!AGGREGATE_TYPE_P (type))
 		    {
 		      tree org_type = TREE_TYPE (*expr_p);
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	(revision 177665)
+++ gcc/tree.def	(working copy)
@@ -704,7 +704,10 @@ DEFTREECODE (TRUTH_NOT_EXPR, "truth_not_
    The others are allowed only for integer (or pointer or enumeral)
    or real types.
    In all cases the operands will have the same type,
-   and the value is always the type used by the language for booleans.  */
+   and the value is either the type used by the language for booleans
+   or an integer vector type of the same size and with the same number
+   of elements as the comparison operands.  True for a vector of
+   comparison results has all bits set while false is equal to zero.  */
 DEFTREECODE (LT_EXPR, "lt_expr", tcc_comparison, 2)
 DEFTREECODE (LE_EXPR, "le_expr", tcc_comparison, 2)
 DEFTREECODE (GT_EXPR, "gt_expr", tcc_comparison, 2)
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 177665)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -35,6 +35,10 @@ along with GCC; see the file COPYING3.
 #include "expr.h"
 #include "optabs.h"
 
+
+static void expand_vector_operations_1 (gimple_stmt_iterator *);
+
+
 /* Build a constant of type TYPE, made of VALUE's bits replicated
    every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision.  */
 static tree
@@ -125,6 +129,31 @@ do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0
+   
+   INNER_TYPE is the type of A and B elements
+   
+   returned expression is of signed integer type with the 
+   size equal to the size of INNER_TYPE.  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree comp_type;
+
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  
+  comp_type = build_nonstandard_integer_type 
+		      (GET_MODE_BITSIZE (TYPE_MODE (inner_type)), 0);
+
+  return gimplify_build3 (gsi, COND_EXPR, comp_type,
+			  fold_build2 (code, boolean_type_node, a, b),
+			  build_int_cst (comp_type, -1),
+			  build_int_cst (comp_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -333,6 +362,24 @@ uniform_vector_p (tree vec)
   return NULL_TREE;
 }
 
+/* Try to expand vector comparison expression OP0 CODE OP1 by
+   querying optab if the following expression:
+	VEC_COND_EXPR< OP0 CODE OP1, {-1,...}, {0,...}>
+   can be expanded.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+  tree t;
+  if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  else
+    t = NULL_TREE;
+
+  return t;
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -375,8 +422,27 @@ expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+	{
+	  tree rhs1 = gimple_assign_rhs1 (assign);
+	  tree rhs2 = gimple_assign_rhs2 (assign);
 
+	  return expand_vector_comparison (gsi, type, rhs1, rhs2, code);
+	}
       default:
 	break;
       }
@@ -450,11 +516,11 @@ expand_vector_operations_1 (gimple_stmt_
 
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
+  lhs = gimple_assign_lhs (stmt);
 
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
-  lhs = gimple_assign_lhs (stmt);
   rhs1 = gimple_assign_rhs1 (stmt);
   type = gimple_expr_type (stmt);
   if (rhs_class == GIMPLE_BINARY_RHS)
@@ -598,6 +664,11 @@ expand_vector_operations_1 (gimple_stmt_
 
   gcc_assert (code != VEC_LSHIFT_EXPR && code != VEC_RSHIFT_EXPR);
   new_rhs = expand_vector_operation (gsi, type, compute_type, stmt, code);
+
+  /* Leave expression untouched for later expansion.  */
+  if (new_rhs == NULL_TREE)
+    return;
+
   if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (new_rhs)))
     new_rhs = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, TREE_TYPE (lhs),
                                new_rhs);
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 177665)
+++ gcc/tree-cfg.c	(working copy)
@@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
       return true;
     }
 
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      if (TREE_CODE (op0_type) != VECTOR_TYPE
+	  || TREE_CODE (op1_type) != VECTOR_TYPE)
+        {
+          error ("non-vector operands in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (!useless_type_conversion_p (op0_type, op1_type)
+	  && !useless_type_conversion_p (op1_type, op0_type))
+        {
+          error ("type mismatch in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type)) 
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+        
+      return false;
+    }
+
   /* For comparisons we do not have the operations type as the
      effective type the comparison is carried out in.  Instead
      we require that either the first operand is trivially
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177665)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18434,8 +18434,13 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx t2, t3, x;
-
-  if (op_false == CONST0_RTX (mode))
+  
+  if (vector_all_ones_operand (op_true, GET_MODE (op_true))
+      && rtx_equal_p (op_false, CONST0_RTX (mode)))
+    {
+      emit_insn (gen_rtx_SET (VOIDmode, dest, cmp));
+    }
+  else if (op_false == CONST0_RTX (mode))
     {
       op_true = force_reg (mode, op_true);
       x = gen_rtx_AND (mode, cmp, op_true);

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-27 10:50                                                                                               ` Artem Shinkarov
@ 2011-08-29 12:46                                                                                                 ` Richard Guenther
  0 siblings, 0 replies; 91+ messages in thread
From: Richard Guenther @ 2011-08-29 12:46 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Henderson, gcc-patches, Joseph S. Myers, Uros Bizjak

[-- Attachment #1: Type: text/plain, Size: 4751 bytes --]

On Sat, Aug 27, 2011 at 3:39 AM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Hi
>
> Here is a patch with vector comparison only.
> Comparison is expanded using VEC_COND_EXPR, conversions between the
> different types inside the VEC_COND_EXPR are happening in optabs.c.

I have split out the middle-end infrastructure parts to support vector
comparisons apart from the expansion piece and am testing this
(see attached, I adjusted some minor bits).  I will commit this if
testing goes ok.

Looking over the rest I wonder why you need to avoid legitimizing stuff
in vector_compare_rtx?  I can't produce any error with x86_64 or i586,
but on i586 gcc.c-torture/execute/vector-compare-1.c does not build
because

/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
error: incompatible types when assigning to type '__vector(2) long
int' from type '__vector(2) long long int'^M

so the testcases need double-checking for this kind of errors.  You
can run tests for both -m32 and -m64 with a command-line like

make check-gcc RUNTESTFLAGS="--target_board=unix/\{,-m32\}
dg.exp=vector-compare*.c"

I'd like to further split the optabs.c and expr.c change which look
independent.

I have the attached incremental patch ontop of yours, I will test the
expr.c and optabs.c parts separately and plan to commit them as well
if that succeeds.

Richard.

> The comparison generally works, however, the x86 backend does not
> recognize vectors of all 1s of type float and double, which is very
> bad, but I hope it could be fixed easily. Here is my humble attempt:
>
> Index: gcc/config/i386/predicates.md
> ===================================================================
> --- gcc/config/i386/predicates.md       (revision 177665)
> +++ gcc/config/i386/predicates.md       (working copy)
> @@ -763,7 +763,19 @@ (define_predicate "vector_all_ones_opera
>       for (i = 0; i < nunits; ++i)
>         {
>           rtx x = CONST_VECTOR_ELT (op, i);
> -          if (x != constm1_rtx)
> +         rtx y;
> +
> +         if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
> +           {
> +             REAL_VALUE_TYPE r;
> +             REAL_VALUE_FROM_INT (r, -1, -1, GET_MODE (x));
> +             y = CONST_DOUBLE_FROM_REAL_VALUE (r, GET_MODE (x));
> +           }
> +         else
> +           y = constm1_rtx;
> +
> +         /* if (x != constm1_rtx) */
> +         if (!rtx_equal_p (x, y))
>             return false;
>         }
>       return true;
>
> But the problem I have here is that -1 actually converts to -1.0,
> where I need to treat -0x1 as float. Something like:
>
> int p = -1;
> void *x = &p;
> float r = *((float *)x);
>
> Is there any way to do that in this context? Or may be there is
> another way to support real-typed vectors of -1 as constants?
>
>
> ChangeLog
>
> 20011-08-27 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>
>        gcc/
>        * optabs.c (vector_compare_rtx): Allow comparison operands
>        and vcond operands have different type.
>        (expand_vec_cond_expr): Convert operands in case they do
>        not match.
>        * fold-const.c (constant_boolean_node): Adjust the meaning
>        of boolean for vector types: true = {-1,..}, false = {0,..}.
>        (fold_unary_loc): Avoid conversion of vector comparison to
>        boolean type.
>        * expr.c (expand_expr_real_2): Expand vector comparison by
>        building an appropriate VEC_COND_EXPR.
>        * c-typeck.c (build_binary_op): Typecheck vector comparisons.
>        (c_objc_common_truthvalue_conversion): Adjust.
>        * gimplify.c (gimplify_expr): Support vector comparison
>        in gimple.
>        * tree.def: Adjust comment.
>        * tree-vect-generic.c (do_compare): Helper function.
>        (expand_vector_comparison): Check if hardware supports
>        vector comparison of the given type or expand vector
>        piecewise.
>        (expand_vector_operation): Treat comparison as binary
>        operation of vector type.
>        (expand_vector_operations_1): Adjust.
>        * tree-cfg.c (verify_gimple_comparison): Adjust.
>
>        gcc/config/i386
>        * i386.c (ix86_expand_sse_movcc): Consider a case when
>        vcond operators are {-1,..} and {0,..}.
>
>        gcc/doc
>        * extend.texi: Adjust.
>
>        gcc/testsuite
>        * gcc.c-torture/execute/vector-compare-1.c: New test.
>        * gcc.c-torture/execute/vector-compare-2.c: New test.
>        * gcc.dg/vector-compare-1.c: New test.
>        * gcc.dg/vector-compare-2.c: New test.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
>
> Artem.
>

[-- Attachment #2: vec-compare.v7.diff.r --]
[-- Type: application/octet-stream, Size: 10635 bytes --]

20011-08-29  Artjoms Sinkarovs  <artyom.shinkaroff@gmail.com>
	Richard Guenther  <rguenther@suse.de>

	* tree.h (constant_boolean_node): Adjust prototype.
	* fold-const.c (fold_convert_loc): Move aggregate conversion
	leeway down.
	(constant_boolean_node): Make value parameter boolean, add
	vector type handling.
	(fold_unary_loc): Use constant_boolean_node.
	(fold_binary_loc): Preserve types properly when folding
	COMPLEX_EXPR <__real x, __imag x>.
	* gimplify.c (gimplify_expr): Handle vector comparison.
	* tree.def (EQ_EXPR, ...): Document behavior on vector typed
	comparison.
	* tree-cfg.c (verify_gimple_comparison): Verify vector typed
	comparisons.

Index: gcc/fold-const.c
===================================================================
*** gcc/fold-const.c.orig	2011-08-29 11:48:23.000000000 +0200
--- gcc/fold-const.c	2011-08-29 12:09:51.000000000 +0200
*************** fold_convert_loc (location_t loc, tree t
*** 1867,1875 ****
        || TREE_CODE (orig) == ERROR_MARK)
      return error_mark_node;
  
-   if (TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (orig))
-     return fold_build1_loc (loc, NOP_EXPR, type, arg);
- 
    switch (TREE_CODE (type))
      {
      case POINTER_TYPE:
--- 1867,1872 ----
*************** fold_convert_loc (location_t loc, tree t
*** 2017,2022 ****
--- 2014,2021 ----
        return fold_build1_loc (loc, NOP_EXPR, type, tem);
  
      default:
+       if (TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (orig))
+ 	return fold_build1_loc (loc, NOP_EXPR, type, arg);
        gcc_unreachable ();
      }
   fold_convert_exit:
*************** extract_muldiv_1 (tree t, tree c, enum t
*** 5929,5945 ****
  }
  \f
  /* Return a node which has the indicated constant VALUE (either 0 or
!    1), and is of the indicated TYPE.  */
  
  tree
! constant_boolean_node (int value, tree type)
  {
    if (type == integer_type_node)
      return value ? integer_one_node : integer_zero_node;
    else if (type == boolean_type_node)
      return value ? boolean_true_node : boolean_false_node;
    else
!     return build_int_cst (type, value);
  }
  
  
--- 5928,5949 ----
  }
  \f
  /* Return a node which has the indicated constant VALUE (either 0 or
!    1 for scalars or {-1,-1,..} or {0,0,...} for vectors),
!    and is of the indicated TYPE.  */
  
  tree
! constant_boolean_node (bool value, tree type)
  {
    if (type == integer_type_node)
      return value ? integer_one_node : integer_zero_node;
    else if (type == boolean_type_node)
      return value ? boolean_true_node : boolean_false_node;
+   else if (TREE_CODE (type) == VECTOR_TYPE)
+     return build_vector_from_val (type,
+ 				  build_int_cst (TREE_TYPE (type),
+ 						 value ? -1 : 0));
    else
!     return fold_convert (type, value ? integer_one_node : integer_zero_node);
  }
  
  
*************** fold_unary_loc (location_t loc, enum tre
*** 7668,7675 ****
  			       TREE_OPERAND (op0, 1));
  	  else if (!INTEGRAL_TYPE_P (type))
  	    return build3_loc (loc, COND_EXPR, type, op0,
! 			       fold_convert (type, boolean_true_node),
! 			       fold_convert (type, boolean_false_node));
  	}
  
        /* Handle cases of two conversions in a row.  */
--- 7672,7679 ----
  			       TREE_OPERAND (op0, 1));
  	  else if (!INTEGRAL_TYPE_P (type))
  	    return build3_loc (loc, COND_EXPR, type, op0,
! 			       constant_boolean_node (true, type),
! 			       constant_boolean_node (false, type));
  	}
  
        /* Handle cases of two conversions in a row.  */
*************** fold_binary_loc (location_t loc,
*** 13202,13209 ****
  	return build_complex (type, arg0, arg1);
        if (TREE_CODE (arg0) == REALPART_EXPR
  	  && TREE_CODE (arg1) == IMAGPART_EXPR
! 	  && (TYPE_MAIN_VARIANT (TREE_TYPE (TREE_OPERAND (arg0, 0)))
! 	      == TYPE_MAIN_VARIANT (type))
  	  && operand_equal_p (TREE_OPERAND (arg0, 0),
  			      TREE_OPERAND (arg1, 0), 0))
  	return omit_one_operand_loc (loc, type, TREE_OPERAND (arg0, 0),
--- 13206,13212 ----
  	return build_complex (type, arg0, arg1);
        if (TREE_CODE (arg0) == REALPART_EXPR
  	  && TREE_CODE (arg1) == IMAGPART_EXPR
! 	  && TREE_TYPE (TREE_OPERAND (arg0, 0)) == type
  	  && operand_equal_p (TREE_OPERAND (arg0, 0),
  			      TREE_OPERAND (arg1, 0), 0))
  	return omit_one_operand_loc (loc, type, TREE_OPERAND (arg0, 0),
Index: gcc/gimplify.c
===================================================================
*** gcc/gimplify.c.orig	2011-08-29 11:48:23.000000000 +0200
--- gcc/gimplify.c	2011-08-29 11:48:30.000000000 +0200
*************** gimplify_expr (tree *expr_p, gimple_seq
*** 7349,7355 ****
  		{
  		  tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1));
  
! 		  if (!AGGREGATE_TYPE_P (type))
  		    {
  		      tree org_type = TREE_TYPE (*expr_p);
  		      *expr_p = gimple_boolify (*expr_p);
--- 7349,7358 ----
  		{
  		  tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1));
  
! 		  /* Vector comparisons need no boolification.  */
! 		  if (TREE_CODE (type) == VECTOR_TYPE)
! 		    goto expr_2;
! 		  else if (!AGGREGATE_TYPE_P (type))
  		    {
  		      tree org_type = TREE_TYPE (*expr_p);
  		      *expr_p = gimple_boolify (*expr_p);
Index: gcc/tree.def
===================================================================
*** gcc/tree.def.orig	2011-08-29 11:48:23.000000000 +0200
--- gcc/tree.def	2011-08-29 11:48:30.000000000 +0200
*************** DEFTREECODE (TRUTH_NOT_EXPR, "truth_not_
*** 704,710 ****
     The others are allowed only for integer (or pointer or enumeral)
     or real types.
     In all cases the operands will have the same type,
!    and the value is always the type used by the language for booleans.  */
  DEFTREECODE (LT_EXPR, "lt_expr", tcc_comparison, 2)
  DEFTREECODE (LE_EXPR, "le_expr", tcc_comparison, 2)
  DEFTREECODE (GT_EXPR, "gt_expr", tcc_comparison, 2)
--- 704,713 ----
     The others are allowed only for integer (or pointer or enumeral)
     or real types.
     In all cases the operands will have the same type,
!    and the value is either the type used by the language for booleans
!    or an integer vector type of the same size and with the same number
!    of elements as the comparison operands.  True for a vector of
!    comparison results has all bits set while false is equal to zero.  */
  DEFTREECODE (LT_EXPR, "lt_expr", tcc_comparison, 2)
  DEFTREECODE (LE_EXPR, "le_expr", tcc_comparison, 2)
  DEFTREECODE (GT_EXPR, "gt_expr", tcc_comparison, 2)
Index: gcc/tree-cfg.c
===================================================================
*** gcc/tree-cfg.c.orig	2011-08-29 11:48:23.000000000 +0200
--- gcc/tree-cfg.c	2011-08-29 11:48:30.000000000 +0200
*************** verify_gimple_comparison (tree type, tre
*** 3266,3290 ****
       effective type the comparison is carried out in.  Instead
       we require that either the first operand is trivially
       convertible into the second, or the other way around.
-      The resulting type of a comparison may be any integral type.
       Because we special-case pointers to void we allow
       comparisons of pointers with the same mode as well.  */
!   if ((!useless_type_conversion_p (op0_type, op1_type)
!        && !useless_type_conversion_p (op1_type, op0_type)
!        && (!POINTER_TYPE_P (op0_type)
! 	   || !POINTER_TYPE_P (op1_type)
! 	   || TYPE_MODE (op0_type) != TYPE_MODE (op1_type)))
!       || !INTEGRAL_TYPE_P (type)
!       || (TREE_CODE (type) != BOOLEAN_TYPE
! 	  && TYPE_PRECISION (type) != 1))
      {
!       error ("type mismatch in comparison expression");
!       debug_generic_expr (type);
        debug_generic_expr (op0_type);
        debug_generic_expr (op1_type);
        return true;
      }
  
    return false;
  }
  
--- 3266,3320 ----
       effective type the comparison is carried out in.  Instead
       we require that either the first operand is trivially
       convertible into the second, or the other way around.
       Because we special-case pointers to void we allow
       comparisons of pointers with the same mode as well.  */
!   if (!useless_type_conversion_p (op0_type, op1_type)
!       && !useless_type_conversion_p (op1_type, op0_type)
!       && (!POINTER_TYPE_P (op0_type)
! 	  || !POINTER_TYPE_P (op1_type)
! 	  || TYPE_MODE (op0_type) != TYPE_MODE (op1_type)))
      {
!       error ("mismatching comparison operand types");
        debug_generic_expr (op0_type);
        debug_generic_expr (op1_type);
        return true;
      }
  
+   /* The resulting type of a comparison may be an effective boolean type.  */
+   if (INTEGRAL_TYPE_P (type)
+       && (TREE_CODE (type) == BOOLEAN_TYPE
+ 	  || TYPE_PRECISION (type) == 1))
+     ;
+   /* Or an integer vector type with the same size and element count
+      as the comparison operand types.  */
+   else if (TREE_CODE (type) == VECTOR_TYPE
+ 	   && TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+     {
+       if (TREE_CODE (op0_type) != VECTOR_TYPE
+ 	  || TREE_CODE (op1_type) != VECTOR_TYPE)
+         {
+           error ("non-vector operands in vector comparison");
+           debug_generic_expr (op0_type);
+           debug_generic_expr (op1_type);
+           return true;
+         }
+ 
+       if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+ 	  || (GET_MODE_SIZE (TYPE_MODE (type))
+ 	      != GET_MODE_SIZE (TYPE_MODE (op0_type))))
+         {
+           error ("invalid vector comparison resulting type");
+           debug_generic_expr (type);
+           return true;
+         }
+     }
+   else
+     {
+       error ("bogus comparison result type");
+       debug_generic_expr (type);
+       return true;
+     }
+ 
    return false;
  }
  
Index: gcc/tree.h
===================================================================
*** gcc/tree.h.orig	2011-08-29 11:48:23.000000000 +0200
--- gcc/tree.h	2011-08-29 11:48:30.000000000 +0200
*************** extern tree build_simple_mem_ref_loc (lo
*** 5274,5280 ****
  extern double_int mem_ref_offset (const_tree);
  extern tree reference_alias_ptr_type (const_tree);
  extern tree build_invariant_address (tree, tree, HOST_WIDE_INT);
! extern tree constant_boolean_node (int, tree);
  extern tree div_if_zero_remainder (enum tree_code, const_tree, const_tree);
  
  extern bool tree_swap_operands_p (const_tree, const_tree, bool);
--- 5274,5280 ----
  extern double_int mem_ref_offset (const_tree);
  extern tree reference_alias_ptr_type (const_tree);
  extern tree build_invariant_address (tree, tree, HOST_WIDE_INT);
! extern tree constant_boolean_node (bool, tree);
  extern tree div_if_zero_remainder (enum tree_code, const_tree, const_tree);
  
  extern bool tree_swap_operands_p (const_tree, const_tree, bool);

[-- Attachment #3: p --]
[-- Type: application/octet-stream, Size: 6034 bytes --]

Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c.orig	2011-08-29 13:02:09.000000000 +0200
+++ gcc/optabs.c	2011-08-29 12:39:13.000000000 +0200
@@ -6592,8 +6592,7 @@ get_rtx_code (enum tree_code tcode, bool
    unsigned operators. Do not generate compare instruction.  */
 
 static rtx
-vector_compare_rtx (tree cond, bool unsignedp, enum insn_code icode, 
-		    bool legitimize)
+vector_compare_rtx (tree cond, bool unsignedp, enum insn_code icode)
 {
   struct expand_operand ops[2];
   enum rtx_code rcode;
@@ -6616,20 +6615,19 @@ vector_compare_rtx (tree cond, bool unsi
 
   create_input_operand (&ops[0], rtx_op0, GET_MODE (rtx_op0));
   create_input_operand (&ops[1], rtx_op1, GET_MODE (rtx_op1));
-  
-  if (legitimize && !maybe_legitimize_operands (icode, 4, 2, ops))
+  if (!maybe_legitimize_operands (icode, 4, 2, ops))
     gcc_unreachable ();
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
 
-/* Return insn code for TYPE, the type of a VEC_COND_EXPR.  */
+/* Return insn code for a vector comparison of MODE operands.
+   UNS is true if this should be an unsigned comparison.  */
 
 static inline enum insn_code
-get_vcond_icode (tree type, enum machine_mode mode)
+get_vcond_icode (enum machine_mode mode, bool uns)
 {
   enum insn_code icode = CODE_FOR_nothing;
-
-  if (TYPE_UNSIGNED (type))
+  if (uns)
     icode = direct_optab_handler (vcondu_optab, mode);
   else
     icode = direct_optab_handler (vcond_optab, mode);
@@ -6637,12 +6635,13 @@ get_vcond_icode (tree type, enum machine
 }
 
 /* Return TRUE iff, appropriate vector insns are available
-   for vector cond expr with type TYPE in VMODE mode.  */
+   for a VEC_COND_EXPR with comparison operand types TYPE
+   and comparison operand vector mode VMODE.  */
 
 bool
 expand_vec_cond_expr_p (tree type, enum machine_mode vmode)
 {
-  if (get_vcond_icode (type, vmode) == CODE_FOR_nothing)
+  if (get_vcond_icode (vmode, TYPE_UNSIGNED (type)) == CODE_FOR_nothing)
     return false;
   return true;
 }
@@ -6668,15 +6667,11 @@ expand_vec_cond_expr (tree vec_cond_type
   comp_mode = TYPE_MODE (comp_type);
   unsignedp = TYPE_UNSIGNED (comp_type);
 
-  if (mode != comp_mode)
-    icode = get_vcond_icode (comp_type, comp_mode);
-  else
-    icode = get_vcond_icode (vec_cond_type, mode);
-
+  icode = get_vcond_icode (comp_mode, unsignedp);
   if (icode == CODE_FOR_nothing)
-    return 0;
+    return NULL_RTX;
 
-  comparison = vector_compare_rtx (op0, unsignedp, icode, mode == comp_mode);
+  comparison = vector_compare_rtx (op0, unsignedp, icode);
   
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
Index: gcc/expr.c
===================================================================
--- gcc/expr.c.orig	2011-08-29 13:02:09.000000000 +0200
+++ gcc/expr.c	2011-08-29 12:58:59.000000000 +0200
@@ -8465,29 +8465,6 @@ expand_expr_real_2 (sepops ops, rtx targ
     case UNGE_EXPR:
     case UNEQ_EXPR:
     case LTGT_EXPR:
-      if (TREE_CODE (ops->type) == VECTOR_TYPE)
-	{
-	  enum tree_code code = ops->code;
-	  tree arg0 = ops->op0;
-	  tree arg1 = ops->op1;
-	  tree el_type = TREE_TYPE (TREE_TYPE (arg0));
-	  tree t, ifexp, if_true, if_false;
-	  
-	  el_type = build_nonstandard_integer_type 
-			(GET_MODE_BITSIZE (TYPE_MODE (el_type)), 0);
-
-	  ifexp = build2 (code, type, arg0, arg1);
-	  if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
-	  if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
-	  
-	  t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
-            
-	  return expand_expr (t,
-			      modifier != EXPAND_STACK_PARM ? target : NULL_RTX, 
-			      tmode != VOIDmode ? tmode : mode, 
-			      modifier);
-	}
-
       temp = do_store_flag (ops,
 			    modifier != EXPAND_STACK_PARM ? target : NULL_RTX,
 			    tmode != VOIDmode ? tmode : mode);
@@ -10332,6 +10309,17 @@ do_store_flag (sepops ops, rtx target, e
   STRIP_NOPS (arg0);
   STRIP_NOPS (arg1);
 
+  /* For vector typed comparisons emit code to generate the desired
+     all-ones or all-zeros mask.  Conveniently use the VEC_COND_EXPR
+     expander for this.  */
+  if (TREE_CODE (ops->type) == VECTOR_TYPE)
+    {
+      tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
+      tree if_true = constant_boolean_node (true, ops->type);
+      tree if_false = constant_boolean_node (false, ops->type);
+      return expand_vec_cond_expr (ops->type, ifexp, if_true, if_false, target);
+    }
+
   /* Get the rtx comparison code to use.  We know that EXP is a comparison
      operation of some type.  Some comparisons against 1 and -1 can be
      converted to comparisons with zero.  Do so here so that the tests
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c.orig	2011-08-29 13:02:09.000000000 +0200
+++ gcc/tree-vect-generic.c	2011-08-29 12:53:00.000000000 +0200
@@ -371,9 +371,9 @@ expand_vector_comparison (gimple_stmt_it
                           tree op1, enum tree_code code)
 {
   tree t;
-  if (! expand_vec_cond_expr_p (TREE_TYPE (type), TYPE_MODE (type)))
+  if (! expand_vec_cond_expr_p (TREE_TYPE (op0), TYPE_MODE (TREE_TYPE (op0))))
     t = expand_vector_piecewise (gsi, do_compare, type, 
-                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+				 TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
   else
     t = NULL_TREE;
 
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c.orig	2011-08-23 10:50:13.000000000 +0200
+++ gcc/tree-vect-stmts.c	2011-08-29 12:38:21.000000000 +0200
@@ -4839,7 +4839,8 @@ vectorizable_condition (gimple stmt, gim
   if (!vec_stmt)
     {
       STMT_VINFO_TYPE (stmt_info) = condition_vec_info_type;
-      return expand_vec_cond_expr_p (TREE_TYPE (op), vec_mode);
+      return expand_vec_cond_expr_p (TREE_TYPE (TREE_OPERAND (cond_expr, 0)),
+				     vec_mode);
     }
 
   /* Transform */

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-22 20:46                                             ` Uros Bizjak
  2011-08-22 20:58                                               ` Richard Guenther
  2011-08-22 21:12                                               ` Artem Shinkarov
@ 2011-08-29 12:54                                               ` Richard Guenther
  2011-08-29 13:08                                                 ` Richard Guenther
  2 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-29 12:54 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Artem Shinkarov, Richard Henderson, gcc-patches, Joseph S. Myers

On Mon, Aug 22, 2011 at 9:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 5:34 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>
>>> In this case it is simple to analyse that a is a comparison, but you
>>> cannot embed the operations of a into VEC_COND_EXPR.
>>
>> Sure, but if the above is C source the frontend would generate
>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>> vector contents though).
>>
>>> Ok, I am testing the patch that removes hooks. Could you push a little
>>> bit the backend-patterns business?
>>
>> Well, I suppose we're waiting for Uros here.  I hadn't much luck with
>> fiddling with the mode-iterator stuff myself.
>
> It is not _that_ trivial change, since we have ix86_expand_fp_vcond
> and ix86_expand_int_vcond to merge. ATM, FP version deals with FP
> operands and vice versa. We have to merge them somehow and split out
> comparison part that handles FP as well as integer operands.
>
> I also don't know why vcond is not allowed to FAIL... probably
> middle-end should be enhanced for a fallback if some comparison isn't
> supported by optab.

I wonder, if we make vcond being able to FAIL (well, it would fail for
invalid input only, like mismatching mode size), if patches along

Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md      (revision 178209)
+++ gcc/config/i386/sse.md      (working copy)
@@ -1406,13 +1406,13 @@ (define_insn "<sse>_ucomi"
    (set_attr "mode" "<MODE>")])

 (define_expand "vcond<mode>"
-  [(set (match_operand:VF 0 "register_operand" "")
-       (if_then_else:VF
+  [(set (match_operand 0 "register_operand" "")
+       (if_then_else
          (match_operator 3 ""
            [(match_operand:VF 4 "nonimmediate_operand" "")
             (match_operand:VF 5 "nonimmediate_operand" "")])
-         (match_operand:VF 1 "general_operand" "")
-         (match_operand:VF 2 "general_operand" "")))]
+         (match_operand 1 "general_operand" "")
+         (match_operand 2 "general_operand" "")))]
   "TARGET_SSE"
 {
   bool ok = ix86_expand_fp_vcond (operands);

would be enough to make it accept V4SF < V4SF ? V4SI : V4SI with
target mode V4SI.  The expander code doesn't seem to care about
the modes of op1/2 too much.

Richard.

> Uros.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-18 10:21                     ` Richard Guenther
  2011-08-18 11:24                       ` Artem Shinkarov
  2011-08-18 15:19                       ` Richard Henderson
@ 2011-08-29 12:54                       ` Paolo Bonzini
  2011-09-16 18:08                         ` Richard Henderson
  2 siblings, 1 reply; 91+ messages in thread
From: Paolo Bonzini @ 2011-08-29 12:54 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Artem Shinkarov, gcc-patches, Joseph S. Myers, Richard Henderson,
	Chris Lattner

On 08/18/2011 11:23 AM, Richard Guenther wrote:
> Yeah, well.  That's really a question for language lawyers;)   I agree
> that it would be nice to have mask ? val0 : val1 behave "the same"
> for scalars and vectors.  The question is whether for vectors you
> define it on the bit-level (which makes it equal to (mask&  val0) |
> (~mask&  val1))
> or on the vector component level.  The vector component level
> is probably what people would expect.
>
> Which means we have to treat mask ? val0 : val1 as
> mask != {0,...} ? val0 : val1.

The definition in OpenCL makes zero sense to me.  For byte operands it 
is custom-tailored after the SSE PMOVMSKB instruction, but there is no 
PMOVMSKW/PMOVMSKD instruction so you would need very slow bit shift 
operations before PMOVMSK.  On the other hand, bit selection is for 
example in Altivec.

Do we have some way to contact anyone in the OpenCL standards group 
(CCing Chris Lattner)?

If you wanted to implement it, it would be mask < {0,...} ? val0 : val1. 
  But really, since we're not implementing OpenCL C I would really 
prefer to have bit-level selection, and let a front-end implement the quirk.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-29 12:54                                               ` Richard Guenther
@ 2011-08-29 13:08                                                 ` Richard Guenther
  2011-09-06 14:51                                                   ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-08-29 13:08 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Artem Shinkarov, Richard Henderson, gcc-patches, Joseph S. Myers

On Mon, Aug 29, 2011 at 2:09 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Aug 22, 2011 at 9:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 5:34 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>
>>>> In this case it is simple to analyse that a is a comparison, but you
>>>> cannot embed the operations of a into VEC_COND_EXPR.
>>>
>>> Sure, but if the above is C source the frontend would generate
>>> res = a != 0 ? v0 : v1; initially.  An optimization pass could still
>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1>
>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track
>>> vector contents though).
>>>
>>>> Ok, I am testing the patch that removes hooks. Could you push a little
>>>> bit the backend-patterns business?
>>>
>>> Well, I suppose we're waiting for Uros here.  I hadn't much luck with
>>> fiddling with the mode-iterator stuff myself.
>>
>> It is not _that_ trivial change, since we have ix86_expand_fp_vcond
>> and ix86_expand_int_vcond to merge. ATM, FP version deals with FP
>> operands and vice versa. We have to merge them somehow and split out
>> comparison part that handles FP as well as integer operands.
>>
>> I also don't know why vcond is not allowed to FAIL... probably
>> middle-end should be enhanced for a fallback if some comparison isn't
>> supported by optab.
>
> I wonder, if we make vcond being able to FAIL (well, it would fail for
> invalid input only, like mismatching mode size), if patches along
>
> Index: gcc/config/i386/sse.md
> ===================================================================
> --- gcc/config/i386/sse.md      (revision 178209)
> +++ gcc/config/i386/sse.md      (working copy)
> @@ -1406,13 +1406,13 @@ (define_insn "<sse>_ucomi"
>    (set_attr "mode" "<MODE>")])
>
>  (define_expand "vcond<mode>"
> -  [(set (match_operand:VF 0 "register_operand" "")
> -       (if_then_else:VF
> +  [(set (match_operand 0 "register_operand" "")
> +       (if_then_else
>          (match_operator 3 ""
>            [(match_operand:VF 4 "nonimmediate_operand" "")
>             (match_operand:VF 5 "nonimmediate_operand" "")])
> -         (match_operand:VF 1 "general_operand" "")
> -         (match_operand:VF 2 "general_operand" "")))]
> +         (match_operand 1 "general_operand" "")
> +         (match_operand 2 "general_operand" "")))]
>   "TARGET_SSE"
>  {
>   bool ok = ix86_expand_fp_vcond (operands);
>
> would be enough to make it accept V4SF < V4SF ? V4SI : V4SI with
> target mode V4SI.  The expander code doesn't seem to care about
> the modes of op1/2 too much.

It at least "almost" works, apart from

/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:122:1:
error: unrecognizable insn:^M
(insn 1813 1812 1814 255 (set (reg:V4SI 1238)^M
        (lt:V4SI (reg:V4SF 1236)^M
            (reg:V4SF 619 [ D.3579 ])))
/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109
-1^M
     (nil))^M
/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:122:1:
internal compiler error: in extract_insn, at recog.c:2115

I suppose the compare patterns need similar adjustments (though I
couldn't find any SSE lt one, but that may be an artifact).

Modeless operands are warned on though - do we have something better
to simulate the effect?  Like a mode mapper that does a 1:1 translation
of an input mode from a mode iterator to another one?  Can we
use define_mode_attr for this?  Like

Index: config/i386/sse.md
===================================================================
--- config/i386/sse.md  (revision 178209)
+++ config/i386/sse.md  (working copy)
@@ -161,6 +161,9 @@ (define_mode_attr avx_avx2
    (V4SI "avx2") (V2DI "avx2")
    (V8SI "avx2") (V4DI "avx2")])

+(define_mode_attr cmpmode
+  [(V8SF "V8SI") (V4SF "V4SI") (V4DF "V4DI") (V2DF "V2DI")])
+
 ;; Mapping of logic-shift operators
 (define_code_iterator lshift [lshiftrt ashift])

@@ -1348,9 +1351,9 @@ (define_insn "<sse>_maskcmp<mode>3"
    (set_attr "mode" "<MODE>")])

 (define_insn "<sse>_vmmaskcmp<mode>3"
-  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
-       (vec_merge:VF_128
-        (match_operator:VF_128 3 "sse_comparison_operator"
+  [(set (match_operand:<cmpmode> 0 "register_operand" "=x,x")
+       (vec_merge:<cmpmode>
+        (match_operator:<cmpmode> 3 "sse_comparison_operator"
           [(match_operand:VF_128 1 "register_operand" "0,x")
            (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm")])
         (match_dup 1)
@@ -1406,13 +1409,13 @@ (define_insn "<sse>_ucomi"
    (set_attr "mode" "<MODE>")])

 (define_expand "vcond<mode>"
-  [(set (match_operand:VF 0 "register_operand" "")
-       (if_then_else:VF
-         (match_operator 3 ""
+  [(set (match_operand 0 "register_operand" "")
+       (if_then_else
+         (match_operator:<cmpmode> 3 ""
            [(match_operand:VF 4 "nonimmediate_operand" "")
             (match_operand:VF 5 "nonimmediate_operand" "")])
-         (match_operand:VF 1 "general_operand" "")
-         (match_operand:VF 2 "general_operand" "")))]
+         (match_operand 1 "general_operand" "")
+         (match_operand 2 "general_operand" "")))]
   "TARGET_SSE"
 {
   bool ok = ix86_expand_fp_vcond (operands);


etc.  That would still leave the result and 1st/2nd operand modes
unspecified though (but we'd have the comparison result always
an appropriate integer mode).  Maybe

(define_expand "vcond<mode>"
  [(set (match_operand:V_128 0 "register_operand" "")
        (if_then_else
          (match_operator:<cmpmode> 3 ""
            [(match_operand:VF_128 4 "nonimmediate_operand" "")
             (match_operand:VF_128 5 "nonimmediate_operand" "")])
          (match_operand:V_128 1 "general_operand" "")
          (match_operand:V_128 2 "general_operand" "")))]
  "TARGET_SSE"
{
  bool ok = ix86_expand_fp_vcond (operands);
  gcc_assert (ok);
  DONE;
})

and a similar _256 patterns would work as well, but I guess it would
lead to duplicate vcond<mode> gen_* functions.

Richard.

> Richard.
>
>> Uros.
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-29 13:08                                                 ` Richard Guenther
@ 2011-09-06 14:51                                                   ` Artem Shinkarov
  2011-09-06 14:56                                                     ` Richard Guenther
  2011-09-30 15:21                                                     ` Georg-Johann Lay
  0 siblings, 2 replies; 91+ messages in thread
From: Artem Shinkarov @ 2011-09-06 14:51 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Uros Bizjak, Richard Henderson, gcc-patches, Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 1520 bytes --]

Here is a new version of the patch which considers the changes from
2011-09-02  Richard Guenther


ChangeLog

20011-09-06 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>

       gcc/
       * fold-const.c (constant_boolean_node): Adjust the meaning
       of boolean for vector types: true = {-1,..}, false = {0,..}.
       (fold_unary_loc): Avoid conversion of vector comparison to
       boolean type.
       * expr.c (expand_expr_real_2): Expand vector comparison by
       building an appropriate VEC_COND_EXPR.
       * c-typeck.c (build_binary_op): Typecheck vector comparisons.
       (c_objc_common_truthvalue_conversion): Adjust.
       * tree-vect-generic.c (do_compare): Helper function.
       (expand_vector_comparison): Check if hardware supports
       vector comparison of the given type or expand vector
       piecewise.
       (expand_vector_operation): Treat comparison as binary
       operation of vector type.
       (expand_vector_operations_1): Adjust.
       * tree-cfg.c (verify_gimple_comparison): Adjust.

       gcc/config/i386
       * i386.c (ix86_expand_sse_movcc): Consider a case when
       vcond operators are {-1,..} and {0,..}.

       gcc/doc
       * extend.texi: Adjust.

       gcc/testsuite
       * gcc.c-torture/execute/vector-compare-1.c: New test.
       * gcc.c-torture/execute/vector-compare-2.c: New test.
       * gcc.dg/vector-compare-1.c: New test.
       * gcc.dg/vector-compare-2.c: New test.

bootstrapped and tested on x86_64-unknown-linux-gnu.


Thanks,
Artem.

[-- Attachment #2: vec-compare.v8.diff --]
[-- Type: text/plain, Size: 19037 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 178579)
+++ gcc/doc/extend.texi	(working copy)
@@ -6561,6 +6561,29 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+In GNU C vector comparison is supported within standard comparison
+operators: @code{==, !=, <, <=, >, >=}. Comparison operands can be
+vector expressions of integer-type or real-type. Comparison between
+integer-type vectors and real-type vectors are not supported.  The
+result of the comparison is a vector of the same width and number of
+elements as the comparison operands with a signed integral element
+type.
+
+Vectors are compared element-wise producing 0 when comparison is false
+and -1 (constant of the appropriate type where all bits are set)
+otherwise. Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
+c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 178579)
+++ gcc/fold-const.c	(working copy)
@@ -5934,7 +5934,15 @@ extract_muldiv_1 (tree t, tree c, enum t
 tree
 constant_boolean_node (bool value, tree type)
 {
-  if (type == integer_type_node)
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      tree tval;
+      
+      gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE);
+      tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0);
+      return build_vector_from_val (type, tval);
+    }
+  else if (type == integer_type_node)
     return value ? integer_one_node : integer_zero_node;
   else if (type == boolean_type_node)
     return value ? boolean_true_node : boolean_false_node;
@@ -7670,6 +7678,16 @@ fold_unary_loc (location_t loc, enum tre
 	    return build2_loc (loc, TREE_CODE (op0), type,
 			       TREE_OPERAND (op0, 0),
 			       TREE_OPERAND (op0, 1));
+	  else if (TREE_CODE (type) == VECTOR_TYPE)
+	    {
+	      tree el_type = TREE_TYPE (type);
+	      tree op_el_type = TREE_TYPE (TREE_TYPE (op0));
+
+	      if (el_type == op_el_type)
+		return op0;
+	      else
+		build1_loc (loc, VIEW_CONVERT_EXPR, type, op0);
+	    }
 	  else if (!INTEGRAL_TYPE_P (type))
 	    return build3_loc (loc, COND_EXPR, type, op0,
 			       constant_boolean_node (true, type),
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
@@ -0,0 +1,123 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define check_compare(count, res, i0, i1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+      if ((res)[__i] != ((i0)[__i] op (i1)[__i] ? -1 : 0)) \
+	{ \
+            __builtin_printf ("%i != ((" fmt " " #op " " fmt " ? -1 : 0) ", \
+			      (res)[__i], (i0)[__i], (i1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, res, fmt); \
+do { \
+    res = (v0 > v1); \
+    check_compare (count, res, v0, v1, >, fmt); \
+    res = (v0 < v1); \
+    check_compare (count, res, v0, v1, <, fmt); \
+    res = (v0 >= v1); \
+    check_compare (count, res, v0, v1, >=, fmt); \
+    res = (v0 <= v1); \
+    check_compare (count, res, v0, v1, <=, fmt); \
+    res = (v0 == v1); \
+    check_compare (count, res, v0, v1, ==, fmt); \
+    res = (v0 != v1); \
+    check_compare (count, res, v0, v1, !=, fmt); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0;
+    vector (4, INT) i1;
+    vector (4, int) ires;
+    int i;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, i0, i1, ires, "%i");
+#undef INT
+
+#define INT unsigned int 
+    vector (4, int) ures;
+    vector (4, INT) u0;
+    vector (4, INT) u1;
+
+    u0 = (vector (4, INT)){argc, 1,  2,  10};
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, u0, u1, ures, "%u");
+#undef INT
+
+
+#define SHORT short
+    vector (8, SHORT) s0;
+    vector (8, SHORT) s1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, s0, s1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;
+    vector (8, SHORT) us1;
+    vector (8, short) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, us0, us1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;
+    vector (16, CHAR) c1;
+    vector (16, signed char) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, c0, c1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;
+    vector (16, CHAR) uc1;
+    vector (16, signed char) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, uc0, uc1, ucres, "%u");
+#undef CHAR
+/* Float comparison.  */
+    vector (4, float) f0;
+    vector (4, float) f1;
+    vector (4, int) ifres;
+
+    f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
+    f1 = (vector (4, float)){0., 3., 2., (float)-23};    
+    test (4, f0, f1, ifres, "%f");
+    
+/* Double comparison.  */
+    vector (2, double) d0;
+    vector (2, double) d1;
+    vector (2, long) idres;
+
+    d0 = (vector (2, double)){(double)argc,  10.};
+    d1 = (vector (2, double)){0., (double)-23};    
+    test (2, d0, d1, idres, "%f");
+
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+/* Check that constant folding in 
+   these simple cases works.  */
+vector (4, int)
+foo (vector (4, int) x)
+{
+  return   (x == x) + (x != x) + (x >  x) 
+	 + (x <  x) + (x >= x) + (x <= x);
+}
+
+int 
+main (int argc, char *argv[])
+{
+  vector (4, int) t = {argc, 2, argc, 42};
+  vector (4, int) r;
+  int i;
+
+  r = foo (t);
+
+  for (i = 0; i < 4; i++)
+    if (r[i] != -3)
+      __builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+void
+foo (vector (4, int) x, vector (4, float) y)
+{
+  vector (4, int) p4;
+  vector (4, int) r4;
+  vector (4, unsigned int) q4;
+  vector (8, int) r8;
+  vector (4, float) f4;
+  
+  r4 = x > y;	    /* { dg-error "comparing vectors with different element types" } */
+  r8 = (x != p4);   /* { dg-error "incompatible types when assigning to type" } */
+  r8 == r4;	    /* { dg-error "comparing vectors with different number of elements" } */
+}
Index: gcc/testsuite/gcc.dg/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */   
+
+/* Test if C_MAYBE_CONST are folded correctly when 
+   creating VEC_COND_EXPR.  */
+
+typedef int vec __attribute__((vector_size(16)));
+
+vec i,j;
+extern vec a, b, c;
+
+extern int p, q, z;
+extern vec foo (int);
+
+vec 
+foo (int x)
+{
+  return  foo (p ? q :z) > a;
+}
+
+vec 
+bar (int x)
+{
+  return  b > foo (p ? q :z);
+}
+
+
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 178579)
+++ gcc/expr.c	(working copy)
@@ -8465,6 +8465,29 @@ expand_expr_real_2 (sepops ops, rtx targ
     case UNGE_EXPR:
     case UNEQ_EXPR:
     case LTGT_EXPR:
+      if (TREE_CODE (ops->type) == VECTOR_TYPE)
+	{
+	  enum tree_code code = ops->code;
+	  tree arg0 = ops->op0;
+	  tree arg1 = ops->op1;
+	  tree el_type = TREE_TYPE (TREE_TYPE (arg0));
+	  tree t, ifexp, if_true, if_false;
+	  
+	  el_type = build_nonstandard_integer_type 
+			(GET_MODE_BITSIZE (TYPE_MODE (el_type)), 0);
+
+	  ifexp = build2 (code, type, arg0, arg1);
+	  if_true = build_vector_from_val (type, build_int_cst (el_type, -1));
+	  if_false = build_vector_from_val (type, build_int_cst (el_type, 0));
+	  
+	  t = build3 (VEC_COND_EXPR, type, ifexp, if_true, if_false);
+            
+	  return expand_expr (t,
+			      modifier != EXPAND_STACK_PARM ? target : NULL_RTX, 
+			      tmode != VOIDmode ? tmode : mode, 
+			      modifier);
+	}
+
       temp = do_store_flag (ops,
 			    modifier != EXPAND_STACK_PARM ? target : NULL_RTX,
 			    tmode != VOIDmode ? tmode : mode);
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 178579)
+++ gcc/c-typeck.c	(working copy)
@@ -9910,6 +9910,29 @@ build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -10022,6 +10045,29 @@ build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10429,6 +10475,10 @@ c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 178579)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -35,6 +35,10 @@ along with GCC; see the file COPYING3.
 #include "expr.h"
 #include "optabs.h"
 
+
+static void expand_vector_operations_1 (gimple_stmt_iterator *);
+
+
 /* Build a constant of type TYPE, made of VALUE's bits replicated
    every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision.  */
 static tree
@@ -125,6 +129,31 @@ do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0
+   
+   INNER_TYPE is the type of A and B elements
+   
+   returned expression is of signed integer type with the 
+   size equal to the size of INNER_TYPE.  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree comp_type;
+
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  
+  comp_type = build_nonstandard_integer_type 
+		      (GET_MODE_BITSIZE (TYPE_MODE (inner_type)), 0);
+
+  return gimplify_build3 (gsi, COND_EXPR, comp_type,
+			  fold_build2 (code, boolean_type_node, a, b),
+			  build_int_cst (comp_type, -1),
+			  build_int_cst (comp_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -333,6 +362,24 @@ uniform_vector_p (tree vec)
   return NULL_TREE;
 }
 
+/* Try to expand vector comparison expression OP0 CODE OP1 by
+   querying optab if the following expression:
+	VEC_COND_EXPR< OP0 CODE OP1, {-1,...}, {0,...}>
+   can be expanded.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+  tree t;
+  if (! expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  else
+    t = NULL_TREE;
+
+  return t;
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -375,8 +422,27 @@ expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+	{
+	  tree rhs1 = gimple_assign_rhs1 (assign);
+	  tree rhs2 = gimple_assign_rhs2 (assign);
 
+	  return expand_vector_comparison (gsi, type, rhs1, rhs2, code);
+	}
       default:
 	break;
       }
@@ -450,11 +516,11 @@ expand_vector_operations_1 (gimple_stmt_
 
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
+  lhs = gimple_assign_lhs (stmt);
 
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
-  lhs = gimple_assign_lhs (stmt);
   rhs1 = gimple_assign_rhs1 (stmt);
   type = gimple_expr_type (stmt);
   if (rhs_class == GIMPLE_BINARY_RHS)
@@ -598,6 +664,11 @@ expand_vector_operations_1 (gimple_stmt_
 
   gcc_assert (code != VEC_LSHIFT_EXPR && code != VEC_RSHIFT_EXPR);
   new_rhs = expand_vector_operation (gsi, type, compute_type, stmt, code);
+
+  /* Leave expression untouched for later expansion.  */
+  if (new_rhs == NULL_TREE)
+    return;
+
   if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (new_rhs)))
     new_rhs = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, TREE_TYPE (lhs),
                                new_rhs);
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 178579)
+++ gcc/tree-cfg.c	(working copy)
@@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre
       return true;
     }
 
+  if (TREE_CODE (type) == VECTOR_TYPE)
+    {
+      if (TREE_CODE (op0_type) != VECTOR_TYPE
+	  || TREE_CODE (op1_type) != VECTOR_TYPE)
+        {
+          error ("non-vector operands in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (!useless_type_conversion_p (op0_type, op1_type)
+	  && !useless_type_conversion_p (op1_type, op0_type))
+        {
+          error ("type mismatch in vector comparison");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type)) 
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+        
+      return false;
+    }
+
   /* For comparisons we do not have the operations type as the
      effective type the comparison is carried out in.  Instead
      we require that either the first operand is trivially
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 178579)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18445,8 +18445,13 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx t2, t3, x;
-
-  if (op_false == CONST0_RTX (mode))
+  
+  if (vector_all_ones_operand (op_true, GET_MODE (op_true))
+      && rtx_equal_p (op_false, CONST0_RTX (mode)))
+    {
+      emit_insn (gen_rtx_SET (VOIDmode, dest, cmp));
+    }
+  else if (op_false == CONST0_RTX (mode))
     {
       op_true = force_reg (mode, op_true);
       x = gen_rtx_AND (mode, cmp, op_true);

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-06 14:51                                                   ` Artem Shinkarov
@ 2011-09-06 14:56                                                     ` Richard Guenther
  2011-09-07 14:14                                                       ` Artem Shinkarov
  2011-09-30 15:21                                                     ` Georg-Johann Lay
  1 sibling, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-09-06 14:56 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Uros Bizjak, Richard Henderson, gcc-patches, Joseph S. Myers

On Tue, Sep 6, 2011 at 4:50 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> Here is a new version of the patch which considers the changes from
> 2011-09-02  Richard Guenther
>
>
> ChangeLog
>
> 20011-09-06 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>
>       gcc/
>       * fold-const.c (constant_boolean_node): Adjust the meaning
>       of boolean for vector types: true = {-1,..}, false = {0,..}.
>       (fold_unary_loc): Avoid conversion of vector comparison to
>       boolean type.

Both changes have already been done.

>       * expr.c (expand_expr_real_2): Expand vector comparison by
>       building an appropriate VEC_COND_EXPR.

I prefer

Index: gcc/expr.c
===================================================================
*** gcc/expr.c.orig     2011-08-29 11:48:23.000000000 +0200
--- gcc/expr.c  2011-08-29 12:58:59.000000000 +0200
*************** do_store_flag (sepops ops, rtx target, e
*** 10309,10314 ****
--- 10309,10325 ----
    STRIP_NOPS (arg0);
    STRIP_NOPS (arg1);

+   /* For vector typed comparisons emit code to generate the desired
+      all-ones or all-zeros mask.  Conveniently use the VEC_COND_EXPR
+      expander for this.  */
+   if (TREE_CODE (ops->type) == VECTOR_TYPE)
+     {
+       tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
+       tree if_true = constant_boolean_node (true, ops->type);
+       tree if_false = constant_boolean_node (false, ops->type);
+       return expand_vec_cond_expr (ops->type, ifexp, if_true,
if_false, target);
+     }
+
    /* Get the rtx comparison code to use.  We know that EXP is a comparison

as I said multiple times.

>       * c-typeck.c (build_binary_op): Typecheck vector comparisons.
>       (c_objc_common_truthvalue_conversion): Adjust.
>       * tree-vect-generic.c (do_compare): Helper function.
>       (expand_vector_comparison): Check if hardware supports
>       vector comparison of the given type or expand vector
>       piecewise.
>       (expand_vector_operation): Treat comparison as binary
>       operation of vector type.
>       (expand_vector_operations_1): Adjust.
>       * tree-cfg.c (verify_gimple_comparison): Adjust.

The tree-cfg.c change has already been done.

Richard.

>
>       gcc/config/i386
>       * i386.c (ix86_expand_sse_movcc): Consider a case when
>       vcond operators are {-1,..} and {0,..}.
>
>       gcc/doc
>       * extend.texi: Adjust.
>
>       gcc/testsuite
>       * gcc.c-torture/execute/vector-compare-1.c: New test.
>       * gcc.c-torture/execute/vector-compare-2.c: New test.
>       * gcc.dg/vector-compare-1.c: New test.
>       * gcc.dg/vector-compare-2.c: New test.
>
> bootstrapped and tested on x86_64-unknown-linux-gnu.
>
>
> Thanks,
> Artem.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-06 14:56                                                     ` Richard Guenther
@ 2011-09-07 14:14                                                       ` Artem Shinkarov
  2011-09-07 15:08                                                         ` Joseph S. Myers
  2011-09-08 12:56                                                         ` Richard Guenther
  0 siblings, 2 replies; 91+ messages in thread
From: Artem Shinkarov @ 2011-09-07 14:14 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Uros Bizjak, Richard Henderson, gcc-patches, Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 4715 bytes --]

On Tue, Sep 6, 2011 at 3:56 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Sep 6, 2011 at 4:50 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> Here is a new version of the patch which considers the changes from
>> 2011-09-02  Richard Guenther
>>
>>
>> ChangeLog
>>
>> 20011-09-06 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>
>>       gcc/
>>       * fold-const.c (constant_boolean_node): Adjust the meaning
>>       of boolean for vector types: true = {-1,..}, false = {0,..}.
>>       (fold_unary_loc): Avoid conversion of vector comparison to
>>       boolean type.
>
> Both changes have already been done.

I missed the way you applied constant_boolean node, sorry for that.
But fold_unary_loc seems confusing to me. We have the following code:

	  else if (!INTEGRAL_TYPE_P (type))
	    return build3_loc (loc, COND_EXPR, type, op0,
			       constant_boolean_node (true, type),
			       constant_boolean_node (false, type));

But this is wrong for the vector types, because it should construct
VEC_COND_EXPR, not COND_EXPR. That is why I had a special case for
vectors.

>>       * expr.c (expand_expr_real_2): Expand vector comparison by
>>       building an appropriate VEC_COND_EXPR.
>
> I prefer
>
> Index: gcc/expr.c
> ===================================================================
> *** gcc/expr.c.orig     2011-08-29 11:48:23.000000000 +0200
> --- gcc/expr.c  2011-08-29 12:58:59.000000000 +0200
> *************** do_store_flag (sepops ops, rtx target, e
> *** 10309,10314 ****
> --- 10309,10325 ----
>    STRIP_NOPS (arg0);
>    STRIP_NOPS (arg1);
>
> +   /* For vector typed comparisons emit code to generate the desired
> +      all-ones or all-zeros mask.  Conveniently use the VEC_COND_EXPR
> +      expander for this.  */
> +   if (TREE_CODE (ops->type) == VECTOR_TYPE)
> +     {
> +       tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
> +       tree if_true = constant_boolean_node (true, ops->type);
> +       tree if_false = constant_boolean_node (false, ops->type);
> +       return expand_vec_cond_expr (ops->type, ifexp, if_true,
> if_false, target);
> +     }
> +
>    /* Get the rtx comparison code to use.  We know that EXP is a comparison
>
> as I said multiple times.
>
>>       * c-typeck.c (build_binary_op): Typecheck vector comparisons.
>>       (c_objc_common_truthvalue_conversion): Adjust.
>>       * tree-vect-generic.c (do_compare): Helper function.
>>       (expand_vector_comparison): Check if hardware supports
>>       vector comparison of the given type or expand vector
>>       piecewise.
>>       (expand_vector_operation): Treat comparison as binary
>>       operation of vector type.
>>       (expand_vector_operations_1): Adjust.
>>       * tree-cfg.c (verify_gimple_comparison): Adjust.
>
> The tree-cfg.c change has already been done.
>
> Richard.
>
>>
>>       gcc/config/i386
>>       * i386.c (ix86_expand_sse_movcc): Consider a case when
>>       vcond operators are {-1,..} and {0,..}.
>>
>>       gcc/doc
>>       * extend.texi: Adjust.
>>
>>       gcc/testsuite
>>       * gcc.c-torture/execute/vector-compare-1.c: New test.
>>       * gcc.c-torture/execute/vector-compare-2.c: New test.
>>       * gcc.dg/vector-compare-1.c: New test.
>>       * gcc.dg/vector-compare-2.c: New test.
>>
>> bootstrapped and tested on x86_64-unknown-linux-gnu.
>>
>>
>> Thanks,
>> Artem.
>>
>

All the rest is adjusted in the new version of the patch you can find
in the attachment.

ChangLog


20011-09-06 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>

      gcc/
      * expr.c (do_store_flag): Expand vector comparison by
      building an appropriate VEC_COND_EXPR.
      * c-typeck.c (build_binary_op): Typecheck vector comparisons.
      (c_objc_common_truthvalue_conversion): Adjust.
      * tree-vect-generic.c (do_compare): Helper function.
      (expand_vector_comparison): Check if hardware supports
      vector comparison of the given type or expand vector
      piecewise.
      (expand_vector_operation): Treat comparison as binary
      operation of vector type.
      (expand_vector_operations_1): Adjust.

      gcc/config/i386
      * i386.c (ix86_expand_sse_movcc): Consider a case when
      vcond operators are {-1,..} and {0,..}.

      gcc/doc
      * extend.texi: Adjust.

      gcc/testsuite
      * gcc.c-torture/execute/vector-compare-1.c: New test.
      * gcc.c-torture/execute/vector-compare-2.c: New test.
      * gcc.dg/vector-compare-1.c: New test.
      * gcc.dg/vector-compare-2.c: New test.

bootstrapped and tested on x86_64-unknown-linux-gnu.

[-- Attachment #2: vec-compare.v9.diff --]
[-- Type: text/plain, Size: 15943 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 178579)
+++ gcc/doc/extend.texi	(working copy)
@@ -6561,6 +6561,29 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+In GNU C vector comparison is supported within standard comparison
+operators: @code{==, !=, <, <=, >, >=}. Comparison operands can be
+vector expressions of integer-type or real-type. Comparison between
+integer-type vectors and real-type vectors are not supported.  The
+result of the comparison is a vector of the same width and number of
+elements as the comparison operands with a signed integral element
+type.
+
+Vectors are compared element-wise producing 0 when comparison is false
+and -1 (constant of the appropriate type where all bits are set)
+otherwise. Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a >  b;     /* The result would be @{0, 0,-1, 0@}  */
+c = a == b;     /* The result would be @{0,-1, 0,-1@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
@@ -0,0 +1,123 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define check_compare(count, res, i0, i1, op, fmt) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+      if ((res)[__i] != ((i0)[__i] op (i1)[__i] ? -1 : 0)) \
+	{ \
+            __builtin_printf ("%i != ((" fmt " " #op " " fmt " ? -1 : 0) ", \
+			      (res)[__i], (i0)[__i], (i1)[__i]); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(count, v0, v1, res, fmt); \
+do { \
+    res = (v0 > v1); \
+    check_compare (count, res, v0, v1, >, fmt); \
+    res = (v0 < v1); \
+    check_compare (count, res, v0, v1, <, fmt); \
+    res = (v0 >= v1); \
+    check_compare (count, res, v0, v1, >=, fmt); \
+    res = (v0 <= v1); \
+    check_compare (count, res, v0, v1, <=, fmt); \
+    res = (v0 == v1); \
+    check_compare (count, res, v0, v1, ==, fmt); \
+    res = (v0 != v1); \
+    check_compare (count, res, v0, v1, !=, fmt); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0;
+    vector (4, INT) i1;
+    vector (4, int) ires;
+    int i;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, i0, i1, ires, "%i");
+#undef INT
+
+#define INT unsigned int 
+    vector (4, int) ures;
+    vector (4, INT) u0;
+    vector (4, INT) u1;
+
+    u0 = (vector (4, INT)){argc, 1,  2,  10};
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (4, u0, u1, ures, "%u");
+#undef INT
+
+
+#define SHORT short
+    vector (8, SHORT) s0;
+    vector (8, SHORT) s1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, s0, s1, sres, "%i");
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;
+    vector (8, SHORT) us1;
+    vector (8, short) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (8, us0, us1, usres, "%u");
+#undef SHORT
+
+#define CHAR signed char
+    vector (16, CHAR) c0;
+    vector (16, CHAR) c1;
+    vector (16, signed char) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, c0, c1, cres, "%i");
+#undef CHAR
+
+#define CHAR unsigned char
+    vector (16, CHAR) uc0;
+    vector (16, CHAR) uc1;
+    vector (16, signed char) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (16, uc0, uc1, ucres, "%u");
+#undef CHAR
+/* Float comparison.  */
+    vector (4, float) f0;
+    vector (4, float) f1;
+    vector (4, int) ifres;
+
+    f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
+    f1 = (vector (4, float)){0., 3., 2., (float)-23};    
+    test (4, f0, f1, ifres, "%f");
+    
+/* Double comparison.  */
+    vector (2, double) d0;
+    vector (2, double) d1;
+    vector (2, long) idres;
+
+    d0 = (vector (2, double)){(double)argc,  10.};
+    d1 = (vector (2, double)){0., (double)-23};    
+    test (2, d0, d1, idres, "%f");
+
+
+    return 0;
+}
+
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-2.c	(revision 0)
@@ -0,0 +1,27 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+/* Check that constant folding in 
+   these simple cases works.  */
+vector (4, int)
+foo (vector (4, int) x)
+{
+  return   (x == x) + (x != x) + (x >  x) 
+	 + (x <  x) + (x >= x) + (x <= x);
+}
+
+int 
+main (int argc, char *argv[])
+{
+  vector (4, int) t = {argc, 2, argc, 42};
+  vector (4, int) r;
+  int i;
+
+  r = foo (t);
+
+  for (i = 0; i < 4; i++)
+    if (r[i] != -3)
+      __builtin_abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-1.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+void
+foo (vector (4, int) x, vector (4, float) y)
+{
+  vector (4, int) p4;
+  vector (4, int) r4;
+  vector (4, unsigned int) q4;
+  vector (8, int) r8;
+  vector (4, float) f4;
+  
+  r4 = x > y;	    /* { dg-error "comparing vectors with different element types" } */
+  r8 = (x != p4);   /* { dg-error "incompatible types when assigning to type" } */
+  r8 == r4;	    /* { dg-error "comparing vectors with different number of elements" } */
+}
Index: gcc/testsuite/gcc.dg/vector-compare-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vector-compare-2.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */   
+
+/* Test if C_MAYBE_CONST are folded correctly when 
+   creating VEC_COND_EXPR.  */
+
+typedef int vec __attribute__((vector_size(16)));
+
+vec i,j;
+extern vec a, b, c;
+
+extern int p, q, z;
+extern vec foo (int);
+
+vec 
+foo (int x)
+{
+  return  foo (p ? q :z) > a;
+}
+
+vec 
+bar (int x)
+{
+  return  b > foo (p ? q :z);
+}
+
+
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 178579)
+++ gcc/expr.c	(working copy)
@@ -10308,6 +10308,17 @@ do_store_flag (sepops ops, rtx target, e
 
   STRIP_NOPS (arg0);
   STRIP_NOPS (arg1);
+  
+  /* For vector typed comparisons emit code to generate the desired
+     all-ones or all-zeros mask.  Conveniently use the VEC_COND_EXPR
+     expander for this.  */
+  if (TREE_CODE (ops->type) == VECTOR_TYPE)
+    {
+      tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
+      tree if_true = constant_boolean_node (true, ops->type);
+      tree if_false = constant_boolean_node (false, ops->type);
+      return expand_vec_cond_expr (ops->type, ifexp, if_true, if_false, target);
+    }
 
   /* Get the rtx comparison code to use.  We know that EXP is a comparison
      operation of some type.  Some comparisons against 1 and -1 can be
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 178579)
+++ gcc/c-typeck.c	(working copy)
@@ -9910,6 +9910,29 @@ build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -10022,6 +10045,29 @@ build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10429,6 +10475,10 @@ c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 178579)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -35,6 +35,10 @@ along with GCC; see the file COPYING3.
 #include "expr.h"
 #include "optabs.h"
 
+
+static void expand_vector_operations_1 (gimple_stmt_iterator *);
+
+
 /* Build a constant of type TYPE, made of VALUE's bits replicated
    every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision.  */
 static tree
@@ -125,6 +129,31 @@ do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0
+   
+   INNER_TYPE is the type of A and B elements
+   
+   returned expression is of signed integer type with the 
+   size equal to the size of INNER_TYPE.  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree comp_type;
+
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  
+  comp_type = build_nonstandard_integer_type 
+		      (GET_MODE_BITSIZE (TYPE_MODE (inner_type)), 0);
+
+  return gimplify_build3 (gsi, COND_EXPR, comp_type,
+			  fold_build2 (code, boolean_type_node, a, b),
+			  build_int_cst (comp_type, -1),
+			  build_int_cst (comp_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -333,6 +362,24 @@ uniform_vector_p (tree vec)
   return NULL_TREE;
 }
 
+/* Try to expand vector comparison expression OP0 CODE OP1 by
+   querying optab if the following expression:
+	VEC_COND_EXPR< OP0 CODE OP1, {-1,...}, {0,...}>
+   can be expanded.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+  tree t;
+  if (! expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  else
+    t = NULL_TREE;
+
+  return t;
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -375,8 +422,27 @@ expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+	{
+	  tree rhs1 = gimple_assign_rhs1 (assign);
+	  tree rhs2 = gimple_assign_rhs2 (assign);
 
+	  return expand_vector_comparison (gsi, type, rhs1, rhs2, code);
+	}
       default:
 	break;
       }
@@ -450,11 +516,11 @@ expand_vector_operations_1 (gimple_stmt_
 
   code = gimple_assign_rhs_code (stmt);
   rhs_class = get_gimple_rhs_class (code);
+  lhs = gimple_assign_lhs (stmt);
 
   if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS)
     return;
 
-  lhs = gimple_assign_lhs (stmt);
   rhs1 = gimple_assign_rhs1 (stmt);
   type = gimple_expr_type (stmt);
   if (rhs_class == GIMPLE_BINARY_RHS)
@@ -598,6 +664,11 @@ expand_vector_operations_1 (gimple_stmt_
 
   gcc_assert (code != VEC_LSHIFT_EXPR && code != VEC_RSHIFT_EXPR);
   new_rhs = expand_vector_operation (gsi, type, compute_type, stmt, code);
+
+  /* Leave expression untouched for later expansion.  */
+  if (new_rhs == NULL_TREE)
+    return;
+
   if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (new_rhs)))
     new_rhs = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, TREE_TYPE (lhs),
                                new_rhs);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 178579)
+++ gcc/config/i386/i386.c	(working copy)
@@ -18445,8 +18445,13 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
 {
   enum machine_mode mode = GET_MODE (dest);
   rtx t2, t3, x;
-
-  if (op_false == CONST0_RTX (mode))
+  
+  if (vector_all_ones_operand (op_true, GET_MODE (op_true))
+      && rtx_equal_p (op_false, CONST0_RTX (mode)))
+    {
+      emit_insn (gen_rtx_SET (VOIDmode, dest, cmp));
+    }
+  else if (op_false == CONST0_RTX (mode))
     {
       op_true = force_reg (mode, op_true);
       x = gen_rtx_AND (mode, cmp, op_true);

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-07 14:14                                                       ` Artem Shinkarov
@ 2011-09-07 15:08                                                         ` Joseph S. Myers
  2011-09-26 14:56                                                           ` Richard Guenther
  2011-09-08 12:56                                                         ` Richard Guenther
  1 sibling, 1 reply; 91+ messages in thread
From: Joseph S. Myers @ 2011-09-07 15:08 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Guenther, Uros Bizjak, Richard Henderson, gcc-patches

This looks like it has the same issue with maybe needing to use 
TYPE_MAIN_VARIANT in type comparisons as the shuffle patch.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-07 14:14                                                       ` Artem Shinkarov
  2011-09-07 15:08                                                         ` Joseph S. Myers
@ 2011-09-08 12:56                                                         ` Richard Guenther
  2011-09-08 13:46                                                           ` Richard Guenther
  2011-09-08 18:14                                                           ` Uros Bizjak
  1 sibling, 2 replies; 91+ messages in thread
From: Richard Guenther @ 2011-09-08 12:56 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Uros Bizjak, Richard Henderson, gcc-patches, Joseph S. Myers

On Wed, Sep 7, 2011 at 3:15 PM, Artem Shinkarov
<artyom.shinkaroff@gmail.com> wrote:
> On Tue, Sep 6, 2011 at 3:56 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Sep 6, 2011 at 4:50 PM, Artem Shinkarov
>> <artyom.shinkaroff@gmail.com> wrote:
>>> Here is a new version of the patch which considers the changes from
>>> 2011-09-02  Richard Guenther
>>>
>>>
>>> ChangeLog
>>>
>>> 20011-09-06 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>>
>>>       gcc/
>>>       * fold-const.c (constant_boolean_node): Adjust the meaning
>>>       of boolean for vector types: true = {-1,..}, false = {0,..}.
>>>       (fold_unary_loc): Avoid conversion of vector comparison to
>>>       boolean type.
>>
>> Both changes have already been done.
>
> I missed the way you applied constant_boolean node, sorry for that.
> But fold_unary_loc seems confusing to me. We have the following code:
>
>          else if (!INTEGRAL_TYPE_P (type))
>            return build3_loc (loc, COND_EXPR, type, op0,
>                               constant_boolean_node (true, type),
>                               constant_boolean_node (false, type));
>
> But this is wrong for the vector types, because it should construct
> VEC_COND_EXPR, not COND_EXPR. That is why I had a special case for
> vectors.

Ah, yeah.  I'll fix that.

The patch looks ok to me from a middle-end point of view.  Thus, if
Joseph is fine with it and Uros is, with the i386 piece the patch is ok.

Thanks,
Richard.

>>>       * expr.c (expand_expr_real_2): Expand vector comparison by
>>>       building an appropriate VEC_COND_EXPR.
>>
>> I prefer
>>
>> Index: gcc/expr.c
>> ===================================================================
>> *** gcc/expr.c.orig     2011-08-29 11:48:23.000000000 +0200
>> --- gcc/expr.c  2011-08-29 12:58:59.000000000 +0200
>> *************** do_store_flag (sepops ops, rtx target, e
>> *** 10309,10314 ****
>> --- 10309,10325 ----
>>    STRIP_NOPS (arg0);
>>    STRIP_NOPS (arg1);
>>
>> +   /* For vector typed comparisons emit code to generate the desired
>> +      all-ones or all-zeros mask.  Conveniently use the VEC_COND_EXPR
>> +      expander for this.  */
>> +   if (TREE_CODE (ops->type) == VECTOR_TYPE)
>> +     {
>> +       tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
>> +       tree if_true = constant_boolean_node (true, ops->type);
>> +       tree if_false = constant_boolean_node (false, ops->type);
>> +       return expand_vec_cond_expr (ops->type, ifexp, if_true,
>> if_false, target);
>> +     }
>> +
>>    /* Get the rtx comparison code to use.  We know that EXP is a comparison
>>
>> as I said multiple times.
>>
>>>       * c-typeck.c (build_binary_op): Typecheck vector comparisons.
>>>       (c_objc_common_truthvalue_conversion): Adjust.
>>>       * tree-vect-generic.c (do_compare): Helper function.
>>>       (expand_vector_comparison): Check if hardware supports
>>>       vector comparison of the given type or expand vector
>>>       piecewise.
>>>       (expand_vector_operation): Treat comparison as binary
>>>       operation of vector type.
>>>       (expand_vector_operations_1): Adjust.
>>>       * tree-cfg.c (verify_gimple_comparison): Adjust.
>>
>> The tree-cfg.c change has already been done.
>>
>> Richard.
>>
>>>
>>>       gcc/config/i386
>>>       * i386.c (ix86_expand_sse_movcc): Consider a case when
>>>       vcond operators are {-1,..} and {0,..}.
>>>
>>>       gcc/doc
>>>       * extend.texi: Adjust.
>>>
>>>       gcc/testsuite
>>>       * gcc.c-torture/execute/vector-compare-1.c: New test.
>>>       * gcc.c-torture/execute/vector-compare-2.c: New test.
>>>       * gcc.dg/vector-compare-1.c: New test.
>>>       * gcc.dg/vector-compare-2.c: New test.
>>>
>>> bootstrapped and tested on x86_64-unknown-linux-gnu.
>>>
>>>
>>> Thanks,
>>> Artem.
>>>
>>
>
> All the rest is adjusted in the new version of the patch you can find
> in the attachment.
>
> ChangLog
>
>
> 20011-09-06 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>
>      gcc/
>      * expr.c (do_store_flag): Expand vector comparison by
>      building an appropriate VEC_COND_EXPR.
>      * c-typeck.c (build_binary_op): Typecheck vector comparisons.
>      (c_objc_common_truthvalue_conversion): Adjust.
>      * tree-vect-generic.c (do_compare): Helper function.
>      (expand_vector_comparison): Check if hardware supports
>      vector comparison of the given type or expand vector
>      piecewise.
>      (expand_vector_operation): Treat comparison as binary
>      operation of vector type.
>      (expand_vector_operations_1): Adjust.
>
>      gcc/config/i386
>      * i386.c (ix86_expand_sse_movcc): Consider a case when
>      vcond operators are {-1,..} and {0,..}.
>
>      gcc/doc
>      * extend.texi: Adjust.
>
>      gcc/testsuite
>      * gcc.c-torture/execute/vector-compare-1.c: New test.
>      * gcc.c-torture/execute/vector-compare-2.c: New test.
>      * gcc.dg/vector-compare-1.c: New test.
>      * gcc.dg/vector-compare-2.c: New test.
>
> bootstrapped and tested on x86_64-unknown-linux-gnu.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-08 12:56                                                         ` Richard Guenther
@ 2011-09-08 13:46                                                           ` Richard Guenther
  2011-09-08 18:14                                                           ` Uros Bizjak
  1 sibling, 0 replies; 91+ messages in thread
From: Richard Guenther @ 2011-09-08 13:46 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Uros Bizjak, Richard Henderson, gcc-patches, Joseph S. Myers

On Thu, Sep 8, 2011 at 2:41 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Sep 7, 2011 at 3:15 PM, Artem Shinkarov
> <artyom.shinkaroff@gmail.com> wrote:
>> On Tue, Sep 6, 2011 at 3:56 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Sep 6, 2011 at 4:50 PM, Artem Shinkarov
>>> <artyom.shinkaroff@gmail.com> wrote:
>>>> Here is a new version of the patch which considers the changes from
>>>> 2011-09-02  Richard Guenther
>>>>
>>>>
>>>> ChangeLog
>>>>
>>>> 20011-09-06 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>>>
>>>>       gcc/
>>>>       * fold-const.c (constant_boolean_node): Adjust the meaning
>>>>       of boolean for vector types: true = {-1,..}, false = {0,..}.
>>>>       (fold_unary_loc): Avoid conversion of vector comparison to
>>>>       boolean type.
>>>
>>> Both changes have already been done.
>>
>> I missed the way you applied constant_boolean node, sorry for that.
>> But fold_unary_loc seems confusing to me. We have the following code:
>>
>>          else if (!INTEGRAL_TYPE_P (type))
>>            return build3_loc (loc, COND_EXPR, type, op0,
>>                               constant_boolean_node (true, type),
>>                               constant_boolean_node (false, type));
>>
>> But this is wrong for the vector types, because it should construct
>> VEC_COND_EXPR, not COND_EXPR. That is why I had a special case for
>> vectors.
>
> Ah, yeah.  I'll fix that.

OTOH, we require that vectors are converted with VIEW_CONVERT_EXPRs,
so the code shouldn't trigger anyway.

Richard.

> The patch looks ok to me from a middle-end point of view.  Thus, if
> Joseph is fine with it and Uros is, with the i386 piece the patch is ok.
>
> Thanks,
> Richard.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-08 12:56                                                         ` Richard Guenther
  2011-09-08 13:46                                                           ` Richard Guenther
@ 2011-09-08 18:14                                                           ` Uros Bizjak
  1 sibling, 0 replies; 91+ messages in thread
From: Uros Bizjak @ 2011-09-08 18:14 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Artem Shinkarov, Richard Henderson, gcc-patches, Joseph S. Myers

On Thu, Sep 8, 2011 at 2:41 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:

>> All the rest is adjusted in the new version of the patch you can find
>> in the attachment.
>>
>> ChangLog
>>
>>
>> 20011-09-06 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>
>>      gcc/
>>      * expr.c (do_store_flag): Expand vector comparison by
>>      building an appropriate VEC_COND_EXPR.
>>      * c-typeck.c (build_binary_op): Typecheck vector comparisons.
>>      (c_objc_common_truthvalue_conversion): Adjust.
>>      * tree-vect-generic.c (do_compare): Helper function.
>>      (expand_vector_comparison): Check if hardware supports
>>      vector comparison of the given type or expand vector
>>      piecewise.
>>      (expand_vector_operation): Treat comparison as binary
>>      operation of vector type.
>>      (expand_vector_operations_1): Adjust.
>>
>>      gcc/config/i386
>>      * i386.c (ix86_expand_sse_movcc): Consider a case when
>>      vcond operators are {-1,..} and {0,..}.
>>
>>      gcc/doc
>>      * extend.texi: Adjust.
>>
>>      gcc/testsuite
>>      * gcc.c-torture/execute/vector-compare-1.c: New test.
>>      * gcc.c-torture/execute/vector-compare-2.c: New test.
>>      * gcc.dg/vector-compare-1.c: New test.
>>      * gcc.dg/vector-compare-2.c: New test.
>>
>> bootstrapped and tested on x86_64-unknown-linux-gnu.

The x86 part is OK.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-08-29 12:54                       ` Paolo Bonzini
@ 2011-09-16 18:08                         ` Richard Henderson
  0 siblings, 0 replies; 91+ messages in thread
From: Richard Henderson @ 2011-09-16 18:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Richard Guenther, Artem Shinkarov, gcc-patches, Joseph S. Myers,
	Chris Lattner

On 08/29/2011 04:41 AM, Paolo Bonzini wrote:
> The definition in OpenCL makes zero sense to me.  For byte operands
> it is custom-tailored after the SSE PMOVMSKB instruction, but there
> is no PMOVMSKW/PMOVMSKD instruction so you would need very slow bit
> shift operations before PMOVMSK.  On the other hand, bit selection is
> for example in Altivec.

Not PMOVMSKB, but the sse4.1 PBLENDVB.

With that, we don't need funny shift operations for wider integer
types, but only *because* the comparison produces -1, which means
that the MSB of each byte is in fact set.

Which means that the Perfect wording probably doesn't want to be
specific to bit selection, but include bit selection (aka and-andn-or)
as a valid implementation.

r~

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-07 15:08                                                         ` Joseph S. Myers
@ 2011-09-26 14:56                                                           ` Richard Guenther
  2011-09-26 16:01                                                             ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-09-26 14:56 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Artem Shinkarov, Uros Bizjak, Richard Henderson, gcc-patches

On Wed, Sep 7, 2011 at 5:06 PM, Joseph S. Myers <joseph@codesourcery.com> wrote:
> This looks like it has the same issue with maybe needing to use
> TYPE_MAIN_VARIANT in type comparisons as the shuffle patch.

I don't think so, we move qualifiers to the vector type from the element type
in make_vector_type and the tests only look at the component type.

I am re-testing the patch currently and will commit it if that succeeds.

Thanks,
Richard.

> --
> Joseph S. Myers
> joseph@codesourcery.com
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-26 14:56                                                           ` Richard Guenther
@ 2011-09-26 16:01                                                             ` Richard Guenther
  2011-09-28 14:53                                                               ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-09-26 16:01 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Artem Shinkarov, Uros Bizjak, Richard Henderson, gcc-patches

On Mon, Sep 26, 2011 at 4:25 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Sep 7, 2011 at 5:06 PM, Joseph S. Myers <joseph@codesourcery.com> wrote:
>> This looks like it has the same issue with maybe needing to use
>> TYPE_MAIN_VARIANT in type comparisons as the shuffle patch.
>
> I don't think so, we move qualifiers to the vector type from the element type
> in make_vector_type and the tests only look at the component type.
>
> I am re-testing the patch currently and will commit it if that succeeds.

Unfortunately gcc.c-torture/execute/vector-compare-1.c fails with -m32
for

    vector (2, double) d0;
    vector (2, double) d1;
    vector (2, long) idres;

    d0 = (vector (2, double)){(double)argc,  10.};
    d1 = (vector (2, double)){0., (double)-23};
    idres = (d0 > d1);

as appearantly the type we chose to assign to (d0 > d1) is different
from that of idres:

/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
error: incompatible types when assigning to type '__vector(2) long
int' from type '__vector(2) long long int'^M

Adjusting it to vector (2, long long) otoh yields, for -m64:

/space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
error: incompatible types when assigning to type '__vector(2) long
long int' from type '__vector(2) long int'

But those two types are at least compatible from their modes.  Joseph,
should we accept mode-compatible types in assignments or maybe
transparently convert them?

Thanks,
Richard.

> Thanks,
> Richard.
>
>> --
>> Joseph S. Myers
>> joseph@codesourcery.com
>>
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-26 16:01                                                             ` Richard Guenther
@ 2011-09-28 14:53                                                               ` Richard Guenther
  2011-09-29 11:05                                                                 ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-09-28 14:53 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Artem Shinkarov, Uros Bizjak, Richard Henderson, gcc-patches

On Mon, Sep 26, 2011 at 5:43 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Sep 26, 2011 at 4:25 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Wed, Sep 7, 2011 at 5:06 PM, Joseph S. Myers <joseph@codesourcery.com> wrote:
>>> This looks like it has the same issue with maybe needing to use
>>> TYPE_MAIN_VARIANT in type comparisons as the shuffle patch.
>>
>> I don't think so, we move qualifiers to the vector type from the element type
>> in make_vector_type and the tests only look at the component type.
>>
>> I am re-testing the patch currently and will commit it if that succeeds.
>
> Unfortunately gcc.c-torture/execute/vector-compare-1.c fails with -m32
> for
>
>    vector (2, double) d0;
>    vector (2, double) d1;
>    vector (2, long) idres;
>
>    d0 = (vector (2, double)){(double)argc,  10.};
>    d1 = (vector (2, double)){0., (double)-23};
>    idres = (d0 > d1);
>
> as appearantly the type we chose to assign to (d0 > d1) is different
> from that of idres:
>
> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
> error: incompatible types when assigning to type '__vector(2) long
> int' from type '__vector(2) long long int'^M
>
> Adjusting it to vector (2, long long) otoh yields, for -m64:
>
> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
> error: incompatible types when assigning to type '__vector(2) long
> long int' from type '__vector(2) long int'
>
> But those two types are at least compatible from their modes.  Joseph,
> should we accept mode-compatible types in assignments or maybe
> transparently convert them?

Looks like we have a more suitable solution for these automatically
generated vector types - mark them with TYPE_VECTOR_OPAQUE.

I'm testing the following incremental patch.

Richard.

Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c.orig 2011-09-28 16:22:10.000000000 +0200
+++ gcc/c-typeck.c      2011-09-28 16:18:39.000000000 +0200
@@ -9928,8 +9928,10 @@ build_binary_op (location_t location, en
             }

           /* Always construct signed integer vector type.  */
-          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
(type0)), 0);
-          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          intt = c_common_type_for_size (GET_MODE_BITSIZE
+                                          (TYPE_MODE (TREE_TYPE (type0))), 0);
+          result_type = build_opaque_vector_type (intt,
+                                                 TYPE_VECTOR_SUBPARTS (type0));
           converted = 1;
           break;
         }
@@ -10063,8 +10065,10 @@ build_binary_op (location_t location, en
             }

           /* Always construct signed integer vector type.  */
-          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
(type0)), 0);
-          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          intt = c_common_type_for_size (GET_MODE_BITSIZE
+                                          (TYPE_MODE (TREE_TYPE (type0))), 0);
+          result_type = build_opaque_vector_type (intt,
+                                                 TYPE_VECTOR_SUBPARTS (type0));
           converted = 1;
           break;
         }

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-28 14:53                                                               ` Richard Guenther
@ 2011-09-29 11:05                                                                 ` Richard Guenther
  2011-09-29 14:01                                                                   ` Richard Guenther
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-09-29 11:05 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Artem Shinkarov, Uros Bizjak, Richard Henderson, gcc-patches

On Wed, Sep 28, 2011 at 4:23 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Mon, Sep 26, 2011 at 5:43 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Sep 26, 2011 at 4:25 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Wed, Sep 7, 2011 at 5:06 PM, Joseph S. Myers <joseph@codesourcery.com> wrote:
>>>> This looks like it has the same issue with maybe needing to use
>>>> TYPE_MAIN_VARIANT in type comparisons as the shuffle patch.
>>>
>>> I don't think so, we move qualifiers to the vector type from the element type
>>> in make_vector_type and the tests only look at the component type.
>>>
>>> I am re-testing the patch currently and will commit it if that succeeds.
>>
>> Unfortunately gcc.c-torture/execute/vector-compare-1.c fails with -m32
>> for
>>
>>    vector (2, double) d0;
>>    vector (2, double) d1;
>>    vector (2, long) idres;
>>
>>    d0 = (vector (2, double)){(double)argc,  10.};
>>    d1 = (vector (2, double)){0., (double)-23};
>>    idres = (d0 > d1);
>>
>> as appearantly the type we chose to assign to (d0 > d1) is different
>> from that of idres:
>>
>> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
>> error: incompatible types when assigning to type '__vector(2) long
>> int' from type '__vector(2) long long int'^M
>>
>> Adjusting it to vector (2, long long) otoh yields, for -m64:
>>
>> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
>> error: incompatible types when assigning to type '__vector(2) long
>> long int' from type '__vector(2) long int'
>>
>> But those two types are at least compatible from their modes.  Joseph,
>> should we accept mode-compatible types in assignments or maybe
>> transparently convert them?
>
> Looks like we have a more suitable solution for these automatically
> generated vector types - mark them with TYPE_VECTOR_OPAQUE.
>
> I'm testing the following incremental patch.
>
> Richard.
>
> Index: gcc/c-typeck.c
> ===================================================================
> --- gcc/c-typeck.c.orig 2011-09-28 16:22:10.000000000 +0200
> +++ gcc/c-typeck.c      2011-09-28 16:18:39.000000000 +0200
> @@ -9928,8 +9928,10 @@ build_binary_op (location_t location, en
>             }
>
>           /* Always construct signed integer vector type.  */
> -          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
> (type0)), 0);
> -          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
> +          intt = c_common_type_for_size (GET_MODE_BITSIZE
> +                                          (TYPE_MODE (TREE_TYPE (type0))), 0);
> +          result_type = build_opaque_vector_type (intt,
> +                                                 TYPE_VECTOR_SUBPARTS (type0));
>           converted = 1;
>           break;
>         }
> @@ -10063,8 +10065,10 @@ build_binary_op (location_t location, en
>             }
>
>           /* Always construct signed integer vector type.  */
> -          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
> (type0)), 0);
> -          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
> +          intt = c_common_type_for_size (GET_MODE_BITSIZE
> +                                          (TYPE_MODE (TREE_TYPE (type0))), 0);
> +          result_type = build_opaque_vector_type (intt,
> +                                                 TYPE_VECTOR_SUBPARTS (type0));
>           converted = 1;
>           break;
>         }

That doesn't seem to work either.  Because we treat the opaque and
non-opaque variants of vector<int> as different (the opaque type isn't
a variant type of the non-opaque one - something suspicious anyway).

I'm going to try to apply some surgery on how we build opaque variants
and then re-visit the above again.

Richard.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-29 11:05                                                                 ` Richard Guenther
@ 2011-09-29 14:01                                                                   ` Richard Guenther
  2011-09-30 11:44                                                                     ` Matthew Gretton-Dann
  0 siblings, 1 reply; 91+ messages in thread
From: Richard Guenther @ 2011-09-29 14:01 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Artem Shinkarov, Uros Bizjak, Richard Henderson, gcc-patches

On Thu, Sep 29, 2011 at 12:00 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Sep 28, 2011 at 4:23 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Mon, Sep 26, 2011 at 5:43 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Sep 26, 2011 at 4:25 PM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Wed, Sep 7, 2011 at 5:06 PM, Joseph S. Myers <joseph@codesourcery.com> wrote:
>>>>> This looks like it has the same issue with maybe needing to use
>>>>> TYPE_MAIN_VARIANT in type comparisons as the shuffle patch.
>>>>
>>>> I don't think so, we move qualifiers to the vector type from the element type
>>>> in make_vector_type and the tests only look at the component type.
>>>>
>>>> I am re-testing the patch currently and will commit it if that succeeds.
>>>
>>> Unfortunately gcc.c-torture/execute/vector-compare-1.c fails with -m32
>>> for
>>>
>>>    vector (2, double) d0;
>>>    vector (2, double) d1;
>>>    vector (2, long) idres;
>>>
>>>    d0 = (vector (2, double)){(double)argc,  10.};
>>>    d1 = (vector (2, double)){0., (double)-23};
>>>    idres = (d0 > d1);
>>>
>>> as appearantly the type we chose to assign to (d0 > d1) is different
>>> from that of idres:
>>>
>>> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
>>> error: incompatible types when assigning to type '__vector(2) long
>>> int' from type '__vector(2) long long int'^M
>>>
>>> Adjusting it to vector (2, long long) otoh yields, for -m64:
>>>
>>> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
>>> error: incompatible types when assigning to type '__vector(2) long
>>> long int' from type '__vector(2) long int'
>>>
>>> But those two types are at least compatible from their modes.  Joseph,
>>> should we accept mode-compatible types in assignments or maybe
>>> transparently convert them?
>>
>> Looks like we have a more suitable solution for these automatically
>> generated vector types - mark them with TYPE_VECTOR_OPAQUE.
>>
>> I'm testing the following incremental patch.
>>
>> Richard.
>>
>> Index: gcc/c-typeck.c
>> ===================================================================
>> --- gcc/c-typeck.c.orig 2011-09-28 16:22:10.000000000 +0200
>> +++ gcc/c-typeck.c      2011-09-28 16:18:39.000000000 +0200
>> @@ -9928,8 +9928,10 @@ build_binary_op (location_t location, en
>>             }
>>
>>           /* Always construct signed integer vector type.  */
>> -          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
>> (type0)), 0);
>> -          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
>> +          intt = c_common_type_for_size (GET_MODE_BITSIZE
>> +                                          (TYPE_MODE (TREE_TYPE (type0))), 0);
>> +          result_type = build_opaque_vector_type (intt,
>> +                                                 TYPE_VECTOR_SUBPARTS (type0));
>>           converted = 1;
>>           break;
>>         }
>> @@ -10063,8 +10065,10 @@ build_binary_op (location_t location, en
>>             }
>>
>>           /* Always construct signed integer vector type.  */
>> -          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
>> (type0)), 0);
>> -          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
>> +          intt = c_common_type_for_size (GET_MODE_BITSIZE
>> +                                          (TYPE_MODE (TREE_TYPE (type0))), 0);
>> +          result_type = build_opaque_vector_type (intt,
>> +                                                 TYPE_VECTOR_SUBPARTS (type0));
>>           converted = 1;
>>           break;
>>         }
>
> That doesn't seem to work either.  Because we treat the opaque and
> non-opaque variants of vector<int> as different (the opaque type isn't
> a variant type of the non-opaque one - something suspicious anyway).
>
> I'm going to try to apply some surgery on how we build opaque variants
> and then re-visit the above again.

Bootstrapped and tested on x86_64-unknown-linux-gnu and installed.

Richard.

> Richard.
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-29 14:01                                                                   ` Richard Guenther
@ 2011-09-30 11:44                                                                     ` Matthew Gretton-Dann
  0 siblings, 0 replies; 91+ messages in thread
From: Matthew Gretton-Dann @ 2011-09-30 11:44 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Joseph S. Myers, Artem Shinkarov, Uros Bizjak, Richard Henderson,
	gcc-patches

On 29/09/11 12:27, Richard Guenther wrote:
> On Thu, Sep 29, 2011 at 12:00 PM, Richard Guenther
> <richard.guenther@gmail.com>  wrote:
>> On Wed, Sep 28, 2011 at 4:23 PM, Richard Guenther
>> <richard.guenther@gmail.com>  wrote:
>>> On Mon, Sep 26, 2011 at 5:43 PM, Richard Guenther
>>> <richard.guenther@gmail.com>  wrote:
>>>> On Mon, Sep 26, 2011 at 4:25 PM, Richard Guenther
>>>> <richard.guenther@gmail.com>  wrote:
>>>>> On Wed, Sep 7, 2011 at 5:06 PM, Joseph S. Myers<joseph@codesourcery.com>  wrote:
>>>>>> This looks like it has the same issue with maybe needing to use
>>>>>> TYPE_MAIN_VARIANT in type comparisons as the shuffle patch.
>>>>>
>>>>> I don't think so, we move qualifiers to the vector type from the element type
>>>>> in make_vector_type and the tests only look at the component type.
>>>>>
>>>>> I am re-testing the patch currently and will commit it if that succeeds.
>>>>
>>>> Unfortunately gcc.c-torture/execute/vector-compare-1.c fails with -m32
>>>> for
>>>>
>>>>     vector (2, double) d0;
>>>>     vector (2, double) d1;
>>>>     vector (2, long) idres;
>>>>
>>>>     d0 = (vector (2, double)){(double)argc,  10.};
>>>>     d1 = (vector (2, double)){0., (double)-23};
>>>>     idres = (d0>  d1);
>>>>
>>>> as appearantly the type we chose to assign to (d0>  d1) is different
>>>> from that of idres:
>>>>
>>>> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
>>>> error: incompatible types when assigning to type '__vector(2) long
>>>> int' from type '__vector(2) long long int'^M
>>>>
>>>> Adjusting it to vector (2, long long) otoh yields, for -m64:
>>>>
>>>> /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
>>>> error: incompatible types when assigning to type '__vector(2) long
>>>> long int' from type '__vector(2) long int'
>>>>
>>>> But those two types are at least compatible from their modes.  Joseph,
>>>> should we accept mode-compatible types in assignments or maybe
>>>> transparently convert them?
>>>
>>> Looks like we have a more suitable solution for these automatically
>>> generated vector types - mark them with TYPE_VECTOR_OPAQUE.
>>>
>>> I'm testing the following incremental patch.
>>>
>>> Richard.
>>>
>>> Index: gcc/c-typeck.c
>>> ===================================================================
>>> --- gcc/c-typeck.c.orig 2011-09-28 16:22:10.000000000 +0200
>>> +++ gcc/c-typeck.c      2011-09-28 16:18:39.000000000 +0200
>>> @@ -9928,8 +9928,10 @@ build_binary_op (location_t location, en
>>>              }
>>>
>>>            /* Always construct signed integer vector type.  */
>>> -          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
>>> (type0)), 0);
>>> -          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
>>> +          intt = c_common_type_for_size (GET_MODE_BITSIZE
>>> +                                          (TYPE_MODE (TREE_TYPE (type0))), 0);
>>> +          result_type = build_opaque_vector_type (intt,
>>> +                                                 TYPE_VECTOR_SUBPARTS (type0));
>>>            converted = 1;
>>>            break;
>>>          }
>>> @@ -10063,8 +10065,10 @@ build_binary_op (location_t location, en
>>>              }
>>>
>>>            /* Always construct signed integer vector type.  */
>>> -          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
>>> (type0)), 0);
>>> -          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
>>> +          intt = c_common_type_for_size (GET_MODE_BITSIZE
>>> +                                          (TYPE_MODE (TREE_TYPE (type0))), 0);
>>> +          result_type = build_opaque_vector_type (intt,
>>> +                                                 TYPE_VECTOR_SUBPARTS (type0));
>>>            converted = 1;
>>>            break;
>>>          }
>>
>> That doesn't seem to work either.  Because we treat the opaque and
>> non-opaque variants of vector<int>  as different (the opaque type isn't
>> a variant type of the non-opaque one - something suspicious anyway).
>>
>> I'm going to try to apply some surgery on how we build opaque variants
>> and then re-visit the above again.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu and installed.
>
> Richard.
>
>> Richard.
>>
>

I'm still getting errors with latest trunk (r179378) for arm-none-eabi. 
  Please see http://gcc.gnu.org/PR50576.

Thanks,

Matt


-- 
Matthew Gretton-Dann
Principal Engineer, PD Software - Tools, ARM Ltd

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-06 14:51                                                   ` Artem Shinkarov
  2011-09-06 14:56                                                     ` Richard Guenther
@ 2011-09-30 15:21                                                     ` Georg-Johann Lay
  2011-09-30 15:29                                                       ` Artem Shinkarov
  1 sibling, 1 reply; 91+ messages in thread
From: Georg-Johann Lay @ 2011-09-30 15:21 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Guenther, Uros Bizjak, Richard Henderson, gcc-patches,
	Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 2055 bytes --]

Artem Shinkarov schrieb:
> Here is a new version of the patch which considers the changes from
> 2011-09-02  Richard Guenther
> 
> 
> ChangeLog
> 
> 20011-09-06 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
> 
>        gcc/
>        * fold-const.c (constant_boolean_node): Adjust the meaning
>        of boolean for vector types: true = {-1,..}, false = {0,..}.
>        (fold_unary_loc): Avoid conversion of vector comparison to
>        boolean type.
>        * expr.c (expand_expr_real_2): Expand vector comparison by
>        building an appropriate VEC_COND_EXPR.
>        * c-typeck.c (build_binary_op): Typecheck vector comparisons.
>        (c_objc_common_truthvalue_conversion): Adjust.
>        * tree-vect-generic.c (do_compare): Helper function.
>        (expand_vector_comparison): Check if hardware supports
>        vector comparison of the given type or expand vector
>        piecewise.
>        (expand_vector_operation): Treat comparison as binary
>        operation of vector type.
>        (expand_vector_operations_1): Adjust.
>        * tree-cfg.c (verify_gimple_comparison): Adjust.
> 
>        gcc/config/i386
>        * i386.c (ix86_expand_sse_movcc): Consider a case when
>        vcond operators are {-1,..} and {0,..}.
> 
>        gcc/doc
>        * extend.texi: Adjust.
> 
>        gcc/testsuite
>        * gcc.c-torture/execute/vector-compare-1.c: New test.
>        * gcc.c-torture/execute/vector-compare-2.c: New test.
>        * gcc.dg/vector-compare-1.c: New test.
>        * gcc.dg/vector-compare-2.c: New test.
> 
> bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> 
> Thanks,
> Artem.

Hi Artem,

the new test case gcc.c-torture/execute/vector-compare-1.c causes bunch of
FAILS in regression tests for avr-unknown-none (see attachment).

The target has

2 = sizeof (short)
2 = sizeof (int)
4 = sizeof (long int)
8 = sizeof (long long int)

Could you fix that? I.e. parametrize sizeof(int) out or skip the test by means of

/* { dg-require-effective-target int32plus } */

or similar.

Thanks, Johann








[-- Attachment #2: out.txt --]
[-- Type: text/plain, Size: 17619 bytes --]

./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
compiler exited with status 1
output is:
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'

FAIL: gcc.c-torture/execute/vector-compare-1.c compilation,  -O3 -fomit-frame-pointer -funroll-loops 
UNRESOLVED: gcc.c-torture/execute/vector-compare-1.c execution,  -O3 -fomit-frame-pointer -funroll-loops 
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
compiler exited with status 1
output is:
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'

FAIL: gcc.c-torture/execute/vector-compare-1.c compilation,  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions 
UNRESOLVED: gcc.c-torture/execute/vector-compare-1.c execution,  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions 
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
compiler exited with status 1
output is:
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'

FAIL: gcc.c-torture/execute/vector-compare-1.c compilation,  -O3 -g 
UNRESOLVED: gcc.c-torture/execute/vector-compare-1.c execution,  -O3 -g 
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
compiler exited with status 1
output is:
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-30 15:21                                                     ` Georg-Johann Lay
@ 2011-09-30 15:29                                                       ` Artem Shinkarov
  2011-09-30 16:21                                                         ` Georg-Johann Lay
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-09-30 15:29 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: Richard Guenther, Uros Bizjak, Richard Henderson, gcc-patches,
	Joseph S. Myers

On Fri, Sep 30, 2011 at 4:01 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
> Artem Shinkarov schrieb:
>> Here is a new version of the patch which considers the changes from
>> 2011-09-02  Richard Guenther
>>
>>
>> ChangeLog
>>
>> 20011-09-06 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>
>>        gcc/
>>        * fold-const.c (constant_boolean_node): Adjust the meaning
>>        of boolean for vector types: true = {-1,..}, false = {0,..}.
>>        (fold_unary_loc): Avoid conversion of vector comparison to
>>        boolean type.
>>        * expr.c (expand_expr_real_2): Expand vector comparison by
>>        building an appropriate VEC_COND_EXPR.
>>        * c-typeck.c (build_binary_op): Typecheck vector comparisons.
>>        (c_objc_common_truthvalue_conversion): Adjust.
>>        * tree-vect-generic.c (do_compare): Helper function.
>>        (expand_vector_comparison): Check if hardware supports
>>        vector comparison of the given type or expand vector
>>        piecewise.
>>        (expand_vector_operation): Treat comparison as binary
>>        operation of vector type.
>>        (expand_vector_operations_1): Adjust.
>>        * tree-cfg.c (verify_gimple_comparison): Adjust.
>>
>>        gcc/config/i386
>>        * i386.c (ix86_expand_sse_movcc): Consider a case when
>>        vcond operators are {-1,..} and {0,..}.
>>
>>        gcc/doc
>>        * extend.texi: Adjust.
>>
>>        gcc/testsuite
>>        * gcc.c-torture/execute/vector-compare-1.c: New test.
>>        * gcc.c-torture/execute/vector-compare-2.c: New test.
>>        * gcc.dg/vector-compare-1.c: New test.
>>        * gcc.dg/vector-compare-2.c: New test.
>>
>> bootstrapped and tested on x86_64-unknown-linux-gnu.
>>
>>
>> Thanks,
>> Artem.
>
> Hi Artem,
>
> the new test case gcc.c-torture/execute/vector-compare-1.c causes bunch of
> FAILS in regression tests for avr-unknown-none (see attachment).
>
> The target has
>
> 2 = sizeof (short)
> 2 = sizeof (int)
> 4 = sizeof (long int)
> 8 = sizeof (long long int)
>
> Could you fix that? I.e. parametrize sizeof(int) out or skip the test by means of
>
> /* { dg-require-effective-target int32plus } */
>
> or similar.
>
> Thanks, Johann
>
>
>
>
>
>
>
>
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> compiler exited with status 1
> output is:
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
>
> FAIL: gcc.c-torture/execute/vector-compare-1.c compilation,  -O3 -fomit-frame-pointer -funroll-loops
> UNRESOLVED: gcc.c-torture/execute/vector-compare-1.c execution,  -O3 -fomit-frame-pointer -funroll-loops
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> compiler exited with status 1
> output is:
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
>
> FAIL: gcc.c-torture/execute/vector-compare-1.c compilation,  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
> UNRESOLVED: gcc.c-torture/execute/vector-compare-1.c execution,  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> compiler exited with status 1
> output is:
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
>
> FAIL: gcc.c-torture/execute/vector-compare-1.c compilation,  -O3 -g
> UNRESOLVED: gcc.c-torture/execute/vector-compare-1.c execution,  -O3 -g
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> compiler exited with status 1
> output is:
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c: In function 'main':
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:109:5: error: incompatible types when assigning to type '__vector(4) int' from type '__vector(4) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
> ./gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int'
>
>

Hi

The problem actually happens when we compare float vector with float
vector, it is assumed that we should get int vector as a result, but
it turns out that we are getting long int.

The same with double, we assume that sizeof (double) == sizeof (long
long). But as it seems double has the same size as float.

Hm, I can put conditional of sort:
if (sizeof (doulbe) == sizeof (long long)) and others. Or may be there
is more elegant way of solving this?

I can fix it, but keep in mind that I don't have a permission to
commit to the trunk.


Thanks,
Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-30 15:29                                                       ` Artem Shinkarov
@ 2011-09-30 16:21                                                         ` Georg-Johann Lay
  2011-09-30 16:30                                                           ` Jakub Jelinek
  0 siblings, 1 reply; 91+ messages in thread
From: Georg-Johann Lay @ 2011-09-30 16:21 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Richard Guenther, Uros Bizjak, Richard Henderson, gcc-patches,
	Joseph S. Myers

Artem Shinkarov schrieb:
> On Fri, Sep 30, 2011 at 4:01 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
>> Artem Shinkarov schrieb:
>>> Here is a new version of the patch which considers the changes from
>>> 2011-09-02  Richard Guenther
>>>
>>>
>>> ChangeLog
>>>
>>> 20011-09-06 Artjoms Sinkarovs <artyom.shinkaroff@gmail.com>
>>>
>>>        gcc/
>>>        * fold-const.c (constant_boolean_node): Adjust the meaning
>>>        of boolean for vector types: true = {-1,..}, false = {0,..}.
>>>        (fold_unary_loc): Avoid conversion of vector comparison to
>>>        boolean type.
>>>        * expr.c (expand_expr_real_2): Expand vector comparison by
>>>        building an appropriate VEC_COND_EXPR.
>>>        * c-typeck.c (build_binary_op): Typecheck vector comparisons.
>>>        (c_objc_common_truthvalue_conversion): Adjust.
>>>        * tree-vect-generic.c (do_compare): Helper function.
>>>        (expand_vector_comparison): Check if hardware supports
>>>        vector comparison of the given type or expand vector
>>>        piecewise.
>>>        (expand_vector_operation): Treat comparison as binary
>>>        operation of vector type.
>>>        (expand_vector_operations_1): Adjust.
>>>        * tree-cfg.c (verify_gimple_comparison): Adjust.
>>>
>>>        gcc/config/i386
>>>        * i386.c (ix86_expand_sse_movcc): Consider a case when
>>>        vcond operators are {-1,..} and {0,..}.
>>>
>>>        gcc/doc
>>>        * extend.texi: Adjust.
>>>
>>>        gcc/testsuite
>>>        * gcc.c-torture/execute/vector-compare-1.c: New test.
>>>        * gcc.c-torture/execute/vector-compare-2.c: New test.
>>>        * gcc.dg/vector-compare-1.c: New test.
>>>        * gcc.dg/vector-compare-2.c: New test.
>>>
>>> bootstrapped and tested on x86_64-unknown-linux-gnu.
>>>
>>>
>>> Thanks,
>>> Artem.
>> Hi Artem,
>>
>> the new test case gcc.c-torture/execute/vector-compare-1.c causes bunch of
>> FAILS in regression tests for avr-unknown-none (see attachment).
>>
>> The target has
>>
>> 2 = sizeof (short)
>> 2 = sizeof (int)
>> 4 = sizeof (long int)
>> 8 = sizeof (long long int)
>>
>> Could you fix that? I.e. parametrize sizeof(int) out or skip the test by means of
>>
>> /* { dg-require-effective-target int32plus } */
>>
>> or similar.
>>
>> Thanks, Johann
>>
>> [...]
>>
> Hi
> 
> The problem actually happens when we compare float vector with float
> vector, it is assumed that we should get int vector as a result, but
> it turns out that we are getting long int.
> 
> The same with double, we assume that sizeof (double) == sizeof (long
> long). But as it seems double has the same size as float.

Yes.

sizeof(double) = sizeof(float) = 4

> Hm, I can put conditional of sort:
> if (sizeof (doulbe) == sizeof (long long)) and others. Or may be there
> is more elegant way of solving this?

That's too late because this won't prevent the compiler from error.
The error already happens at compile time, not at run time.

> I can fix it, but keep in mind that I don't have a permission to
> commit to the trunk.

You could browse ./testsuite/lib/target-supports.exp and try to find some gate
functions that fit the test case's requirement like
check_effective_target_large_double, check_effective_target_double64,
check_effective_target_x32 or a combination of them.

Johann

> Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-30 16:21                                                         ` Georg-Johann Lay
@ 2011-09-30 16:30                                                           ` Jakub Jelinek
  2011-09-30 16:45                                                             ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Jakub Jelinek @ 2011-09-30 16:30 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: Artem Shinkarov, Richard Guenther, Uros Bizjak,
	Richard Henderson, gcc-patches, Joseph S. Myers

On Fri, Sep 30, 2011 at 05:36:47PM +0200, Georg-Johann Lay wrote:
> >> The target has
> >>
> >> 2 = sizeof (short)
> >> 2 = sizeof (int)
> >> 4 = sizeof (long int)
> >> 8 = sizeof (long long int)
> >>
> >> Could you fix that? I.e. parametrize sizeof(int) out or skip the test by means of
> >>
> >> /* { dg-require-effective-target int32plus } */
> >>
> >> or similar.
> >>
> >> Thanks, Johann
> >>
> >> [...]
> >>
> > The problem actually happens when we compare float vector with float
> > vector, it is assumed that we should get int vector as a result, but
> > it turns out that we are getting long int.
> > 
> > The same with double, we assume that sizeof (double) == sizeof (long
> > long). But as it seems double has the same size as float.
> 
> Yes.
> 
> sizeof(double) = sizeof(float) = 4
> 
> > Hm, I can put conditional of sort:
> > if (sizeof (doulbe) == sizeof (long long)) and others. Or may be there
> > is more elegant way of solving this?
> 
> That's too late because this won't prevent the compiler from error.
> The error already happens at compile time, not at run time.

Isn't it possible to do something like:
     vector (4, float) f0;
     vector (4, float) f1;
-    vector (4, int) ifres;
+    vector (4, __typeof (f0 > f1)) ifres;

     f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
     f1 = (vector (4, float)){0., 3., 2., (float)-23};    
     test (4, f0, f1, ifres, "%f");
    
 /* Double comparison.  */
     vector (2, double) d0;
     vector (2, double) d1;
-    vector (2, long long) idres;
+    vector (2, __typeof (d0 > d1)) idres;

     d0 = (vector (2, double)){(double)argc,  10.};
     d1 = (vector (2, double)){0., (double)-23};    
     test (2, d0, d1, idres, "%f");

	Jakub

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-30 16:30                                                           ` Jakub Jelinek
@ 2011-09-30 16:45                                                             ` Artem Shinkarov
  2011-09-30 16:51                                                               ` Jakub Jelinek
  0 siblings, 1 reply; 91+ messages in thread
From: Artem Shinkarov @ 2011-09-30 16:45 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Georg-Johann Lay, Richard Guenther, Uros Bizjak,
	Richard Henderson, gcc-patches, Joseph S. Myers

On Fri, Sep 30, 2011 at 4:43 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Sep 30, 2011 at 05:36:47PM +0200, Georg-Johann Lay wrote:
>> >> The target has
>> >>
>> >> 2 = sizeof (short)
>> >> 2 = sizeof (int)
>> >> 4 = sizeof (long int)
>> >> 8 = sizeof (long long int)
>> >>
>> >> Could you fix that? I.e. parametrize sizeof(int) out or skip the test by means of
>> >>
>> >> /* { dg-require-effective-target int32plus } */
>> >>
>> >> or similar.
>> >>
>> >> Thanks, Johann
>> >>
>> >> [...]
>> >>
>> > The problem actually happens when we compare float vector with float
>> > vector, it is assumed that we should get int vector as a result, but
>> > it turns out that we are getting long int.
>> >
>> > The same with double, we assume that sizeof (double) == sizeof (long
>> > long). But as it seems double has the same size as float.
>>
>> Yes.
>>
>> sizeof(double) = sizeof(float) = 4
>>
>> > Hm, I can put conditional of sort:
>> > if (sizeof (doulbe) == sizeof (long long)) and others. Or may be there
>> > is more elegant way of solving this?
>>
>> That's too late because this won't prevent the compiler from error.
>> The error already happens at compile time, not at run time.
>
> Isn't it possible to do something like:
>     vector (4, float) f0;
>     vector (4, float) f1;
> -    vector (4, int) ifres;
> +    vector (4, __typeof (f0 > f1)) ifres;
>
>     f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
>     f1 = (vector (4, float)){0., 3., 2., (float)-23};
>     test (4, f0, f1, ifres, "%f");
>
>  /* Double comparison.  */
>     vector (2, double) d0;
>     vector (2, double) d1;
> -    vector (2, long long) idres;
> +    vector (2, __typeof (d0 > d1)) idres;
>
>     d0 = (vector (2, double)){(double)argc,  10.};
>     d1 = (vector (2, double)){0., (double)-23};
>     test (2, d0, d1, idres, "%f");
>
>        Jakub
>

Most likely we can. The question is what do we really want to check
with this test. My intention was to check that a programmer can
statically get correspondence of the types, in a sense that sizeof
(float) == sizeof (int) and sizeof (double) == sizeof (long long). As
it seems my original assumption does not hold. Before using __typeof,
I would try to make sure that there is no other way to determine these
correspondences.

Artem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-30 16:45                                                             ` Artem Shinkarov
@ 2011-09-30 16:51                                                               ` Jakub Jelinek
  2011-09-30 17:01                                                                 ` Artem Shinkarov
  0 siblings, 1 reply; 91+ messages in thread
From: Jakub Jelinek @ 2011-09-30 16:51 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Georg-Johann Lay, Richard Guenther, Uros Bizjak,
	Richard Henderson, gcc-patches, Joseph S. Myers

On Fri, Sep 30, 2011 at 04:48:41PM +0100, Artem Shinkarov wrote:
> Most likely we can. The question is what do we really want to check
> with this test. My intention was to check that a programmer can
> statically get correspondence of the types, in a sense that sizeof
> (float) == sizeof (int) and sizeof (double) == sizeof (long long). As
> it seems my original assumption does not hold. Before using __typeof,
> I would try to make sure that there is no other way to determine these
> correspondences.

You can use preprocessor too, either just surround the whole test
with #if __SIZEOF_INT__ == __SIZEOF_FLOAT__ and similar,
or select the right type through preprocessor
#if __SIZEOF_INT__ == __SIZEOF_FLOAT__
#define FLOATCMPTYPE int
#elif __SIZEOF_LONG__ == __SIZEOF_FLOAT__
#define FLOATCMPTYPE long
#else
...
or __typeof, etc.

	Jakub

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-30 16:51                                                               ` Jakub Jelinek
@ 2011-09-30 17:01                                                                 ` Artem Shinkarov
  2011-09-30 19:05                                                                   ` Georg-Johann Lay
  2011-10-04  9:39                                                                   ` Georg-Johann Lay
  0 siblings, 2 replies; 91+ messages in thread
From: Artem Shinkarov @ 2011-09-30 17:01 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Georg-Johann Lay, Richard Guenther, Uros Bizjak,
	Richard Henderson, gcc-patches, Joseph S. Myers

[-- Attachment #1: Type: text/plain, Size: 1053 bytes --]

On Fri, Sep 30, 2011 at 4:54 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Sep 30, 2011 at 04:48:41PM +0100, Artem Shinkarov wrote:
>> Most likely we can. The question is what do we really want to check
>> with this test. My intention was to check that a programmer can
>> statically get correspondence of the types, in a sense that sizeof
>> (float) == sizeof (int) and sizeof (double) == sizeof (long long). As
>> it seems my original assumption does not hold. Before using __typeof,
>> I would try to make sure that there is no other way to determine these
>> correspondences.
>
> You can use preprocessor too, either just surround the whole test
> with #if __SIZEOF_INT__ == __SIZEOF_FLOAT__ and similar,
> or select the right type through preprocessor
> #if __SIZEOF_INT__ == __SIZEOF_FLOAT__
> #define FLOATCMPTYPE int
> #elif __SIZEOF_LONG__ == __SIZEOF_FLOAT__
> #define FLOATCMPTYPE long
> #else
> ...
> or __typeof, etc.
>
>        Jakub
>

Ok, here is a patch which uses __typeof. Passes on x86_64.

Artem.

[-- Attachment #2: comparison-patch.diff --]
[-- Type: text/plain, Size: 2362 bytes --]

Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 179378)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(working copy)
@@ -39,17 +39,17 @@ int main (int argc, char *argv[]) {
     int i;
 
     i0 = (vector (4, INT)){argc, 1,  2,  10};
-    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};
     test (4, i0, i1, ires, "%i");
 #undef INT
 
-#define INT unsigned int 
+#define INT unsigned int
     vector (4, int) ures;
     vector (4, INT) u0;
     vector (4, INT) u1;
 
     u0 = (vector (4, INT)){argc, 1,  2,  10};
-    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};
     test (4, u0, u1, ures, "%u");
 #undef INT
 
@@ -60,7 +60,7 @@ int main (int argc, char *argv[]) {
     vector (8, short) sres;
 
     s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
-    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
     test (8, s0, s1, sres, "%i");
 #undef SHORT
 
@@ -70,7 +70,7 @@ int main (int argc, char *argv[]) {
     vector (8, short) usres;
 
     us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
-    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};
     test (8, us0, us1, usres, "%u");
 #undef SHORT
 
@@ -102,19 +102,19 @@ int main (int argc, char *argv[]) {
 /* Float comparison.  */
     vector (4, float) f0;
     vector (4, float) f1;
-    vector (4, int) ifres;
+    __typeof (f0 == f1) ifres;
 
     f0 = (vector (4, float)){(float)argc, 1.,  2.,  10.};
-    f1 = (vector (4, float)){0., 3., 2., (float)-23};    
+    f1 = (vector (4, float)){0., 3., 2., (float)-23};
     test (4, f0, f1, ifres, "%f");
-    
+
 /* Double comparison.  */
     vector (2, double) d0;
     vector (2, double) d1;
-    vector (2, long long) idres;
+    __typeof (d0 == d1) idres;
 
     d0 = (vector (2, double)){(double)argc,  10.};
-    d1 = (vector (2, double)){0., (double)-23};    
+    d1 = (vector (2, double)){0., (double)-23};
     test (2, d0, d1, idres, "%f");
 
 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-30 17:01                                                                 ` Artem Shinkarov
@ 2011-09-30 19:05                                                                   ` Georg-Johann Lay
  2011-10-04  9:39                                                                   ` Georg-Johann Lay
  1 sibling, 0 replies; 91+ messages in thread
From: Georg-Johann Lay @ 2011-09-30 19:05 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Jakub Jelinek, Richard Guenther, Uros Bizjak, Richard Henderson,
	gcc-patches, Joseph S. Myers

Artem Shinkarov schrieb:
> On Fri, Sep 30, 2011 at 4:54 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> 
>>On Fri, Sep 30, 2011 at 04:48:41PM +0100, Artem Shinkarov wrote:
>>
>>>Most likely we can. The question is what do we really want to check
>>>with this test. My intention was to check that a programmer can
>>>statically get correspondence of the types, in a sense that sizeof
>>>(float) == sizeof (int) and sizeof (double) == sizeof (long long). As
>>>it seems my original assumption does not hold. Before using __typeof,
>>>I would try to make sure that there is no other way to determine these
>>>correspondences.
>>
>>You can use preprocessor too, either just surround the whole test
>>with #if __SIZEOF_INT__ == __SIZEOF_FLOAT__ and similar,
>>or select the right type through preprocessor
>>#if __SIZEOF_INT__ == __SIZEOF_FLOAT__
>>#define FLOATCMPTYPE int
>>#elif __SIZEOF_LONG__ == __SIZEOF_FLOAT__
>>#define FLOATCMPTYPE long
>>#else
>>...
>>or __typeof, etc.
>>
>>       Jakub
> 
> Ok, here is a patch which uses __typeof. Passes on x86_64.

Thanks, I will test on avr next week.

Johann

> 
> Artem.
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-09-30 17:01                                                                 ` Artem Shinkarov
  2011-09-30 19:05                                                                   ` Georg-Johann Lay
@ 2011-10-04  9:39                                                                   ` Georg-Johann Lay
  2011-10-04  9:55                                                                     ` Jakub Jelinek
  1 sibling, 1 reply; 91+ messages in thread
From: Georg-Johann Lay @ 2011-10-04  9:39 UTC (permalink / raw)
  To: Artem Shinkarov
  Cc: Jakub Jelinek, Richard Guenther, Uros Bizjak, Richard Henderson,
	gcc-patches, Joseph S. Myers

Artem Shinkarov schrieb:
> On Fri, Sep 30, 2011 at 4:54 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Fri, Sep 30, 2011 at 04:48:41PM +0100, Artem Shinkarov wrote:
>>> Most likely we can. The question is what do we really want to check
>>> with this test. My intention was to check that a programmer can
>>> statically get correspondence of the types, in a sense that sizeof
>>> (float) == sizeof (int) and sizeof (double) == sizeof (long long). As
>>> it seems my original assumption does not hold. Before using __typeof,
>>> I would try to make sure that there is no other way to determine these
>>> correspondences.
>> You can use preprocessor too, either just surround the whole test
>> with #if __SIZEOF_INT__ == __SIZEOF_FLOAT__ and similar,
>> or select the right type through preprocessor
>> #if __SIZEOF_INT__ == __SIZEOF_FLOAT__
>> #define FLOATCMPTYPE int
>> #elif __SIZEOF_LONG__ == __SIZEOF_FLOAT__
>> #define FLOATCMPTYPE long
>> #else
>> ...
>> or __typeof, etc.
>>
>>        Jakub
>>
> 
> Ok, here is a patch which uses __typeof. Passes on x86_64.
> 
> Artem.

The patch from
  http://gcc.gnu.org/ml/gcc-patches/2011-09/msg02060.html
  http://gcc.gnu.org/ml/gcc-patches/2011-09/txt00337.txt
works for me.

If it's ok from maintainer I can apply it for you.

Johann







^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-10-04  9:39                                                                   ` Georg-Johann Lay
@ 2011-10-04  9:55                                                                     ` Jakub Jelinek
  2011-10-04 10:05                                                                       ` Georg-Johann Lay
  0 siblings, 1 reply; 91+ messages in thread
From: Jakub Jelinek @ 2011-10-04  9:55 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: Artem Shinkarov, Jakub Jelinek, Richard Guenther, Uros Bizjak,
	Richard Henderson, gcc-patches, Joseph S. Myers

On Tue, Oct 04, 2011 at 11:32:37AM +0200, Georg-Johann Lay wrote:
> The patch from
>   http://gcc.gnu.org/ml/gcc-patches/2011-09/msg02060.html
>   http://gcc.gnu.org/ml/gcc-patches/2011-09/txt00337.txt
> works for me.
> 
> If it's ok from maintainer I can apply it for you.

It is fine with a suitable ChangeLog entry.

	Jakub

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Vector Comparison patch
  2011-10-04  9:55                                                                     ` Jakub Jelinek
@ 2011-10-04 10:05                                                                       ` Georg-Johann Lay
  0 siblings, 0 replies; 91+ messages in thread
From: Georg-Johann Lay @ 2011-10-04 10:05 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Artem Shinkarov, Richard Guenther, Uros Bizjak,
	Richard Henderson, gcc-patches, Joseph S. Myers

Jakub Jelinek schrieb:
> On Tue, Oct 04, 2011 at 11:32:37AM +0200, Georg-Johann Lay wrote:
>> The patch from
>>   http://gcc.gnu.org/ml/gcc-patches/2011-09/msg02060.html
>>   http://gcc.gnu.org/ml/gcc-patches/2011-09/txt00337.txt
>> works for me.
>>
>> If it's ok from maintainer I can apply it for you.
> 
> It is fine with a suitable ChangeLog entry.
> 
> 	Jakub

It's here:

http://gcc.gnu.org/viewcvs?view=revision&revision=179497

Johann

^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2011-10-04 10:04 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-12  7:04 Vector Comparison patch Artem Shinkarov
2011-08-15 15:25 ` Richard Guenther
2011-08-15 17:53   ` Artem Shinkarov
2011-08-16 16:39     ` Richard Guenther
2011-08-16 17:01       ` Artem Shinkarov
2011-08-16 21:48         ` Artem Shinkarov
2011-08-17 12:58           ` Richard Guenther
2011-08-17 15:27             ` Artem Shinkarov
2011-08-17 16:14               ` Richard Guenther
2011-08-17 17:07                 ` Artem Shinkarov
2011-08-17 21:18                   ` Artem Shinkarov
2011-08-18  1:22                     ` Joseph S. Myers
2011-08-18 11:37                       ` Artem Shinkarov
2011-08-18 14:20                         ` Joseph S. Myers
2011-08-18 10:21                     ` Richard Guenther
2011-08-18 11:24                       ` Artem Shinkarov
2011-08-18 15:05                         ` Artem Shinkarov
2011-08-18 15:19                       ` Richard Henderson
2011-08-19  8:17                         ` Artem Shinkarov
2011-08-19 15:38                           ` Richard Guenther
2011-08-19 16:28                             ` Artem Shinkarov
2011-08-20 10:14                               ` Richard Guenther
2011-08-22  7:32                                 ` Artem Shinkarov
2011-08-22 12:06                                   ` Richard Guenther
2011-08-22 13:56                                     ` Artem Shinkarov
2011-08-22 15:43                                       ` Richard Guenther
2011-08-22 15:54                                         ` Artem Shinkarov
2011-08-22 15:57                                           ` Richard Guenther
2011-08-22 16:02                                             ` Artem Shinkarov
2011-08-22 16:25                                               ` Richard Guenther
2011-08-22 17:16                                                 ` Artem Shinkarov
2011-08-22 21:07                                                   ` Richard Guenther
2011-08-22 21:53                                                     ` Artem Shinkarov
2011-08-22 22:39                                                       ` Richard Guenther
2011-08-22 23:13                                                         ` Artem Shinkarov
2011-08-23  9:53                                                           ` Richard Guenther
2011-08-23 10:12                                                             ` Artem Shinkarov
2011-08-23 10:45                                                               ` Richard Guenther
2011-08-23 11:08                                                                 ` Artem Shinkarov
2011-08-23 11:12                                                                   ` Richard Guenther
2011-08-23 11:23                                                                     ` Artem Shinkarov
2011-08-23 11:26                                                                       ` Richard Guenther
2011-08-23 11:41                                                                         ` Artem Shinkarov
2011-08-23 11:58                                                                           ` Artem Shinkarov
2011-08-23 12:06                                                                           ` Richard Guenther
2011-08-23 12:37                                                                             ` Artem Shinkarov
2011-08-25  9:22                                                                               ` Artem Shinkarov
2011-08-25  9:58                                                                                 ` Richard Guenther
2011-08-25 10:15                                                                                   ` Artem Shinkarov
2011-08-25 11:02                                                                                 ` Richard Guenther
2011-08-25 11:49                                                                                   ` Artem Shinkarov
2011-08-25 12:14                                                                                     ` Richard Guenther
2011-08-25 13:29                                                                                       ` Artem Shinkarov
2011-08-25 13:30                                                                                         ` Richard Guenther
2011-08-25 13:31                                                                                           ` Artem Shinkarov
2011-08-25 14:49                                                                                             ` Richard Guenther
2011-08-27 10:50                                                                                               ` Artem Shinkarov
2011-08-29 12:46                                                                                                 ` Richard Guenther
2011-08-22 20:46                                             ` Uros Bizjak
2011-08-22 20:58                                               ` Richard Guenther
2011-08-22 21:12                                               ` Artem Shinkarov
2011-08-29 12:54                                               ` Richard Guenther
2011-08-29 13:08                                                 ` Richard Guenther
2011-09-06 14:51                                                   ` Artem Shinkarov
2011-09-06 14:56                                                     ` Richard Guenther
2011-09-07 14:14                                                       ` Artem Shinkarov
2011-09-07 15:08                                                         ` Joseph S. Myers
2011-09-26 14:56                                                           ` Richard Guenther
2011-09-26 16:01                                                             ` Richard Guenther
2011-09-28 14:53                                                               ` Richard Guenther
2011-09-29 11:05                                                                 ` Richard Guenther
2011-09-29 14:01                                                                   ` Richard Guenther
2011-09-30 11:44                                                                     ` Matthew Gretton-Dann
2011-09-08 12:56                                                         ` Richard Guenther
2011-09-08 13:46                                                           ` Richard Guenther
2011-09-08 18:14                                                           ` Uros Bizjak
2011-09-30 15:21                                                     ` Georg-Johann Lay
2011-09-30 15:29                                                       ` Artem Shinkarov
2011-09-30 16:21                                                         ` Georg-Johann Lay
2011-09-30 16:30                                                           ` Jakub Jelinek
2011-09-30 16:45                                                             ` Artem Shinkarov
2011-09-30 16:51                                                               ` Jakub Jelinek
2011-09-30 17:01                                                                 ` Artem Shinkarov
2011-09-30 19:05                                                                   ` Georg-Johann Lay
2011-10-04  9:39                                                                   ` Georg-Johann Lay
2011-10-04  9:55                                                                     ` Jakub Jelinek
2011-10-04 10:05                                                                       ` Georg-Johann Lay
2011-08-29 12:54                       ` Paolo Bonzini
2011-09-16 18:08                         ` Richard Henderson
2011-08-17 12:49         ` Richard Guenther
2011-08-20 11:22           ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).