[RFC] [patch] Support vectorization of min/max location pattern

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC] [patch] Support vectorization of min/max location pattern
@ 2010-07-01  8:01 Ira Rosen
  2010-07-06  7:15 ` Ira Rosen
  0 siblings, 1 reply; 16+ messages in thread
From: Ira Rosen @ 2010-07-01  8:01 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7144 bytes --]


Hi,

This patch adds vectorization support of min/max location pattern:

  for (i = 0; i < N; i++)
    if (arr[i] < limit)
      {
        pos = i + 1;
        limit = arr[i];
      }

The recognized pattern is compound of two statements (and is called
compound pattern):

  # pos_22 = PHI <pos_1(4), 1(2)>
  # limit_24 = PHI <limit_4(4), 0(2)>
  ...
  pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22;
  limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24;

both statements should be reductions with cond_expr and have the same
condition part. The min/max statement is expected to be of the form "x op
y ? x : y" (where op can be >, <, >= or <=), and the location is expected
to be an induction.

To vectorize min/max location pattern we use a technique described in
"Multimedia vectorization of floating-point MIN/MAX reductions" by
A.J.C.Bik, X.Tian and M.B.Girkar,
http://portal.acm.org/citation.cfm?id=1145765.

Vectorized loop (maxloc, first index):
     vcx[0:vl-1:1] = | x |..| x |;  - vector of max values
     vck[0:vl-1:1] = | k |..| k |;  - vector of positions
     ind[0:vl-1:1] = |vl-1|..| 0 |;
     inc[0:vl-1:1] = | vl |..| vl |;
     for (i = 0; i < N; i += vl) {
       msk[0:vl-1:1] = (a[i:i+vl-1:1] > vcx[0:vl-1:1]);
       vck[0:vl-1:1] = (ind[0:vl-1:1] & msk[0:vl-1:1]) |
                       (vck[0:vl-1:1] & !msk[0:vl-1:1]);
       vcx[0:vl-1:1] = VMAX(vcx[0:vl-1:1], a[i:i+vl-1:1]);
       ind[0:vl-1:1] += inc[0:vl-1:1];
     }
     x = HMAX(vcx[0:vl-1:1]);       - scalar maximum extraction
     msk[0:vl-1:1] = (vcx[0:vl-1:1] == |x|..|x|);
     vck[0:vl-1:1] = (vck[0:vl-1:1] & msk[0:vl-1:1]) |
                     (|MaxInt|..|MaxInt| & !msk[0:vl-1:1]);
     k = HMIN(vck[0:vl-1:1]);       - first position extraction


Vectorization of minloc is supposed to help gas_dyn from Polyhedron as
discussed in PR 31067.

PRs 44710 and 44711 currently prevent the vectorization. PR 44711 can be
bypassed by using -fno-tree-pre. I'll wait for a fix of PR 44710 before I
commit this patch (after I regtest it again).
Also the case of pos = i; instead of pos = i+1; is not supported since in
this case the operands are switched, i.e., we get "x op y ? y : x".


My main question is the implementation of vector comparisons. I understand
that different targets can return different types of results. So instead of
defining new tree codes, I used target builtin which also returns the type
of the result.

Other comments are welcome too.

Bootstrapped and tested on powerpc64-suse-linux.

Thanks,
Ira


ChangeLog:

      * doc/tm.texi (TARGET_VECTORIZE_BUILTIN_VEC_CMP): Document.
      * target.h (struct vectorize): Add new target builtin.
      * tree-vectorizer.h (enum vect_compound_pattern): New.
      (struct _stmt_vec_info): Add new fields compound_pattern and
      reduc_scalar_result_stmt. Add macros to access them.
      (is_pattern_stmt_p): Return true for compound pattern.
      (vectorizable_condition): Add arguments.
      (vect_recog_compound_func_ptr): New function-pointer type.
      (NUM_COMPOUND_PATTERNS): New.
      (vect_compound_pattern_recog): Declare.
      * tree-vect-loop.c (vect_determine_vectorization_factor): Fix assert
      for compound patterns.
      (vect_analyze_scalar_cycles_1): Fix typo. Detect compound reduction
      patterns. Update comment.
      (vect_analyze_scalar_cycles): Update comment.
      (destroy_loop_vec_info): Update def stmt for the original pattern
      statement.
      (vect_is_simple_reduction_1): Skip compound pattern statements in
      uses check. Add spaces. Skip commutativity and type checks for
      minimum location statement. Fix printings.
      (vect_model_reduction_cost): Add min/max location pattern cost
      computation.
      (vect_create_epilogue_for_compound_pattern): New function.
      (vect_create_epilog_for_reduction): Don't retrieve the original
      statement for compound pattern. Fix comment accordingly. Store the
      result of vector reduction computation in a variable and use it. Call
      vect_create_epilogue_for_compound_pattern (). Check if optab exists
      before using it. Keep the scalar result computation statement. Use
      either exit phi node result or compound pattern result in scalar
      extraction. Don't expect to find an exit phi node for min/max
      statement.
      (vectorizable_reduction): Skip check for uses in loop for compound
      patterns. Don't retrieve the original statement for compound pattern.
      Call vectorizable_condition () with additional parameters. Skip
      reduction code check for compound patterns. Prepare operands for
      min/max location statement vectorization and pass them to
      vectorizable_condition ().
      (vectorizable_live_operation): Return TRUE for compound patterns.
      * tree-vect-patterns.c (vect_recog_min_max_loc_pattern): Declare.
      (vect_recog_compound_func_ptrs): Likewise.
      (vect_recog_min_max_loc_pattern): New function.
      (vect_compound_pattern_recog): Likewise.
      * target-def.h (TARGET_VECTORIZE_BUILTIN_VEC_CMP): New.
      * tree-vect-stmts.c (process_use): Mark compound pattern statements
as
      used by reduction.
      (vect_mark_stmts_to_be_vectorized): Allow compound pattern statements
      to be used by reduction.
      (vectorize_minmax_location_pattern): New function.
      (vectorizable_condition): Update comment, add arguments. Skip checks
      irrelevant for compound pattern. Check that vector comparisons are
      supported by the target. Prepare operands using new arguments. Call
      vectorize_minmax_location_pattern().
      (vect_analyze_stmt): Allow nested cycle statements to be used by
      reduction. Call vectorizable_condition () with additional arguments.
      (vect_transform_stmt): Call vectorizable_condition () with additional
      arguments.
      (new_stmt_vec_info): Initialize new fields.
      * config/rs6000/rs6000-builtin.def (ALTIVEC_BUILTIN_VCMPLTFP): New.
      (ALTIVEC_BUILTIN_VCMPLEFP): New.
      * config/rs6000/rs6000.c (rs6000_builtin_vect_compare): New.
      (TARGET_VECTORIZE_BUILTIN_VEC_CMP): Redefine.
      (struct builtin_description bdesc_2arg): Add altivec_vcmpltfp and
      altivec_vcmplefp.
      * config/rs6000/altivec.md (altivec_vcmpltfp): New pattern.
      (altivec_vcmplefp): Likewise.
      * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for compound
      patterns.

testsuite/ChangeLog:

      * gcc.dg/vect/vect.exp: Define how to run tests named fast-math*.c
      * lib/target-supports.exp (check_effective_target_vect_cmp): New.
      * gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test.
      * gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c,
      gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c,
      gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c,
      gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c,
      gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c,
      gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c,
      gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c,
      gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c: Likewise.


(See attached file: minloc.txt)(See attached file: minloc-tests.txt)



[-- Attachment #2: minloc.txt --]
[-- Type: text/plain, Size: 69350 bytes --]

Index: doc/tm.texi
===================================================================
--- doc/tm.texi	(revision 161484)
+++ doc/tm.texi	(working copy)
@@ -5750,6 +5750,14 @@ the elements in the vectors should be of
 parameter is true if the memory access is defined in a packed struct.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_BUILTIN_VEC_CMP (unsigned int @var{code}, const_tree @var{type}, tree *@var{result_type})
+Target builtin that implements vector element-wise comparison.
+The value of @var{code} is one of the enumerators in @code{enum tree_code} and
+specifies comparison operation, @var{type} specifies the type of input vectors.
+The function returns the type of the comparison result in @var{result_type}. 
+@end deftypefn
+
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
Index: target.h
===================================================================
--- target.h	(revision 161484)
+++ target.h	(working copy)
@@ -545,6 +545,8 @@ struct gcc_target
        is true if the access is defined in a packed struct.  */
     bool (* builtin_support_vector_misalignment) (enum machine_mode,
                                                   const_tree, int, bool);
+    /* Target builtin that implements vector element-wise comparison.  */
+    tree (* builtin_vect_compare) (unsigned int, tree, tree *);
   } vectorize;
 
   /* The initial value of target_flags.  */
Index: tree-vectorizer.h
===================================================================
--- tree-vectorizer.h	(revision 161484)
+++ tree-vectorizer.h	(working copy)
@@ -389,6 +389,17 @@ enum slp_vect_type {
   hybrid
 };
 
+/* Compound pattern is a pattern consisting more than one statement that need
+   to be vectorized. Currenty min/max location pattern is the only supported
+   compound pattern. It has two statements: the first statement calculates the 
+   minimum (marked MINMAX_STMT) and the second one calculates the location 
+   (marked MINMAX_LOC_STMT).  */
+enum vect_compound_pattern {
+  not_in_pattern = 0,
+  minmax_stmt,
+  minmax_loc_stmt
+};
+
 
 typedef struct data_reference *dr_p;
 DEF_VEC_P(dr_p);
@@ -405,6 +416,10 @@ typedef struct _stmt_vec_info {
   /* Stmt is part of some pattern (computation idiom)  */
   bool in_pattern_p;
 
+  /* Statement is a part of a compound pattern, i.e., a pattern consisting
+     more than one statement.  */
+  enum vect_compound_pattern compound_pattern;
+
   /* For loads only, if there is a store with the same location, this field is
      TRUE.  */
   bool read_write_dep;
@@ -491,6 +506,10 @@ typedef struct _stmt_vec_info {
   /* The bb_vec_info with respect to which STMT is vectorized.  */
   bb_vec_info bb_vinfo;
 
+  /* The scalar result of vectorized reduction computation generated in
+     reduction epilogue.  */
+  gimple reduc_scalar_result_stmt;
+
   /* Is this statement vectorizable or should it be skipped in (partial)
      vectorization.  */
   bool vectorizable;
@@ -515,6 +534,7 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_DR_ALIGNED_TO(S)        (S)->dr_aligned_to
 
 #define STMT_VINFO_IN_PATTERN_P(S)         (S)->in_pattern_p
+#define STMT_VINFO_COMPOUND_PATTERN(S)     (S)->compound_pattern
 #define STMT_VINFO_RELATED_STMT(S)         (S)->related_stmt
 #define STMT_VINFO_SAME_ALIGN_REFS(S)      (S)->same_align_refs
 #define STMT_VINFO_DEF_TYPE(S)             (S)->def_type
@@ -526,6 +546,7 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_DR_GROUP_SAME_DR_STMT(S)(S)->same_dr_stmt
 #define STMT_VINFO_DR_GROUP_READ_WRITE_DEPENDENCE(S)  (S)->read_write_dep
 #define STMT_VINFO_STRIDED_ACCESS(S)      ((S)->first_dr != NULL)
+#define STMT_VINFO_REDUC_SCALAR_RES_STMT(S) (S)->reduc_scalar_result_stmt
 
 #define DR_GROUP_FIRST_DR(S)               (S)->first_dr
 #define DR_GROUP_NEXT_DR(S)                (S)->next_dr
@@ -620,7 +641,8 @@ is_pattern_stmt_p (stmt_vec_info stmt_in
   related_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
   if (related_stmt
       && (related_stmt_info = vinfo_for_stmt (related_stmt))
-      && STMT_VINFO_IN_PATTERN_P (related_stmt_info))
+      && (STMT_VINFO_IN_PATTERN_P (related_stmt_info)
+          || STMT_VINFO_COMPOUND_PATTERN (related_stmt_info)))
     return true;
 
   return false;
@@ -741,8 +763,10 @@ extern bool vect_transform_stmt (gimple,
                                  bool *, slp_tree, slp_instance);
 extern void vect_remove_stores (gimple);
 extern bool vect_analyze_stmt (gimple, bool *, slp_tree);
-extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
-                                    tree, int);
+extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *, 
+                                    tree, int, tree, int);
+extern gimple vectorize_minmax_location_pattern (gimple, gimple_stmt_iterator*,
+                                       enum tree_code, tree, tree, tree, tree);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
@@ -818,8 +842,11 @@ extern void vect_slp_transform_bb (basic
    Additional pattern recognition functions can (and will) be added
    in the future.  */
 typedef gimple (* vect_recog_func_ptr) (gimple, tree *, tree *);
-#define NUM_PATTERNS 4
+typedef bool (* vect_recog_compound_func_ptr) (unsigned int, va_list);
+#define NUM_PATTERNS 4 
+#define NUM_COMPOUND_PATTERNS 1  
 void vect_pattern_recog (loop_vec_info);
+void vect_compound_pattern_recog (unsigned int, ...);
 
 /* In tree-vectorizer.c.  */
 unsigned vectorize_loops (void);
Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c	(revision 161484)
+++ tree-vect-loop.c	(working copy)
@@ -295,7 +295,8 @@ vect_determine_vectorization_factor (loo
 	  else
 	    {
 	      gcc_assert (!STMT_VINFO_DATA_REF (stmt_info)
-			  && !is_pattern_stmt_p (stmt_info));
+			  && (!is_pattern_stmt_p (stmt_info)
+                              || STMT_VINFO_COMPOUND_PATTERN (stmt_info)));
 
 	      scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 	      if (vect_print_dump_info (REPORT_DETAILS))
@@ -444,10 +445,15 @@ static void
 vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, struct loop *loop)
 {
   basic_block bb = loop->header;
-  tree dumy;
+  tree dummy;
   VEC(gimple,heap) *worklist = VEC_alloc (gimple, heap, 64);
   gimple_stmt_iterator gsi;
-  bool double_reduc;
+  bool double_reduc, found, minmax_loc = false;
+  gimple first_cond_stmt = NULL, second_cond_stmt = NULL;
+  gimple first_phi = NULL, second_phi = NULL, phi, use_stmt;
+  int i;
+  imm_use_iterator imm_iter;
+  use_operand_p use_p;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "=== vect_analyze_scalar_cycles ===");
@@ -484,7 +490,8 @@ vect_analyze_scalar_cycles_1 (loop_vec_i
 	}
 
       if (!access_fn
-	  || !vect_is_simple_iv_evolution (loop->num, access_fn, &dumy, &dumy))
+	  || !vect_is_simple_iv_evolution (loop->num, access_fn, &dummy, 
+                                           &dummy)) 
 	{
 	  VEC_safe_push (gimple, heap, worklist, phi);
 	  continue;
@@ -495,8 +502,56 @@ vect_analyze_scalar_cycles_1 (loop_vec_i
       STMT_VINFO_DEF_TYPE (stmt_vinfo) = vect_induction_def;
     }
 
+  /* Detect compound reduction patterns (before reduction detection):  
+     we currently support only min/max location pattern, so we look for two 
+     reduction condition statements.  */
+  for (i = 0; VEC_iterate (gimple, worklist, i, phi); i++)
+    {
+      tree def = PHI_RESULT (phi);
 
-  /* Second - identify all reductions and nested cycles.  */
+      found = false;
+      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, def)
+        {
+          use_stmt = USE_STMT (use_p);
+          if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
+              && vinfo_for_stmt (use_stmt)
+              && is_gimple_assign (use_stmt)
+              && gimple_assign_rhs_code (use_stmt) == COND_EXPR)
+            {
+              found = true;
+              break;
+            }
+        }
+
+      if (!found)
+        continue;
+
+      if (!first_cond_stmt)
+        {
+          first_cond_stmt = use_stmt;
+          first_phi = phi;
+        }
+      else
+        {
+          if (second_cond_stmt)
+            {
+              /* This one is the third reduction condition statement in the 
+                 loop. This is too confusing, we bail out.  */
+              minmax_loc = false;
+              break;
+            }
+
+          second_cond_stmt = use_stmt;
+          second_phi = phi;
+          minmax_loc = true;
+        }
+    }
+
+  if (minmax_loc)
+    vect_compound_pattern_recog (4, first_phi, first_cond_stmt, 
+                                 second_phi, second_cond_stmt);
+
+  /* Identify all reductions and nested cycles.  */
   while (VEC_length (gimple, worklist) > 0)
     {
       gimple phi = VEC_pop (gimple, worklist);
@@ -595,11 +650,9 @@ vect_analyze_scalar_cycles (loop_vec_inf
   /* When vectorizing an outer-loop, the inner-loop is executed sequentially.
      Reductions in such inner-loop therefore have different properties than
      the reductions in the nest that gets vectorized:
-     1. When vectorized, they are executed in the same order as in the original
-        scalar loop, so we can't change the order of computation when
-        vectorizing them.
-     2. FIXME: Inner-loop reductions can be used in the inner-loop, so the
-        current checks are too strict.  */
+     when vectorized, they are executed in the same order as in the original
+     scalar loop, so we can't change the order of computation when
+     vectorizing them.  */
 
   if (loop->inner)
     vect_analyze_scalar_cycles_1 (loop_vinfo, loop->inner);
@@ -819,7 +872,15 @@ destroy_loop_vec_info (loop_vec_info loo
                   if (orig_stmt_info
                       && STMT_VINFO_IN_PATTERN_P (orig_stmt_info))
                     remove_stmt_p = true;
-                }
+               
+		  /* We are removing statement inserted by the pattern 
+		     detection pass. Update the original statement to be the 
+		     def stmt of the statement's LHS.  */
+                  if (remove_stmt_p && is_gimple_assign (orig_stmt) 
+                      && TREE_CODE (gimple_assign_lhs (orig_stmt)) == SSA_NAME)
+                    SSA_NAME_DEF_STMT (gimple_assign_lhs (orig_stmt)) 
+                      = orig_stmt;
+                 }
 
               /* Free stmt_vec_info.  */
               free_stmt_vec_info (stmt);
@@ -1662,13 +1723,16 @@ vect_is_simple_reduction_1 (loop_vec_inf
       gimple use_stmt = USE_STMT (use_p);
       if (is_gimple_debug (use_stmt))
 	continue;
+
       if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
 	  && vinfo_for_stmt (use_stmt)
-	  && !is_pattern_stmt_p (vinfo_for_stmt (use_stmt)))
+	  && !is_pattern_stmt_p (vinfo_for_stmt (use_stmt))
+	  && !STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (use_stmt)))
         nloop_uses++;
+   
       if (nloop_uses > 1)
         {
-          if (vect_print_dump_info (REPORT_DETAILS))
+          if (vect_print_dump_info (REPORT_DETAILS)) 
             fprintf (vect_dump, "reduction used in loop.");
           return NULL;
         }
@@ -1716,10 +1780,12 @@ vect_is_simple_reduction_1 (loop_vec_inf
       gimple use_stmt = USE_STMT (use_p);
       if (is_gimple_debug (use_stmt))
 	continue;
+
       if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
 	  && vinfo_for_stmt (use_stmt)
 	  && !is_pattern_stmt_p (vinfo_for_stmt (use_stmt)))
 	nloop_uses++;
+
       if (nloop_uses > 1)
 	{
 	  if (vect_print_dump_info (REPORT_DETAILS))
@@ -1769,6 +1835,9 @@ vect_is_simple_reduction_1 (loop_vec_inf
     code = PLUS_EXPR;
 
   if (check_reduction
+      && (!vinfo_for_stmt (def_stmt)
+          || STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (def_stmt))
+                != minmax_loc_stmt)
       && (!commutative_tree_code (code) || !associative_tree_code (code)))
     {
       if (vect_print_dump_info (REPORT_DETAILS))
@@ -1819,14 +1888,16 @@ vect_is_simple_reduction_1 (loop_vec_inf
    }
 
   type = TREE_TYPE (gimple_assign_lhs (def_stmt));
-  if ((TREE_CODE (op1) == SSA_NAME
-       && !types_compatible_p (type,TREE_TYPE (op1)))
-      || (TREE_CODE (op2) == SSA_NAME
-          && !types_compatible_p (type, TREE_TYPE (op2)))
-      || (op3 && TREE_CODE (op3) == SSA_NAME
-          && !types_compatible_p (type, TREE_TYPE (op3)))
-      || (op4 && TREE_CODE (op4) == SSA_NAME
-          && !types_compatible_p (type, TREE_TYPE (op4))))
+  if (STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (def_stmt)) 
+        != minmax_loc_stmt
+      && ((TREE_CODE (op1) == SSA_NAME 
+           && !types_compatible_p (type, TREE_TYPE (op1)))
+          || (TREE_CODE (op2) == SSA_NAME
+           && !types_compatible_p (type, TREE_TYPE (op2)))
+          || (op3 && TREE_CODE (op3) == SSA_NAME
+           && !types_compatible_p (type, TREE_TYPE (op3)))
+          || (op4 && TREE_CODE (op4) == SSA_NAME
+           && !types_compatible_p (type, TREE_TYPE (op4)))))
     {
       if (vect_print_dump_info (REPORT_DETAILS))
         {
@@ -1834,17 +1905,17 @@ vect_is_simple_reduction_1 (loop_vec_inf
           print_generic_expr (vect_dump, type, TDF_SLIM);
           fprintf (vect_dump, ", operands types: ");
           print_generic_expr (vect_dump, TREE_TYPE (op1), TDF_SLIM);
-          fprintf (vect_dump, ",");
+          fprintf (vect_dump, ", ");
           print_generic_expr (vect_dump, TREE_TYPE (op2), TDF_SLIM);
           if (op3)
             {
-              fprintf (vect_dump, ",");
+              fprintf (vect_dump, ", ");
               print_generic_expr (vect_dump, TREE_TYPE (op3), TDF_SLIM);
             }
 
           if (op4)
             {
-              fprintf (vect_dump, ",");
+              fprintf (vect_dump, ", ");
               print_generic_expr (vect_dump, TREE_TYPE (op4), TDF_SLIM);
             }
         }
@@ -1952,7 +2023,7 @@ vect_is_simple_reduction_1 (loop_vec_inf
                                == vect_internal_def
 		           && !is_loop_header_bb_p (gimple_bb (def2)))))))
     {
-      if (check_reduction)
+      if (check_reduction && code != COND_EXPR)
         {
           /* Swap operands (just for simplicity - so that the rest of the code
 	     can assume that the reduction variable is always the last (second)
@@ -2354,7 +2425,6 @@ vect_model_reduction_cost (stmt_vec_info
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
 
-
   /* Cost of reduction op inside loop.  */
   STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) 
     += ncopies * vect_get_cost (vector_stmt);
@@ -2391,11 +2461,15 @@ vect_model_reduction_cost (stmt_vec_info
   mode = TYPE_MODE (vectype);
   orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
 
-  if (!orig_stmt)
+  if (!orig_stmt || STMT_VINFO_COMPOUND_PATTERN (stmt_info)) 
     orig_stmt = STMT_VINFO_STMT (stmt_info);
 
   code = gimple_assign_rhs_code (orig_stmt);
 
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt)
+    STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)
+      += ncopies * 5 * vect_get_cost (vector_stmt);
+
   /* Add in cost for initial definition.  */
   outer_cost += vect_get_cost (scalar_to_vec);
 
@@ -2411,28 +2485,35 @@ vect_model_reduction_cost (stmt_vec_info
                       + vect_get_cost (vec_to_scalar); 
       else
 	{
-	  int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
-	  tree bitsize =
-	    TYPE_SIZE (TREE_TYPE (gimple_assign_lhs (orig_stmt)));
-	  int element_bitsize = tree_low_cst (bitsize, 1);
-	  int nelements = vec_size_in_bits / element_bitsize;
-
-	  optab = optab_for_tree_code (code, vectype, optab_default);
-
-	  /* We have a whole vector shift available.  */
-	  if (VECTOR_MODE_P (mode)
-	      && optab_handler (optab, mode)->insn_code != CODE_FOR_nothing
-	      && optab_handler (vec_shr_optab, mode)->insn_code != CODE_FOR_nothing)
-	    /* Final reduction via vector shifts and the reduction operator. Also
-	       requires scalar extract.  */
-	    outer_cost += ((exact_log2(nelements) * 2) 
-              * vect_get_cost (vector_stmt) 
-  	      + vect_get_cost (vec_to_scalar));
-	  else
-	    /* Use extracts and reduction op for final reduction.  For N elements,
-               we have N extracts and N-1 reduction ops.  */
-	    outer_cost += ((nelements + nelements - 1) 
-              * vect_get_cost (vector_stmt));
+          if (STMT_VINFO_COMPOUND_PATTERN (stmt_info))
+            outer_cost += 6 * vect_get_cost (vector_stmt) 
+                          + vect_get_cost (vec_to_scalar);
+          else
+            {
+  	      int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
+ 	      tree bitsize =
+	        TYPE_SIZE (TREE_TYPE (gimple_assign_lhs (orig_stmt)));
+	      int element_bitsize = tree_low_cst (bitsize, 1);
+	      int nelements = vec_size_in_bits / element_bitsize;
+
+	      optab = optab_for_tree_code (code, vectype, optab_default);
+
+	      /* We have a whole vector shift available.  */
+	      if (VECTOR_MODE_P (mode)
+	          && optab_handler (optab, mode)->insn_code != CODE_FOR_nothing
+ 	          && optab_handler (vec_shr_optab, mode)->insn_code 
+                     != CODE_FOR_nothing)
+	        /* Final reduction via vector shifts and the reduction operator. 
+                   Also requires scalar extract.  */
+	        outer_cost += ((exact_log2(nelements) * 2) 
+                                * vect_get_cost (vector_stmt) 
+                               + vect_get_cost (vec_to_scalar));
+	      else
+	        /* Use extracts and reduction op for final reduction.  For N 
+                   elements, we have N extracts and N-1 reduction ops.  */
+	        outer_cost += ((nelements + nelements - 1)
+                               * vect_get_cost (vector_stmt));
+            }
 	}
     }
 
@@ -2933,6 +3014,128 @@ get_initial_def_for_reduction (gimple st
   return init_def;
 }
 
+/* Create min/max location epilogue calculation. We have both vector and
+   extracted scalar results of min/max computation, and a vector of locations
+   that we need to reduce to a scalar result now.
+   We use a technique described in the documention of
+   vectorize_minmax_location_pattern ().  */
+
+static void
+vect_create_epilogue_for_compound_pattern (gimple stmt, tree vectype, 
+                                           enum tree_code *reduc_code,
+                                           gimple *new_phi, 
+                                           gimple_stmt_iterator *exit_gsi,
+                                           enum tree_code *code)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  tree t = NULL_TREE, minmax_vec, minmax_res = NULL_TREE, orig_cond, val;
+  gimple related, min_max_stmt, related_res;
+  enum machine_mode vec_mode;
+  optab reduc_optab;
+  unsigned int nunits;
+  int i;
+  imm_use_iterator imm_iter;
+  use_operand_p use_p;
+  basic_block exit_bb;
+  enum tree_code orig_code;
+
+  if (nested_in_vect_loop_p (loop, stmt))
+    loop = loop->inner;
+
+  exit_bb = single_exit (loop)->dest;
+
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) != minmax_loc_stmt)
+    return;
+
+  related = STMT_VINFO_RELATED_STMT (stmt_info);
+  related_res = STMT_VINFO_REDUC_SCALAR_RES_STMT (vinfo_for_stmt (related));
+  gcc_assert (related_res);
+
+  /* Get a vector result of min/max computation.  */
+  min_max_stmt = STMT_VINFO_VEC_STMT (vinfo_for_stmt (related));
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_assign_lhs (min_max_stmt))
+    if (gimple_bb (USE_STMT (use_p)) == exit_bb
+        && gimple_code (USE_STMT (use_p)) == GIMPLE_PHI)
+      minmax_res = PHI_RESULT (USE_STMT (use_p));
+   
+  gcc_assert (minmax_res);
+
+  /* Create vector {min, min,...} or {max, max, ...}.  */
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  for (i = nunits - 1; i >= 0; --i)
+    t = tree_cons (NULL_TREE, gimple_assign_lhs (related_res), t);
+
+  minmax_vec = build_constructor_from_list (TREE_TYPE (minmax_res), t);
+
+  /* To extract the final position value, we need to know whether to look
+     for maximum (GT_EXPR and LT_EXPR) or minimum (GE_EXPR or LE_EXPR).  */ 
+  orig_cond = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
+  if (TREE_CODE (orig_cond) == SSA_NAME)
+    {
+      gimple cond_def_stmt = SSA_NAME_DEF_STMT (orig_cond);
+      orig_code = gimple_assign_rhs_code (cond_def_stmt);
+    }
+  else
+    orig_code = TREE_CODE (orig_cond);
+
+  if (orig_code == GT_EXPR || orig_code == LT_EXPR)
+    {
+      val = TYPE_MAX_VALUE (TREE_TYPE (gimple_assign_lhs (stmt)));
+      *code = MIN_EXPR;
+    }
+  else
+    {
+      val = TYPE_MIN_VALUE (TREE_TYPE (gimple_assign_lhs (stmt)));
+      *code = MAX_EXPR;
+    }
+
+  /* Build a vector of maximum or minimum values.  */
+  t = NULL_TREE;
+  for (i = nunits - 1; i >= 0; --i)
+    t = tree_cons (NULL_TREE, val, t); 
+
+  /* Promote GSI to after the min/max result extraction, since we use it
+     in index calculation. (We insert the min/max scalar statement before
+     the index calculation statement (in vect_recog_min_max_loc_pattern()),
+     therefore, its epilogue is created before the epilogue of the index
+     calculation statement.  */
+  *exit_gsi = gsi_for_stmt (related_res);
+  gsi_next (exit_gsi);
+  minmax_vec = vect_init_vector (stmt, minmax_vec, TREE_TYPE (minmax_res), 
+                                 exit_gsi);
+  *new_phi = vectorize_minmax_location_pattern (stmt, exit_gsi, EQ_EXPR,
+                                                minmax_res, minmax_vec,
+                                                PHI_RESULT (*new_phi),
+                                                build_vector (vectype, t));
+
+  /* Extract minimum or maximum from VECTOR_RESULT to get the first or the last
+     index (using one of the above techniques).  */
+  *reduc_code = ERROR_MARK;
+  if (reduction_code_for_scalar_code (*code, reduc_code))
+    {
+      reduc_optab = optab_for_tree_code (*reduc_code, vectype, optab_default);
+      if (!reduc_optab)
+        {
+          if (vect_print_dump_info (REPORT_DETAILS))
+            fprintf (vect_dump, "no optab for reduction.");
+
+          reduc_code = ERROR_MARK;
+        }
+
+        vec_mode = TYPE_MODE (vectype);
+        if (reduc_optab
+            && optab_handler (reduc_optab, vec_mode)->insn_code
+                == CODE_FOR_nothing)
+          {
+            if (vect_print_dump_info (REPORT_DETAILS))
+              fprintf (vect_dump, "reduc op not supported by target.");
+
+            *reduc_code = ERROR_MARK;
+          }
+     }
+}
 
 /* Function vect_create_epilog_for_reduction
 
@@ -3035,6 +3238,7 @@ vect_create_epilog_for_reduction (VEC (t
   unsigned int group_size = 1, k, ratio;
   VEC (tree, heap) *vec_initial_defs = NULL;
   VEC (gimple, heap) *phis;
+  tree vec_temp;
 
   if (slp_node)
     group_size = VEC_length (gimple, SLP_TREE_SCALAR_STMTS (slp_node)); 
@@ -3092,9 +3296,9 @@ vect_create_epilog_for_reduction (VEC (t
   else
     {
       vec_initial_defs = VEC_alloc (tree, heap, 1);
-     /* For the case of reduction, vect_get_vec_def_for_operand returns
-        the scalar def before the loop, that defines the initial value
-        of the reduction variable.  */
+      /* For the case of reduction, vect_get_vec_def_for_operand returns
+         the scalar def before the loop, that defines the initial value
+         of the reduction variable.  */
       vec_initial_def = vect_get_vec_def_for_operand (reduction_op, stmt,
                                                       &adjustment_def);
       VEC_quick_push (tree, vec_initial_defs, vec_initial_def);
@@ -3194,18 +3398,18 @@ vect_create_epilog_for_reduction (VEC (t
          defined in the loop.  In case STMT is a "pattern-stmt" (i.e. - it
          represents a reduction pattern), the tree-code and scalar-def are
          taken from the original stmt that the pattern-stmt (STMT) replaces.
-         Otherwise (it is a regular reduction) - the tree-code and scalar-def
-         are taken from STMT.  */
+         Otherwise (it is a regular reduction or a compound pattern) - the 
+         tree-code and scalar-def are taken from STMT.  */
 
   orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
-  if (!orig_stmt)
+  if (!orig_stmt || STMT_VINFO_COMPOUND_PATTERN (stmt_info))  
     {
-      /* Regular reduction  */
+      /* Regular reduction or compound pattern.  */
       orig_stmt = stmt;
     }
   else
     {
-      /* Reduction pattern  */
+      /* Reduction pattern.  */ 
       stmt_vec_info stmt_vinfo = vinfo_for_stmt (orig_stmt);
       gcc_assert (STMT_VINFO_IN_PATTERN_P (stmt_vinfo));
       gcc_assert (STMT_VINFO_RELATED_STMT (stmt_vinfo) == stmt);
@@ -3232,6 +3436,16 @@ vect_create_epilog_for_reduction (VEC (t
   if (nested_in_vect_loop && !double_reduc)
     goto vect_finalize_reduction;
 
+  /* Create an epilogue for compound pattern.  */
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info))
+    {
+      /* FORNOW: SLP with compound patterns is not supported.  */
+      new_phi = VEC_index (gimple, new_phis, 0);
+      vect_create_epilogue_for_compound_pattern (stmt, vectype, &reduc_code,
+                                                 &new_phi, &exit_gsi, &code);
+      VEC_replace (gimple, new_phis, 0, new_phi);
+    }
+  
   /* 2.3 Create the reduction code, using one of the three schemes described
          above. In SLP we simply need to extract all the elements from the 
          vector (without reducing them), so we use scalar shifts.  */
@@ -3247,7 +3461,11 @@ vect_create_epilog_for_reduction (VEC (t
 
       vec_dest = vect_create_destination_var (scalar_dest, vectype);
       new_phi = VEC_index (gimple, new_phis, 0);
-      tmp = build1 (reduc_code, vectype,  PHI_RESULT (new_phi));
+      if (gimple_code (new_phi) == GIMPLE_PHI)
+        vec_temp = PHI_RESULT (new_phi);
+      else
+        vec_temp = gimple_assign_lhs (new_phi);
+      tmp = build1 (reduc_code, vectype,  vec_temp);
       epilog_stmt = gimple_build_assign (vec_dest, tmp);
       new_temp = make_ssa_name (vec_dest, epilog_stmt);
       gimple_assign_set_lhs (epilog_stmt, new_temp);
@@ -3262,7 +3480,6 @@ vect_create_epilog_for_reduction (VEC (t
       int bit_offset;
       int element_bitsize = tree_low_cst (bitsize, 1);
       int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
-      tree vec_temp;
 
       if (optab_handler (vec_shr_optab, mode)->insn_code != CODE_FOR_nothing)
         shift_code = VEC_RSHIFT_EXPR;
@@ -3278,11 +3495,11 @@ vect_create_epilog_for_reduction (VEC (t
       if (!VECTOR_MODE_P (mode))
         have_whole_vector_shift = false;
       else
-        {
-          optab optab = optab_for_tree_code (code, vectype, optab_default);
-          if (optab_handler (optab, mode)->insn_code == CODE_FOR_nothing)
-            have_whole_vector_shift = false;
-        }
+	{
+	  optab optab = optab_for_tree_code (code, vectype, optab_default);
+	  if (!optab || optab_handler (optab, mode)->insn_code == CODE_FOR_nothing)
+	    have_whole_vector_shift = false;
+	}
 
       if (have_whole_vector_shift && !slp_node)
         {
@@ -3298,7 +3515,10 @@ vect_create_epilog_for_reduction (VEC (t
 
           vec_dest = vect_create_destination_var (scalar_dest, vectype);
           new_phi = VEC_index (gimple, new_phis, 0);
-          new_temp = PHI_RESULT (new_phi);
+          if (gimple_code (new_phi) == GIMPLE_PHI)
+            new_temp = PHI_RESULT (new_phi);
+          else
+            new_temp = gimple_assign_lhs (new_phi);
           for (bit_offset = vec_size_in_bits/2;
                bit_offset >= element_bitsize;
                bit_offset /= 2)
@@ -3340,7 +3560,10 @@ vect_create_epilog_for_reduction (VEC (t
           vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
           for (i = 0; VEC_iterate (gimple, new_phis, i, new_phi); i++)
             {
-              vec_temp = PHI_RESULT (new_phi);
+              if (gimple_code (new_phi) == GIMPLE_PHI)
+                vec_temp = PHI_RESULT (new_phi);
+              else
+                vec_temp = gimple_assign_lhs (new_phi);
               rhs = build3 (BIT_FIELD_REF, scalar_type, vec_temp, bitsize,
                             bitsize_zero_node);
               epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
@@ -3410,6 +3633,7 @@ vect_create_epilog_for_reduction (VEC (t
             /* Not SLP - we have one scalar to keep in SCALAR_RESULTS.  */
             VEC_safe_push (tree, heap, scalar_results, new_temp);
 
+          STMT_VINFO_REDUC_SCALAR_RES_STMT (stmt_info) = epilog_stmt;
           extract_scalar_result = false;
         }
     }
@@ -3437,6 +3661,7 @@ vect_create_epilog_for_reduction (VEC (t
       gimple_assign_set_lhs (epilog_stmt, new_temp);
       gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
       VEC_safe_push (tree, heap, scalar_results, new_temp);
+      STMT_VINFO_REDUC_SCALAR_RES_STMT (stmt_info) = epilog_stmt;
     }
   
 vect_finalize_reduction:
@@ -3452,8 +3677,13 @@ vect_finalize_reduction:
       if (nested_in_vect_loop)
 	{
           new_phi = VEC_index (gimple, new_phis, 0);
+          if (gimple_code (new_phi) == GIMPLE_PHI)
+            vec_temp = PHI_RESULT (new_phi);
+          else
+            vec_temp = gimple_assign_lhs (new_phi);
+
 	  gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) == VECTOR_TYPE);
-	  expr = build2 (code, vectype, PHI_RESULT (new_phi), adjustment_def);
+	  expr = build2 (code, vectype, vec_temp, adjustment_def);
 	  new_dest = vect_create_destination_var (scalar_dest, vectype);
 	}
       else
@@ -3486,6 +3716,7 @@ vect_finalize_reduction:
         VEC_replace (tree, scalar_results, 0, new_temp);
 
       VEC_replace (gimple, new_phis, 0, epilog_stmt);
+      STMT_VINFO_REDUC_SCALAR_RES_STMT (stmt_info) = epilog_stmt;
     }
 
   /* 2.6  Handle the loop-exit phis. Replace the uses of scalar loop-exit
@@ -3555,8 +3786,10 @@ vect_finalize_reduction:
           VEC_safe_push (gimple, heap, phis, USE_STMT (use_p));
 
       /* We expect to have found an exit_phi because of loop-closed-ssa
-         form.  */
-      gcc_assert (!VEC_empty (gimple, phis));
+         form,unless it's a min/max statement of min/max location pattern, 
+         which is inserted by the pattern recognition phase.  */
+      gcc_assert (!VEC_empty (gimple, phis)
+                  || STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_stmt);
 
       for (i = 0; VEC_iterate (gimple, phis, i, exit_phi); i++)
         {
@@ -3637,7 +3870,12 @@ vect_finalize_reduction:
                   add_phi_arg (vect_phi, vect_phi_init,
                                loop_preheader_edge (outer_loop),
                                UNKNOWN_LOCATION);
-                  add_phi_arg (vect_phi, PHI_RESULT (epilog_stmt),
+                  if (gimple_code (epilog_stmt) == GIMPLE_PHI)
+                    vec_temp = PHI_RESULT (epilog_stmt);
+                  else
+                    vec_temp = gimple_assign_lhs (epilog_stmt);
+
+                  add_phi_arg (vect_phi, vec_temp,
                                loop_latch_edge (outer_loop), UNKNOWN_LOCATION);
                   if (vect_print_dump_info (REPORT_DETAILS))
                     {
@@ -3760,11 +3998,11 @@ vectorizable_reduction (gimple stmt, gim
   basic_block def_bb;
   struct loop * def_stmt_loop, *outer_loop = NULL;
   tree def_arg;
-  gimple def_arg_stmt;
+  gimple def_arg_stmt, related;
   VEC (tree, heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL, *vect_defs = NULL;
   VEC (gimple, heap) *phis = NULL;
-  int vec_num;
-  tree def0, def1;
+  int vec_num, cond_reduc_index = 0;
+  tree def0, def1, cond_reduc_def = NULL_TREE;
 
   if (nested_in_vect_loop_p (loop, stmt))
     {
@@ -3774,8 +4012,10 @@ vectorizable_reduction (gimple stmt, gim
     }
 
   /* 1. Is vectorizable reduction?  */
-  /* Not supportable if the reduction variable is used in the loop.  */
-  if (STMT_VINFO_RELEVANT (stmt_info) > vect_used_in_outer)
+  /* Not supportable if the reduction variable is used in the loop,
+     unless it's a pattern.  */
+  if (STMT_VINFO_RELEVANT (stmt_info) > vect_used_in_outer 
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
     return false;
 
   /* Reductions that are not used even in an enclosing outer-loop,
@@ -3797,14 +4037,17 @@ vectorizable_reduction (gimple stmt, gim
      the original sequence that constitutes the pattern.  */
 
   orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
-  if (orig_stmt)
+  if (orig_stmt 
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
     {
       orig_stmt_info = vinfo_for_stmt (orig_stmt);
       gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) == stmt);
       gcc_assert (STMT_VINFO_IN_PATTERN_P (orig_stmt_info));
       gcc_assert (!STMT_VINFO_IN_PATTERN_P (stmt_info));
     }
-
+  else
+    orig_stmt = NULL;
+ 
   /* 3. Check the operands of the operation. The first operands are defined
         inside the loop body. The last operand is the reduction variable,
         which is defined by the loop-header-phi.  */
@@ -3917,12 +4160,13 @@ vectorizable_reduction (gimple stmt, gim
 
   if (code == COND_EXPR)
     {
-      if (!vectorizable_condition (stmt, gsi, NULL, ops[reduc_index], 0))
+      if (!vectorizable_condition (stmt, gsi, NULL, ops[reduc_index], 0, 
+                                   cond_reduc_def, cond_reduc_index)) 
         {
           if (vect_print_dump_info (REPORT_DETAILS))
             fprintf (vect_dump, "unsupported condition in reduction");
 
-            return false;
+          return false;
         }
     }
   else
@@ -4055,7 +4299,12 @@ vectorizable_reduction (gimple stmt, gim
     }
   else
     {
-      if (!nested_cycle || double_reduc)
+      /* There is no need in reduction epilogue in case of a nested cycle, 
+         unless it is double reduction. For reduction pattern, we assume that
+         we know how to create an epilogue even if there is no reduction code
+         for it.  */ 
+      if ((!nested_cycle || double_reduc) 
+           && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
         {
           if (vect_print_dump_info (REPORT_DETAILS))
             fprintf (vect_dump, "no reduc code for scalar code.");
@@ -4075,8 +4324,9 @@ vectorizable_reduction (gimple stmt, gim
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
-      if (!vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies))
+      if (!vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies)) 
         return false;
+
       return true;
     }
 
@@ -4127,6 +4377,32 @@ vectorizable_reduction (gimple stmt, gim
   else
     epilog_copies = ncopies;
 
+  /* Prepare vector operands for min/max location.  */
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt)
+    {
+      tree cond_op;
+      gimple cond_def_stmt;
+
+      related = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt));
+      cond_op = TREE_OPERAND (ops[0], 0);
+      cond_def_stmt = SSA_NAME_DEF_STMT (cond_op);
+      if (gimple_code (cond_def_stmt) == GIMPLE_PHI)
+        {
+          cond_reduc_index = 1;
+          cond_reduc_def = gimple_assign_rhs1 (STMT_VINFO_VEC_STMT (
+                                                    vinfo_for_stmt (related)));
+        }
+      else
+        {
+          cond_op = TREE_OPERAND (ops[0], 1);
+          cond_def_stmt = SSA_NAME_DEF_STMT (cond_op);
+          gcc_assert (gimple_code (cond_def_stmt) == GIMPLE_PHI);
+          cond_reduc_index = 2;
+          cond_reduc_def = gimple_assign_rhs2 (STMT_VINFO_VEC_STMT (
+                                                    vinfo_for_stmt (related)));
+        }
+    }
+
   prev_stmt_info = NULL;
   prev_phi_info = NULL;
   if (slp_node)
@@ -4170,7 +4446,8 @@ vectorizable_reduction (gimple stmt, gim
           gcc_assert (!slp_node);
           vectorizable_condition (stmt, gsi, vec_stmt, 
                                   PHI_RESULT (VEC_index (gimple, phis, 0)), 
-                                  reduc_index);
+                                  reduc_index, cond_reduc_def, 
+                                  cond_reduc_index);
           /* Multiple types are not supported for condition.  */
           break;
         }
@@ -4406,6 +4683,9 @@ vectorizable_live_operation (gimple stmt
 
   gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
 
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info))
+    return true;
+
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
     return false;
 
Index: tree-vect-patterns.c
===================================================================
--- tree-vect-patterns.c	(revision 161484)
+++ tree-vect-patterns.c	(working copy)
@@ -53,6 +53,10 @@ static vect_recog_func_ptr vect_vect_rec
 	vect_recog_widen_sum_pattern,
 	vect_recog_dot_prod_pattern,
 	vect_recog_pow_pattern};
+static bool vect_recog_min_max_loc_pattern (unsigned int, va_list);
+static vect_recog_compound_func_ptr 
+   vect_recog_compound_func_ptrs[NUM_COMPOUND_PATTERNS] = {
+        vect_recog_min_max_loc_pattern};
 
 
 /* Function widened_name_p
@@ -847,3 +851,286 @@ vect_pattern_recog (loop_vec_info loop_v
         }
     }
 }
+
+
+/* Detect min/max location pattern. 
+   Given two reducton condition statements and their phi nodes, we check
+   if one of the statements calculates minimum or maximum, and the other one
+   records its location. If the pattern is detected, we replace the min/max 
+   condition statement with MIN_EXPR or MAX_EXPR, and mark the old statement 
+   as pattern statement.
+
+   The pattern we are looking for:
+
+   s1: min = [cond_expr] a < min ? a : min
+   s2: index = [cond_expr] a < min ? new_index : index
+
+   We add MIN_EXPR statement before the index calculation statement:
+
+   s1:  min = [cond_expr] a < min ? a : min
+   s1': min = [min_expr] <a, min>
+   s2:  index = [cond_expr] a < min ? new_index : index
+
+   s1 is marked as pattern statement
+   s1' points to s1 via related_stmt field
+   s1 points to s1' via related_stmt field
+   s2 points to s1' via related_stmt field.  
+   s1' and s2 are marked as compound pattern min/max and min/max location
+   statements.  */
+
+static bool
+vect_recog_min_max_loc_pattern (unsigned int nargs, va_list args)
+{
+  gimple first_phi, first_stmt, second_phi, second_stmt, loop_op_def_stmt;
+  stmt_vec_info stmt_vinfo, new_stmt_info, minmax_stmt_info, pos_stmt_info;
+  loop_vec_info loop_info;
+  struct loop *loop;
+  enum tree_code code, first_code, second_code;
+  gimple first_cond_def_stmt = NULL, second_cond_def_stmt = NULL;
+  tree first_cond_op0, first_cond_op1, second_cond_op0, second_cond_op1;
+  tree first_stmt_oprnd0, first_stmt_oprnd1, second_stmt_oprnd0;
+  tree second_stmt_oprnd1, first_cond, second_cond;
+  int phi_def_index;
+  tree first_loop_op, second_loop_op, pos_stmt_loop_op, def, result;
+  gimple pos_stmt, min_max_stmt, new_stmt, def_stmt;
+  gimple_stmt_iterator gsi;
+
+  if (nargs < 4)
+    return false;
+
+  first_phi = va_arg (args, gimple);
+  first_stmt = va_arg (args, gimple);
+  second_phi = va_arg (args, gimple);
+  second_stmt = va_arg (args, gimple);
+
+  stmt_vinfo = vinfo_for_stmt (first_stmt);
+  loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+  loop = LOOP_VINFO_LOOP (loop_info);
+
+  /* Check that the condition is the same and is GT or LT.  */
+  first_cond = TREE_OPERAND (gimple_assign_rhs1 (first_stmt), 0);
+  if (TREE_CODE (first_cond) == SSA_NAME)
+    {
+      first_cond_def_stmt = SSA_NAME_DEF_STMT (first_cond);
+      first_code = gimple_assign_rhs_code (first_cond_def_stmt);
+      first_cond_op0 = gimple_assign_rhs1 (first_cond_def_stmt);
+      first_cond_op1 = gimple_assign_rhs2 (first_cond_def_stmt);
+    }
+  else
+    {
+      first_code = TREE_CODE (first_cond);
+      first_cond_op0 = TREE_OPERAND (first_cond, 0);
+      first_cond_op1 = TREE_OPERAND (first_cond, 1);
+    }
+
+  if (first_code != GT_EXPR && first_code != LT_EXPR
+      && first_code != GE_EXPR && first_code != LE_EXPR)
+    return false;
+
+  second_cond = TREE_OPERAND (gimple_assign_rhs1 (second_stmt), 0);
+  if (TREE_CODE (second_cond) == SSA_NAME)
+    {
+      second_cond_def_stmt = SSA_NAME_DEF_STMT (second_cond);
+      second_code = gimple_assign_rhs_code (second_cond_def_stmt);
+      second_cond_op0 = gimple_assign_rhs1 (second_cond_def_stmt);
+      second_cond_op1 = gimple_assign_rhs2 (second_cond_def_stmt);
+    }
+  else
+    {
+      second_code = TREE_CODE (second_cond);
+      second_cond_op0 = TREE_OPERAND (second_cond, 0);
+      second_cond_op1 = TREE_OPERAND (second_cond, 1);
+    }
+
+  if (first_code != second_code)
+    return false;
+
+  if (first_cond_def_stmt
+      && (!second_cond_def_stmt
+          || first_cond_def_stmt != second_cond_def_stmt
+          || !operand_equal_p (first_cond_op0, second_cond_op0, 0)
+          || !operand_equal_p (first_cond_op1, second_cond_op1, 0)))
+   return false;
+
+  /* Both statements have the same condition.  */
+
+  first_stmt_oprnd0 = TREE_OPERAND (gimple_assign_rhs1 (first_stmt), 1);
+  first_stmt_oprnd1 = TREE_OPERAND (gimple_assign_rhs1 (first_stmt), 2);
+
+  second_stmt_oprnd0 = TREE_OPERAND (gimple_assign_rhs1 (second_stmt), 1);
+  second_stmt_oprnd1 = TREE_OPERAND (gimple_assign_rhs1 (second_stmt), 2);
+
+  if (TREE_CODE (first_stmt_oprnd0) != SSA_NAME
+      || TREE_CODE (first_stmt_oprnd1) != SSA_NAME
+      || TREE_CODE (second_stmt_oprnd0) != SSA_NAME
+      || TREE_CODE (second_stmt_oprnd1) != SSA_NAME)
+    return false;
+
+  if (operand_equal_p (PHI_RESULT (first_phi), first_stmt_oprnd0, 0)
+      && operand_equal_p (PHI_RESULT (second_phi), second_stmt_oprnd0, 0))
+    {
+      phi_def_index = 0;
+      first_loop_op = first_stmt_oprnd1;
+      second_loop_op = second_stmt_oprnd1;
+    }
+  else
+    {
+      if (operand_equal_p (PHI_RESULT (first_phi), first_stmt_oprnd1, 0)
+          && operand_equal_p (PHI_RESULT (second_phi), second_stmt_oprnd1, 0))
+        {
+          phi_def_index = 1;
+          first_loop_op = first_stmt_oprnd0;
+          second_loop_op = second_stmt_oprnd0;
+        }
+      else
+        return false;
+    }
+
+  /* Now we know which operand is defined by phi node. Analyze the second
+     one.  */
+
+  /* The min/max stmt must be x < y ? x : y.  */
+  if (operand_equal_p (first_cond_op0, first_stmt_oprnd0, 0)
+      && operand_equal_p (first_cond_op1, first_stmt_oprnd1, 0))
+    {
+      pos_stmt = second_stmt;
+      min_max_stmt = first_stmt;
+      pos_stmt_loop_op = second_loop_op;
+    }
+  else
+    {
+      if (operand_equal_p (second_cond_op0, second_stmt_oprnd0, 0)
+          && operand_equal_p (second_cond_op1, second_stmt_oprnd1, 0))
+        {
+          pos_stmt = first_stmt;
+          min_max_stmt = second_stmt;
+          pos_stmt_loop_op = first_loop_op;
+        }
+      else
+        return false;
+    }
+
+  /* Analyze the position stmt. We expect it to be either induction or
+     induction plus constant.  */
+  loop_op_def_stmt = SSA_NAME_DEF_STMT (pos_stmt_loop_op);
+
+  if (!flow_bb_inside_loop_p (loop, gimple_bb (loop_op_def_stmt)))
+    return false;
+
+  if (gimple_code (loop_op_def_stmt) == GIMPLE_PHI)
+    {
+      if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (loop_op_def_stmt))
+          != vect_induction_def)
+        return false;
+    }
+  else
+    {
+      if (!is_gimple_assign (loop_op_def_stmt))
+        return false;
+
+      if (get_gimple_rhs_class (gimple_assign_rhs_code (loop_op_def_stmt))
+           == GIMPLE_UNARY_RHS)
+        def = gimple_assign_rhs1 (loop_op_def_stmt);
+      else
+        {
+          tree op1, op2;
+
+          if (get_gimple_rhs_class (gimple_assign_rhs_code (loop_op_def_stmt))
+               != GIMPLE_BINARY_RHS
+              || gimple_assign_rhs_code (loop_op_def_stmt) != PLUS_EXPR)
+            return false;
+
+          op1 = gimple_assign_rhs1 (loop_op_def_stmt);
+          op2 = gimple_assign_rhs2 (loop_op_def_stmt);
+
+          if (TREE_CONSTANT (op1))
+            def = op2;
+          else
+            {
+              if (TREE_CONSTANT (op2))
+                def = op1;
+              else
+                return false;
+            }
+        }
+
+      if (TREE_CODE (def) != SSA_NAME)
+        return false;
+
+      def_stmt = SSA_NAME_DEF_STMT (def);
+      if (!flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))
+          || gimple_code (def_stmt) != GIMPLE_PHI
+          || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def_stmt))
+              != vect_induction_def)
+         return false;
+    }
+
+  /* Pattern detected. Create a stmt to be used to replace the pattern.  */
+  if (first_code == GT_EXPR || first_code == GE_EXPR)
+    code = phi_def_index ? MAX_EXPR : MIN_EXPR;
+  else
+    code = phi_def_index ? MIN_EXPR : MAX_EXPR;
+
+  result = gimple_assign_lhs (min_max_stmt);
+  new_stmt = gimple_build_assign_with_ops (code, result,
+                          TREE_OPERAND (gimple_assign_rhs1 (min_max_stmt), 1),
+                          TREE_OPERAND (gimple_assign_rhs1 (min_max_stmt), 2));
+  gsi = gsi_for_stmt (pos_stmt);
+  gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+  SSA_NAME_DEF_STMT (result) = new_stmt;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    {
+      fprintf (vect_dump, "Detected min/max location pattern:\nmin/max stmt ");
+      print_gimple_stmt (vect_dump, min_max_stmt, 0, TDF_SLIM);
+      fprintf (vect_dump, "\nlocation stmt ");
+      print_gimple_stmt (vect_dump, pos_stmt, 0, TDF_SLIM);
+      fprintf (vect_dump, "\nCreated stmt: ");
+      print_gimple_stmt (vect_dump, new_stmt, 0, TDF_SLIM);
+    }
+
+  /* Mark the stmts that are involved in the pattern. */
+  set_vinfo_for_stmt (new_stmt,
+                      new_stmt_vec_info (new_stmt, loop_info, NULL));
+  new_stmt_info = vinfo_for_stmt (new_stmt);
+
+  pos_stmt_info = vinfo_for_stmt (pos_stmt);
+  minmax_stmt_info = vinfo_for_stmt (min_max_stmt);
+
+  STMT_VINFO_DEF_TYPE (new_stmt_info) = STMT_VINFO_DEF_TYPE (minmax_stmt_info);
+  STMT_VINFO_VECTYPE (new_stmt_info) = STMT_VINFO_VECTYPE (minmax_stmt_info);
+
+  STMT_VINFO_IN_PATTERN_P (minmax_stmt_info) = true;
+  STMT_VINFO_COMPOUND_PATTERN (new_stmt_info) = minmax_stmt;
+  STMT_VINFO_COMPOUND_PATTERN (pos_stmt_info) = minmax_loc_stmt;
+  STMT_VINFO_RELATED_STMT (new_stmt_info) = min_max_stmt;
+  STMT_VINFO_RELATED_STMT (minmax_stmt_info) = new_stmt;
+  STMT_VINFO_RELATED_STMT (pos_stmt_info) = new_stmt;
+
+  return true;
+}
+
+/* Detect patterns consisting of two more statements to be vectorized.
+   Currently the only supported pattern is min/max location.  */
+
+void
+vect_compound_pattern_recog (unsigned int nargs, ...)
+{
+  unsigned int j;
+  va_list args;
+  bool detected = false;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "=== vect_compound_pattern_recog ===");
+
+  /* Scan over all generic vect_recog__compound_xxx_pattern functions.  */
+  for (j = 0; j < NUM_COMPOUND_PATTERNS; j++)
+    {
+      va_start (args, nargs);
+      detected = (* vect_recog_compound_func_ptrs[j]) (nargs, args);
+      va_end (args);
+      if (detected)
+        break;
+    }
+}
+
Index: target-def.h
===================================================================
--- target-def.h	(revision 161484)
+++ target-def.h	(working copy)
@@ -431,7 +431,7 @@
   hook_bool_tree_tree_true
 #define TARGET_SUPPORT_VECTOR_MISALIGNMENT \
   default_builtin_support_vector_misalignment
-
+#define TARGET_VECTORIZE_BUILTIN_VEC_CMP 0   
 
 #define TARGET_VECTORIZE                                                \
   {									\
@@ -444,7 +444,8 @@
     TARGET_VECTOR_ALIGNMENT_REACHABLE,                                  \
     TARGET_VECTORIZE_BUILTIN_VEC_PERM,					\
     TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK,				\
-    TARGET_SUPPORT_VECTOR_MISALIGNMENT					\
+    TARGET_SUPPORT_VECTOR_MISALIGNMENT,					\
+    TARGET_VECTORIZE_BUILTIN_VEC_CMP                                    \
   }
 
 #define TARGET_DEFAULT_TARGET_FLAGS 0
Index: tree-vect-stmts.c
===================================================================
--- tree-vect-stmts.c	(revision 161484)
+++ tree-vect-stmts.c	(working copy)
@@ -271,8 +271,10 @@ process_use (gimple stmt, tree use, loop
   /* case 2: A reduction phi (STMT) defined by a reduction stmt (DEF_STMT).
      DEF_STMT must have already been processed, because this should be the
      only way that STMT, which is a reduction-phi, was put in the worklist,
-     as there should be no other uses for DEF_STMT in the loop.  So we just
-     check that everything is as expected, and we are done.  */
+     as there should be no other uses for DEF_STMT in the loop, unless it is
+     min/max location pattern.  So we just check that everything is as
+     as expected, and mark the min/max stmt of the location pattern stmt as
+     used by reduction (it is used by the reduction of location).  */
   dstmt_vinfo = vinfo_for_stmt (def_stmt);
   bb = gimple_bb (stmt);
   if (gimple_code (stmt) == GIMPLE_PHI
@@ -283,11 +285,22 @@ process_use (gimple stmt, tree use, loop
     {
       if (vect_print_dump_info (REPORT_DETAILS))
 	fprintf (vect_dump, "reduc-stmt defining reduc-phi in the same nest.");
+
+      /* Compound reduction pattern: is used by reduction.  */
+      if (STMT_VINFO_COMPOUND_PATTERN (dstmt_vinfo))
+        {
+          relevant = vect_used_by_reduction;
+          vect_mark_relevant (worklist, def_stmt, relevant, live_p);
+          return true;
+        }
+
       if (STMT_VINFO_IN_PATTERN_P (dstmt_vinfo))
 	dstmt_vinfo = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (dstmt_vinfo));
+
       gcc_assert (STMT_VINFO_RELEVANT (dstmt_vinfo) < vect_used_by_reduction);
-      gcc_assert (STMT_VINFO_LIVE_P (dstmt_vinfo)
-		  || STMT_VINFO_RELEVANT (dstmt_vinfo) > vect_unused_in_scope);
+      gcc_assert (STMT_VINFO_LIVE_P (dstmt_vinfo) 
+		  || STMT_VINFO_RELEVANT (dstmt_vinfo) > vect_unused_in_scope
+		  || STMT_VINFO_COMPOUND_PATTERN (dstmt_vinfo));
       return true;
     }
 
@@ -481,7 +494,8 @@ vect_mark_stmts_to_be_vectorized (loop_v
 	          break;
 
 	        case vect_used_by_reduction:
-	          if (gimple_code (stmt) == GIMPLE_PHI)
+	          if (gimple_code (stmt) == GIMPLE_PHI
+                      || STMT_VINFO_COMPOUND_PATTERN (stmt_vinfo))
                     break;
   	          /* fall through */
 
@@ -3873,6 +3887,106 @@ vect_is_simple_cond (tree cond, loop_vec
   return true;
 }
 
+/* Create a sequence of statements that vectorizes min/max location pattern
+   either inside the loop body, or in reduction epilogue. The technique used
+   here was taken from "Multimedia vectorization of floating-point MIN/MAX 
+   reductions" by A.J.C.Bik, X.Tian and M.B.Girkar, 
+   http://portal.acm.org/citation.cfm?id=1145765. 
+   Vectorized loop (maxloc, first index):
+     vcx[0:vl-1:1] = | x |..| x |;  - vector of max values
+     vck[0:vl-1:1] = | k |..| k |;  - vector of positions
+     ind[0:vl-1:1] = |vl-1|..| 0 |; 
+     inc[0:vl-1:1] = | vl |..| vl |; 
+     for (i = 0; i < N; i += vl) { 
+       msk[0:vl-1:1] = (a[i:i+vl-1:1] > vcx[0:vl-1:1]); 
+       vck[0:vl-1:1] = (ind[0:vl-1:1] & msk[0:vl-1:1]) | 
+                       (vck[0:vl-1:1] & !msk[0:vl-1:1]); 
+       vcx[0:vl-1:1] = VMAX(vcx[0:vl-1:1], a[i:i+vl-1:1]); 
+       ind[0:vl-1:1] += inc[0:vl-1:1]; 
+     } 
+     x = HMAX(vcx[0:vl-1:1]);       - scalar maximum extraction
+     msk[0:vl-1:1] = (vcx[0:vl-1:1] == |x|..|x|); 
+     vck[0:vl-1:1] = (vck[0:vl-1:1] & msk[0:vl-1:1]) | 
+                     (|MaxInt|..|MaxInt| & !msk[0:vl-1:1]); 
+     k = HMIN(vck[0:vl-1:1]);       - first position extraction
+
+   In this function we generate:
+    MASK = CODE (COMPARE_OPRND1, COMPARE_OPRND2)
+    VEC_DEST = (VEC_OPRND1 & MASK) | (VEC_OPRND2 & !MASK)  
+
+   When called from vectorizable_condition(), the loop body code is generated.
+   When called from vect_create_epilog_for_reduction(), the function generates
+   the code for scalar extraction in the reduction epilogue. 
+
+   The return value is the last statement in the above sequence.  */
+
+gimple
+vectorize_minmax_location_pattern (gimple stmt, gimple_stmt_iterator *gsi,
+                                   enum tree_code code,
+                                   tree compare_oprnd1, tree compare_oprnd2,
+                                   tree vec_oprnd1, tree vec_oprnd2)
+{
+  tree mask_type, builtin_decl, vec_dest, new_temp, vect_mask;
+  tree and_res1, and_res2, and_dest1, and_dest2, tmp, not_mask, mask, tmp_mask;
+  gimple mask_stmt, new_stmt;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree scalar_dest = gimple_assign_lhs (stmt);
+  gimple related = STMT_VINFO_RELATED_STMT (stmt_info);
+  tree related_lhs = gimple_assign_lhs (related);
+  tree comparison_type = get_vectype_for_scalar_type (TREE_TYPE (related_lhs));
+
+  /* Create mask: MASK = CODE (COMPARE_OPRND1, COMPARE_OPRND2).  */ 
+  builtin_decl = targetm.vectorize.builtin_vect_compare (code,
+                                                  comparison_type, &mask_type);
+  vect_mask = vect_create_destination_var (related_lhs, mask_type);
+  mask_stmt = gimple_build_call (builtin_decl, 2, compare_oprnd1, 
+                                 compare_oprnd2);
+  tmp_mask = make_ssa_name (vect_mask, mask_stmt);
+  gimple_call_set_lhs (mask_stmt, tmp_mask);
+  vect_finish_stmt_generation (stmt, mask_stmt, gsi);
+
+  /* Convert the mask to VECTYPE.  */
+  vect_mask = vect_create_destination_var (scalar_dest, vectype);
+  mask_stmt = gimple_build_assign (vect_mask, fold_build1 (VIEW_CONVERT_EXPR, 
+                                                           vectype, tmp_mask));
+  mask = make_ssa_name (vect_mask, mask_stmt);
+  gimple_assign_set_lhs (mask_stmt, mask);
+  vect_finish_stmt_generation (stmt, mask_stmt, gsi);
+
+  /* Create: VEC_DEST = (VEC_OPRND1 & MASK) | (VEC_OPRND2 & !MASK).  */ 
+  and_dest1 = vect_create_destination_var (scalar_dest, vectype);
+  and_dest2 = vect_create_destination_var (scalar_dest, vectype);
+  vec_dest = vect_create_destination_var (scalar_dest, vectype);
+
+  tmp = build2 (BIT_AND_EXPR, vectype, vec_oprnd1, mask);
+  new_stmt = gimple_build_assign (and_dest1, tmp);
+  and_res1 = make_ssa_name (and_dest1, new_stmt);
+  gimple_assign_set_lhs (new_stmt, and_res1);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+  tmp = build1 (BIT_NOT_EXPR, vectype, mask);
+  new_stmt = gimple_build_assign (vec_dest, tmp);
+  not_mask = make_ssa_name (vec_dest, new_stmt);
+  gimple_assign_set_lhs (new_stmt, not_mask);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+  tmp = build2 (BIT_AND_EXPR, vectype, vec_oprnd2, not_mask);
+  new_stmt = gimple_build_assign (and_dest2, tmp);
+  and_res2 = make_ssa_name (and_dest2, new_stmt);
+  gimple_assign_set_lhs (new_stmt, and_res2);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+  vec_dest = vect_create_destination_var (scalar_dest, vectype);
+  tmp = build2 (BIT_IOR_EXPR, vectype, and_res1, and_res2);
+  new_stmt = gimple_build_assign (vec_dest, tmp);
+  new_temp = make_ssa_name (vec_dest, new_stmt);
+  gimple_assign_set_lhs (new_stmt, new_temp);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+  return new_stmt;
+}
+
 /* vectorizable_condition.
 
    Check if STMT is conditional modify expression that can be vectorized.
@@ -3884,11 +3998,16 @@ vect_is_simple_cond (tree cond, loop_vec
    to be used at REDUC_INDEX (in then clause if REDUC_INDEX is 1, and in
    else caluse if it is 2).
 
+   In min/max location pattern, reduction defs are used in both condition part
+   and then/else clause. In that case COND_REDUC_DEF contains such vector def,
+   and COND_REDUC_INDEX specifies its place in the condition.
+
    Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
 
 bool
 vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
-			gimple *vec_stmt, tree reduc_def, int reduc_index)
+			gimple *vec_stmt, tree reduc_def, int reduc_index,
+                        tree cond_reduc_def, int cond_reduc_index) 
 {
   tree scalar_dest = NULL_TREE;
   tree vec_dest = NULL_TREE;
@@ -3906,6 +4025,7 @@ vectorizable_condition (gimple stmt, gim
   int nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
   enum tree_code code;
+  tree comparison_type, mask_type;
 
   /* FORNOW: unsupported in basic block SLP.  */
   gcc_assert (loop_vinfo);
@@ -3914,20 +4034,23 @@ vectorizable_condition (gimple stmt, gim
   if (ncopies > 1)
     return false; /* FORNOW */
 
-  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+  if (!STMT_VINFO_RELEVANT_P (stmt_info)
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
     return false;
 
   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
-      && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+      && !((STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+            || STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
            && reduc_def))
-    return false;
+    return false;  
 
   /* FORNOW: SLP not supported.  */
   if (STMT_SLP_TYPE (stmt_info))
     return false;
 
   /* FORNOW: not yet supported.  */
-  if (STMT_VINFO_LIVE_P (stmt_info))
+  if (STMT_VINFO_LIVE_P (stmt_info) 
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info)) 
     {
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "value used after loop.");
@@ -3955,7 +4078,10 @@ vectorizable_condition (gimple stmt, gim
   /* We do not handle two different vector types for the condition
      and the values.  */
   if (!types_compatible_p (TREE_TYPE (TREE_OPERAND (cond_expr, 0)),
-			   TREE_TYPE (vectype)))
+			   TREE_TYPE (vectype))
+      && !(STMT_VINFO_COMPOUND_PATTERN (stmt_info)
+           && TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (cond_expr, 0)))
+               == TYPE_SIZE_UNIT (TREE_TYPE (vectype))))
     return false;
 
   if (TREE_CODE (then_clause) == SSA_NAME)
@@ -3985,42 +4111,77 @@ vectorizable_condition (gimple stmt, gim
 
   vec_mode = TYPE_MODE (vectype);
 
-  if (!vec_stmt)
+  comparison_type = 
+         get_vectype_for_scalar_type (TREE_TYPE (TREE_OPERAND (cond_expr, 0)));
+
+  /* Check that min/max location pattern is supported, i.e., the relevant 
+     vector comparisons exist (including EQ_EXPR for reduction epilogue).  */
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt
+      && (!targetm.vectorize.builtin_vect_compare
+          || !targetm.vectorize.builtin_vect_compare (TREE_CODE (cond_expr),
+                                                   comparison_type, &mask_type)
+          || !targetm.vectorize.builtin_vect_compare (EQ_EXPR, comparison_type,
+                                                      &mask_type)))
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "unsupported comparison");
+
+      return false;
+    }
+
+  if (!vec_stmt) 
     {
       STMT_VINFO_TYPE (stmt_info) = condition_vec_info_type;
       return expand_vec_cond_expr_p (TREE_TYPE (op), vec_mode);
     }
 
-  /* Transform */
+  /* Transform.  */
 
   /* Handle def.  */
   scalar_dest = gimple_assign_lhs (stmt);
   vec_dest = vect_create_destination_var (scalar_dest, vectype);
 
   /* Handle cond expr.  */
-  vec_cond_lhs =
-    vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0), stmt, NULL);
-  vec_cond_rhs =
-    vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1), stmt, NULL);
+  if (cond_reduc_index == 1)
+    vec_cond_lhs = cond_reduc_def;
+  else
+    vec_cond_lhs = 
+      vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0), stmt, NULL);
+
+  if (cond_reduc_index == 2)
+    vec_cond_rhs = cond_reduc_def;
+  else
+    vec_cond_rhs = 
+      vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1), stmt, NULL);
+
   if (reduc_index == 1)
     vec_then_clause = reduc_def;
   else
     vec_then_clause = vect_get_vec_def_for_operand (then_clause, stmt, NULL);
+
   if (reduc_index == 2)
     vec_else_clause = reduc_def;
   else
     vec_else_clause = vect_get_vec_def_for_operand (else_clause, stmt, NULL);
 
   /* Arguments are ready. Create the new vector stmt.  */
-  vec_compare = build2 (TREE_CODE (cond_expr), vectype,
-			vec_cond_lhs, vec_cond_rhs);
-  vec_cond_expr = build3 (VEC_COND_EXPR, vectype,
-			  vec_compare, vec_then_clause, vec_else_clause);
-
-  *vec_stmt = gimple_build_assign (vec_dest, vec_cond_expr);
-  new_temp = make_ssa_name (vec_dest, *vec_stmt);
-  gimple_assign_set_lhs (*vec_stmt, new_temp);
-  vect_finish_stmt_generation (stmt, *vec_stmt, gsi);
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt)
+    {
+      *vec_stmt = vectorize_minmax_location_pattern (stmt, gsi,  
+                             TREE_CODE (cond_expr), vec_cond_lhs, vec_cond_rhs,
+                             vec_then_clause, vec_else_clause);
+    }
+  else
+    {
+      vec_compare = build2 (TREE_CODE (cond_expr), vectype, vec_cond_lhs, 
+                            vec_cond_rhs);
+      vec_cond_expr = build3 (VEC_COND_EXPR, vectype, vec_compare, 
+    			      vec_then_clause, vec_else_clause);
+      *vec_stmt = gimple_build_assign (vec_dest, vec_cond_expr);
+      new_temp = make_ssa_name (vec_dest, *vec_stmt);
+      gimple_assign_set_lhs (*vec_stmt, new_temp);
+      vect_finish_stmt_generation (stmt, *vec_stmt, gsi);
+    }
 
   return true;
 }
@@ -4077,7 +4238,8 @@ vect_analyze_stmt (gimple stmt, bool *ne
       case vect_nested_cycle:
          gcc_assert (!bb_vinfo && (relevance == vect_used_in_outer
                      || relevance == vect_used_in_outer_by_reduction
-                     || relevance == vect_unused_in_scope));
+                     || relevance == vect_unused_in_scope
+                     || relevance == vect_used_by_reduction));
          break;
 
       case vect_induction_def:
@@ -4139,7 +4301,7 @@ vect_analyze_stmt (gimple stmt, bool *ne
             || vectorizable_call (stmt, NULL, NULL)
             || vectorizable_store (stmt, NULL, NULL, NULL)
             || vectorizable_reduction (stmt, NULL, NULL, NULL)
-            || vectorizable_condition (stmt, NULL, NULL, NULL, 0));
+            || vectorizable_condition (stmt, NULL, NULL, NULL, 0, NULL, 0)); 
     else
       {
         if (bb_vinfo)
@@ -4280,7 +4442,7 @@ vect_transform_stmt (gimple stmt, gimple
 
     case condition_vec_info_type:
       gcc_assert (!slp_node);
-      done = vectorizable_condition (stmt, gsi, &vec_stmt, NULL, 0);
+      done = vectorizable_condition (stmt, gsi, &vec_stmt, NULL, 0, NULL, 0); 
       gcc_assert (done);
       break;
 
@@ -4418,6 +4580,7 @@ new_stmt_vec_info (gimple stmt, loop_vec
   STMT_VINFO_VEC_STMT (res) = NULL;
   STMT_VINFO_VECTORIZABLE (res) = true;
   STMT_VINFO_IN_PATTERN_P (res) = false;
+  STMT_VINFO_COMPOUND_PATTERN (res) = not_in_pattern;
   STMT_VINFO_RELATED_STMT (res) = NULL;
   STMT_VINFO_DATA_REF (res) = NULL;
 
@@ -4436,6 +4599,7 @@ new_stmt_vec_info (gimple stmt, loop_vec
   STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
   STMT_VINFO_INSIDE_OF_LOOP_COST (res) = 0;
   STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
+  STMT_VINFO_REDUC_SCALAR_RES_STMT (res) = NULL;
   STMT_SLP_TYPE (res) = loop_vect;
   DR_GROUP_FIRST_DR (res) = NULL;
   DR_GROUP_NEXT_DR (res) = NULL;
Index: config/rs6000/rs6000-builtin.def
===================================================================
--- config/rs6000/rs6000-builtin.def	(revision 161484)
+++ config/rs6000/rs6000-builtin.def	(working copy)
@@ -73,6 +73,8 @@ RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPGTSH,
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPGTUW,		RS6000_BTC_CONST)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPGTSW,		RS6000_BTC_CONST)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPGTFP,		RS6000_BTC_FP_PURE)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPLTFP,		RS6000_BTC_FP_PURE)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPLEFP,		RS6000_BTC_FP_PURE)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VEXPTEFP,		RS6000_BTC_FP_PURE)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VLOGEFP,			RS6000_BTC_FP_PURE)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VMADDFP,			RS6000_BTC_FP_PURE)
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 161484)
+++ config/rs6000/rs6000.c	(working copy)
@@ -1056,6 +1056,7 @@ static bool rs6000_builtin_support_vecto
 							machine_mode,
 							const_tree,
 							int, bool);
+static tree rs6000_builtin_vect_compare (unsigned int, tree, tree *);
 
 static void def_builtin (int, const char *, tree, int);
 static bool rs6000_vector_alignment_reachable (const_tree, bool);
@@ -1448,6 +1449,8 @@ static const struct attribute_spec rs600
   rs6000_builtin_support_vector_misalignment
 #undef TARGET_VECTOR_ALIGNMENT_REACHABLE
 #define TARGET_VECTOR_ALIGNMENT_REACHABLE rs6000_vector_alignment_reachable
+#undef TARGET_VECTORIZE_BUILTIN_VEC_CMP
+#define TARGET_VECTORIZE_BUILTIN_VEC_CMP rs6000_builtin_vect_compare
 
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
@@ -3371,6 +3374,45 @@ rs6000_builtin_vec_perm (tree type, tree
   return d;
 }
 
+/* Implement targetm.vectorize.builtin_vect_compare.  */
+tree
+rs6000_builtin_vect_compare (unsigned int tcode, tree type, tree *return_type)
+{
+  enum tree_code code = (enum tree_code) tcode;
+
+  if (!TARGET_ALTIVEC)
+    return NULL_TREE;
+
+  switch (TYPE_MODE (type))
+    {
+    case V4SFmode:
+      *return_type = V4SF_type_node;
+      switch (code)
+        {
+          case GT_EXPR:
+            return rs6000_builtin_decls[ALTIVEC_BUILTIN_VCMPGTFP];
+            
+          case LT_EXPR:
+            return rs6000_builtin_decls[ALTIVEC_BUILTIN_VCMPLTFP];
+
+          case GE_EXPR:
+            return rs6000_builtin_decls[ALTIVEC_BUILTIN_VCMPGEFP];
+
+          case LE_EXPR:
+            return rs6000_builtin_decls[ALTIVEC_BUILTIN_VCMPLEFP];
+
+          case EQ_EXPR:
+            return rs6000_builtin_decls[ALTIVEC_BUILTIN_VCMPEQFP];
+
+          default:
+            return NULL_TREE;
+        }
+
+    default:
+      return NULL_TREE;
+    }
+}
+
 /* Handle generic options of the form -mfoo=yes/no.
    NAME is the option name.
    VALUE is the option value.
@@ -9182,6 +9224,8 @@ static struct builtin_description bdesc_
   { MASK_ALTIVEC, CODE_FOR_vector_gtuv4si, "__builtin_altivec_vcmpgtuw", ALTIVEC_BUILTIN_VCMPGTUW },
   { MASK_ALTIVEC, CODE_FOR_vector_gtv4si, "__builtin_altivec_vcmpgtsw", ALTIVEC_BUILTIN_VCMPGTSW },
   { MASK_ALTIVEC, CODE_FOR_vector_gtv4sf, "__builtin_altivec_vcmpgtfp", ALTIVEC_BUILTIN_VCMPGTFP },
+  { MASK_ALTIVEC, CODE_FOR_altivec_vcmpltfp, "__builtin_altivec_vcmpltfp", ALTIVEC_BUILTIN_VCMPLTFP },
+  { MASK_ALTIVEC, CODE_FOR_altivec_vcmplefp, "__builtin_altivec_vcmplefp", ALTIVEC_BUILTIN_VCMPLEFP },
   { MASK_ALTIVEC, CODE_FOR_altivec_vctsxs, "__builtin_altivec_vctsxs", ALTIVEC_BUILTIN_VCTSXS },
   { MASK_ALTIVEC, CODE_FOR_altivec_vctuxs, "__builtin_altivec_vctuxs", ALTIVEC_BUILTIN_VCTUXS },
   { MASK_ALTIVEC, CODE_FOR_umaxv16qi3, "__builtin_altivec_vmaxub", ALTIVEC_BUILTIN_VMAXUB },
Index: config/rs6000/altivec.md
===================================================================
--- config/rs6000/altivec.md	(revision 161484)
+++ config/rs6000/altivec.md	(working copy)
@@ -144,6 +144,8 @@
    (UNSPEC_VUPKHU_V4SF  326)
    (UNSPEC_VUPKLU_V4SF  327)
    (UNSPEC_VNMSUBFP	328)
+   (UNSPEC_VCMPLTFP     329)
+   (UNSPEC_VCMPLEFP     330)
 ])
 
 (define_constants
@@ -2802,3 +2804,22 @@
   emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx));
   DONE;
 }")
+
+
+(define_insn "altivec_vcmpltfp"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+        (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")
+                  (match_operand:V4SF 2 "register_operand" "v")]
+                   UNSPEC_VCMPLTFP))]
+  "TARGET_ALTIVEC"
+  "vcmpgtfp %0,%2,%1"
+  [(set_attr "type" "veccmp")])
+
+(define_insn "altivec_vcmplefp"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+        (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")
+                  (match_operand:V4SF 2 "register_operand" "v")]
+                   UNSPEC_VCMPLEFP))]
+  "TARGET_ALTIVEC"
+  "vcmpgefp %0,%2,%1"
+  [(set_attr "type" "veccmp")])
Index: tree-vect-slp.c
===================================================================
--- tree-vect-slp.c	(revision 161484)
+++ tree-vect-slp.c	(working copy)
@@ -146,6 +146,18 @@ vect_get_and_check_slp_defs (loop_vec_in
 	  return false;
 	}
 
+      if (def_stmt && vinfo_for_stmt (def_stmt)
+          && STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (def_stmt))) 
+        {
+          if (vect_print_dump_info (REPORT_SLP))
+            {
+              fprintf (vect_dump, "Build SLP failed: compound pattern ");
+              print_gimple_stmt (vect_dump, def_stmt, 0, TDF_SLIM);
+            }
+
+          return false;
+        }
+
       /* Check if DEF_STMT is a part of a pattern in LOOP and get the def stmt
          from the pattern. Check that all the stmts of the node are in the
          pattern.  */

[-- Attachment #3: minloc-tests.txt --]
[-- Type: text/plain, Size: 12162 bytes --]

Index: gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c
===================================================================
--- gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c	(revision 0)
+++ gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c	(revision 0)
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+__attribute__ ((noinline)) 
+int foo ()
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = 12;
+
+  for (i = 0; i < N; i++)
+    if (arr[i] > limit)
+      {
+        pos = i + 1;
+        limit = arr[i];
+      }
+
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr [2] = N + 15.8;
+
+  pos = foo ();
+  if (pos != 3)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_cmp } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c
===================================================================
--- gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c	(revision 0)
+++ gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c	(revision 0)
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+__attribute__ ((noinline)) 
+int foo ()
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = N+N;
+
+  for (i = 0; i < N; i++)
+    if (arr[i] < limit)
+      {
+        pos = i + 1;
+        limit = arr[i];
+      }
+
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr [2] = -5.8;
+
+  pos = foo ();
+  if (pos != 3)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_cmp } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c
===================================================================
--- gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c	(revision 0)
+++ gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c	(revision 0)
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+__attribute__ ((noinline)) 
+int foo (unsigned int n, float *min)
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = N+N;
+
+  for (i = 0; i < N; i++)
+    if (arr[i] < limit)
+      {
+        pos = i + 1;
+        limit = arr[i];
+      }
+
+  *min = limit;
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+  float min;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr[2] = -5.8;
+
+  pos = foo (N, &min);
+  if (pos != 3 || min != arr[2])
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_cmp } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c
===================================================================
--- gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c	(revision 0)
+++ gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c	(revision 0)
@@ -0,0 +1,55 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+short a[N];
+
+/* Loop with multiple types - currently not supported.  */
+__attribute__ ((noinline)) 
+int foo (unsigned int n, float *min, short x)
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = N+N;
+
+  for (i = 0; i < n; i++)
+    {
+      if (arr[i] < limit)
+        {
+          limit = arr[i];
+          pos = i + 1;
+        }
+
+      a[i] = x;
+    }
+
+  *min = limit;
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+  float min;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr[2] = -5.8;
+
+  pos = foo (N, &min, 6);
+  if (pos != 3 || min != arr[2])
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c
===================================================================
--- gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c	(revision 0)
+++ gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c	(revision 0)
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+#define MAX_VALUE N+N
+float arr[N];
+
+/* Not minloc pattern - different conditions.  */
+__attribute__ ((noinline)) 
+int foo ()
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = MAX_VALUE;
+
+  for (i = 0; i < N; i++)
+    {
+      if (arr[i] < limit)
+        pos = i + 1;
+
+      if (arr[i] > limit)
+        limit = arr[i];
+    }
+
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr[2] = -5.8;
+
+  pos = foo ();
+
+  if (pos != N)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c
===================================================================
--- gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c	(revision 0)
+++ gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c	(revision 0)
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+/* Not minloc pattern: position is not induction.  */
+__attribute__ ((noinline)) 
+int foo (unsigned int n, float *min)
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = N+N;
+
+  for (i = 0; i < n; i++)
+    if (arr[i] < limit)
+      {
+        pos = 5;
+        limit = arr[i];
+      }
+
+  *min = limit;
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+  float min;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr[2] = -5.8;
+
+  pos = foo (N, &min);
+  if (pos != 5 || min != arr[2])
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c
===================================================================
--- gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c	(revision 0)
+++ gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c	(revision 0)
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+/* Position and minimum are of types of different sizes - not supported.  */
+__attribute__ ((noinline)) 
+int foo (unsigned short n, float *min)
+{
+  unsigned short pos = 1;
+  unsigned short i;
+  float limit = N+N;
+
+  for (i = 0; i < n; i++)
+    if (arr[i] < limit)
+      {
+        pos = i + 1;
+        limit = arr[i];
+      }
+
+  *min = limit;
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+  float min;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr[2] = -5.8;
+
+  pos = foo (N, &min);
+  if (pos != 3 || min != arr[2])
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc.dg/vect/vect.exp
===================================================================
--- gcc.dg/vect/vect.exp	(revision 161484)
+++ gcc.dg/vect/vect.exp	(working copy)
@@ -158,9 +158,27 @@ dg-runtest [lsort [glob -nocomplain $src
 # -ffast-math tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-ffast-math"
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-*.\[cS\]]]  \
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-pr*.\[cS\]]]  \
 	"" $DEFAULT_VECTCFLAGS
 
+# -ffast-math tests
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-ffast-math"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-slp*.\[cS\]]]  \
+        "" $DEFAULT_VECTCFLAGS
+
+# -ffast-math tests
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-ffast-math"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-vect*.\[cS\]]]  \
+        "" $DEFAULT_VECTCFLAGS
+
+# -ffast-math and -fno-tree-pre tests
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-ffast-math" "-fno-tree-pre"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-no-pre*.\[cS\]]]  \
+        "" $DEFAULT_VECTCFLAGS
+
 # -fno-math-errno tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fno-math-errno"
Index: gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c
===================================================================
--- gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c	(revision 0)
+++ gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c	(revision 0)
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N][N];
+
+/* Double reduction.  */
+__attribute__ ((noinline)) 
+int foo ()
+{
+  unsigned int pos = 1;
+  unsigned int i, j;
+  float limit = N+N;
+
+  for (j = 0; j < N; j++)
+    for (i = 0; i < N; i++)
+      if (arr[i][j] < limit)
+        {
+          pos = i + 1;
+          limit = arr[i][j];
+        }
+
+  return pos;
+}
+
+int main (void)
+{
+  int i, j, pos;
+
+  check_vect();
+
+  for (j = 0; j < N; j++)
+    for (i = 0; i < N; i++)
+      arr[j][i] = (float)(i+j+1);
+
+  arr[8][2] = 0;
+  pos = foo ();
+  if (pos != 9)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_cmp } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c
===================================================================
--- gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c	(revision 0)
+++ gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c	(revision 0)
@@ -0,0 +1,48 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+__attribute__ ((noinline)) 
+int foo ()
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = 7;
+
+  for (i = 0; i < N; i++)
+    if (arr[i] >= limit)
+      {
+        pos = i + 1;
+        limit = arr[i];
+      }
+
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr [2] = N + 5.8;
+  arr [12] = N + 5.8;
+
+  pos = foo ();
+  if (pos != 13)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_cmp } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: lib/target-supports.exp
===================================================================
--- lib/target-supports.exp	(revision 161484)
+++ lib/target-supports.exp	(working copy)
@@ -2839,6 +2839,23 @@ proc check_effective_target_vect_strided
     return $et_vect_strided_wide_saved
 }
 
+# Return 1 if the target supports vector comparison, 0 otherwise.
+proc check_effective_target_vect_cmp { } {
+    global et_vect_cmp_saved
+
+    if [info exists et_vect_cmp_saved] {
+        verbose "check_effective_target_vect_cmp: using cached result" 2
+    } else {
+        set et_vect_cmp_saved 0
+        if { [istarget powerpc*-*-*] } {
+           set et_vect_cmp_saved 1
+        }
+    }
+
+    verbose "check_effective_target_vect_cmp: returning $et_vect_cmp_saved" 2
+    return $et_vect_cmp_saved
+}
+
 # Return 1 if the target supports section-anchors
 
 proc check_effective_target_section_anchors { } {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [patch] Support vectorization of min/max location pattern
  2010-07-01  8:01 [RFC] [patch] Support vectorization of min/max location pattern Ira Rosen
@ 2010-07-06  7:15 ` Ira Rosen
  2010-07-07 20:43   ` Richard Henderson
  2010-11-19 15:53   ` [RFC] [patch] Support vectorization of min/max location pattern H.J. Lu
  0 siblings, 2 replies; 16+ messages in thread
From: Ira Rosen @ 2010-07-06  7:15 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7472 bytes --]

gcc-patches-owner@gcc.gnu.org wrote on 01/07/2010 11:00:50 AM:

> Hi,
>
> This patch adds vectorization support of min/max location pattern:
>
>   for (i = 0; i < N; i++)
>     if (arr[i] < limit)
>       {
>         pos = i + 1;
>         limit = arr[i];
>       }
>
> The recognized pattern is compound of two statements (and is called
> compound pattern):
>
>   # pos_22 = PHI <pos_1(4), 1(2)>
>   # limit_24 = PHI <limit_4(4), 0(2)>
>   ...
>   pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22;
>   limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24;
>
> both statements should be reductions with cond_expr and have the same
> condition part. The min/max statement is expected to be of the form "x op
> y ? x : y" (where op can be >, <, >= or <=), and the location is expected
> to be an induction.
>
> To vectorize min/max location pattern we use a technique described in
> "Multimedia vectorization of floating-point MIN/MAX reductions" by
> A.J.C.Bik, X.Tian and M.B.Girkar,
> http://portal.acm.org/citation.cfm?id=1145765.
>
> Vectorized loop (maxloc, first index):
>      vcx[0:vl-1:1] = | x |..| x |;  - vector of max values
>      vck[0:vl-1:1] = | k |..| k |;  - vector of positions
>      ind[0:vl-1:1] = |vl-1|..| 0 |;
>      inc[0:vl-1:1] = | vl |..| vl |;
>      for (i = 0; i < N; i += vl) {
>        msk[0:vl-1:1] = (a[i:i+vl-1:1] > vcx[0:vl-1:1]);
>        vck[0:vl-1:1] = (ind[0:vl-1:1] & msk[0:vl-1:1]) |
>                        (vck[0:vl-1:1] & !msk[0:vl-1:1]);
>        vcx[0:vl-1:1] = VMAX(vcx[0:vl-1:1], a[i:i+vl-1:1]);
>        ind[0:vl-1:1] += inc[0:vl-1:1];
>      }
>      x = HMAX(vcx[0:vl-1:1]);       - scalar maximum extraction
>      msk[0:vl-1:1] = (vcx[0:vl-1:1] == |x|..|x|);
>      vck[0:vl-1:1] = (vck[0:vl-1:1] & msk[0:vl-1:1]) |
>                      (|MaxInt|..|MaxInt| & !msk[0:vl-1:1]);
>      k = HMIN(vck[0:vl-1:1]);       - first position extraction
>
>
> Vectorization of minloc is supposed to help gas_dyn from Polyhedron as
> discussed in PR 31067.
>
> PRs 44710 and 44711 currently prevent the vectorization. PR 44711 can be
> bypassed by using -fno-tree-pre. I'll wait for a fix of PR 44710 before I
> commit this patch (after I regtest it again).
> Also the case of pos = i; instead of pos = i+1; is not supported since in
> this case the operands are switched, i.e., we get "x op y ? y : x".
>
>
> My main question is the implementation of vector comparisons. I
understand
> that different targets can return different types of results. So instead
of
> defining new tree codes, I used target builtin which also returns the
type
> of the result.
>
> Other comments are welcome too.
>
> Bootstrapped and tested on powerpc64-suse-linux.

Since it looks like nobody objects the use of target builtins for vector
comparison, I am resubmitting an updated patch (the code) for review of
non-vectorizer parts.

Thanks,
Ira


ChangeLog:

      * doc/tm.texi: Regenerate.
      * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_VECT_COMPARE):
      Document.
      * target.def (builtin_vect_compare): Add new builtin.
      * tree-vectorizer.h (enum vect_compound_pattern): New.
      (struct _stmt_vec_info): Add new fields compound_pattern and
      reduc_scalar_result_stmt. Add macros to access them.
      (is_pattern_stmt_p): Return true for compound pattern.
      (vectorizable_condition): Add arguments.
      (vect_recog_compound_func_ptr): New function-pointer type.
      (NUM_COMPOUND_PATTERNS): New.
      (vect_compound_pattern_recog): Declare.
      * tree-vect-loop.c (vect_determine_vectorization_factor): Fix assert
      for compound patterns.
      (vect_analyze_scalar_cycles_1): Fix typo. Detect compound reduction
      patterns. Update comment.
      (vect_analyze_scalar_cycles): Update comment.
      (destroy_loop_vec_info): Update def stmt for the original pattern
      statement.
      (vect_is_simple_reduction_1): Skip compound pattern statements in
      uses check. Add spaces. Skip commutativity and type checks for
      minimum location statement. Fix printings.
      (vect_model_reduction_cost): Add min/max location pattern cost
      computation.
      (vect_create_epilogue_for_compound_pattern): New function.
      (vect_create_epilog_for_reduction): Don't retrieve the original
      statement for compound pattern. Fix comment accordingly. Store the
      result of vector reduction computation in a variable and use it. Call
      vect_create_epilogue_for_compound_pattern (). Check if optab exists
      before using it. Keep the scalar result computation statement. Use
      either exit phi node result or compound pattern result in scalar
      extraction. Don't expect to find an exit phi node for min/max
      statement.
      (vectorizable_reduction): Skip check for uses in loop for compound
      patterns. Don't retrieve the original statement for compound pattern.
      Call vectorizable_condition () with additional parameters. Skip
      reduction code check for compound patterns. Prepare operands for
      min/max location statement vectorization and pass them to
      vectorizable_condition ().
      (vectorizable_live_operation): Return TRUE for compound patterns.
      * tree-vect-patterns.c (vect_recog_min_max_loc_pattern): Declare.
      (vect_recog_compound_func_ptrs): Likewise.
      (vect_recog_min_max_loc_pattern): New function.
      (vect_compound_pattern_recog): Likewise.
      * tree-vect-stmts.c (process_use): Mark compound pattern statements
      as used by reduction.
      (vect_mark_stmts_to_be_vectorized): Allow compound pattern statements
      to be used by reduction.
      (vectorize_minmax_location_pattern): New function.
      (vectorizable_condition): Update comment, add arguments. Skip checks
      irrelevant for compound pattern. Check that vector comparisons are
      supported by the target. Prepare operands using new arguments. Call
      vectorize_minmax_location_pattern().
      (vect_analyze_stmt): Allow nested cycle statements to be used by
      reduction. Call vectorizable_condition () with additional arguments.
      (vect_transform_stmt): Call vectorizable_condition () with additional
      arguments.
      (new_stmt_vec_info): Initialize new fields.
      * config/rs6000/rs6000-builtin.def (ALTIVEC_BUILTIN_VCMPLTFP): New.
      (ALTIVEC_BUILTIN_VCMPLEFP): New.
      * config/rs6000/rs6000.c (rs6000_builtin_vect_compare): New.
      (TARGET_VECTORIZE_BUILTIN_VEC_CMP): Redefine.
      (struct builtin_description bdesc_2arg): Add altivec_vcmpltfp and
      altivec_vcmplefp.
      * config/rs6000/altivec.md (altivec_vcmpltfp): New pattern.
      (altivec_vcmplefp): Likewise.
      * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for compound
      patterns.

(See attached file: minloc-new.txt)

>
> testsuite/ChangeLog:
>
>       * gcc.dg/vect/vect.exp: Define how to run tests named fast-math*.c
>       * lib/target-supports.exp (check_effective_target_vect_cmp): New.
>       * gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test.
>       * gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c: Likewise.
>

[-- Attachment #2: minloc-new.txt --]
[-- Type: text/plain, Size: 68741 bytes --]

Index: doc/tm.texi
===================================================================
--- doc/tm.texi	(revision 161862)
+++ doc/tm.texi	(working copy)
@@ -5753,6 +5753,13 @@ the elements in the vectors should be of
 parameter is true if the memory access is defined in a packed struct.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECT_COMPARE (unsigned @var{code}, tree @var{type}, tree *@var{return_type}) 
+Target builtin that implements vector element-wise comparison.
+The value of @var{code} is one of the enumerators in @code{enum tree_code} and
+specifies comparison operation, @var{type} specifies the type of input vectors.
+The function returns the type of the comparison result in @var{result_type}.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in	(revision 161862)
+++ doc/tm.texi.in	(working copy)
@@ -5753,6 +5753,13 @@ the elements in the vectors should be of
 parameter is true if the memory access is defined in a packed struct.
 @end deftypefn
 
+@hook TARGET_VECTORIZE_BUILTIN_VECT_COMPARE 
+Target builtin that implements vector element-wise comparison.
+The value of @var{code} is one of the enumerators in @code{enum tree_code} and
+specifies comparison operation, @var{type} specifies the type of input vectors.
+The function returns the type of the comparison result in @var{result_type}.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
Index: target.def
===================================================================
--- target.def	(revision 161862)
+++ target.def	(working copy)
@@ -829,6 +829,12 @@ DEFHOOK
  (enum machine_mode mode, const_tree type, int misalignment, bool is_packed),
  default_builtin_support_vector_misalignment)
 
+/* Target builtin that implements vector element-wise comparison.  */
+DEFHOOK
+(builtin_vect_compare,
+ "",
+ tree, (unsigned code, tree type, tree *return_type), NULL)
+
 HOOK_VECTOR_END (vectorize)
 
 #undef HOOK_PREFIX
Index: tree-vectorizer.h
===================================================================
--- tree-vectorizer.h	(revision 161862)
+++ tree-vectorizer.h	(working copy)
@@ -409,6 +409,17 @@ enum slp_vect_type {
   hybrid
 };
 
+/* Compound pattern is a pattern consisting more than one statement that need
+   to be vectorized. Currenty min/max location pattern is the only supported
+   compound pattern. It has two statements: the first statement calculates the 
+   minimum (marked MINMAX_STMT) and the second one calculates the location 
+   (marked MINMAX_LOC_STMT).  */
+enum vect_compound_pattern {
+  not_in_pattern = 0,
+  minmax_stmt,
+  minmax_loc_stmt
+};
+
 
 typedef struct data_reference *dr_p;
 DEF_VEC_P(dr_p);
@@ -425,6 +436,10 @@ typedef struct _stmt_vec_info {
   /* Stmt is part of some pattern (computation idiom)  */
   bool in_pattern_p;
 
+  /* Statement is a part of a compound pattern, i.e., a pattern consisting
+     more than one statement.  */
+  enum vect_compound_pattern compound_pattern;
+
   /* For loads only, if there is a store with the same location, this field is
      TRUE.  */
   bool read_write_dep;
@@ -511,6 +526,10 @@ typedef struct _stmt_vec_info {
   /* The bb_vec_info with respect to which STMT is vectorized.  */
   bb_vec_info bb_vinfo;
 
+  /* The scalar result of vectorized reduction computation generated in
+     reduction epilogue.  */
+  gimple reduc_scalar_result_stmt;
+
   /* Is this statement vectorizable or should it be skipped in (partial)
      vectorization.  */
   bool vectorizable;
@@ -535,6 +554,7 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_DR_ALIGNED_TO(S)        (S)->dr_aligned_to
 
 #define STMT_VINFO_IN_PATTERN_P(S)         (S)->in_pattern_p
+#define STMT_VINFO_COMPOUND_PATTERN(S)     (S)->compound_pattern
 #define STMT_VINFO_RELATED_STMT(S)         (S)->related_stmt
 #define STMT_VINFO_SAME_ALIGN_REFS(S)      (S)->same_align_refs
 #define STMT_VINFO_DEF_TYPE(S)             (S)->def_type
@@ -546,6 +566,7 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_DR_GROUP_SAME_DR_STMT(S)(S)->same_dr_stmt
 #define STMT_VINFO_DR_GROUP_READ_WRITE_DEPENDENCE(S)  (S)->read_write_dep
 #define STMT_VINFO_STRIDED_ACCESS(S)      ((S)->first_dr != NULL)
+#define STMT_VINFO_REDUC_SCALAR_RES_STMT(S) (S)->reduc_scalar_result_stmt
 
 #define DR_GROUP_FIRST_DR(S)               (S)->first_dr
 #define DR_GROUP_NEXT_DR(S)                (S)->next_dr
@@ -642,7 +663,8 @@ is_pattern_stmt_p (stmt_vec_info stmt_in
   related_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
   if (related_stmt
       && (related_stmt_info = vinfo_for_stmt (related_stmt))
-      && STMT_VINFO_IN_PATTERN_P (related_stmt_info))
+      && (STMT_VINFO_IN_PATTERN_P (related_stmt_info)
+          || STMT_VINFO_COMPOUND_PATTERN (related_stmt_info)))
     return true;
 
   return false;
@@ -764,7 +786,9 @@ extern bool vect_transform_stmt (gimple,
 extern void vect_remove_stores (gimple);
 extern bool vect_analyze_stmt (gimple, bool *, slp_tree);
 extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
-                                    tree, int);
+                                    tree, int, tree, int);
+extern gimple vectorize_minmax_location_pattern (gimple, gimple_stmt_iterator*,
+                                       enum tree_code, tree, tree, tree, tree);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
                                 unsigned int *, unsigned int *);
 extern void vect_get_store_cost (struct data_reference *, int, unsigned int *);
@@ -844,8 +868,11 @@ extern void vect_slp_transform_bb (basic
    Additional pattern recognition functions can (and will) be added
    in the future.  */
 typedef gimple (* vect_recog_func_ptr) (gimple, tree *, tree *);
-#define NUM_PATTERNS 4
+typedef bool (* vect_recog_compound_func_ptr) (unsigned int, va_list);
+#define NUM_PATTERNS 4 
+#define NUM_COMPOUND_PATTERNS 1  
 void vect_pattern_recog (loop_vec_info);
+void vect_compound_pattern_recog (unsigned int, ...);
 
 /* In tree-vectorizer.c.  */
 unsigned vectorize_loops (void);
Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c	(revision 161862)
+++ tree-vect-loop.c	(working copy)
@@ -295,7 +295,8 @@ vect_determine_vectorization_factor (loo
 	  else
 	    {
 	      gcc_assert (!STMT_VINFO_DATA_REF (stmt_info)
-			  && !is_pattern_stmt_p (stmt_info));
+			  && (!is_pattern_stmt_p (stmt_info)
+                              || STMT_VINFO_COMPOUND_PATTERN (stmt_info)));
 
 	      scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 	      if (vect_print_dump_info (REPORT_DETAILS))
@@ -444,10 +445,15 @@ static void
 vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, struct loop *loop)
 {
   basic_block bb = loop->header;
-  tree dumy;
+  tree dummy;
   VEC(gimple,heap) *worklist = VEC_alloc (gimple, heap, 64);
   gimple_stmt_iterator gsi;
-  bool double_reduc;
+  bool double_reduc, found, minmax_loc = false;
+  gimple first_cond_stmt = NULL, second_cond_stmt = NULL;
+  gimple first_phi = NULL, second_phi = NULL, phi, use_stmt;
+  int i;
+  imm_use_iterator imm_iter;
+  use_operand_p use_p;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "=== vect_analyze_scalar_cycles ===");
@@ -484,7 +490,8 @@ vect_analyze_scalar_cycles_1 (loop_vec_i
 	}
 
       if (!access_fn
-	  || !vect_is_simple_iv_evolution (loop->num, access_fn, &dumy, &dumy))
+	  || !vect_is_simple_iv_evolution (loop->num, access_fn, &dummy, 
+                                           &dummy)) 
 	{
 	  VEC_safe_push (gimple, heap, worklist, phi);
 	  continue;
@@ -495,8 +502,56 @@ vect_analyze_scalar_cycles_1 (loop_vec_i
       STMT_VINFO_DEF_TYPE (stmt_vinfo) = vect_induction_def;
     }
 
+  /* Detect compound reduction patterns (before reduction detection):  
+     we currently support only min/max location pattern, so we look for two 
+     reduction condition statements.  */
+  for (i = 0; VEC_iterate (gimple, worklist, i, phi); i++)
+    {
+      tree def = PHI_RESULT (phi);
+
+      found = false;
+      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, def)
+        {
+          use_stmt = USE_STMT (use_p);
+          if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
+              && vinfo_for_stmt (use_stmt)
+              && is_gimple_assign (use_stmt)
+              && gimple_assign_rhs_code (use_stmt) == COND_EXPR)
+            {
+              found = true;
+              break;
+            }
+        }
+
+      if (!found)
+        continue;
+
+      if (!first_cond_stmt)
+        {
+          first_cond_stmt = use_stmt;
+          first_phi = phi;
+        }
+      else
+        {
+          if (second_cond_stmt)
+            {
+              /* This one is the third reduction condition statement in the 
+                 loop. This is too confusing, we bail out.  */
+              minmax_loc = false;
+              break;
+            }
+
+          second_cond_stmt = use_stmt;
+          second_phi = phi;
+          minmax_loc = true;
+        }
+    }
+
+  if (minmax_loc)
+    vect_compound_pattern_recog (4, first_phi, first_cond_stmt, 
+                                 second_phi, second_cond_stmt);
 
-  /* Second - identify all reductions and nested cycles.  */
+  /* Identify all reductions and nested cycles.  */
   while (VEC_length (gimple, worklist) > 0)
     {
       gimple phi = VEC_pop (gimple, worklist);
@@ -595,11 +650,9 @@ vect_analyze_scalar_cycles (loop_vec_inf
   /* When vectorizing an outer-loop, the inner-loop is executed sequentially.
      Reductions in such inner-loop therefore have different properties than
      the reductions in the nest that gets vectorized:
-     1. When vectorized, they are executed in the same order as in the original
-        scalar loop, so we can't change the order of computation when
-        vectorizing them.
-     2. FIXME: Inner-loop reductions can be used in the inner-loop, so the
-        current checks are too strict.  */
+     when vectorized, they are executed in the same order as in the original
+     scalar loop, so we can't change the order of computation when
+     vectorizing them.  */
 
   if (loop->inner)
     vect_analyze_scalar_cycles_1 (loop_vinfo, loop->inner);
@@ -820,7 +873,15 @@ destroy_loop_vec_info (loop_vec_info loo
                   if (orig_stmt_info
                       && STMT_VINFO_IN_PATTERN_P (orig_stmt_info))
                     remove_stmt_p = true;
-                }
+               
+		  /* We are removing statement inserted by the pattern 
+		     detection pass. Update the original statement to be the 
+		     def stmt of the statement's LHS.  */
+                  if (remove_stmt_p && is_gimple_assign (orig_stmt) 
+                      && TREE_CODE (gimple_assign_lhs (orig_stmt)) == SSA_NAME)
+                    SSA_NAME_DEF_STMT (gimple_assign_lhs (orig_stmt)) 
+                      = orig_stmt;
+                 }
 
               /* Free stmt_vec_info.  */
               free_stmt_vec_info (stmt);
@@ -1670,13 +1731,16 @@ vect_is_simple_reduction_1 (loop_vec_inf
       gimple use_stmt = USE_STMT (use_p);
       if (is_gimple_debug (use_stmt))
 	continue;
+
       if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
 	  && vinfo_for_stmt (use_stmt)
-	  && !is_pattern_stmt_p (vinfo_for_stmt (use_stmt)))
+	  && !is_pattern_stmt_p (vinfo_for_stmt (use_stmt))
+	  && !STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (use_stmt)))
         nloop_uses++;
+   
       if (nloop_uses > 1)
         {
-          if (vect_print_dump_info (REPORT_DETAILS))
+          if (vect_print_dump_info (REPORT_DETAILS)) 
             fprintf (vect_dump, "reduction used in loop.");
           return NULL;
         }
@@ -1724,10 +1788,12 @@ vect_is_simple_reduction_1 (loop_vec_inf
       gimple use_stmt = USE_STMT (use_p);
       if (is_gimple_debug (use_stmt))
 	continue;
+
       if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
 	  && vinfo_for_stmt (use_stmt)
 	  && !is_pattern_stmt_p (vinfo_for_stmt (use_stmt)))
 	nloop_uses++;
+
       if (nloop_uses > 1)
 	{
 	  if (vect_print_dump_info (REPORT_DETAILS))
@@ -1777,6 +1843,9 @@ vect_is_simple_reduction_1 (loop_vec_inf
     code = PLUS_EXPR;
 
   if (check_reduction
+      && (!vinfo_for_stmt (def_stmt)
+          || STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (def_stmt))
+                != minmax_loc_stmt)
       && (!commutative_tree_code (code) || !associative_tree_code (code)))
     {
       if (vect_print_dump_info (REPORT_DETAILS))
@@ -1827,14 +1896,16 @@ vect_is_simple_reduction_1 (loop_vec_inf
    }
 
   type = TREE_TYPE (gimple_assign_lhs (def_stmt));
-  if ((TREE_CODE (op1) == SSA_NAME
-       && !types_compatible_p (type,TREE_TYPE (op1)))
-      || (TREE_CODE (op2) == SSA_NAME
-          && !types_compatible_p (type, TREE_TYPE (op2)))
-      || (op3 && TREE_CODE (op3) == SSA_NAME
-          && !types_compatible_p (type, TREE_TYPE (op3)))
-      || (op4 && TREE_CODE (op4) == SSA_NAME
-          && !types_compatible_p (type, TREE_TYPE (op4))))
+  if (STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (def_stmt)) 
+        != minmax_loc_stmt
+      && ((TREE_CODE (op1) == SSA_NAME 
+           && !types_compatible_p (type, TREE_TYPE (op1)))
+          || (TREE_CODE (op2) == SSA_NAME
+           && !types_compatible_p (type, TREE_TYPE (op2)))
+          || (op3 && TREE_CODE (op3) == SSA_NAME
+           && !types_compatible_p (type, TREE_TYPE (op3)))
+          || (op4 && TREE_CODE (op4) == SSA_NAME
+           && !types_compatible_p (type, TREE_TYPE (op4)))))
     {
       if (vect_print_dump_info (REPORT_DETAILS))
         {
@@ -1842,17 +1913,17 @@ vect_is_simple_reduction_1 (loop_vec_inf
           print_generic_expr (vect_dump, type, TDF_SLIM);
           fprintf (vect_dump, ", operands types: ");
           print_generic_expr (vect_dump, TREE_TYPE (op1), TDF_SLIM);
-          fprintf (vect_dump, ",");
+          fprintf (vect_dump, ", ");
           print_generic_expr (vect_dump, TREE_TYPE (op2), TDF_SLIM);
           if (op3)
             {
-              fprintf (vect_dump, ",");
+              fprintf (vect_dump, ", ");
               print_generic_expr (vect_dump, TREE_TYPE (op3), TDF_SLIM);
             }
 
           if (op4)
             {
-              fprintf (vect_dump, ",");
+              fprintf (vect_dump, ", ");
               print_generic_expr (vect_dump, TREE_TYPE (op4), TDF_SLIM);
             }
         }
@@ -1960,7 +2031,7 @@ vect_is_simple_reduction_1 (loop_vec_inf
                                == vect_internal_def
 		           && !is_loop_header_bb_p (gimple_bb (def2)))))))
     {
-      if (check_reduction)
+      if (check_reduction && code != COND_EXPR)
         {
           /* Swap operands (just for simplicity - so that the rest of the code
 	     can assume that the reduction variable is always the last (second)
@@ -2431,7 +2502,6 @@ vect_model_reduction_cost (stmt_vec_info
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
 
-
   /* Cost of reduction op inside loop.  */
   STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) 
     += ncopies * vect_get_cost (vector_stmt);
@@ -2468,11 +2538,15 @@ vect_model_reduction_cost (stmt_vec_info
   mode = TYPE_MODE (vectype);
   orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
 
-  if (!orig_stmt)
+  if (!orig_stmt || STMT_VINFO_COMPOUND_PATTERN (stmt_info)) 
     orig_stmt = STMT_VINFO_STMT (stmt_info);
 
   code = gimple_assign_rhs_code (orig_stmt);
 
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt)
+    STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)
+      += ncopies * 5 * vect_get_cost (vector_stmt);
+
   /* Add in cost for initial definition.  */
   outer_cost += vect_get_cost (scalar_to_vec);
 
@@ -2488,28 +2562,34 @@ vect_model_reduction_cost (stmt_vec_info
                       + vect_get_cost (vec_to_scalar); 
       else
 	{
-	  int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
-	  tree bitsize =
-	    TYPE_SIZE (TREE_TYPE (gimple_assign_lhs (orig_stmt)));
-	  int element_bitsize = tree_low_cst (bitsize, 1);
-	  int nelements = vec_size_in_bits / element_bitsize;
-
-	  optab = optab_for_tree_code (code, vectype, optab_default);
-
-	  /* We have a whole vector shift available.  */
-	  if (VECTOR_MODE_P (mode)
-	      && optab_handler (optab, mode) != CODE_FOR_nothing
-	      && optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
-	    /* Final reduction via vector shifts and the reduction operator. Also
-	       requires scalar extract.  */
-	    outer_cost += ((exact_log2(nelements) * 2) 
-              * vect_get_cost (vector_stmt) 
-  	      + vect_get_cost (vec_to_scalar));
-	  else
-	    /* Use extracts and reduction op for final reduction.  For N elements,
-               we have N extracts and N-1 reduction ops.  */
-	    outer_cost += ((nelements + nelements - 1) 
-              * vect_get_cost (vector_stmt));
+          if (STMT_VINFO_COMPOUND_PATTERN (stmt_info))
+            outer_cost += 6 * vect_get_cost (vector_stmt)
+                          + vect_get_cost (vec_to_scalar);
+          else
+            {
+	      int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
+	      tree bitsize =
+		TYPE_SIZE (TREE_TYPE (gimple_assign_lhs (orig_stmt)));
+	      int element_bitsize = tree_low_cst (bitsize, 1);
+	      int nelements = vec_size_in_bits / element_bitsize;
+
+	      optab = optab_for_tree_code (code, vectype, optab_default);
+
+	      /* We have a whole vector shift available.  */
+	      if (VECTOR_MODE_P (mode)
+	          && optab_handler (optab, mode) != CODE_FOR_nothing
+	          && optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
+	        /* Final reduction via vector shifts and the reduction 
+		   operator. Also requires scalar extract.  */
+	        outer_cost += ((exact_log2(nelements) * 2) 
+                  * vect_get_cost (vector_stmt) 
+  	          + vect_get_cost (vec_to_scalar));
+	      else
+	        /* Use extracts and reduction op for final reduction.  For N 
+		   elements, we have N extracts and N-1 reduction ops.  */
+		outer_cost += ((nelements + nelements - 1) 
+		  * vect_get_cost (vector_stmt));
+	    }
 	}
     }
 
@@ -3010,6 +3090,127 @@ get_initial_def_for_reduction (gimple st
   return init_def;
 }
 
+/* Create min/max location epilogue calculation. We have both vector and
+   extracted scalar results of min/max computation, and a vector of locations
+   that we need to reduce to a scalar result now.
+   We use a technique described in the documention of
+   vectorize_minmax_location_pattern ().  */
+
+static void
+vect_create_epilogue_for_compound_pattern (gimple stmt, tree vectype, 
+                                           enum tree_code *reduc_code,
+                                           gimple *new_phi, 
+                                           gimple_stmt_iterator *exit_gsi,
+                                           enum tree_code *code)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  tree t = NULL_TREE, minmax_vec, minmax_res = NULL_TREE, orig_cond, val;
+  gimple related, min_max_stmt, related_res;
+  enum machine_mode vec_mode;
+  optab reduc_optab;
+  unsigned int nunits;
+  int i;
+  imm_use_iterator imm_iter;
+  use_operand_p use_p;
+  basic_block exit_bb;
+  enum tree_code orig_code;
+
+  if (nested_in_vect_loop_p (loop, stmt))
+    loop = loop->inner;
+
+  exit_bb = single_exit (loop)->dest;
+
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) != minmax_loc_stmt)
+    return;
+
+  related = STMT_VINFO_RELATED_STMT (stmt_info);
+  related_res = STMT_VINFO_REDUC_SCALAR_RES_STMT (vinfo_for_stmt (related));
+  gcc_assert (related_res);
+
+  /* Get a vector result of min/max computation.  */
+  min_max_stmt = STMT_VINFO_VEC_STMT (vinfo_for_stmt (related));
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_assign_lhs (min_max_stmt))
+    if (gimple_bb (USE_STMT (use_p)) == exit_bb
+        && gimple_code (USE_STMT (use_p)) == GIMPLE_PHI)
+      minmax_res = PHI_RESULT (USE_STMT (use_p));
+   
+  gcc_assert (minmax_res);
+
+  /* Create vector {min, min,...} or {max, max, ...}.  */
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  for (i = nunits - 1; i >= 0; --i)
+    t = tree_cons (NULL_TREE, gimple_assign_lhs (related_res), t);
+
+  minmax_vec = build_constructor_from_list (TREE_TYPE (minmax_res), t);
+
+  /* To extract the final position value, we need to know whether to look
+     for maximum (GT_EXPR and LT_EXPR) or minimum (GE_EXPR or LE_EXPR).  */ 
+  orig_cond = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
+  if (TREE_CODE (orig_cond) == SSA_NAME)
+    {
+      gimple cond_def_stmt = SSA_NAME_DEF_STMT (orig_cond);
+      orig_code = gimple_assign_rhs_code (cond_def_stmt);
+    }
+  else
+    orig_code = TREE_CODE (orig_cond);
+
+  if (orig_code == GT_EXPR || orig_code == LT_EXPR)
+    {
+      val = TYPE_MAX_VALUE (TREE_TYPE (gimple_assign_lhs (stmt)));
+      *code = MIN_EXPR;
+    }
+  else
+    {
+      val = TYPE_MIN_VALUE (TREE_TYPE (gimple_assign_lhs (stmt)));
+      *code = MAX_EXPR;
+    }
+
+  /* Build a vector of maximum or minimum values.  */
+  t = NULL_TREE;
+  for (i = nunits - 1; i >= 0; --i)
+    t = tree_cons (NULL_TREE, val, t); 
+
+  /* Promote GSI to after the min/max result extraction, since we use it
+     in index calculation. (We insert the min/max scalar statement before
+     the index calculation statement (in vect_recog_min_max_loc_pattern()),
+     therefore, its epilogue is created before the epilogue of the index
+     calculation statement.  */
+  *exit_gsi = gsi_for_stmt (related_res);
+  gsi_next (exit_gsi);
+  minmax_vec = vect_init_vector (stmt, minmax_vec, TREE_TYPE (minmax_res), 
+                                 exit_gsi);
+  *new_phi = vectorize_minmax_location_pattern (stmt, exit_gsi, EQ_EXPR,
+                                                minmax_res, minmax_vec,
+                                                PHI_RESULT (*new_phi),
+                                                build_vector (vectype, t));
+
+  /* Extract minimum or maximum from VECTOR_RESULT to get the first or the last
+     index (using one of the above techniques).  */
+  *reduc_code = ERROR_MARK;
+  if (reduction_code_for_scalar_code (*code, reduc_code))
+    {
+      reduc_optab = optab_for_tree_code (*reduc_code, vectype, optab_default);
+      if (!reduc_optab)
+        {
+          if (vect_print_dump_info (REPORT_DETAILS))
+            fprintf (vect_dump, "no optab for reduction.");
+
+          reduc_code = ERROR_MARK;
+        }
+
+        vec_mode = TYPE_MODE (vectype);
+        if (reduc_optab
+            && optab_handler (reduc_optab, vec_mode)  == CODE_FOR_nothing)
+          {
+            if (vect_print_dump_info (REPORT_DETAILS))
+              fprintf (vect_dump, "reduc op not supported by target.");
+
+            *reduc_code = ERROR_MARK;
+          }
+     }
+}
 
 /* Function vect_create_epilog_for_reduction
 
@@ -3112,6 +3313,7 @@ vect_create_epilog_for_reduction (VEC (t
   unsigned int group_size = 1, k, ratio;
   VEC (tree, heap) *vec_initial_defs = NULL;
   VEC (gimple, heap) *phis;
+  tree vec_temp;
 
   if (slp_node)
     group_size = VEC_length (gimple, SLP_TREE_SCALAR_STMTS (slp_node)); 
@@ -3169,9 +3371,9 @@ vect_create_epilog_for_reduction (VEC (t
   else
     {
       vec_initial_defs = VEC_alloc (tree, heap, 1);
-     /* For the case of reduction, vect_get_vec_def_for_operand returns
-        the scalar def before the loop, that defines the initial value
-        of the reduction variable.  */
+      /* For the case of reduction, vect_get_vec_def_for_operand returns
+         the scalar def before the loop, that defines the initial value
+         of the reduction variable.  */
       vec_initial_def = vect_get_vec_def_for_operand (reduction_op, stmt,
                                                       &adjustment_def);
       VEC_quick_push (tree, vec_initial_defs, vec_initial_def);
@@ -3271,18 +3473,18 @@ vect_create_epilog_for_reduction (VEC (t
          defined in the loop.  In case STMT is a "pattern-stmt" (i.e. - it
          represents a reduction pattern), the tree-code and scalar-def are
          taken from the original stmt that the pattern-stmt (STMT) replaces.
-         Otherwise (it is a regular reduction) - the tree-code and scalar-def
-         are taken from STMT.  */
+         Otherwise (it is a regular reduction or a compound pattern) - the 
+         tree-code and scalar-def are taken from STMT.  */
 
   orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
-  if (!orig_stmt)
+  if (!orig_stmt || STMT_VINFO_COMPOUND_PATTERN (stmt_info))  
     {
-      /* Regular reduction  */
+      /* Regular reduction or compound pattern.  */
       orig_stmt = stmt;
     }
   else
     {
-      /* Reduction pattern  */
+      /* Reduction pattern.  */ 
       stmt_vec_info stmt_vinfo = vinfo_for_stmt (orig_stmt);
       gcc_assert (STMT_VINFO_IN_PATTERN_P (stmt_vinfo));
       gcc_assert (STMT_VINFO_RELATED_STMT (stmt_vinfo) == stmt);
@@ -3309,6 +3511,16 @@ vect_create_epilog_for_reduction (VEC (t
   if (nested_in_vect_loop && !double_reduc)
     goto vect_finalize_reduction;
 
+  /* Create an epilogue for compound pattern.  */
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info))
+    {
+      /* FORNOW: SLP with compound patterns is not supported.  */
+      new_phi = VEC_index (gimple, new_phis, 0);
+      vect_create_epilogue_for_compound_pattern (stmt, vectype, &reduc_code,
+                                                 &new_phi, &exit_gsi, &code);
+      VEC_replace (gimple, new_phis, 0, new_phi);
+    }
+  
   /* 2.3 Create the reduction code, using one of the three schemes described
          above. In SLP we simply need to extract all the elements from the 
          vector (without reducing them), so we use scalar shifts.  */
@@ -3324,7 +3536,11 @@ vect_create_epilog_for_reduction (VEC (t
 
       vec_dest = vect_create_destination_var (scalar_dest, vectype);
       new_phi = VEC_index (gimple, new_phis, 0);
-      tmp = build1 (reduc_code, vectype,  PHI_RESULT (new_phi));
+      if (gimple_code (new_phi) == GIMPLE_PHI)
+        vec_temp = PHI_RESULT (new_phi);
+      else
+        vec_temp = gimple_assign_lhs (new_phi);
+      tmp = build1 (reduc_code, vectype,  vec_temp);
       epilog_stmt = gimple_build_assign (vec_dest, tmp);
       new_temp = make_ssa_name (vec_dest, epilog_stmt);
       gimple_assign_set_lhs (epilog_stmt, new_temp);
@@ -3339,7 +3555,6 @@ vect_create_epilog_for_reduction (VEC (t
       int bit_offset;
       int element_bitsize = tree_low_cst (bitsize, 1);
       int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
-      tree vec_temp;
 
       if (optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
         shift_code = VEC_RSHIFT_EXPR;
@@ -3357,7 +3572,7 @@ vect_create_epilog_for_reduction (VEC (t
       else
         {
           optab optab = optab_for_tree_code (code, vectype, optab_default);
-          if (optab_handler (optab, mode) == CODE_FOR_nothing)
+          if (!optab || optab_handler (optab, mode) == CODE_FOR_nothing)
             have_whole_vector_shift = false;
         }
 
@@ -3375,7 +3590,10 @@ vect_create_epilog_for_reduction (VEC (t
 
           vec_dest = vect_create_destination_var (scalar_dest, vectype);
           new_phi = VEC_index (gimple, new_phis, 0);
-          new_temp = PHI_RESULT (new_phi);
+          if (gimple_code (new_phi) == GIMPLE_PHI)
+            new_temp = PHI_RESULT (new_phi);
+          else
+            new_temp = gimple_assign_lhs (new_phi);
           for (bit_offset = vec_size_in_bits/2;
                bit_offset >= element_bitsize;
                bit_offset /= 2)
@@ -3417,7 +3635,10 @@ vect_create_epilog_for_reduction (VEC (t
           vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
           for (i = 0; VEC_iterate (gimple, new_phis, i, new_phi); i++)
             {
-              vec_temp = PHI_RESULT (new_phi);
+              if (gimple_code (new_phi) == GIMPLE_PHI)
+                vec_temp = PHI_RESULT (new_phi);
+              else
+                vec_temp = gimple_assign_lhs (new_phi);
               rhs = build3 (BIT_FIELD_REF, scalar_type, vec_temp, bitsize,
                             bitsize_zero_node);
               epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
@@ -3487,6 +3708,7 @@ vect_create_epilog_for_reduction (VEC (t
             /* Not SLP - we have one scalar to keep in SCALAR_RESULTS.  */
             VEC_safe_push (tree, heap, scalar_results, new_temp);
 
+          STMT_VINFO_REDUC_SCALAR_RES_STMT (stmt_info) = epilog_stmt;
           extract_scalar_result = false;
         }
     }
@@ -3514,6 +3736,7 @@ vect_create_epilog_for_reduction (VEC (t
       gimple_assign_set_lhs (epilog_stmt, new_temp);
       gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
       VEC_safe_push (tree, heap, scalar_results, new_temp);
+      STMT_VINFO_REDUC_SCALAR_RES_STMT (stmt_info) = epilog_stmt;
     }
   
 vect_finalize_reduction:
@@ -3529,8 +3752,13 @@ vect_finalize_reduction:
       if (nested_in_vect_loop)
 	{
           new_phi = VEC_index (gimple, new_phis, 0);
+          if (gimple_code (new_phi) == GIMPLE_PHI)
+            vec_temp = PHI_RESULT (new_phi);
+          else
+            vec_temp = gimple_assign_lhs (new_phi);
+
 	  gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) == VECTOR_TYPE);
-	  expr = build2 (code, vectype, PHI_RESULT (new_phi), adjustment_def);
+	  expr = build2 (code, vectype, vec_temp, adjustment_def);
 	  new_dest = vect_create_destination_var (scalar_dest, vectype);
 	}
       else
@@ -3563,6 +3791,7 @@ vect_finalize_reduction:
         VEC_replace (tree, scalar_results, 0, new_temp);
 
       VEC_replace (gimple, new_phis, 0, epilog_stmt);
+      STMT_VINFO_REDUC_SCALAR_RES_STMT (stmt_info) = epilog_stmt;
     }
 
   /* 2.6  Handle the loop-exit phis. Replace the uses of scalar loop-exit
@@ -3632,8 +3861,10 @@ vect_finalize_reduction:
           VEC_safe_push (gimple, heap, phis, USE_STMT (use_p));
 
       /* We expect to have found an exit_phi because of loop-closed-ssa
-         form.  */
-      gcc_assert (!VEC_empty (gimple, phis));
+         form,unless it's a min/max statement of min/max location pattern, 
+         which is inserted by the pattern recognition phase.  */
+      gcc_assert (!VEC_empty (gimple, phis)
+                  || STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_stmt);
 
       for (i = 0; VEC_iterate (gimple, phis, i, exit_phi); i++)
         {
@@ -3714,7 +3945,12 @@ vect_finalize_reduction:
                   add_phi_arg (vect_phi, vect_phi_init,
                                loop_preheader_edge (outer_loop),
                                UNKNOWN_LOCATION);
-                  add_phi_arg (vect_phi, PHI_RESULT (epilog_stmt),
+                  if (gimple_code (epilog_stmt) == GIMPLE_PHI)
+                    vec_temp = PHI_RESULT (epilog_stmt);
+                  else
+                    vec_temp = gimple_assign_lhs (epilog_stmt);
+
+                  add_phi_arg (vect_phi, vec_temp,
                                loop_latch_edge (outer_loop), UNKNOWN_LOCATION);
                   if (vect_print_dump_info (REPORT_DETAILS))
                     {
@@ -3837,11 +4073,11 @@ vectorizable_reduction (gimple stmt, gim
   basic_block def_bb;
   struct loop * def_stmt_loop, *outer_loop = NULL;
   tree def_arg;
-  gimple def_arg_stmt;
+  gimple def_arg_stmt, related;
   VEC (tree, heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL, *vect_defs = NULL;
   VEC (gimple, heap) *phis = NULL;
-  int vec_num;
-  tree def0, def1;
+  int vec_num, cond_reduc_index = 0;
+  tree def0, def1, cond_reduc_def = NULL_TREE;
 
   if (nested_in_vect_loop_p (loop, stmt))
     {
@@ -3851,8 +4087,10 @@ vectorizable_reduction (gimple stmt, gim
     }
 
   /* 1. Is vectorizable reduction?  */
-  /* Not supportable if the reduction variable is used in the loop.  */
-  if (STMT_VINFO_RELEVANT (stmt_info) > vect_used_in_outer)
+  /* Not supportable if the reduction variable is used in the loop,
+     unless it's a pattern.  */
+  if (STMT_VINFO_RELEVANT (stmt_info) > vect_used_in_outer 
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
     return false;
 
   /* Reductions that are not used even in an enclosing outer-loop,
@@ -3874,14 +4112,17 @@ vectorizable_reduction (gimple stmt, gim
      the original sequence that constitutes the pattern.  */
 
   orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
-  if (orig_stmt)
+  if (orig_stmt 
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
     {
       orig_stmt_info = vinfo_for_stmt (orig_stmt);
       gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) == stmt);
       gcc_assert (STMT_VINFO_IN_PATTERN_P (orig_stmt_info));
       gcc_assert (!STMT_VINFO_IN_PATTERN_P (stmt_info));
     }
-
+  else
+    orig_stmt = NULL;
+ 
   /* 3. Check the operands of the operation. The first operands are defined
         inside the loop body. The last operand is the reduction variable,
         which is defined by the loop-header-phi.  */
@@ -3994,12 +4235,13 @@ vectorizable_reduction (gimple stmt, gim
 
   if (code == COND_EXPR)
     {
-      if (!vectorizable_condition (stmt, gsi, NULL, ops[reduc_index], 0))
+      if (!vectorizable_condition (stmt, gsi, NULL, ops[reduc_index], 0, 
+                                   cond_reduc_def, cond_reduc_index)) 
         {
           if (vect_print_dump_info (REPORT_DETAILS))
             fprintf (vect_dump, "unsupported condition in reduction");
 
-            return false;
+          return false;
         }
     }
   else
@@ -4132,7 +4374,12 @@ vectorizable_reduction (gimple stmt, gim
     }
   else
     {
-      if (!nested_cycle || double_reduc)
+      /* There is no need in reduction epilogue in case of a nested cycle, 
+         unless it is double reduction. For reduction pattern, we assume that
+         we know how to create an epilogue even if there is no reduction code
+         for it.  */ 
+      if ((!nested_cycle || double_reduc) 
+           && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
         {
           if (vect_print_dump_info (REPORT_DETAILS))
             fprintf (vect_dump, "no reduc code for scalar code.");
@@ -4152,8 +4399,9 @@ vectorizable_reduction (gimple stmt, gim
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
-      if (!vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies))
+      if (!vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies)) 
         return false;
+
       return true;
     }
 
@@ -4204,6 +4452,32 @@ vectorizable_reduction (gimple stmt, gim
   else
     epilog_copies = ncopies;
 
+  /* Prepare vector operands for min/max location.  */
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt)
+    {
+      tree cond_op;
+      gimple cond_def_stmt;
+
+      related = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt));
+      cond_op = TREE_OPERAND (ops[0], 0);
+      cond_def_stmt = SSA_NAME_DEF_STMT (cond_op);
+      if (gimple_code (cond_def_stmt) == GIMPLE_PHI)
+        {
+          cond_reduc_index = 1;
+          cond_reduc_def = gimple_assign_rhs1 (STMT_VINFO_VEC_STMT (
+                                                    vinfo_for_stmt (related)));
+        }
+      else
+        {
+          cond_op = TREE_OPERAND (ops[0], 1);
+          cond_def_stmt = SSA_NAME_DEF_STMT (cond_op);
+          gcc_assert (gimple_code (cond_def_stmt) == GIMPLE_PHI);
+          cond_reduc_index = 2;
+          cond_reduc_def = gimple_assign_rhs2 (STMT_VINFO_VEC_STMT (
+                                                    vinfo_for_stmt (related)));
+        }
+    }
+
   prev_stmt_info = NULL;
   prev_phi_info = NULL;
   if (slp_node)
@@ -4247,7 +4521,8 @@ vectorizable_reduction (gimple stmt, gim
           gcc_assert (!slp_node);
           vectorizable_condition (stmt, gsi, vec_stmt, 
                                   PHI_RESULT (VEC_index (gimple, phis, 0)), 
-                                  reduc_index);
+                                  reduc_index, cond_reduc_def, 
+                                  cond_reduc_index);
           /* Multiple types are not supported for condition.  */
           break;
         }
@@ -4483,6 +4758,9 @@ vectorizable_live_operation (gimple stmt
 
   gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
 
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info))
+    return true;
+
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
     return false;
 
Index: tree-vect-patterns.c
===================================================================
--- tree-vect-patterns.c	(revision 161862)
+++ tree-vect-patterns.c	(working copy)
@@ -53,6 +53,10 @@ static vect_recog_func_ptr vect_vect_rec
 	vect_recog_widen_sum_pattern,
 	vect_recog_dot_prod_pattern,
 	vect_recog_pow_pattern};
+static bool vect_recog_min_max_loc_pattern (unsigned int, va_list);
+static vect_recog_compound_func_ptr 
+   vect_recog_compound_func_ptrs[NUM_COMPOUND_PATTERNS] = {
+        vect_recog_min_max_loc_pattern};
 
 
 /* Function widened_name_p
@@ -846,3 +850,286 @@ vect_pattern_recog (loop_vec_info loop_v
         }
     }
 }
+
+
+/* Detect min/max location pattern. 
+   Given two reducton condition statements and their phi nodes, we check
+   if one of the statements calculates minimum or maximum, and the other one
+   records its location. If the pattern is detected, we replace the min/max 
+   condition statement with MIN_EXPR or MAX_EXPR, and mark the old statement 
+   as pattern statement.
+
+   The pattern we are looking for:
+
+   s1: min = [cond_expr] a < min ? a : min
+   s2: index = [cond_expr] a < min ? new_index : index
+
+   We add MIN_EXPR statement before the index calculation statement:
+
+   s1:  min = [cond_expr] a < min ? a : min
+   s1': min = [min_expr] <a, min>
+   s2:  index = [cond_expr] a < min ? new_index : index
+
+   s1 is marked as pattern statement
+   s1' points to s1 via related_stmt field
+   s1 points to s1' via related_stmt field
+   s2 points to s1' via related_stmt field.  
+   s1' and s2 are marked as compound pattern min/max and min/max location
+   statements.  */
+
+static bool
+vect_recog_min_max_loc_pattern (unsigned int nargs, va_list args)
+{
+  gimple first_phi, first_stmt, second_phi, second_stmt, loop_op_def_stmt;
+  stmt_vec_info stmt_vinfo, new_stmt_info, minmax_stmt_info, pos_stmt_info;
+  loop_vec_info loop_info;
+  struct loop *loop;
+  enum tree_code code, first_code, second_code;
+  gimple first_cond_def_stmt = NULL, second_cond_def_stmt = NULL;
+  tree first_cond_op0, first_cond_op1, second_cond_op0, second_cond_op1;
+  tree first_stmt_oprnd0, first_stmt_oprnd1, second_stmt_oprnd0;
+  tree second_stmt_oprnd1, first_cond, second_cond;
+  int phi_def_index;
+  tree first_loop_op, second_loop_op, pos_stmt_loop_op, def, result;
+  gimple pos_stmt, min_max_stmt, new_stmt, def_stmt;
+  gimple_stmt_iterator gsi;
+
+  if (nargs < 4)
+    return false;
+
+  first_phi = va_arg (args, gimple);
+  first_stmt = va_arg (args, gimple);
+  second_phi = va_arg (args, gimple);
+  second_stmt = va_arg (args, gimple);
+
+  stmt_vinfo = vinfo_for_stmt (first_stmt);
+  loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+  loop = LOOP_VINFO_LOOP (loop_info);
+
+  /* Check that the condition is the same and is GT or LT.  */
+  first_cond = TREE_OPERAND (gimple_assign_rhs1 (first_stmt), 0);
+  if (TREE_CODE (first_cond) == SSA_NAME)
+    {
+      first_cond_def_stmt = SSA_NAME_DEF_STMT (first_cond);
+      first_code = gimple_assign_rhs_code (first_cond_def_stmt);
+      first_cond_op0 = gimple_assign_rhs1 (first_cond_def_stmt);
+      first_cond_op1 = gimple_assign_rhs2 (first_cond_def_stmt);
+    }
+  else
+    {
+      first_code = TREE_CODE (first_cond);
+      first_cond_op0 = TREE_OPERAND (first_cond, 0);
+      first_cond_op1 = TREE_OPERAND (first_cond, 1);
+    }
+
+  if (first_code != GT_EXPR && first_code != LT_EXPR
+      && first_code != GE_EXPR && first_code != LE_EXPR)
+    return false;
+
+  second_cond = TREE_OPERAND (gimple_assign_rhs1 (second_stmt), 0);
+  if (TREE_CODE (second_cond) == SSA_NAME)
+    {
+      second_cond_def_stmt = SSA_NAME_DEF_STMT (second_cond);
+      second_code = gimple_assign_rhs_code (second_cond_def_stmt);
+      second_cond_op0 = gimple_assign_rhs1 (second_cond_def_stmt);
+      second_cond_op1 = gimple_assign_rhs2 (second_cond_def_stmt);
+    }
+  else
+    {
+      second_code = TREE_CODE (second_cond);
+      second_cond_op0 = TREE_OPERAND (second_cond, 0);
+      second_cond_op1 = TREE_OPERAND (second_cond, 1);
+    }
+
+  if (first_code != second_code)
+    return false;
+
+  if (first_cond_def_stmt
+      && (!second_cond_def_stmt
+          || first_cond_def_stmt != second_cond_def_stmt
+          || !operand_equal_p (first_cond_op0, second_cond_op0, 0)
+          || !operand_equal_p (first_cond_op1, second_cond_op1, 0)))
+   return false;
+
+  /* Both statements have the same condition.  */
+
+  first_stmt_oprnd0 = TREE_OPERAND (gimple_assign_rhs1 (first_stmt), 1);
+  first_stmt_oprnd1 = TREE_OPERAND (gimple_assign_rhs1 (first_stmt), 2);
+
+  second_stmt_oprnd0 = TREE_OPERAND (gimple_assign_rhs1 (second_stmt), 1);
+  second_stmt_oprnd1 = TREE_OPERAND (gimple_assign_rhs1 (second_stmt), 2);
+
+  if (TREE_CODE (first_stmt_oprnd0) != SSA_NAME
+      || TREE_CODE (first_stmt_oprnd1) != SSA_NAME
+      || TREE_CODE (second_stmt_oprnd0) != SSA_NAME
+      || TREE_CODE (second_stmt_oprnd1) != SSA_NAME)
+    return false;
+
+  if (operand_equal_p (PHI_RESULT (first_phi), first_stmt_oprnd0, 0)
+      && operand_equal_p (PHI_RESULT (second_phi), second_stmt_oprnd0, 0))
+    {
+      phi_def_index = 0;
+      first_loop_op = first_stmt_oprnd1;
+      second_loop_op = second_stmt_oprnd1;
+    }
+  else
+    {
+      if (operand_equal_p (PHI_RESULT (first_phi), first_stmt_oprnd1, 0)
+          && operand_equal_p (PHI_RESULT (second_phi), second_stmt_oprnd1, 0))
+        {
+          phi_def_index = 1;
+          first_loop_op = first_stmt_oprnd0;
+          second_loop_op = second_stmt_oprnd0;
+        }
+      else
+        return false;
+    }
+
+  /* Now we know which operand is defined by phi node. Analyze the second
+     one.  */
+
+  /* The min/max stmt must be x < y ? x : y.  */
+  if (operand_equal_p (first_cond_op0, first_stmt_oprnd0, 0)
+      && operand_equal_p (first_cond_op1, first_stmt_oprnd1, 0))
+    {
+      pos_stmt = second_stmt;
+      min_max_stmt = first_stmt;
+      pos_stmt_loop_op = second_loop_op;
+    }
+  else
+    {
+      if (operand_equal_p (second_cond_op0, second_stmt_oprnd0, 0)
+          && operand_equal_p (second_cond_op1, second_stmt_oprnd1, 0))
+        {
+          pos_stmt = first_stmt;
+          min_max_stmt = second_stmt;
+          pos_stmt_loop_op = first_loop_op;
+        }
+      else
+        return false;
+    }
+
+  /* Analyze the position stmt. We expect it to be either induction or
+     induction plus constant.  */
+  loop_op_def_stmt = SSA_NAME_DEF_STMT (pos_stmt_loop_op);
+
+  if (!flow_bb_inside_loop_p (loop, gimple_bb (loop_op_def_stmt)))
+    return false;
+
+  if (gimple_code (loop_op_def_stmt) == GIMPLE_PHI)
+    {
+      if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (loop_op_def_stmt))
+          != vect_induction_def)
+        return false;
+    }
+  else
+    {
+      if (!is_gimple_assign (loop_op_def_stmt))
+        return false;
+
+      if (get_gimple_rhs_class (gimple_assign_rhs_code (loop_op_def_stmt))
+           == GIMPLE_UNARY_RHS)
+        def = gimple_assign_rhs1 (loop_op_def_stmt);
+      else
+        {
+          tree op1, op2;
+
+          if (get_gimple_rhs_class (gimple_assign_rhs_code (loop_op_def_stmt))
+               != GIMPLE_BINARY_RHS
+              || gimple_assign_rhs_code (loop_op_def_stmt) != PLUS_EXPR)
+            return false;
+
+          op1 = gimple_assign_rhs1 (loop_op_def_stmt);
+          op2 = gimple_assign_rhs2 (loop_op_def_stmt);
+
+          if (TREE_CONSTANT (op1))
+            def = op2;
+          else
+            {
+              if (TREE_CONSTANT (op2))
+                def = op1;
+              else
+                return false;
+            }
+        }
+
+      if (TREE_CODE (def) != SSA_NAME)
+        return false;
+
+      def_stmt = SSA_NAME_DEF_STMT (def);
+      if (!flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))
+          || gimple_code (def_stmt) != GIMPLE_PHI
+          || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def_stmt))
+              != vect_induction_def)
+         return false;
+    }
+
+  /* Pattern detected. Create a stmt to be used to replace the pattern.  */
+  if (first_code == GT_EXPR || first_code == GE_EXPR)
+    code = phi_def_index ? MAX_EXPR : MIN_EXPR;
+  else
+    code = phi_def_index ? MIN_EXPR : MAX_EXPR;
+
+  result = gimple_assign_lhs (min_max_stmt);
+  new_stmt = gimple_build_assign_with_ops (code, result,
+                          TREE_OPERAND (gimple_assign_rhs1 (min_max_stmt), 1),
+                          TREE_OPERAND (gimple_assign_rhs1 (min_max_stmt), 2));
+  gsi = gsi_for_stmt (pos_stmt);
+  gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+  SSA_NAME_DEF_STMT (result) = new_stmt;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    {
+      fprintf (vect_dump, "Detected min/max location pattern:\nmin/max stmt ");
+      print_gimple_stmt (vect_dump, min_max_stmt, 0, TDF_SLIM);
+      fprintf (vect_dump, "\nlocation stmt ");
+      print_gimple_stmt (vect_dump, pos_stmt, 0, TDF_SLIM);
+      fprintf (vect_dump, "\nCreated stmt: ");
+      print_gimple_stmt (vect_dump, new_stmt, 0, TDF_SLIM);
+    }
+
+  /* Mark the stmts that are involved in the pattern. */
+  set_vinfo_for_stmt (new_stmt,
+                      new_stmt_vec_info (new_stmt, loop_info, NULL));
+  new_stmt_info = vinfo_for_stmt (new_stmt);
+
+  pos_stmt_info = vinfo_for_stmt (pos_stmt);
+  minmax_stmt_info = vinfo_for_stmt (min_max_stmt);
+
+  STMT_VINFO_DEF_TYPE (new_stmt_info) = STMT_VINFO_DEF_TYPE (minmax_stmt_info);
+  STMT_VINFO_VECTYPE (new_stmt_info) = STMT_VINFO_VECTYPE (minmax_stmt_info);
+
+  STMT_VINFO_IN_PATTERN_P (minmax_stmt_info) = true;
+  STMT_VINFO_COMPOUND_PATTERN (new_stmt_info) = minmax_stmt;
+  STMT_VINFO_COMPOUND_PATTERN (pos_stmt_info) = minmax_loc_stmt;
+  STMT_VINFO_RELATED_STMT (new_stmt_info) = min_max_stmt;
+  STMT_VINFO_RELATED_STMT (minmax_stmt_info) = new_stmt;
+  STMT_VINFO_RELATED_STMT (pos_stmt_info) = new_stmt;
+
+  return true;
+}
+
+/* Detect patterns consisting of two more statements to be vectorized.
+   Currently the only supported pattern is min/max location.  */
+
+void
+vect_compound_pattern_recog (unsigned int nargs, ...)
+{
+  unsigned int j;
+  va_list args;
+  bool detected = false;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "=== vect_compound_pattern_recog ===");
+
+  /* Scan over all generic vect_recog__compound_xxx_pattern functions.  */
+  for (j = 0; j < NUM_COMPOUND_PATTERNS; j++)
+    {
+      va_start (args, nargs);
+      detected = (* vect_recog_compound_func_ptrs[j]) (nargs, args);
+      va_end (args);
+      if (detected)
+        break;
+    }
+}
+
Index: tree-vect-stmts.c
===================================================================
--- tree-vect-stmts.c	(revision 161862)
+++ tree-vect-stmts.c	(working copy)
@@ -271,8 +271,10 @@ process_use (gimple stmt, tree use, loop
   /* case 2: A reduction phi (STMT) defined by a reduction stmt (DEF_STMT).
      DEF_STMT must have already been processed, because this should be the
      only way that STMT, which is a reduction-phi, was put in the worklist,
-     as there should be no other uses for DEF_STMT in the loop.  So we just
-     check that everything is as expected, and we are done.  */
+     as there should be no other uses for DEF_STMT in the loop, unless it is
+     min/max location pattern.  So we just check that everything is as
+     as expected, and mark the min/max stmt of the location pattern stmt as
+     used by reduction (it is used by the reduction of location).  */
   dstmt_vinfo = vinfo_for_stmt (def_stmt);
   bb = gimple_bb (stmt);
   if (gimple_code (stmt) == GIMPLE_PHI
@@ -283,11 +285,22 @@ process_use (gimple stmt, tree use, loop
     {
       if (vect_print_dump_info (REPORT_DETAILS))
 	fprintf (vect_dump, "reduc-stmt defining reduc-phi in the same nest.");
+
+      /* Compound reduction pattern: is used by reduction.  */
+      if (STMT_VINFO_COMPOUND_PATTERN (dstmt_vinfo))
+        {
+          relevant = vect_used_by_reduction;
+          vect_mark_relevant (worklist, def_stmt, relevant, live_p);
+          return true;
+        }
+
       if (STMT_VINFO_IN_PATTERN_P (dstmt_vinfo))
 	dstmt_vinfo = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (dstmt_vinfo));
+
       gcc_assert (STMT_VINFO_RELEVANT (dstmt_vinfo) < vect_used_by_reduction);
-      gcc_assert (STMT_VINFO_LIVE_P (dstmt_vinfo)
-		  || STMT_VINFO_RELEVANT (dstmt_vinfo) > vect_unused_in_scope);
+      gcc_assert (STMT_VINFO_LIVE_P (dstmt_vinfo) 
+		  || STMT_VINFO_RELEVANT (dstmt_vinfo) > vect_unused_in_scope
+		  || STMT_VINFO_COMPOUND_PATTERN (dstmt_vinfo));
       return true;
     }
 
@@ -481,7 +494,8 @@ vect_mark_stmts_to_be_vectorized (loop_v
 	          break;
 
 	        case vect_used_by_reduction:
-	          if (gimple_code (stmt) == GIMPLE_PHI)
+	          if (gimple_code (stmt) == GIMPLE_PHI
+                      || STMT_VINFO_COMPOUND_PATTERN (stmt_vinfo))
                     break;
   	          /* fall through */
 
@@ -3975,6 +3989,106 @@ vect_is_simple_cond (tree cond, loop_vec
   return true;
 }
 
+/* Create a sequence of statements that vectorizes min/max location pattern
+   either inside the loop body, or in reduction epilogue. The technique used
+   here was taken from "Multimedia vectorization of floating-point MIN/MAX 
+   reductions" by A.J.C.Bik, X.Tian and M.B.Girkar, 
+   http://portal.acm.org/citation.cfm?id=1145765. 
+   Vectorized loop (maxloc, first index):
+     vcx[0:vl-1:1] = | x |..| x |;  - vector of max values
+     vck[0:vl-1:1] = | k |..| k |;  - vector of positions
+     ind[0:vl-1:1] = |vl-1|..| 0 |; 
+     inc[0:vl-1:1] = | vl |..| vl |; 
+     for (i = 0; i < N; i += vl) { 
+       msk[0:vl-1:1] = (a[i:i+vl-1:1] > vcx[0:vl-1:1]); 
+       vck[0:vl-1:1] = (ind[0:vl-1:1] & msk[0:vl-1:1]) | 
+                       (vck[0:vl-1:1] & !msk[0:vl-1:1]); 
+       vcx[0:vl-1:1] = VMAX(vcx[0:vl-1:1], a[i:i+vl-1:1]); 
+       ind[0:vl-1:1] += inc[0:vl-1:1]; 
+     } 
+     x = HMAX(vcx[0:vl-1:1]);       - scalar maximum extraction
+     msk[0:vl-1:1] = (vcx[0:vl-1:1] == |x|..|x|); 
+     vck[0:vl-1:1] = (vck[0:vl-1:1] & msk[0:vl-1:1]) | 
+                     (|MaxInt|..|MaxInt| & !msk[0:vl-1:1]); 
+     k = HMIN(vck[0:vl-1:1]);       - first position extraction
+
+   In this function we generate:
+    MASK = CODE (COMPARE_OPRND1, COMPARE_OPRND2)
+    VEC_DEST = (VEC_OPRND1 & MASK) | (VEC_OPRND2 & !MASK)  
+
+   When called from vectorizable_condition(), the loop body code is generated.
+   When called from vect_create_epilog_for_reduction(), the function generates
+   the code for scalar extraction in the reduction epilogue. 
+
+   The return value is the last statement in the above sequence.  */
+
+gimple
+vectorize_minmax_location_pattern (gimple stmt, gimple_stmt_iterator *gsi,
+                                   enum tree_code code,
+                                   tree compare_oprnd1, tree compare_oprnd2,
+                                   tree vec_oprnd1, tree vec_oprnd2)
+{
+  tree mask_type, builtin_decl, vec_dest, new_temp, vect_mask;
+  tree and_res1, and_res2, and_dest1, and_dest2, tmp, not_mask, mask, tmp_mask;
+  gimple mask_stmt, new_stmt;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree scalar_dest = gimple_assign_lhs (stmt);
+  gimple related = STMT_VINFO_RELATED_STMT (stmt_info);
+  tree related_lhs = gimple_assign_lhs (related);
+  tree comparison_type = get_vectype_for_scalar_type (TREE_TYPE (related_lhs));
+
+  /* Create mask: MASK = CODE (COMPARE_OPRND1, COMPARE_OPRND2).  */ 
+  builtin_decl = targetm.vectorize.builtin_vect_compare (code,
+                                                  comparison_type, &mask_type);
+  vect_mask = vect_create_destination_var (related_lhs, mask_type);
+  mask_stmt = gimple_build_call (builtin_decl, 2, compare_oprnd1, 
+                                 compare_oprnd2);
+  tmp_mask = make_ssa_name (vect_mask, mask_stmt);
+  gimple_call_set_lhs (mask_stmt, tmp_mask);
+  vect_finish_stmt_generation (stmt, mask_stmt, gsi);
+
+  /* Convert the mask to VECTYPE.  */
+  vect_mask = vect_create_destination_var (scalar_dest, vectype);
+  mask_stmt = gimple_build_assign (vect_mask, fold_build1 (VIEW_CONVERT_EXPR, 
+                                                           vectype, tmp_mask));
+  mask = make_ssa_name (vect_mask, mask_stmt);
+  gimple_assign_set_lhs (mask_stmt, mask);
+  vect_finish_stmt_generation (stmt, mask_stmt, gsi);
+
+  /* Create: VEC_DEST = (VEC_OPRND1 & MASK) | (VEC_OPRND2 & !MASK).  */ 
+  and_dest1 = vect_create_destination_var (scalar_dest, vectype);
+  and_dest2 = vect_create_destination_var (scalar_dest, vectype);
+  vec_dest = vect_create_destination_var (scalar_dest, vectype);
+
+  tmp = build2 (BIT_AND_EXPR, vectype, vec_oprnd1, mask);
+  new_stmt = gimple_build_assign (and_dest1, tmp);
+  and_res1 = make_ssa_name (and_dest1, new_stmt);
+  gimple_assign_set_lhs (new_stmt, and_res1);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+  tmp = build1 (BIT_NOT_EXPR, vectype, mask);
+  new_stmt = gimple_build_assign (vec_dest, tmp);
+  not_mask = make_ssa_name (vec_dest, new_stmt);
+  gimple_assign_set_lhs (new_stmt, not_mask);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+  tmp = build2 (BIT_AND_EXPR, vectype, vec_oprnd2, not_mask);
+  new_stmt = gimple_build_assign (and_dest2, tmp);
+  and_res2 = make_ssa_name (and_dest2, new_stmt);
+  gimple_assign_set_lhs (new_stmt, and_res2);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+  vec_dest = vect_create_destination_var (scalar_dest, vectype);
+  tmp = build2 (BIT_IOR_EXPR, vectype, and_res1, and_res2);
+  new_stmt = gimple_build_assign (vec_dest, tmp);
+  new_temp = make_ssa_name (vec_dest, new_stmt);
+  gimple_assign_set_lhs (new_stmt, new_temp);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+  return new_stmt;
+}
+
 /* vectorizable_condition.
 
    Check if STMT is conditional modify expression that can be vectorized.
@@ -3986,11 +4100,16 @@ vect_is_simple_cond (tree cond, loop_vec
    to be used at REDUC_INDEX (in then clause if REDUC_INDEX is 1, and in
    else caluse if it is 2).
 
+   In min/max location pattern, reduction defs are used in both condition part
+   and then/else clause. In that case COND_REDUC_DEF contains such vector def,
+   and COND_REDUC_INDEX specifies its place in the condition.
+
    Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
 
 bool
 vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
-			gimple *vec_stmt, tree reduc_def, int reduc_index)
+			gimple *vec_stmt, tree reduc_def, int reduc_index,
+                        tree cond_reduc_def, int cond_reduc_index) 
 {
   tree scalar_dest = NULL_TREE;
   tree vec_dest = NULL_TREE;
@@ -4008,6 +4127,7 @@ vectorizable_condition (gimple stmt, gim
   int nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
   enum tree_code code;
+  tree comparison_type, mask_type;
 
   /* FORNOW: unsupported in basic block SLP.  */
   gcc_assert (loop_vinfo);
@@ -4016,20 +4136,23 @@ vectorizable_condition (gimple stmt, gim
   if (ncopies > 1)
     return false; /* FORNOW */
 
-  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+  if (!STMT_VINFO_RELEVANT_P (stmt_info)
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
     return false;
 
   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
-      && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+      && !((STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+            || STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
            && reduc_def))
-    return false;
+    return false;  
 
   /* FORNOW: SLP not supported.  */
   if (STMT_SLP_TYPE (stmt_info))
     return false;
 
   /* FORNOW: not yet supported.  */
-  if (STMT_VINFO_LIVE_P (stmt_info))
+  if (STMT_VINFO_LIVE_P (stmt_info) 
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info)) 
     {
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "value used after loop.");
@@ -4057,7 +4180,10 @@ vectorizable_condition (gimple stmt, gim
   /* We do not handle two different vector types for the condition
      and the values.  */
   if (!types_compatible_p (TREE_TYPE (TREE_OPERAND (cond_expr, 0)),
-			   TREE_TYPE (vectype)))
+			   TREE_TYPE (vectype))
+      && !(STMT_VINFO_COMPOUND_PATTERN (stmt_info)
+           && TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (cond_expr, 0)))
+               == TYPE_SIZE_UNIT (TREE_TYPE (vectype))))
     return false;
 
   if (TREE_CODE (then_clause) == SSA_NAME)
@@ -4087,42 +4213,77 @@ vectorizable_condition (gimple stmt, gim
 
   vec_mode = TYPE_MODE (vectype);
 
-  if (!vec_stmt)
+  comparison_type = 
+         get_vectype_for_scalar_type (TREE_TYPE (TREE_OPERAND (cond_expr, 0)));
+
+  /* Check that min/max location pattern is supported, i.e., the relevant 
+     vector comparisons exist (including EQ_EXPR for reduction epilogue).  */
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt
+      && (!targetm.vectorize.builtin_vect_compare
+          || !targetm.vectorize.builtin_vect_compare (TREE_CODE (cond_expr),
+                                                   comparison_type, &mask_type)
+          || !targetm.vectorize.builtin_vect_compare (EQ_EXPR, comparison_type,
+                                                      &mask_type)))
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "unsupported comparison");
+
+      return false;
+    }
+
+  if (!vec_stmt) 
     {
       STMT_VINFO_TYPE (stmt_info) = condition_vec_info_type;
       return expand_vec_cond_expr_p (TREE_TYPE (op), vec_mode);
     }
 
-  /* Transform */
+  /* Transform.  */
 
   /* Handle def.  */
   scalar_dest = gimple_assign_lhs (stmt);
   vec_dest = vect_create_destination_var (scalar_dest, vectype);
 
   /* Handle cond expr.  */
-  vec_cond_lhs =
-    vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0), stmt, NULL);
-  vec_cond_rhs =
-    vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1), stmt, NULL);
+  if (cond_reduc_index == 1)
+    vec_cond_lhs = cond_reduc_def;
+  else
+    vec_cond_lhs = 
+      vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0), stmt, NULL);
+
+  if (cond_reduc_index == 2)
+    vec_cond_rhs = cond_reduc_def;
+  else
+    vec_cond_rhs = 
+      vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1), stmt, NULL);
+
   if (reduc_index == 1)
     vec_then_clause = reduc_def;
   else
     vec_then_clause = vect_get_vec_def_for_operand (then_clause, stmt, NULL);
+
   if (reduc_index == 2)
     vec_else_clause = reduc_def;
   else
     vec_else_clause = vect_get_vec_def_for_operand (else_clause, stmt, NULL);
 
   /* Arguments are ready. Create the new vector stmt.  */
-  vec_compare = build2 (TREE_CODE (cond_expr), vectype,
-			vec_cond_lhs, vec_cond_rhs);
-  vec_cond_expr = build3 (VEC_COND_EXPR, vectype,
-			  vec_compare, vec_then_clause, vec_else_clause);
-
-  *vec_stmt = gimple_build_assign (vec_dest, vec_cond_expr);
-  new_temp = make_ssa_name (vec_dest, *vec_stmt);
-  gimple_assign_set_lhs (*vec_stmt, new_temp);
-  vect_finish_stmt_generation (stmt, *vec_stmt, gsi);
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt)
+    {
+      *vec_stmt = vectorize_minmax_location_pattern (stmt, gsi,  
+                             TREE_CODE (cond_expr), vec_cond_lhs, vec_cond_rhs,
+                             vec_then_clause, vec_else_clause);
+    }
+  else
+    {
+      vec_compare = build2 (TREE_CODE (cond_expr), vectype, vec_cond_lhs, 
+                            vec_cond_rhs);
+      vec_cond_expr = build3 (VEC_COND_EXPR, vectype, vec_compare, 
+    			      vec_then_clause, vec_else_clause);
+      *vec_stmt = gimple_build_assign (vec_dest, vec_cond_expr);
+      new_temp = make_ssa_name (vec_dest, *vec_stmt);
+      gimple_assign_set_lhs (*vec_stmt, new_temp);
+      vect_finish_stmt_generation (stmt, *vec_stmt, gsi);
+    }
 
   return true;
 }
@@ -4179,7 +4340,8 @@ vect_analyze_stmt (gimple stmt, bool *ne
       case vect_nested_cycle:
          gcc_assert (!bb_vinfo && (relevance == vect_used_in_outer
                      || relevance == vect_used_in_outer_by_reduction
-                     || relevance == vect_unused_in_scope));
+                     || relevance == vect_unused_in_scope
+                     || relevance == vect_used_by_reduction));
          break;
 
       case vect_induction_def:
@@ -4241,7 +4403,7 @@ vect_analyze_stmt (gimple stmt, bool *ne
             || vectorizable_call (stmt, NULL, NULL)
             || vectorizable_store (stmt, NULL, NULL, NULL)
             || vectorizable_reduction (stmt, NULL, NULL, NULL)
-            || vectorizable_condition (stmt, NULL, NULL, NULL, 0));
+            || vectorizable_condition (stmt, NULL, NULL, NULL, 0, NULL, 0)); 
     else
       {
         if (bb_vinfo)
@@ -4382,7 +4544,7 @@ vect_transform_stmt (gimple stmt, gimple
 
     case condition_vec_info_type:
       gcc_assert (!slp_node);
-      done = vectorizable_condition (stmt, gsi, &vec_stmt, NULL, 0);
+      done = vectorizable_condition (stmt, gsi, &vec_stmt, NULL, 0, NULL, 0); 
       gcc_assert (done);
       break;
 
@@ -4520,6 +4682,7 @@ new_stmt_vec_info (gimple stmt, loop_vec
   STMT_VINFO_VEC_STMT (res) = NULL;
   STMT_VINFO_VECTORIZABLE (res) = true;
   STMT_VINFO_IN_PATTERN_P (res) = false;
+  STMT_VINFO_COMPOUND_PATTERN (res) = not_in_pattern;
   STMT_VINFO_RELATED_STMT (res) = NULL;
   STMT_VINFO_DATA_REF (res) = NULL;
 
@@ -4538,6 +4701,7 @@ new_stmt_vec_info (gimple stmt, loop_vec
   STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
   STMT_VINFO_INSIDE_OF_LOOP_COST (res) = 0;
   STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
+  STMT_VINFO_REDUC_SCALAR_RES_STMT (res) = NULL;
   STMT_SLP_TYPE (res) = loop_vect;
   DR_GROUP_FIRST_DR (res) = NULL;
   DR_GROUP_NEXT_DR (res) = NULL;
Index: config/rs6000/rs6000-builtin.def
===================================================================
--- config/rs6000/rs6000-builtin.def	(revision 161862)
+++ config/rs6000/rs6000-builtin.def	(working copy)
@@ -73,6 +73,8 @@ RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPGTSH,
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPGTUW,		RS6000_BTC_CONST)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPGTSW,		RS6000_BTC_CONST)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPGTFP,		RS6000_BTC_FP_PURE)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPLTFP,		RS6000_BTC_FP_PURE)
+RS6000_BUILTIN(ALTIVEC_BUILTIN_VCMPLEFP,		RS6000_BTC_FP_PURE)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VEXPTEFP,		RS6000_BTC_FP_PURE)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VLOGEFP,			RS6000_BTC_FP_PURE)
 RS6000_BUILTIN(ALTIVEC_BUILTIN_VMADDFP,			RS6000_BTC_FP_PURE)
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 161862)
+++ config/rs6000/rs6000.c	(working copy)
@@ -1077,6 +1077,7 @@ static bool rs6000_builtin_support_vecto
 							int, bool);
 static int rs6000_builtin_vectorization_cost (enum vect_cost_for_stmt,
                                               tree, int);
+static tree rs6000_builtin_vect_compare (unsigned int, tree, tree *);
 
 static void def_builtin (int, const char *, tree, int);
 static bool rs6000_vector_alignment_reachable (const_tree, bool);
@@ -1472,6 +1473,8 @@ static const struct attribute_spec rs600
 #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
   rs6000_builtin_vectorization_cost
+#undef TARGET_VECTORIZE_BUILTIN_VECT_CMP
+#define TARGET_VECTORIZE_BUILTIN_VECT_CMP rs6000_builtin_vect_compare
 
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
@@ -3515,6 +3518,45 @@ rs6000_builtin_vectorization_cost (enum 
     }
 }
 
+/* Implement targetm.vectorize.builtin_vect_compare.  */
+tree
+rs6000_builtin_vect_compare (unsigned int tcode, tree type, tree *return_type)
+{
+  enum tree_code code = (enum tree_code) tcode;
+
+  if (!TARGET_ALTIVEC)
+    return NULL_TREE;
+
+  switch (TYPE_MODE (type))
+    {
+    case V4SFmode:
+      *return_type = V4SF_type_node;
+      switch (code)
+        {
+          case GT_EXPR:
+            return rs6000_builtin_decls[ALTIVEC_BUILTIN_VCMPGTFP];
+            
+          case LT_EXPR:
+            return rs6000_builtin_decls[ALTIVEC_BUILTIN_VCMPLTFP];
+
+          case GE_EXPR:
+            return rs6000_builtin_decls[ALTIVEC_BUILTIN_VCMPGEFP];
+
+          case LE_EXPR:
+            return rs6000_builtin_decls[ALTIVEC_BUILTIN_VCMPLEFP];
+
+          case EQ_EXPR:
+            return rs6000_builtin_decls[ALTIVEC_BUILTIN_VCMPEQFP];
+
+          default:
+            return NULL_TREE;
+        }
+
+    default:
+      return NULL_TREE;
+    }
+}
+
 /* Handle generic options of the form -mfoo=yes/no.
    NAME is the option name.
    VALUE is the option value.
@@ -9337,6 +9379,8 @@ static struct builtin_description bdesc_
   { MASK_ALTIVEC, CODE_FOR_vector_gtuv4si, "__builtin_altivec_vcmpgtuw", ALTIVEC_BUILTIN_VCMPGTUW },
   { MASK_ALTIVEC, CODE_FOR_vector_gtv4si, "__builtin_altivec_vcmpgtsw", ALTIVEC_BUILTIN_VCMPGTSW },
   { MASK_ALTIVEC, CODE_FOR_vector_gtv4sf, "__builtin_altivec_vcmpgtfp", ALTIVEC_BUILTIN_VCMPGTFP },
+  { MASK_ALTIVEC, CODE_FOR_altivec_vcmpltfp, "__builtin_altivec_vcmpltfp", ALTIVEC_BUILTIN_VCMPLTFP },
+  { MASK_ALTIVEC, CODE_FOR_altivec_vcmplefp, "__builtin_altivec_vcmplefp", ALTIVEC_BUILTIN_VCMPLEFP },
   { MASK_ALTIVEC, CODE_FOR_altivec_vctsxs, "__builtin_altivec_vctsxs", ALTIVEC_BUILTIN_VCTSXS },
   { MASK_ALTIVEC, CODE_FOR_altivec_vctuxs, "__builtin_altivec_vctuxs", ALTIVEC_BUILTIN_VCTUXS },
   { MASK_ALTIVEC, CODE_FOR_umaxv16qi3, "__builtin_altivec_vmaxub", ALTIVEC_BUILTIN_VMAXUB },
Index: config/rs6000/altivec.md
===================================================================
--- config/rs6000/altivec.md	(revision 161862)
+++ config/rs6000/altivec.md	(working copy)
@@ -144,6 +144,8 @@
    (UNSPEC_VUPKHU_V4SF  326)
    (UNSPEC_VUPKLU_V4SF  327)
    (UNSPEC_VNMSUBFP	328)
+   (UNSPEC_VCMPLTFP     329)
+   (UNSPEC_VCMPLEFP     330)
 ])
 
 (define_constants
@@ -2802,3 +2804,22 @@
   emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx));
   DONE;
 }")
+
+
+(define_insn "altivec_vcmpltfp"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+        (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")
+                  (match_operand:V4SF 2 "register_operand" "v")]
+                   UNSPEC_VCMPLTFP))]
+  "TARGET_ALTIVEC"
+  "vcmpgtfp %0,%2,%1"
+  [(set_attr "type" "veccmp")])
+
+(define_insn "altivec_vcmplefp"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+        (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")
+                  (match_operand:V4SF 2 "register_operand" "v")]
+                   UNSPEC_VCMPLEFP))]
+  "TARGET_ALTIVEC"
+  "vcmpgefp %0,%2,%1"
+  [(set_attr "type" "veccmp")])
Index: tree-vect-slp.c
===================================================================
--- tree-vect-slp.c	(revision 161862)
+++ tree-vect-slp.c	(working copy)
@@ -146,6 +146,18 @@ vect_get_and_check_slp_defs (loop_vec_in
 	  return false;
 	}
 
+      if (def_stmt && vinfo_for_stmt (def_stmt)
+          && STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (def_stmt))) 
+        {
+          if (vect_print_dump_info (REPORT_SLP))
+            {
+              fprintf (vect_dump, "Build SLP failed: compound pattern ");
+              print_gimple_stmt (vect_dump, def_stmt, 0, TDF_SLIM);
+            }
+
+          return false;
+        }
+
       /* Check if DEF_STMT is a part of a pattern in LOOP and get the def stmt
          from the pattern. Check that all the stmts of the node are in the
          pattern.  */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [patch] Support vectorization of min/max location pattern
  2010-07-06  7:15 ` Ira Rosen
@ 2010-07-07 20:43   ` Richard Henderson
  2010-07-08  7:34     ` Ira Rosen
  2010-11-19 15:53   ` [RFC] [patch] Support vectorization of min/max location pattern H.J. Lu
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Henderson @ 2010-07-07 20:43 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

> +@hook TARGET_VECTORIZE_BUILTIN_VECT_COMPARE 
> +Target builtin that implements vector element-wise comparison.
> +The value of @var{code} is one of the enumerators in @code{enum tree_code} and
> +specifies comparison operation, @var{type} specifies the type of input vectors.
> +The function returns the type of the comparison result in @var{result_type}.
> +@end deftypefn
...
> +/* Target builtin that implements vector element-wise comparison.  */
> +DEFHOOK
> +(builtin_vect_compare,
> + "",
> + tree, (unsigned code, tree type, tree *return_type), NULL)

(1) The documentation should go into the DEFHOOK.
(2) result_type != return_type
(3) Missing articles before "comparison operation", "input vectors".

> +  /* The min/max stmt must be x < y ? x : y.  */

Why hasn't the cond_expr been simplified to min_expr already?
Why does this need to be done inside the vectorizer?  This seems
like a major conceptual problem to me.

What has BUILTIN_VECT_COMPARE really got to do with MIN/MAX?
We support MIN/MAX_EXPR with vector arguments, don't we?  And we
have direct support for min/max in optabs.  So I really don't see
why you need to be fiddling with builtins at all.

r~

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [patch] Support vectorization of min/max location pattern
  2010-07-07 20:43   ` Richard Henderson
@ 2010-07-08  7:34     ` Ira Rosen
  2010-07-08  9:21       ` Richard Guenther
  2010-07-08 17:15       ` Richard Henderson
  0 siblings, 2 replies; 16+ messages in thread
From: Ira Rosen @ 2010-07-08  7:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches



Richard Henderson <rth@redhat.com> wrote on 07/07/2010 11:42:55 PM:

> > +@hook TARGET_VECTORIZE_BUILTIN_VECT_COMPARE
> > +Target builtin that implements vector element-wise comparison.
> > +The value of @var{code} is one of the enumerators in @code{enum
> tree_code} and
> > +specifies comparison operation, @var{type} specifies the type of
> input vectors.
> > +The function returns the type of the comparison result in @var
> {result_type}.
> > +@end deftypefn
> ...
> > +/* Target builtin that implements vector element-wise comparison.  */
> > +DEFHOOK
> > +(builtin_vect_compare,
> > + "",
> > + tree, (unsigned code, tree type, tree *return_type), NULL)
>
> (1) The documentation should go into the DEFHOOK.

I am sorry, but I don't understand what you mean.

> (2) result_type != return_type
> (3) Missing articles before "comparison operation", "input vectors".

I'll fix that.

>
> > +  /* The min/max stmt must be x < y ? x : y.  */
>
> Why hasn't the cond_expr been simplified to min_expr already?
> Why does this need to be done inside the vectorizer?  This seems
> like a major conceptual problem to me.

I guess it happens because of location computation.

>
> What has BUILTIN_VECT_COMPARE really got to do with MIN/MAX?
> We support MIN/MAX_EXPR with vector arguments, don't we?  And we
> have direct support for min/max in optabs.  So I really don't see
> why you need to be fiddling with builtins at all.

BUILTIN_VECT_COMPARE is used for location and not for MIN/MAX. There are
two statements to vectorize: min/max and location computation. And min/max
is vectorized as you described. Location has different types in condition
(float) and then/else (integer).

Thanks,
Ira


>
>
> r~

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [patch] Support vectorization of min/max location pattern
  2010-07-08  7:34     ` Ira Rosen
@ 2010-07-08  9:21       ` Richard Guenther
  2010-07-08 17:15       ` Richard Henderson
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Guenther @ 2010-07-08  9:21 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Richard Henderson, gcc-patches

On Thu, Jul 8, 2010 at 9:33 AM, Ira Rosen <IRAR@il.ibm.com> wrote:
>
>
> Richard Henderson <rth@redhat.com> wrote on 07/07/2010 11:42:55 PM:
>
>> > +@hook TARGET_VECTORIZE_BUILTIN_VECT_COMPARE
>> > +Target builtin that implements vector element-wise comparison.
>> > +The value of @var{code} is one of the enumerators in @code{enum
>> tree_code} and
>> > +specifies comparison operation, @var{type} specifies the type of
>> input vectors.
>> > +The function returns the type of the comparison result in @var
>> {result_type}.
>> > +@end deftypefn
>> ...
>> > +/* Target builtin that implements vector element-wise comparison.  */
>> > +DEFHOOK
>> > +(builtin_vect_compare,
>> > + "",
>> > + tree, (unsigned code, tree type, tree *return_type), NULL)
>>
>> (1) The documentation should go into the DEFHOOK.
>
> I am sorry, but I don't understand what you mean.
>
>> (2) result_type != return_type
>> (3) Missing articles before "comparison operation", "input vectors".
>
> I'll fix that.
>
>>
>> > +  /* The min/max stmt must be x < y ? x : y.  */
>>
>> Why hasn't the cond_expr been simplified to min_expr already?
>> Why does this need to be done inside the vectorizer?  This seems
>> like a major conceptual problem to me.
>
> I guess it happens because of location computation.

I suppose it is if-conversion creating the above.  If so it should be
fixed to fold the COND_EXPRs it creates.

      /* Build new RHS using selected condition and arguments.  */
      rhs = build3 (COND_EXPR, TREE_TYPE (PHI_RESULT (phi)),
                    unshare_expr (cond), arg_0, arg_1);

Richard.

>> What has BUILTIN_VECT_COMPARE really got to do with MIN/MAX?
>> We support MIN/MAX_EXPR with vector arguments, don't we?  And we
>> have direct support for min/max in optabs.  So I really don't see
>> why you need to be fiddling with builtins at all.
>
> BUILTIN_VECT_COMPARE is used for location and not for MIN/MAX. There are
> two statements to vectorize: min/max and location computation. And min/max
> is vectorized as you described. Location has different types in condition
> (float) and then/else (integer).
>
> Thanks,
> Ira
>
>
>>
>>
>> r~
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [patch] Support vectorization of min/max location pattern
  2010-07-08  7:34     ` Ira Rosen
  2010-07-08  9:21       ` Richard Guenther
@ 2010-07-08 17:15       ` Richard Henderson
  2010-07-08 18:20         ` Ira Rosen
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Henderson @ 2010-07-08 17:15 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

On 07/08/2010 12:33 AM, Ira Rosen wrote:
>> (1) The documentation should go into the DEFHOOK.
> 
> I am sorry, but I don't understand what you mean.

DEFHOOK
(builtin_vect_compare,
 "This hook returns a target builtin..."
 tree, (unsigned code, tree type, tree *return_type), NULL)

For new code, the only thing that goes in tm.texi.in is

@hook TARGET_VECTORIZE_BUILTIN_VECT_COMPARE

> BUILTIN_VECT_COMPARE is used for location and not for MIN/MAX. There
> are two statements to vectorize: min/max and location computation.
> And min/max is vectorized as you described. Location has different
> types in condition (float) and then/else (integer).

Ah, well it seems I didn't really know what I was reviewing.
What is "location computation" in this context?


r~

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [patch] Support vectorization of min/max location pattern
  2010-07-08 17:15       ` Richard Henderson
@ 2010-07-08 18:20         ` Ira Rosen
  2010-07-08 20:10           ` Richard Henderson
  0 siblings, 1 reply; 16+ messages in thread
From: Ira Rosen @ 2010-07-08 18:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches



Richard Henderson <rth@redhat.com> wrote on 08/07/2010 08:15:25 PM:

> On 07/08/2010 12:33 AM, Ira Rosen wrote:
> >> (1) The documentation should go into the DEFHOOK.
> >
> > I am sorry, but I don't understand what you mean.
>
> DEFHOOK
> (builtin_vect_compare,
>  "This hook returns a target builtin..."
>  tree, (unsigned code, tree type, tree *return_type), NULL)
>
> For new code, the only thing that goes in tm.texi.in is
>
> @hook TARGET_VECTORIZE_BUILTIN_VECT_COMPARE

Thanks, I'll fix this.

>
> > BUILTIN_VECT_COMPARE is used for location and not for MIN/MAX. There
> > are two statements to vectorize: min/max and location computation.
> > And min/max is vectorized as you described. Location has different
> > types in condition (float) and then/else (integer).
>
> Ah, well it seems I didn't really know what I was reviewing.
> What is "location computation" in this context?

It's minloc pattern, i.e., a loop that finds the location of the minimum:

  float  arr[N};

  for (i = 0; i < N; i++)
    if (arr[i] < limit)
      {
        pos = i + 1;
        limit = arr[i];
      }

Vectorizer's input code:

  # pos_22 = PHI <pos_1(4), 1(2)>
  # limit_24 = PHI <limit_4(4), 0(2)>
  ...
  pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22;       //
location
  limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24;  // min


Please see my original mail for some more details
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00018.html.

Ira

>
>
> r~

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [patch] Support vectorization of min/max location pattern
  2010-07-08 18:20         ` Ira Rosen
@ 2010-07-08 20:10           ` Richard Henderson
  2010-08-09  7:55             ` [patch] Support vectorization of min/max location pattern - take 2 Ira Rosen
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Henderson @ 2010-07-08 20:10 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

On 07/08/2010 11:19 AM, Ira Rosen wrote:
> It's minloc pattern, i.e., a loop that finds the location of the minimum:
> 
>   float  arr[N};
> 
>   for (i = 0; i < N; i++)
>     if (arr[i] < limit)
>       {
>         pos = i + 1;
>         limit = arr[i];
>       }
> 
> Vectorizer's input code:
> 
>   # pos_22 = PHI <pos_1(4), 1(2)>
>   # limit_24 = PHI <limit_4(4), 0(2)>
>   ...
>   pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22;       //
> location
>   limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24;  // min

Ok, I get it now.

So your thinking was that you needed the builtin to replace the
comparison portion of the VEC_COND_EXPR?  Or, looking again I see
that you don't actually use VEC_COND_EXPR, you use ...

> +  /* Create: VEC_DEST = (VEC_OPRND1 & MASK) | (VEC_OPRND2 & !MASK).  */ 

... explicit masking.  I.e. you assume that the return value of
the builtin is a bit mask of the full width, and that there's no
better way to implement the VEC_COND.

I wonder if it wouldn't be better to extend the definition
of VEC_COND_EXPR so that the comparison values can be of a 
different type than the data operands (with the caveat that the
number of elements should be the same -- i.e. 4-wide compare must
match 4-wide data movement).

I can think of 2 portability problems with your current solution:

(1) SSE4.1 would prefer to use BLEND instructions, which perform
    that entire (X & M) | (Y & ~M) operation in one insn.

(2) The mips C.cond.PS instruction does *not* produce a bitmask
    like altivec or sse do.  Instead it sets multiple condition
    codes.  One then uses MOV[TF].PS to merge the elements based
    on the individual condition codes.  While there's no direct
    corresponding instruction that will operate on integers, I
    don't think it would be too difficult to use MOV[TF].G or
    BC1AND2[FT] instructions to emulate it.  In any case, this 
    is again a case where you don't want to expose any part of
    the VEC_COND at the gimple level.


r~

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [patch] Support vectorization of min/max location pattern - take 2
  2010-07-08 20:10           ` Richard Henderson
@ 2010-08-09  7:55             ` Ira Rosen
  2010-08-09 10:05               ` Richard Guenther
  0 siblings, 1 reply; 16+ messages in thread
From: Ira Rosen @ 2010-08-09  7:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7907 bytes --]

Richard Henderson <rth@redhat.com> wrote on 08/07/2010 11:10:37 PM:

> On 07/08/2010 11:19 AM, Ira Rosen wrote:
> > It's minloc pattern, i.e., a loop that finds the location of the
minimum:
> >
> >   float  arr[N};
> >
> >   for (i = 0; i < N; i++)
> >     if (arr[i] < limit)
> >       {
> >         pos = i + 1;
> >         limit = arr[i];
> >       }
> >
> > Vectorizer's input code:
> >
> >   # pos_22 = PHI <pos_1(4), 1(2)>
> >   # limit_24 = PHI <limit_4(4), 0(2)>
> >   ...
> >   pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22;       //
> > location
> >   limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24;  //
min
>
> Ok, I get it now.
>
> So your thinking was that you needed the builtin to replace the
> comparison portion of the VEC_COND_EXPR?  Or, looking again I see
> that you don't actually use VEC_COND_EXPR, you use ...
>
> > +  /* Create: VEC_DEST = (VEC_OPRND1 & MASK) | (VEC_OPRND2 & !MASK).
*/
>
> ... explicit masking.  I.e. you assume that the return value of
> the builtin is a bit mask of the full width, and that there's no
> better way to implement the VEC_COND.
>
> I wonder if it wouldn't be better to extend the definition
> of VEC_COND_EXPR so that the comparison values can be of a
> different type than the data operands (with the caveat that the
> number of elements should be the same -- i.e. 4-wide compare must
> match 4-wide data movement).

I implemented VEC_COND_EXPR extension in the attached patch.

For reduction epilogue I defined new tree codes
REDUC_MIN/MAX_FIRST/LAST_LOC_EXPR.

Bootstrapped and tested on powerpc64-suse-linux.
OK for mainline?

Thanks,
Ira

ChangeLog:

	* tree-pretty-print.c (dump_generic_node): Handle new codes.
	* optabs.c (optab_for_tree_code): Likewise.
	(init_optabs): Initialize new optabs.
	(get_vcond_icode): Handle vector condition with different types
	of comparison and then/else operands.
	(expand_vec_cond_expr_p, expand_vec_cond_expr): Likewise.
	(get_vec_reduc_minloc_expr_icode): New function.
	(expand_vec_reduc_minloc_expr): New function.
	* optabs.h (enum convert_optab_index): Add new optabs.
	(vcondc_optab): Define.
	(vcondcu_optab, reduc_min_first_loc_optab, reduc_min_last_loc_optab,
	reduc_max_last_loc_optab): Likewise.
	(expand_vec_cond_expr_p): Add arguments.
	(get_vec_reduc_minloc_expr_code): Declare.
	(expand_vec_reduc_minloc_expr): Declare.
	* genopinit.c (optabs): Add vcondc_optab, vcondcu_optab,
	reduc_min_first_loc_optab, reduc_min_last_loc_optab,
	reduc_max_last_loc_optab.
	* rtl.def (GEF): New rtx.
	(GTF, LEF, LTF, EQF, NEQF): Likewise.
	* jump.c (reverse_condition): Handle new rtx.
	(swap_condition): Likewise.
	* expr.c (expand_expr_real_2): Expand new reduction tree codes.
	* gimple-pretty-print.c (dump_binary_rhs): Print new codes.
	* tree-vectorizer.h (enum vect_compound_pattern): New.
	(struct _stmt_vec_info): Add new field compound_pattern. Add macro
	to access it.
	(is_pattern_stmt_p): Return true for compound pattern.
	(get_minloc_reduc_epilogue_code): New.
	(vectorizable_condition): Add arguments.
	(vect_recog_compound_func_ptr): New function-pointer type.
	(NUM_COMPOUND_PATTERNS): New.
	(vect_compound_pattern_recog): Declare.
	* tree-vect-loop.c (vect_determine_vectorization_factor): Fix assert
	for compound patterns.
	(vect_analyze_scalar_cycles_1): Fix typo. Detect compound reduction
	patterns. Update comment.
	(vect_analyze_scalar_cycles): Update comment.
	(destroy_loop_vec_info): Update def stmt for the original pattern
	statement.
	(vect_is_simple_reduction_1): Skip compound pattern statements in
	uses check. Add spaces. Skip commutativity and type checks for
	minimum location statement. Fix printings.
	(vect_model_reduction_cost): Add min/max location pattern cost
	computation.
	(vect_create_epilog_for_reduction): Don't retrieve the original
	statement for compound pattern. Fix comment accordingly. Get tree
	code for reduction epilogue of min/max location computation
	according to the comparison operation. Don't expect to find an
	exit phi node for min/max statement.
	(vectorizable_reduction): Skip check for uses in loop for compound
	patterns. Don't retrieve the original statement for compound pattern.
	Call vectorizable_condition () with additional parameters. Skip
	reduction code check for compound patterns. Prepare operands for
	min/max location statement vectorization and pass them to
	vectorizable_condition ().
	(vectorizable_live_operation): Return TRUE for compound patterns.
	* tree.def (REDUC_MIN_FIRST_LOC_EXPR): Define.
	(REDUC_MIN_LAST_LOC_EXPR, REDUC_MAX_FIRST_LOC_EXPR,
	REDUC_MAX_LAST_LOC_EXPR): Likewise.
	* cfgexpand.c (expand_debug_expr): Handle new tree codes.
	* tree-vect-patterns.c (vect_recog_min_max_loc_pattern): Declare.
	(vect_recog_compound_func_ptrs): Likewise.
	(vect_recog_min_max_loc_pattern): New function.
	(vect_compound_pattern_recog): Likewise.
	* tree-vect-stmts.c (process_use): Mark compound pattern statements
as
	used by reduction.
	(vect_mark_stmts_to_be_vectorized): Allow compound pattern statements
	to be used by reduction.
	(vectorizable_condition): Update comment, add arguments. Skip checks
	irrelevant for compound pattern. Check that if comparison and
then/else
	operands are of different types, the size of the types is equal.Check
	that reduction epilogue, if needed, is supported. Prepare operands
	using new arguments.
	(vect_analyze_stmt): Allow nested cycle statements to be used by
	reduction. Call vectorizable_condition () with additional arguments.
	(vect_transform_stmt): Call vectorizable_condition () with additional
	arguments.
	(new_stmt_vec_info): Initialize new fields.
	* tree-inline.c (estimate_operator_cost): Handle new tree codes.
	* tree-vect-generic.c (expand_vector_operations_1): Likewise.
	* tree-cfg.c (verify_gimple_assign_binary): Likewise.
	* config/rs6000/rs6000.c (rs6000_emit_vector_compare_inner): Add
	argument. Handle new rtx.
	(rs6000_emit_vector_compare): Handle the case of result type
different
	from the operands, update calls to rs6000_emit_vector_compare_inner
().
	(rs6000_emit_vector_cond_expr): Use new codes in case of different
	types.
	* config/rs6000/altivec.md (UNSPEC_REDUC_MINLOC): New.
	(altivec_gefv4sf): New pattern.
	(altivec_gtfv4sf, altivec_eqfv4sf, reduc_min_first_loc_v4sfv4si,
	reduc_min_last_loc_v4sfv4si, reduc_max_first_loc_v4sfv4si,
	reduc_max_last_loc_v4sfv4si): Likewise.
	* tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for compound
	patterns.

testsuite/ChangeLog:

	* gcc.dg/vect/vect.exp: Define how to run tests named fast-math*.c
	* lib/target-supports.exp (check_effective_target_vect_cmp): New.
	* gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test.
	* gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c,
	gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c,
	gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c,
	gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c,
	gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c,
	gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c,
	gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c,
	gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c,
	gcc.dg/vect/fast-math-no-pre-minmax-loc-10.c: Likewise.


(See attached file: minloc.txt)

>
> I can think of 2 portability problems with your current solution:
>
> (1) SSE4.1 would prefer to use BLEND instructions, which perform
>     that entire (X & M) | (Y & ~M) operation in one insn.
>
> (2) The mips C.cond.PS instruction does *not* produce a bitmask
>     like altivec or sse do.  Instead it sets multiple condition
>     codes.  One then uses MOV[TF].PS to merge the elements based
>     on the individual condition codes.  While there's no direct
>     corresponding instruction that will operate on integers, I
>     don't think it would be too difficult to use MOV[TF].G or
>     BC1AND2[FT] instructions to emulate it.  In any case, this
>     is again a case where you don't want to expose any part of
>     the VEC_COND at the gimple level.
>
>
> r~

[-- Attachment #2: minloc.txt --]
[-- Type: text/plain, Size: 94371 bytes --]

Index: tree-pretty-print.c
===================================================================
--- tree-pretty-print.c	(revision 162994)
+++ tree-pretty-print.c	(working copy)
@@ -2182,6 +2182,38 @@ dump_generic_node (pretty_printer *buffe
       pp_string (buffer, " > ");
       break;
 
+    case REDUC_MIN_FIRST_LOC_EXPR:
+      pp_string (buffer, " REDUC_MIN_FIRST_LOC_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
+    case REDUC_MIN_LAST_LOC_EXPR:
+      pp_string (buffer, " REDUC_MIN_LAST_LOC_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
+    case REDUC_MAX_FIRST_LOC_EXPR:
+      pp_string (buffer, " REDUC_MAX_FIRST_LOC_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
+    case REDUC_MAX_LAST_LOC_EXPR:
+      pp_string (buffer, " REDUC_MAX_LAST_LOC_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
     case VEC_WIDEN_MULT_HI_EXPR:
       pp_string (buffer, " VEC_WIDEN_MULT_HI_EXPR < ");
       dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
Index: optabs.c
===================================================================
--- optabs.c	(revision 162994)
+++ optabs.c	(working copy)
@@ -383,6 +383,30 @@ optab_for_tree_code (enum tree_code code
     case REDUC_PLUS_EXPR:
       return TYPE_UNSIGNED (type) ? reduc_uplus_optab : reduc_splus_optab;
 
+    case REDUC_MIN_FIRST_LOC_EXPR:
+      if (VECTOR_FLOAT_TYPE_P (type))
+        return (optab) reduc_min_first_loc_optab;      
+      else
+        return NULL;
+
+    case REDUC_MIN_LAST_LOC_EXPR:
+      if (VECTOR_FLOAT_TYPE_P (type))
+        return (optab) reduc_min_last_loc_optab;
+      else
+        return NULL;
+
+    case REDUC_MAX_FIRST_LOC_EXPR:
+      if (VECTOR_FLOAT_TYPE_P (type))
+        return (optab) reduc_max_first_loc_optab;
+      else
+        return NULL;
+
+    case REDUC_MAX_LAST_LOC_EXPR:
+      if (VECTOR_FLOAT_TYPE_P (type))
+        return (optab) reduc_max_last_loc_optab;
+      else
+        return NULL;
+
     case VEC_LSHIFT_EXPR:
       return vec_shl_optab;
 
@@ -6314,6 +6338,13 @@ init_optabs (void)
   init_convert_optab (satfract_optab, SAT_FRACT);
   init_convert_optab (satfractuns_optab, UNSIGNED_SAT_FRACT);
 
+  init_convert_optab (vcondc_optab, UNKNOWN);
+  init_convert_optab (vcondcu_optab, UNKNOWN);
+  init_convert_optab (reduc_min_first_loc_optab, UNKNOWN);
+  init_convert_optab (reduc_min_last_loc_optab, UNKNOWN);
+  init_convert_optab (reduc_max_first_loc_optab, UNKNOWN);
+  init_convert_optab (reduc_max_last_loc_optab, UNKNOWN);
+
   /* Fill in the optabs with the insns we support.  */
   init_all_optabs ();
 
@@ -6762,14 +6793,25 @@ vector_compare_rtx (tree cond, bool unsi
 /* Return insn code for TYPE, the type of a VEC_COND_EXPR.  */
 
 static inline enum insn_code
-get_vcond_icode (tree type, enum machine_mode mode)
+get_vcond_icode (tree type, enum machine_mode mode, tree vec_cmp_type,
+                 enum machine_mode cmp_mode)
 {
   enum insn_code icode = CODE_FOR_nothing;
 
-  if (TYPE_UNSIGNED (type))
-    icode = direct_optab_handler (vcondu_optab, mode);
+  if (type != vec_cmp_type)
+    {  
+      if (TYPE_UNSIGNED (type))
+        icode = convert_optab_handler (vcondcu_optab, mode, cmp_mode);
+      else
+        icode = convert_optab_handler (vcondc_optab, mode, cmp_mode);
+    }
   else
-    icode = direct_optab_handler (vcond_optab, mode);
+    {
+      if (TYPE_UNSIGNED (type))
+        icode = direct_optab_handler (vcondu_optab, mode);
+      else
+        icode = direct_optab_handler (vcond_optab, mode);
+    }
   return icode;
 }
 
@@ -6777,9 +6819,11 @@ get_vcond_icode (tree type, enum machine
    for vector cond expr with type TYPE in VMODE mode.  */
 
 bool
-expand_vec_cond_expr_p (tree type, enum machine_mode vmode)
+expand_vec_cond_expr_p (tree type, enum machine_mode vmode, 
+                        tree vec_cmp_type, enum machine_mode cmp_mode)
 {
-  if (get_vcond_icode (type, vmode) == CODE_FOR_nothing)
+  if (get_vcond_icode (type, vmode, vec_cmp_type, cmp_mode) 
+      == CODE_FOR_nothing)
     return false;
   return true;
 }
@@ -6794,13 +6838,15 @@ expand_vec_cond_expr (tree vec_cond_type
   enum insn_code icode;
   rtx comparison, rtx_op1, rtx_op2, cc_op0, cc_op1;
   enum machine_mode mode = TYPE_MODE (vec_cond_type);
-  bool unsignedp = TYPE_UNSIGNED (vec_cond_type);
+  tree vec_cmp_type = TREE_TYPE (op0);
+  enum machine_mode cmp_mode = TYPE_MODE (vec_cmp_type);
+  bool unsignedp = TYPE_UNSIGNED (vec_cmp_type);
 
-  icode = get_vcond_icode (vec_cond_type, mode);
+  icode = get_vcond_icode (vec_cond_type, mode, vec_cmp_type, cmp_mode);
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  if (!target || !insn_data[icode].operand[0].predicate (target, mode))
+  if (!target || !insn_data[icode].operand[0].predicate (target, cmp_mode))
     target = gen_reg_rtx (mode);
 
   /* Get comparison rtx.  First expand both cond expr operands.  */
@@ -6826,6 +6872,55 @@ expand_vec_cond_expr (tree vec_cond_type
   return target;
 }
 
+/* Return instruction code for vector reduction epilogue for CODE.  */
+enum insn_code
+get_vec_reduc_minloc_expr_code (enum tree_code code, tree type0, tree type1)
+{
+  enum machine_mode mode0 = TYPE_MODE (type0);
+  enum machine_mode mode1 = TYPE_MODE (type1);
+  convert_optab this_optab;
+
+  this_optab = (convert_optab) optab_for_tree_code (code, type0, 
+                                                    optab_default);
+  return convert_optab_handler (this_optab, mode1, mode0);
+}
+
+/* Expand vector reduction epilogue for min/max location.  */ 
+rtx
+expand_vec_reduc_minloc_expr (enum tree_code code, tree op0, tree op1, 
+                              rtx target)
+{
+  enum insn_code icode;
+  rtx rtx_op0, rtx_op1;
+  enum machine_mode mode0 = TYPE_MODE (TREE_TYPE (op0));
+  enum machine_mode mode1 = TYPE_MODE (TREE_TYPE (op1));
+
+  icode = get_vec_reduc_minloc_expr_code (code, TREE_TYPE (op0), 
+                                          TREE_TYPE (op1)); 
+  if (icode == CODE_FOR_nothing)
+    return 0;
+
+  if (!target || !insn_data[icode].operand[0].predicate (target, mode1))
+    target = gen_reg_rtx (mode1);
+
+  /* Expand both operands and force them in reg, if required.  */
+  rtx_op0 = expand_normal (op0);
+  if (!insn_data[icode].operand[1].predicate (rtx_op0, mode0)
+      && mode0 != VOIDmode)
+    rtx_op0 = force_reg (mode0, rtx_op0);
+
+  rtx_op1 = expand_normal (op1);
+  if (!insn_data[icode].operand[2].predicate (rtx_op1, mode1)
+      && mode1 != VOIDmode)
+    rtx_op1 = force_reg (mode1, rtx_op1);
+
+  /* Emit instruction! */
+  emit_insn (GEN_FCN (icode) (target, rtx_op0, rtx_op1));
+
+  return target;
+}
+
+
 \f
 /* This is an internal subroutine of the other compare_and_swap expanders.
    MEM, OLD_VAL and NEW_VAL are as you'd expect for a compare-and-swap
Index: optabs.h
===================================================================
--- optabs.h	(revision 162994)
+++ optabs.h	(working copy)
@@ -569,6 +569,15 @@ enum convert_optab_index
   COI_satfract,
   COI_satfractuns,
 
+  COI_vcondc,
+  COI_vcondcu,
+
+  COI_reduc_min_first_loc,
+  COI_reduc_min_last_loc,
+  COI_reduc_max_first_loc,
+  COI_reduc_max_last_loc,
+
+
   COI_MAX
 };
 
@@ -589,6 +598,13 @@ enum convert_optab_index
 #define fractuns_optab (&convert_optab_table[COI_fractuns])
 #define satfract_optab (&convert_optab_table[COI_satfract])
 #define satfractuns_optab (&convert_optab_table[COI_satfractuns])
+#define vcondc_optab (&convert_optab_table[COI_vcondc])
+#define vcondcu_optab (&convert_optab_table[COI_vcondcu])
+#define reduc_min_first_loc_optab (&convert_optab_table[COI_reduc_min_first_loc])
+#define reduc_min_last_loc_optab (&convert_optab_table[COI_reduc_min_last_loc])
+#define reduc_max_first_loc_optab (&convert_optab_table[COI_reduc_max_first_loc])
+#define reduc_max_last_loc_optab (&convert_optab_table[COI_reduc_max_last_loc])
+
 
 /* Contains the optab used for each rtx code.  */
 extern optab code_to_optab[NUM_RTX_CODE + 1];
@@ -842,14 +858,20 @@ extern bool expand_sfix_optab (rtx, rtx,
 /* Generate code for a widening multiply.  */
 extern rtx expand_widening_mult (enum machine_mode, rtx, rtx, rtx, int, optab);
 
-/* Return tree if target supports vector operations for COND_EXPR.  */
-bool expand_vec_cond_expr_p (tree, enum machine_mode);
+/* Return true if target supports vector operations for COND_EXPR.  */
+bool expand_vec_cond_expr_p (tree, enum machine_mode, tree, enum machine_mode);
 
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
 /* Generate code for VEC_LSHIFT_EXPR and VEC_RSHIFT_EXPR.  */
 extern rtx expand_vec_shift_expr (sepops, rtx);
 
+/* Return the instruction for REDUC_MIN/MAX_FIRST/LAST_EXPR.  */
+enum insn_code get_vec_reduc_minloc_expr_code (enum tree_code, tree, tree);
+/* Generate code for REDUC_MIN/MAX_FIRST/LAST_EXPR.  */
+extern rtx expand_vec_reduc_minloc_expr (enum tree_code, tree, tree, rtx);
+
+
 /* Return the insn used to implement mode MODE of OP, or CODE_FOR_nothing
    if the target does not have such an insn.  */
 
Index: genopinit.c
===================================================================
--- genopinit.c	(revision 162994)
+++ genopinit.c	(working copy)
@@ -246,6 +246,8 @@ static const char * const optabs[] =
   "set_optab_handler (vec_realign_load_optab, $A, CODE_FOR_$(vec_realign_load_$a$))",
   "set_direct_optab_handler (vcond_optab, $A, CODE_FOR_$(vcond$a$))",
   "set_direct_optab_handler (vcondu_optab, $A, CODE_FOR_$(vcondu$a$))",
+  "set_convert_optab_handler (vcondc_optab, $B, $A, CODE_FOR_$(vcondc$F$a$I$b$))",
+  "set_convert_optab_handler (vcondcu_optab, $B, $A, CODE_FOR_$(vcondcu$F$a$I$b$))",
   "set_optab_handler (ssum_widen_optab, $A, CODE_FOR_$(widen_ssum$I$a3$))",
   "set_optab_handler (usum_widen_optab, $A, CODE_FOR_$(widen_usum$I$a3$))",
   "set_optab_handler (udot_prod_optab, $A, CODE_FOR_$(udot_prod$I$a$))",
@@ -256,6 +258,10 @@ static const char * const optabs[] =
   "set_optab_handler (reduc_umin_optab, $A, CODE_FOR_$(reduc_umin_$a$))",
   "set_optab_handler (reduc_splus_optab, $A, CODE_FOR_$(reduc_splus_$a$))" ,
   "set_optab_handler (reduc_uplus_optab, $A, CODE_FOR_$(reduc_uplus_$a$))",
+  "set_convert_optab_handler (reduc_min_first_loc_optab, $B, $A, CODE_FOR_$(reduc_min_first_loc_$F$a$I$b$))",
+  "set_convert_optab_handler (reduc_min_last_loc_optab, $B, $A, CODE_FOR_$(reduc_min_last_loc_$F$a$I$b$))",
+  "set_convert_optab_handler (reduc_max_first_loc_optab, $B, $A, CODE_FOR_$(reduc_max_first_loc_$F$a$I$b$))",
+  "set_convert_optab_handler (reduc_max_last_loc_optab, $B, $A, CODE_FOR_$(reduc_max_last_loc_$F$a$I$b$))",
   "set_optab_handler (vec_widen_umult_hi_optab, $A, CODE_FOR_$(vec_widen_umult_hi_$a$))",
   "set_optab_handler (vec_widen_umult_lo_optab, $A, CODE_FOR_$(vec_widen_umult_lo_$a$))",
   "set_optab_handler (vec_widen_smult_hi_optab, $A, CODE_FOR_$(vec_widen_smult_hi_$a$))",
Index: testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c
===================================================================
--- testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c	(revision 0)
+++ testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c	(revision 0)
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+__attribute__ ((noinline)) 
+int foo ()
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = 12;
+
+  for (i = 0; i < N; i++)
+    if (arr[i] > limit)
+      {
+        pos = i + 1;
+        limit = arr[i];
+      }
+
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr [2] = N + 15.8;
+
+  pos = foo ();
+  if (pos != 3)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_cmp } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c
===================================================================
--- testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c	(revision 0)
+++ testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c	(revision 0)
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+__attribute__ ((noinline)) 
+int foo ()
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = N+N;
+
+  for (i = 0; i < N; i++)
+    if (arr[i] < limit)
+      {
+        pos = i + 1;
+        limit = arr[i];
+      }
+
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr [2] = -5.8;
+
+  pos = foo ();
+  if (pos != 3)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_cmp } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c
===================================================================
--- testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c	(revision 0)
+++ testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c	(revision 0)
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+__attribute__ ((noinline)) 
+int foo (unsigned int n, float *min)
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = N+N;
+
+  for (i = 0; i < N; i++)
+    if (arr[i] < limit)
+      {
+        pos = i + 1;
+        limit = arr[i];
+      }
+
+  *min = limit;
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+  float min;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr[2] = -5.8;
+
+  pos = foo (N, &min);
+  if (pos != 3 || min != arr[2])
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_cmp } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c
===================================================================
--- testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c	(revision 0)
+++ testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c	(revision 0)
@@ -0,0 +1,55 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+short a[N];
+
+/* Loop with multiple types - currently not supported.  */
+__attribute__ ((noinline)) 
+int foo (unsigned int n, float *min, short x)
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = N+N;
+
+  for (i = 0; i < n; i++)
+    {
+      if (arr[i] < limit)
+        {
+          limit = arr[i];
+          pos = i + 1;
+        }
+
+      a[i] = x;
+    }
+
+  *min = limit;
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+  float min;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr[2] = -5.8;
+
+  pos = foo (N, &min, 6);
+  if (pos != 3 || min != arr[2])
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-10.c
===================================================================
--- testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-10.c	(revision 0)
+++ testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-10.c	(revision 0)
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+__attribute__ ((noinline)) 
+int foo ()
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = N+N;
+
+  for (i = 0; i < N; i++)
+    if (arr[i] <= limit)
+      {
+        pos = i + 1;
+        limit = arr[i];
+      }
+
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr [2] = -5.8;
+
+  pos = foo ();
+  if (pos != 3)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_cmp } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c
===================================================================
--- testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c	(revision 0)
+++ testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c	(revision 0)
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+#define MAX_VALUE N+N
+float arr[N];
+
+/* Not minloc pattern - different conditions.  */
+__attribute__ ((noinline)) 
+int foo ()
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = MAX_VALUE;
+
+  for (i = 0; i < N; i++)
+    {
+      if (arr[i] < limit)
+        pos = i + 1;
+
+      if (arr[i] > limit)
+        limit = arr[i];
+    }
+
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr[2] = -5.8;
+
+  pos = foo ();
+
+  if (pos != N)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c
===================================================================
--- testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c	(revision 0)
+++ testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c	(revision 0)
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+/* Not minloc pattern: position is not induction.  */
+__attribute__ ((noinline)) 
+int foo (unsigned int n, float *min)
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = N+N;
+
+  for (i = 0; i < n; i++)
+    if (arr[i] < limit)
+      {
+        pos = 5;
+        limit = arr[i];
+      }
+
+  *min = limit;
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+  float min;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr[2] = -5.8;
+
+  pos = foo (N, &min);
+  if (pos != 5 || min != arr[2])
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c
===================================================================
--- testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c	(revision 0)
+++ testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c	(revision 0)
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+/* Position and minimum are of types of different sizes - not supported.  */
+__attribute__ ((noinline)) 
+int foo (unsigned short n, float *min)
+{
+  unsigned short pos = 1;
+  unsigned short i;
+  float limit = N+N;
+
+  for (i = 0; i < n; i++)
+    if (arr[i] < limit)
+      {
+        pos = i + 1;
+        limit = arr[i];
+      }
+
+  *min = limit;
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+  float min;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr[2] = -5.8;
+
+  pos = foo (N, &min);
+  if (pos != 3 || min != arr[2])
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/vect.exp
===================================================================
--- testsuite/gcc.dg/vect/vect.exp	(revision 162994)
+++ testsuite/gcc.dg/vect/vect.exp	(working copy)
@@ -159,9 +159,27 @@ dg-runtest [lsort [glob -nocomplain $src
 # -ffast-math tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-ffast-math"
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-*.\[cS\]]]  \
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-pr*.\[cS\]]]  \
 	"" $DEFAULT_VECTCFLAGS
 
+# -ffast-math tests
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-ffast-math"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-slp*.\[cS\]]]  \
+        "" $DEFAULT_VECTCFLAGS
+
+# -ffast-math tests
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-ffast-math"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-vect*.\[cS\]]]  \
+        "" $DEFAULT_VECTCFLAGS
+
+# -ffast-math and -fno-tree-pre tests
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-ffast-math" "-fno-tree-pre"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-no-pre*.\[cS\]]]  \
+        "" $DEFAULT_VECTCFLAGS
+
 # -fno-math-errno tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fno-math-errno"
Index: testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c
===================================================================
--- testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c	(revision 0)
+++ testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c	(revision 0)
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N][N];
+
+/* Double reduction.  */
+__attribute__ ((noinline)) 
+int foo ()
+{
+  unsigned int pos = 1;
+  unsigned int i, j;
+  float limit = N+N;
+
+  for (j = 0; j < N; j++)
+    for (i = 0; i < N; i++)
+      if (arr[i][j] < limit)
+        {
+          pos = i + 1;
+          limit = arr[i][j];
+        }
+
+  return pos;
+}
+
+int main (void)
+{
+  int i, j, pos;
+
+  check_vect();
+
+  for (j = 0; j < N; j++)
+    for (i = 0; i < N; i++)
+      arr[j][i] = (float)(i+j+1);
+
+  arr[8][2] = 0;
+  pos = foo ();
+  if (pos != 9)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_cmp } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c
===================================================================
--- testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c	(revision 0)
+++ testsuite/gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c	(revision 0)
@@ -0,0 +1,48 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 64
+float arr[N];
+
+__attribute__ ((noinline)) 
+int foo ()
+{
+  unsigned int pos = 1;
+  unsigned int i;
+  float limit = 7;
+
+  for (i = 0; i < N; i++)
+    if (arr[i] >= limit)
+      {
+        pos = i + 1;
+        limit = arr[i];
+      }
+
+  return pos;
+}
+
+int main (void)
+{
+  int i, pos;
+
+  check_vect();
+
+  for (i = 0; i < N; i++)
+   arr[i] = (float)(i);
+
+  arr [2] = N + 5.8;
+  arr [12] = N + 5.8;
+
+  pos = foo ();
+  if (pos != 13)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_cmp } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/lib/target-supports.exp
===================================================================
--- testsuite/lib/target-supports.exp	(revision 162994)
+++ testsuite/lib/target-supports.exp	(working copy)
@@ -2966,6 +2966,23 @@ proc check_effective_target_vect_strided
     return $et_vect_strided_wide_saved
 }
 
+# Return 1 if the target supports vector comparison, 0 otherwise.
+proc check_effective_target_vect_cmp { } {
+    global et_vect_cmp_saved
+
+    if [info exists et_vect_cmp_saved] {
+        verbose "check_effective_target_vect_cmp: using cached result" 2
+    } else {
+        set et_vect_cmp_saved 0
+        if { [istarget powerpc*-*-*] } {
+           set et_vect_cmp_saved 1
+        }
+    }
+
+    verbose "check_effective_target_vect_cmp: returning $et_vect_cmp_saved" 2
+    return $et_vect_cmp_saved
+}
+
 # Return 1 if the target supports section-anchors
 
 proc check_effective_target_section_anchors { } {
Index: rtl.def
===================================================================
--- rtl.def	(revision 162994)
+++ rtl.def	(working copy)
@@ -519,6 +519,12 @@ DEF_RTL_EXPR(GEU, "geu", "ee", RTX_COMPA
 DEF_RTL_EXPR(GTU, "gtu", "ee", RTX_COMPARE)
 DEF_RTL_EXPR(LEU, "leu", "ee", RTX_COMPARE)
 DEF_RTL_EXPR(LTU, "ltu", "ee", RTX_COMPARE)
+DEF_RTL_EXPR(GEF, "gef", "ee", RTX_COMPARE)
+DEF_RTL_EXPR(GTF, "gtf", "ee", RTX_COMPARE)
+DEF_RTL_EXPR(LEF, "lef", "ee", RTX_COMPARE)
+DEF_RTL_EXPR(LTF, "ltf", "ee", RTX_COMPARE)
+DEF_RTL_EXPR(EQF, "eqf", "ee", RTX_COMM_COMPARE)
+DEF_RTL_EXPR(NEQF, "neqf", "ee", RTX_COMM_COMPARE)
 
 /* Additional floating point unordered comparison flavors.  */
 DEF_RTL_EXPR(UNORDERED, "unordered", "ee", RTX_COMM_COMPARE)
Index: jump.c
===================================================================
--- jump.c	(revision 162994)
+++ jump.c	(working copy)
@@ -460,6 +460,19 @@ reverse_condition (enum rtx_code code)
       return GEU;
     case LEU:
       return GTU;
+    case GTF:
+      return LEF;
+    case GEF:
+      return LTF;
+    case LTF:
+      return GEF;
+    case LEF:
+      return GTF;
+    case EQF:
+      return NEQF;
+    case NEQF:
+      return EQF;
+
     case UNORDERED:
       return ORDERED;
     case ORDERED:
@@ -535,6 +548,8 @@ swap_condition (enum rtx_code code)
     case ORDERED:
     case UNEQ:
     case LTGT:
+    case EQF:
+    case NEQF:
       return code;
 
     case GT:
@@ -561,6 +576,14 @@ swap_condition (enum rtx_code code)
       return UNLT;
     case UNGE:
       return UNLE;
+    case GTF:
+      return LTF;
+    case GEF:
+      return LEF;
+    case LTF:
+      return GTF;
+    case LEF:
+      return GEF;
 
     default:
       gcc_unreachable ();
Index: expr.c
===================================================================
--- expr.c	(revision 162994)
+++ expr.c	(working copy)
@@ -8110,6 +8110,15 @@ expand_expr_real_2 (sepops ops, rtx targ
         return temp;
       }
 
+    case REDUC_MIN_FIRST_LOC_EXPR:
+    case REDUC_MIN_LAST_LOC_EXPR:
+    case REDUC_MAX_FIRST_LOC_EXPR:
+    case REDUC_MAX_LAST_LOC_EXPR:
+      {
+        target = expand_vec_reduc_minloc_expr (code, treeop0, treeop1, target);
+        return target;
+      }
+
     case VEC_EXTRACT_EVEN_EXPR:
     case VEC_EXTRACT_ODD_EXPR:
       {
Index: gimple-pretty-print.c
===================================================================
--- gimple-pretty-print.c	(revision 162994)
+++ gimple-pretty-print.c	(working copy)
@@ -343,6 +343,10 @@ dump_binary_rhs (pretty_printer *buffer,
     case VEC_EXTRACT_ODD_EXPR:
     case VEC_INTERLEAVE_HIGH_EXPR:
     case VEC_INTERLEAVE_LOW_EXPR:
+    case REDUC_MIN_FIRST_LOC_EXPR:
+    case REDUC_MIN_LAST_LOC_EXPR:
+    case REDUC_MAX_FIRST_LOC_EXPR:
+    case REDUC_MAX_LAST_LOC_EXPR:
       for (p = tree_code_name [(int) code]; *p; p++)
 	pp_character (buffer, TOUPPER (*p));
       pp_string (buffer, " <");
Index: tree-vectorizer.h
===================================================================
--- tree-vectorizer.h	(revision 162994)
+++ tree-vectorizer.h	(working copy)
@@ -409,6 +409,17 @@ enum slp_vect_type {
   hybrid
 };
 
+/* Compound pattern is a pattern consisting more than one statement that need
+   to be vectorized. Currenty min/max location pattern is the only supported
+   compound pattern. It has two statements: the first statement calculates the 
+   minimum (marked MINMAX_STMT) and the second one calculates the location 
+   (marked MINMAX_LOC_STMT).  */
+enum vect_compound_pattern {
+  not_in_pattern = 0,
+  minmax_stmt,
+  minmax_loc_stmt
+};
+
 
 typedef struct data_reference *dr_p;
 DEF_VEC_P(dr_p);
@@ -425,6 +436,10 @@ typedef struct _stmt_vec_info {
   /* Stmt is part of some pattern (computation idiom)  */
   bool in_pattern_p;
 
+  /* Statement is a part of a compound pattern, i.e., a pattern consisting
+     more than one statement.  */
+  enum vect_compound_pattern compound_pattern;
+
   /* For loads only, if there is a store with the same location, this field is
      TRUE.  */
   bool read_write_dep;
@@ -535,6 +550,7 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_DR_ALIGNED_TO(S)        (S)->dr_aligned_to
 
 #define STMT_VINFO_IN_PATTERN_P(S)         (S)->in_pattern_p
+#define STMT_VINFO_COMPOUND_PATTERN(S)     (S)->compound_pattern
 #define STMT_VINFO_RELATED_STMT(S)         (S)->related_stmt
 #define STMT_VINFO_SAME_ALIGN_REFS(S)      (S)->same_align_refs
 #define STMT_VINFO_DEF_TYPE(S)             (S)->def_type
@@ -642,7 +658,8 @@ is_pattern_stmt_p (stmt_vec_info stmt_in
   related_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
   if (related_stmt
       && (related_stmt_info = vinfo_for_stmt (related_stmt))
-      && STMT_VINFO_IN_PATTERN_P (related_stmt_info))
+      && (STMT_VINFO_IN_PATTERN_P (related_stmt_info)
+          || STMT_VINFO_COMPOUND_PATTERN (related_stmt_info)))
     return true;
 
   return false;
@@ -709,6 +726,29 @@ known_alignment_for_access_p (struct dat
   return (DR_MISALIGNMENT (data_ref_info) != -1);
 }
 
+
+static inline enum tree_code
+get_minloc_reduc_epilogue_code (enum tree_code code)
+{
+  switch (code)
+    {
+      case LT_EXPR:
+        return REDUC_MIN_FIRST_LOC_EXPR;
+
+      case LE_EXPR:
+        return REDUC_MIN_LAST_LOC_EXPR;
+
+      case GT_EXPR:
+        return REDUC_MAX_FIRST_LOC_EXPR;
+
+      case GE_EXPR:
+        return REDUC_MAX_LAST_LOC_EXPR;
+
+      default:
+        return ERROR_MARK;
+    }
+}
+
 /* vect_dump will be set to stderr or dump_file if exist.  */
 extern FILE *vect_dump;
 extern LOC vect_loop_location;
@@ -764,7 +804,7 @@ extern bool vect_transform_stmt (gimple,
 extern void vect_remove_stores (gimple);
 extern bool vect_analyze_stmt (gimple, bool *, slp_tree);
 extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
-                                    tree, int);
+                                    tree, int, tree, int);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
                                 unsigned int *, unsigned int *);
 extern void vect_get_store_cost (struct data_reference *, int, unsigned int *);
@@ -844,8 +884,11 @@ extern void vect_slp_transform_bb (basic
    Additional pattern recognition functions can (and will) be added
    in the future.  */
 typedef gimple (* vect_recog_func_ptr) (gimple, tree *, tree *);
-#define NUM_PATTERNS 4
+typedef bool (* vect_recog_compound_func_ptr) (unsigned int, va_list);
+#define NUM_PATTERNS 4 
+#define NUM_COMPOUND_PATTERNS 1  
 void vect_pattern_recog (loop_vec_info);
+void vect_compound_pattern_recog (unsigned int, ...);
 
 /* In tree-vectorizer.c.  */
 unsigned vectorize_loops (void);
Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c	(revision 162994)
+++ tree-vect-loop.c	(working copy)
@@ -296,7 +296,8 @@ vect_determine_vectorization_factor (loo
 	  else
 	    {
 	      gcc_assert (!STMT_VINFO_DATA_REF (stmt_info)
-			  && !is_pattern_stmt_p (stmt_info));
+			  && (!is_pattern_stmt_p (stmt_info)
+                              || STMT_VINFO_COMPOUND_PATTERN (stmt_info)));
 
 	      scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
 	      if (vect_print_dump_info (REPORT_DETAILS))
@@ -445,10 +446,15 @@ static void
 vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, struct loop *loop)
 {
   basic_block bb = loop->header;
-  tree dumy;
+  tree dummy;
   VEC(gimple,heap) *worklist = VEC_alloc (gimple, heap, 64);
   gimple_stmt_iterator gsi;
-  bool double_reduc;
+  bool double_reduc, found, minmax_loc = false;
+  gimple first_cond_stmt = NULL, second_cond_stmt = NULL;
+  gimple first_phi = NULL, second_phi = NULL, phi, use_stmt;
+  int i;
+  imm_use_iterator imm_iter;
+  use_operand_p use_p;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "=== vect_analyze_scalar_cycles ===");
@@ -485,7 +491,8 @@ vect_analyze_scalar_cycles_1 (loop_vec_i
 	}
 
       if (!access_fn
-	  || !vect_is_simple_iv_evolution (loop->num, access_fn, &dumy, &dumy))
+	  || !vect_is_simple_iv_evolution (loop->num, access_fn, &dummy, 
+                                           &dummy)) 
 	{
 	  VEC_safe_push (gimple, heap, worklist, phi);
 	  continue;
@@ -496,8 +503,56 @@ vect_analyze_scalar_cycles_1 (loop_vec_i
       STMT_VINFO_DEF_TYPE (stmt_vinfo) = vect_induction_def;
     }
 
+  /* Detect compound reduction patterns (before reduction detection):  
+     we currently support only min/max location pattern, so we look for two 
+     reduction condition statements.  */
+  for (i = 0; VEC_iterate (gimple, worklist, i, phi); i++)
+    {
+      tree def = PHI_RESULT (phi);
+
+      found = false;
+      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, def)
+        {
+          use_stmt = USE_STMT (use_p);
+          if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
+              && vinfo_for_stmt (use_stmt)
+              && is_gimple_assign (use_stmt)
+              && gimple_assign_rhs_code (use_stmt) == COND_EXPR)
+            {
+              found = true;
+              break;
+            }
+        }
+
+      if (!found)
+        continue;
+
+      if (!first_cond_stmt)
+        {
+          first_cond_stmt = use_stmt;
+          first_phi = phi;
+        }
+      else
+        {
+          if (second_cond_stmt)
+            {
+              /* This one is the third reduction condition statement in the 
+                 loop. This is too confusing, we bail out.  */
+              minmax_loc = false;
+              break;
+            }
+
+          second_cond_stmt = use_stmt;
+          second_phi = phi;
+          minmax_loc = true;
+        }
+    }
+
+  if (minmax_loc)
+    vect_compound_pattern_recog (4, first_phi, first_cond_stmt, 
+                                 second_phi, second_cond_stmt);
 
-  /* Second - identify all reductions and nested cycles.  */
+  /* Identify all reductions and nested cycles.  */
   while (VEC_length (gimple, worklist) > 0)
     {
       gimple phi = VEC_pop (gimple, worklist);
@@ -596,11 +651,9 @@ vect_analyze_scalar_cycles (loop_vec_inf
   /* When vectorizing an outer-loop, the inner-loop is executed sequentially.
      Reductions in such inner-loop therefore have different properties than
      the reductions in the nest that gets vectorized:
-     1. When vectorized, they are executed in the same order as in the original
-        scalar loop, so we can't change the order of computation when
-        vectorizing them.
-     2. FIXME: Inner-loop reductions can be used in the inner-loop, so the
-        current checks are too strict.  */
+     when vectorized, they are executed in the same order as in the original
+     scalar loop, so we can't change the order of computation when
+     vectorizing them.  */
 
   if (loop->inner)
     vect_analyze_scalar_cycles_1 (loop_vinfo, loop->inner);
@@ -821,7 +874,15 @@ destroy_loop_vec_info (loop_vec_info loo
                   if (orig_stmt_info
                       && STMT_VINFO_IN_PATTERN_P (orig_stmt_info))
                     remove_stmt_p = true;
-                }
+               
+		  /* We are removing statement inserted by the pattern 
+		     detection pass. Update the original statement to be the 
+		     def stmt of the statement's LHS.  */
+                  if (remove_stmt_p && is_gimple_assign (orig_stmt) 
+                      && TREE_CODE (gimple_assign_lhs (orig_stmt)) == SSA_NAME)
+                    SSA_NAME_DEF_STMT (gimple_assign_lhs (orig_stmt)) 
+                      = orig_stmt;
+                 }
 
               /* Free stmt_vec_info.  */
               free_stmt_vec_info (stmt);
@@ -1671,13 +1732,16 @@ vect_is_simple_reduction_1 (loop_vec_inf
       gimple use_stmt = USE_STMT (use_p);
       if (is_gimple_debug (use_stmt))
 	continue;
+
       if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
 	  && vinfo_for_stmt (use_stmt)
-	  && !is_pattern_stmt_p (vinfo_for_stmt (use_stmt)))
+	  && !is_pattern_stmt_p (vinfo_for_stmt (use_stmt))
+	  && !STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (use_stmt)))
         nloop_uses++;
+   
       if (nloop_uses > 1)
         {
-          if (vect_print_dump_info (REPORT_DETAILS))
+          if (vect_print_dump_info (REPORT_DETAILS)) 
             fprintf (vect_dump, "reduction used in loop.");
           return NULL;
         }
@@ -1725,10 +1789,12 @@ vect_is_simple_reduction_1 (loop_vec_inf
       gimple use_stmt = USE_STMT (use_p);
       if (is_gimple_debug (use_stmt))
 	continue;
+
       if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
 	  && vinfo_for_stmt (use_stmt)
 	  && !is_pattern_stmt_p (vinfo_for_stmt (use_stmt)))
 	nloop_uses++;
+
       if (nloop_uses > 1)
 	{
 	  if (vect_print_dump_info (REPORT_DETAILS))
@@ -1778,6 +1844,9 @@ vect_is_simple_reduction_1 (loop_vec_inf
     code = PLUS_EXPR;
 
   if (check_reduction
+      && (!vinfo_for_stmt (def_stmt)
+          || STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (def_stmt))
+                != minmax_loc_stmt)
       && (!commutative_tree_code (code) || !associative_tree_code (code)))
     {
       if (vect_print_dump_info (REPORT_DETAILS))
@@ -1828,14 +1897,16 @@ vect_is_simple_reduction_1 (loop_vec_inf
    }
 
   type = TREE_TYPE (gimple_assign_lhs (def_stmt));
-  if ((TREE_CODE (op1) == SSA_NAME
-       && !types_compatible_p (type,TREE_TYPE (op1)))
-      || (TREE_CODE (op2) == SSA_NAME
-          && !types_compatible_p (type, TREE_TYPE (op2)))
-      || (op3 && TREE_CODE (op3) == SSA_NAME
-          && !types_compatible_p (type, TREE_TYPE (op3)))
-      || (op4 && TREE_CODE (op4) == SSA_NAME
-          && !types_compatible_p (type, TREE_TYPE (op4))))
+  if (STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (def_stmt)) 
+        != minmax_loc_stmt
+      && ((TREE_CODE (op1) == SSA_NAME 
+           && !types_compatible_p (type, TREE_TYPE (op1)))
+          || (TREE_CODE (op2) == SSA_NAME
+           && !types_compatible_p (type, TREE_TYPE (op2)))
+          || (op3 && TREE_CODE (op3) == SSA_NAME
+           && !types_compatible_p (type, TREE_TYPE (op3)))
+          || (op4 && TREE_CODE (op4) == SSA_NAME
+           && !types_compatible_p (type, TREE_TYPE (op4)))))
     {
       if (vect_print_dump_info (REPORT_DETAILS))
         {
@@ -1843,17 +1914,17 @@ vect_is_simple_reduction_1 (loop_vec_inf
           print_generic_expr (vect_dump, type, TDF_SLIM);
           fprintf (vect_dump, ", operands types: ");
           print_generic_expr (vect_dump, TREE_TYPE (op1), TDF_SLIM);
-          fprintf (vect_dump, ",");
+          fprintf (vect_dump, ", ");
           print_generic_expr (vect_dump, TREE_TYPE (op2), TDF_SLIM);
           if (op3)
             {
-              fprintf (vect_dump, ",");
+              fprintf (vect_dump, ", ");
               print_generic_expr (vect_dump, TREE_TYPE (op3), TDF_SLIM);
             }
 
           if (op4)
             {
-              fprintf (vect_dump, ",");
+              fprintf (vect_dump, ", ");
               print_generic_expr (vect_dump, TREE_TYPE (op4), TDF_SLIM);
             }
         }
@@ -1961,7 +2032,7 @@ vect_is_simple_reduction_1 (loop_vec_inf
                                == vect_internal_def
 		           && !is_loop_header_bb_p (gimple_bb (def2)))))))
     {
-      if (check_reduction)
+      if (check_reduction && code != COND_EXPR)
         {
           /* Swap operands (just for simplicity - so that the rest of the code
 	     can assume that the reduction variable is always the last (second)
@@ -2432,7 +2503,6 @@ vect_model_reduction_cost (stmt_vec_info
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
 
-
   /* Cost of reduction op inside loop.  */
   STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) 
     += ncopies * vect_get_cost (vector_stmt);
@@ -2469,11 +2539,15 @@ vect_model_reduction_cost (stmt_vec_info
   mode = TYPE_MODE (vectype);
   orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
 
-  if (!orig_stmt)
+  if (!orig_stmt || STMT_VINFO_COMPOUND_PATTERN (stmt_info)) 
     orig_stmt = STMT_VINFO_STMT (stmt_info);
 
   code = gimple_assign_rhs_code (orig_stmt);
 
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt)
+    STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)
+      += ncopies * 5 * vect_get_cost (vector_stmt);
+
   /* Add in cost for initial definition.  */
   outer_cost += vect_get_cost (scalar_to_vec);
 
@@ -2489,28 +2563,34 @@ vect_model_reduction_cost (stmt_vec_info
                       + vect_get_cost (vec_to_scalar); 
       else
 	{
-	  int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
-	  tree bitsize =
-	    TYPE_SIZE (TREE_TYPE (gimple_assign_lhs (orig_stmt)));
-	  int element_bitsize = tree_low_cst (bitsize, 1);
-	  int nelements = vec_size_in_bits / element_bitsize;
-
-	  optab = optab_for_tree_code (code, vectype, optab_default);
-
-	  /* We have a whole vector shift available.  */
-	  if (VECTOR_MODE_P (mode)
-	      && optab_handler (optab, mode) != CODE_FOR_nothing
-	      && optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
-	    /* Final reduction via vector shifts and the reduction operator. Also
-	       requires scalar extract.  */
-	    outer_cost += ((exact_log2(nelements) * 2) 
-              * vect_get_cost (vector_stmt) 
-  	      + vect_get_cost (vec_to_scalar));
-	  else
-	    /* Use extracts and reduction op for final reduction.  For N elements,
-               we have N extracts and N-1 reduction ops.  */
-	    outer_cost += ((nelements + nelements - 1) 
-              * vect_get_cost (vector_stmt));
+          if (STMT_VINFO_COMPOUND_PATTERN (stmt_info))
+            outer_cost += 6 * vect_get_cost (vector_stmt)
+                          + vect_get_cost (vec_to_scalar);
+          else
+            {
+	      int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
+	      tree bitsize =
+		TYPE_SIZE (TREE_TYPE (gimple_assign_lhs (orig_stmt)));
+	      int element_bitsize = tree_low_cst (bitsize, 1);
+	      int nelements = vec_size_in_bits / element_bitsize;
+
+	      optab = optab_for_tree_code (code, vectype, optab_default);
+
+	      /* We have a whole vector shift available.  */
+	      if (VECTOR_MODE_P (mode)
+	          && optab_handler (optab, mode) != CODE_FOR_nothing
+	          && optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
+	        /* Final reduction via vector shifts and the reduction 
+		   operator. Also requires scalar extract.  */
+	        outer_cost += ((exact_log2(nelements) * 2) 
+                  * vect_get_cost (vector_stmt) 
+  	          + vect_get_cost (vec_to_scalar));
+	      else
+	        /* Use extracts and reduction op for final reduction.  For N 
+		   elements, we have N extracts and N-1 reduction ops.  */
+		outer_cost += ((nelements + nelements - 1) 
+		  * vect_get_cost (vector_stmt));
+	    }
 	}
     }
 
@@ -3113,6 +3193,8 @@ vect_create_epilog_for_reduction (VEC (t
   unsigned int group_size = 1, k, ratio;
   VEC (tree, heap) *vec_initial_defs = NULL;
   VEC (gimple, heap) *phis;
+  tree vec_temp;
+  tree min_max_res = NULL_TREE;
 
   if (slp_node)
     group_size = VEC_length (gimple, SLP_TREE_SCALAR_STMTS (slp_node)); 
@@ -3170,9 +3252,9 @@ vect_create_epilog_for_reduction (VEC (t
   else
     {
       vec_initial_defs = VEC_alloc (tree, heap, 1);
-     /* For the case of reduction, vect_get_vec_def_for_operand returns
-        the scalar def before the loop, that defines the initial value
-        of the reduction variable.  */
+      /* For the case of reduction, vect_get_vec_def_for_operand returns
+         the scalar def before the loop, that defines the initial value
+         of the reduction variable.  */
       vec_initial_def = vect_get_vec_def_for_operand (reduction_op, stmt,
                                                       &adjustment_def);
       VEC_quick_push (tree, vec_initial_defs, vec_initial_def);
@@ -3280,18 +3362,18 @@ vect_create_epilog_for_reduction (VEC (t
          defined in the loop.  In case STMT is a "pattern-stmt" (i.e. - it
          represents a reduction pattern), the tree-code and scalar-def are
          taken from the original stmt that the pattern-stmt (STMT) replaces.
-         Otherwise (it is a regular reduction) - the tree-code and scalar-def
-         are taken from STMT.  */
+         Otherwise (it is a regular reduction or a compound pattern) - the 
+         tree-code and scalar-def are taken from STMT.  */
 
   orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
-  if (!orig_stmt)
+  if (!orig_stmt || STMT_VINFO_COMPOUND_PATTERN (stmt_info))  
     {
-      /* Regular reduction  */
+      /* Regular reduction or compound pattern.  */
       orig_stmt = stmt;
     }
   else
     {
-      /* Reduction pattern  */
+      /* Reduction pattern.  */ 
       stmt_vec_info stmt_vinfo = vinfo_for_stmt (orig_stmt);
       gcc_assert (STMT_VINFO_IN_PATTERN_P (stmt_vinfo));
       gcc_assert (STMT_VINFO_RELATED_STMT (stmt_vinfo) == stmt);
@@ -3318,6 +3400,27 @@ vect_create_epilog_for_reduction (VEC (t
   if (nested_in_vect_loop && !double_reduc)
     goto vect_finalize_reduction;
 
+  /* Get reduction code for compound pattern.  */
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt)
+    {
+      gimple related = STMT_VINFO_RELATED_STMT (stmt_info);     
+      gimple min_max_stmt = STMT_VINFO_VEC_STMT (vinfo_for_stmt (related));
+
+      tree cond = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
+      enum tree_code cond_code;
+
+      min_max_res = gimple_assign_lhs (min_max_stmt);
+      if (TREE_CODE (cond) == SSA_NAME)
+        {
+          gimple cond_def_stmt = SSA_NAME_DEF_STMT (cond);
+          cond_code = gimple_assign_rhs_code (cond_def_stmt);
+        }
+      else
+        cond_code = TREE_CODE (cond);
+
+      reduc_code = get_minloc_reduc_epilogue_code (cond_code); 
+    }
+  
   /* 2.3 Create the reduction code, using one of the three schemes described
          above. In SLP we simply need to extract all the elements from the 
          vector (without reducing them), so we use scalar shifts.  */
@@ -3333,7 +3436,12 @@ vect_create_epilog_for_reduction (VEC (t
 
       vec_dest = vect_create_destination_var (scalar_dest, vectype);
       new_phi = VEC_index (gimple, new_phis, 0);
-      tmp = build1 (reduc_code, vectype,  PHI_RESULT (new_phi));
+      vec_temp = PHI_RESULT (new_phi);
+      if (min_max_res)
+        tmp = build2 (reduc_code, vectype, min_max_res, vec_temp);
+      else
+        tmp = build1 (reduc_code, vectype, vec_temp);
+
       epilog_stmt = gimple_build_assign (vec_dest, tmp);
       new_temp = make_ssa_name (vec_dest, epilog_stmt);
       gimple_assign_set_lhs (epilog_stmt, new_temp);
@@ -3348,7 +3456,6 @@ vect_create_epilog_for_reduction (VEC (t
       int bit_offset;
       int element_bitsize = tree_low_cst (bitsize, 1);
       int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
-      tree vec_temp;
 
       if (optab_handler (vec_shr_optab, mode) != CODE_FOR_nothing)
         shift_code = VEC_RSHIFT_EXPR;
@@ -3366,7 +3473,7 @@ vect_create_epilog_for_reduction (VEC (t
       else
         {
           optab optab = optab_for_tree_code (code, vectype, optab_default);
-          if (optab_handler (optab, mode) == CODE_FOR_nothing)
+          if (!optab || optab_handler (optab, mode) == CODE_FOR_nothing)
             have_whole_vector_shift = false;
         }
 
@@ -3644,8 +3751,10 @@ vect_finalize_reduction:
           VEC_safe_push (gimple, heap, phis, USE_STMT (use_p));
 
       /* We expect to have found an exit_phi because of loop-closed-ssa
-         form.  */
-      gcc_assert (!VEC_empty (gimple, phis));
+         form,unless it's a min/max statement of min/max location pattern, 
+         which is inserted by the pattern recognition phase.  */
+      gcc_assert (!VEC_empty (gimple, phis)
+                  || STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_stmt);
 
       for (i = 0; VEC_iterate (gimple, phis, i, exit_phi); i++)
         {
@@ -3887,11 +3996,11 @@ vectorizable_reduction (gimple stmt, gim
   basic_block def_bb;
   struct loop * def_stmt_loop, *outer_loop = NULL;
   tree def_arg;
-  gimple def_arg_stmt;
+  gimple def_arg_stmt, related;
   VEC (tree, heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL, *vect_defs = NULL;
   VEC (gimple, heap) *phis = NULL;
-  int vec_num;
-  tree def0, def1;
+  int vec_num, cond_reduc_index = 0;
+  tree def0, def1, cond_reduc_def = NULL_TREE;
 
   if (nested_in_vect_loop_p (loop, stmt))
     {
@@ -3901,8 +4010,10 @@ vectorizable_reduction (gimple stmt, gim
     }
 
   /* 1. Is vectorizable reduction?  */
-  /* Not supportable if the reduction variable is used in the loop.  */
-  if (STMT_VINFO_RELEVANT (stmt_info) > vect_used_in_outer)
+  /* Not supportable if the reduction variable is used in the loop,
+     unless it's a pattern.  */
+  if (STMT_VINFO_RELEVANT (stmt_info) > vect_used_in_outer 
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
     return false;
 
   /* Reductions that are not used even in an enclosing outer-loop,
@@ -3924,14 +4035,17 @@ vectorizable_reduction (gimple stmt, gim
      the original sequence that constitutes the pattern.  */
 
   orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
-  if (orig_stmt)
+  if (orig_stmt 
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
     {
       orig_stmt_info = vinfo_for_stmt (orig_stmt);
       gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) == stmt);
       gcc_assert (STMT_VINFO_IN_PATTERN_P (orig_stmt_info));
       gcc_assert (!STMT_VINFO_IN_PATTERN_P (stmt_info));
     }
-
+  else
+    orig_stmt = NULL;
+ 
   /* 3. Check the operands of the operation. The first operands are defined
         inside the loop body. The last operand is the reduction variable,
         which is defined by the loop-header-phi.  */
@@ -4044,12 +4158,13 @@ vectorizable_reduction (gimple stmt, gim
 
   if (code == COND_EXPR)
     {
-      if (!vectorizable_condition (stmt, gsi, NULL, ops[reduc_index], 0))
+      if (!vectorizable_condition (stmt, gsi, NULL, ops[reduc_index], 0, 
+                                   cond_reduc_def, cond_reduc_index)) 
         {
           if (vect_print_dump_info (REPORT_DETAILS))
             fprintf (vect_dump, "unsupported condition in reduction");
 
-            return false;
+          return false;
         }
     }
   else
@@ -4182,7 +4297,12 @@ vectorizable_reduction (gimple stmt, gim
     }
   else
     {
-      if (!nested_cycle || double_reduc)
+      /* There is no need in reduction epilogue in case of a nested cycle, 
+         unless it is double reduction. For reduction pattern, we assume that
+         we know how to create an epilogue even if there is no reduction code
+         for it.  */ 
+      if ((!nested_cycle || double_reduc) 
+           && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
         {
           if (vect_print_dump_info (REPORT_DETAILS))
             fprintf (vect_dump, "no reduc code for scalar code.");
@@ -4202,8 +4322,9 @@ vectorizable_reduction (gimple stmt, gim
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
-      if (!vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies))
+      if (!vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies)) 
         return false;
+
       return true;
     }
 
@@ -4254,6 +4375,32 @@ vectorizable_reduction (gimple stmt, gim
   else
     epilog_copies = ncopies;
 
+  /* Prepare vector operands for min/max location.  */
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt)
+    {
+      tree cond_op;
+      gimple cond_def_stmt;
+
+      related = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt));
+      cond_op = TREE_OPERAND (ops[0], 0);
+      cond_def_stmt = SSA_NAME_DEF_STMT (cond_op);
+      if (gimple_code (cond_def_stmt) == GIMPLE_PHI)
+        {
+          cond_reduc_index = 1;
+          cond_reduc_def = gimple_assign_rhs1 (STMT_VINFO_VEC_STMT (
+                                                    vinfo_for_stmt (related)));
+        }
+      else
+        {
+          cond_op = TREE_OPERAND (ops[0], 1);
+          cond_def_stmt = SSA_NAME_DEF_STMT (cond_op);
+          gcc_assert (gimple_code (cond_def_stmt) == GIMPLE_PHI);
+          cond_reduc_index = 2;
+          cond_reduc_def = gimple_assign_rhs2 (STMT_VINFO_VEC_STMT (
+                                                    vinfo_for_stmt (related)));
+        }
+    }
+
   prev_stmt_info = NULL;
   prev_phi_info = NULL;
   if (slp_node)
@@ -4297,7 +4444,8 @@ vectorizable_reduction (gimple stmt, gim
           gcc_assert (!slp_node);
           vectorizable_condition (stmt, gsi, vec_stmt, 
                                   PHI_RESULT (VEC_index (gimple, phis, 0)), 
-                                  reduc_index);
+                                  reduc_index, cond_reduc_def, 
+                                  cond_reduc_index);
           /* Multiple types are not supported for condition.  */
           break;
         }
@@ -4533,6 +4681,9 @@ vectorizable_live_operation (gimple stmt
 
   gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
 
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info))
+    return true;
+
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
     return false;
 
Index: tree.def
===================================================================
--- tree.def	(revision 162994)
+++ tree.def	(working copy)
@@ -1054,10 +1054,14 @@ DEFTREECODE (OMP_CLAUSE, "omp_clause", t
    result (e.g. summing the elements of the vector, finding the minimum over
    the vector elements, etc).
    Operand 0 is a vector; the first element in the vector has the result.
-   Operand 1 is a vector.  */
+   Operands 1 and 2 are vectors.  */
 DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1)
 DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1)
 DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1)
+DEFTREECODE (REDUC_MIN_FIRST_LOC_EXPR, "reduc_min_first_loc_expr", tcc_binary, 2)
+DEFTREECODE (REDUC_MIN_LAST_LOC_EXPR, "reduc_min_last_loc_expr", tcc_binary, 2)
+DEFTREECODE (REDUC_MAX_FIRST_LOC_EXPR, "reduc_max_first_loc_expr", tcc_binary, 2)
+DEFTREECODE (REDUC_MAX_LAST_LOC_EXPR, "reduc_max_last_loc_expr", tcc_binary, 2)
 
 /* Widening dot-product.
    The first two arguments are of type t1.
Index: cfgexpand.c
===================================================================
--- cfgexpand.c	(revision 162994)
+++ cfgexpand.c	(working copy)
@@ -2991,6 +2991,10 @@ expand_debug_expr (tree exp)
     case REDUC_MAX_EXPR:
     case REDUC_MIN_EXPR:
     case REDUC_PLUS_EXPR:
+    case REDUC_MIN_FIRST_LOC_EXPR:
+    case REDUC_MIN_LAST_LOC_EXPR:
+    case REDUC_MAX_FIRST_LOC_EXPR:
+    case REDUC_MAX_LAST_LOC_EXPR:
     case VEC_COND_EXPR:
     case VEC_EXTRACT_EVEN_EXPR:
     case VEC_EXTRACT_ODD_EXPR:
Index: tree-vect-patterns.c
===================================================================
--- tree-vect-patterns.c	(revision 162994)
+++ tree-vect-patterns.c	(working copy)
@@ -54,6 +54,10 @@ static vect_recog_func_ptr vect_vect_rec
 	vect_recog_widen_sum_pattern,
 	vect_recog_dot_prod_pattern,
 	vect_recog_pow_pattern};
+static bool vect_recog_min_max_loc_pattern (unsigned int, va_list);
+static vect_recog_compound_func_ptr 
+   vect_recog_compound_func_ptrs[NUM_COMPOUND_PATTERNS] = {
+        vect_recog_min_max_loc_pattern};
 
 
 /* Function widened_name_p
@@ -847,3 +851,286 @@ vect_pattern_recog (loop_vec_info loop_v
         }
     }
 }
+
+
+/* Detect min/max location pattern. 
+   Given two reducton condition statements and their phi nodes, we check
+   if one of the statements calculates minimum or maximum, and the other one
+   records its location. If the pattern is detected, we replace the min/max 
+   condition statement with MIN_EXPR or MAX_EXPR, and mark the old statement 
+   as pattern statement.
+
+   The pattern we are looking for:
+
+   s1: min = [cond_expr] a < min ? a : min
+   s2: index = [cond_expr] a < min ? new_index : index
+
+   We add MIN_EXPR statement before the index calculation statement:
+
+   s1:  min = [cond_expr] a < min ? a : min
+   s1': min = [min_expr] <a, min>
+   s2:  index = [cond_expr] a < min ? new_index : index
+
+   s1 is marked as pattern statement
+   s1' points to s1 via related_stmt field
+   s1 points to s1' via related_stmt field
+   s2 points to s1' via related_stmt field.  
+   s1' and s2 are marked as compound pattern min/max and min/max location
+   statements.  */
+
+static bool
+vect_recog_min_max_loc_pattern (unsigned int nargs, va_list args)
+{
+  gimple first_phi, first_stmt, second_phi, second_stmt, loop_op_def_stmt;
+  stmt_vec_info stmt_vinfo, new_stmt_info, minmax_stmt_info, pos_stmt_info;
+  loop_vec_info loop_info;
+  struct loop *loop;
+  enum tree_code code, first_code, second_code;
+  gimple first_cond_def_stmt = NULL, second_cond_def_stmt = NULL;
+  tree first_cond_op0, first_cond_op1, second_cond_op0, second_cond_op1;
+  tree first_stmt_oprnd0, first_stmt_oprnd1, second_stmt_oprnd0;
+  tree second_stmt_oprnd1, first_cond, second_cond;
+  int phi_def_index;
+  tree first_loop_op, second_loop_op, pos_stmt_loop_op, def, result;
+  gimple pos_stmt, min_max_stmt, new_stmt, def_stmt;
+  gimple_stmt_iterator gsi;
+
+  if (nargs < 4)
+    return false;
+
+  first_phi = va_arg (args, gimple);
+  first_stmt = va_arg (args, gimple);
+  second_phi = va_arg (args, gimple);
+  second_stmt = va_arg (args, gimple);
+
+  stmt_vinfo = vinfo_for_stmt (first_stmt);
+  loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+  loop = LOOP_VINFO_LOOP (loop_info);
+
+  /* Check that the condition is the same and is GT or LT.  */
+  first_cond = TREE_OPERAND (gimple_assign_rhs1 (first_stmt), 0);
+  if (TREE_CODE (first_cond) == SSA_NAME)
+    {
+      first_cond_def_stmt = SSA_NAME_DEF_STMT (first_cond);
+      first_code = gimple_assign_rhs_code (first_cond_def_stmt);
+      first_cond_op0 = gimple_assign_rhs1 (first_cond_def_stmt);
+      first_cond_op1 = gimple_assign_rhs2 (first_cond_def_stmt);
+    }
+  else
+    {
+      first_code = TREE_CODE (first_cond);
+      first_cond_op0 = TREE_OPERAND (first_cond, 0);
+      first_cond_op1 = TREE_OPERAND (first_cond, 1);
+    }
+
+  if (first_code != GT_EXPR && first_code != LT_EXPR
+      && first_code != GE_EXPR && first_code != LE_EXPR)
+    return false;
+
+  second_cond = TREE_OPERAND (gimple_assign_rhs1 (second_stmt), 0);
+  if (TREE_CODE (second_cond) == SSA_NAME)
+    {
+      second_cond_def_stmt = SSA_NAME_DEF_STMT (second_cond);
+      second_code = gimple_assign_rhs_code (second_cond_def_stmt);
+      second_cond_op0 = gimple_assign_rhs1 (second_cond_def_stmt);
+      second_cond_op1 = gimple_assign_rhs2 (second_cond_def_stmt);
+    }
+  else
+    {
+      second_code = TREE_CODE (second_cond);
+      second_cond_op0 = TREE_OPERAND (second_cond, 0);
+      second_cond_op1 = TREE_OPERAND (second_cond, 1);
+    }
+
+  if (first_code != second_code)
+    return false;
+
+  if (first_cond_def_stmt
+      && (!second_cond_def_stmt
+          || first_cond_def_stmt != second_cond_def_stmt
+          || !operand_equal_p (first_cond_op0, second_cond_op0, 0)
+          || !operand_equal_p (first_cond_op1, second_cond_op1, 0)))
+   return false;
+
+  /* Both statements have the same condition.  */
+
+  first_stmt_oprnd0 = TREE_OPERAND (gimple_assign_rhs1 (first_stmt), 1);
+  first_stmt_oprnd1 = TREE_OPERAND (gimple_assign_rhs1 (first_stmt), 2);
+
+  second_stmt_oprnd0 = TREE_OPERAND (gimple_assign_rhs1 (second_stmt), 1);
+  second_stmt_oprnd1 = TREE_OPERAND (gimple_assign_rhs1 (second_stmt), 2);
+
+  if (TREE_CODE (first_stmt_oprnd0) != SSA_NAME
+      || TREE_CODE (first_stmt_oprnd1) != SSA_NAME
+      || TREE_CODE (second_stmt_oprnd0) != SSA_NAME
+      || TREE_CODE (second_stmt_oprnd1) != SSA_NAME)
+    return false;
+
+  if (operand_equal_p (PHI_RESULT (first_phi), first_stmt_oprnd0, 0)
+      && operand_equal_p (PHI_RESULT (second_phi), second_stmt_oprnd0, 0))
+    {
+      phi_def_index = 0;
+      first_loop_op = first_stmt_oprnd1;
+      second_loop_op = second_stmt_oprnd1;
+    }
+  else
+    {
+      if (operand_equal_p (PHI_RESULT (first_phi), first_stmt_oprnd1, 0)
+          && operand_equal_p (PHI_RESULT (second_phi), second_stmt_oprnd1, 0))
+        {
+          phi_def_index = 1;
+          first_loop_op = first_stmt_oprnd0;
+          second_loop_op = second_stmt_oprnd0;
+        }
+      else
+        return false;
+    }
+
+  /* Now we know which operand is defined by phi node. Analyze the second
+     one.  */
+
+  /* The min/max stmt must be x < y ? x : y.  */
+  if (operand_equal_p (first_cond_op0, first_stmt_oprnd0, 0)
+      && operand_equal_p (first_cond_op1, first_stmt_oprnd1, 0))
+    {
+      pos_stmt = second_stmt;
+      min_max_stmt = first_stmt;
+      pos_stmt_loop_op = second_loop_op;
+    }
+  else
+    {
+      if (operand_equal_p (second_cond_op0, second_stmt_oprnd0, 0)
+          && operand_equal_p (second_cond_op1, second_stmt_oprnd1, 0))
+        {
+          pos_stmt = first_stmt;
+          min_max_stmt = second_stmt;
+          pos_stmt_loop_op = first_loop_op;
+        }
+      else
+        return false;
+    }
+
+  /* Analyze the position stmt. We expect it to be either induction or
+     induction plus constant.  */
+  loop_op_def_stmt = SSA_NAME_DEF_STMT (pos_stmt_loop_op);
+
+  if (!flow_bb_inside_loop_p (loop, gimple_bb (loop_op_def_stmt)))
+    return false;
+
+  if (gimple_code (loop_op_def_stmt) == GIMPLE_PHI)
+    {
+      if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (loop_op_def_stmt))
+          != vect_induction_def)
+        return false;
+    }
+  else
+    {
+      if (!is_gimple_assign (loop_op_def_stmt))
+        return false;
+
+      if (get_gimple_rhs_class (gimple_assign_rhs_code (loop_op_def_stmt))
+           == GIMPLE_UNARY_RHS)
+        def = gimple_assign_rhs1 (loop_op_def_stmt);
+      else
+        {
+          tree op1, op2;
+
+          if (get_gimple_rhs_class (gimple_assign_rhs_code (loop_op_def_stmt))
+               != GIMPLE_BINARY_RHS
+              || gimple_assign_rhs_code (loop_op_def_stmt) != PLUS_EXPR)
+            return false;
+
+          op1 = gimple_assign_rhs1 (loop_op_def_stmt);
+          op2 = gimple_assign_rhs2 (loop_op_def_stmt);
+
+          if (TREE_CONSTANT (op1))
+            def = op2;
+          else
+            {
+              if (TREE_CONSTANT (op2))
+                def = op1;
+              else
+                return false;
+            }
+        }
+
+      if (TREE_CODE (def) != SSA_NAME)
+        return false;
+
+      def_stmt = SSA_NAME_DEF_STMT (def);
+      if (!flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))
+          || gimple_code (def_stmt) != GIMPLE_PHI
+          || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def_stmt))
+              != vect_induction_def)
+         return false;
+    }
+
+  /* Pattern detected. Create a stmt to be used to replace the pattern.  */
+  if (first_code == GT_EXPR || first_code == GE_EXPR)
+    code = phi_def_index ? MAX_EXPR : MIN_EXPR;
+  else
+    code = phi_def_index ? MIN_EXPR : MAX_EXPR;
+
+  result = gimple_assign_lhs (min_max_stmt);
+  new_stmt = gimple_build_assign_with_ops (code, result,
+                          TREE_OPERAND (gimple_assign_rhs1 (min_max_stmt), 1),
+                          TREE_OPERAND (gimple_assign_rhs1 (min_max_stmt), 2));
+  gsi = gsi_for_stmt (pos_stmt);
+  gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+  SSA_NAME_DEF_STMT (result) = new_stmt;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    {
+      fprintf (vect_dump, "Detected min/max location pattern:\nmin/max stmt ");
+      print_gimple_stmt (vect_dump, min_max_stmt, 0, TDF_SLIM);
+      fprintf (vect_dump, "\nlocation stmt ");
+      print_gimple_stmt (vect_dump, pos_stmt, 0, TDF_SLIM);
+      fprintf (vect_dump, "\nCreated stmt: ");
+      print_gimple_stmt (vect_dump, new_stmt, 0, TDF_SLIM);
+    }
+
+  /* Mark the stmts that are involved in the pattern. */
+  set_vinfo_for_stmt (new_stmt,
+                      new_stmt_vec_info (new_stmt, loop_info, NULL));
+  new_stmt_info = vinfo_for_stmt (new_stmt);
+
+  pos_stmt_info = vinfo_for_stmt (pos_stmt);
+  minmax_stmt_info = vinfo_for_stmt (min_max_stmt);
+
+  STMT_VINFO_DEF_TYPE (new_stmt_info) = STMT_VINFO_DEF_TYPE (minmax_stmt_info);
+  STMT_VINFO_VECTYPE (new_stmt_info) = STMT_VINFO_VECTYPE (minmax_stmt_info);
+
+  STMT_VINFO_IN_PATTERN_P (minmax_stmt_info) = true;
+  STMT_VINFO_COMPOUND_PATTERN (new_stmt_info) = minmax_stmt;
+  STMT_VINFO_COMPOUND_PATTERN (pos_stmt_info) = minmax_loc_stmt;
+  STMT_VINFO_RELATED_STMT (new_stmt_info) = min_max_stmt;
+  STMT_VINFO_RELATED_STMT (minmax_stmt_info) = new_stmt;
+  STMT_VINFO_RELATED_STMT (pos_stmt_info) = new_stmt;
+
+  return true;
+}
+
+/* Detect patterns consisting of two more statements to be vectorized.
+   Currently the only supported pattern is min/max location.  */
+
+void
+vect_compound_pattern_recog (unsigned int nargs, ...)
+{
+  unsigned int j;
+  va_list args;
+  bool detected = false;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "=== vect_compound_pattern_recog ===");
+
+  /* Scan over all generic vect_recog__compound_xxx_pattern functions.  */
+  for (j = 0; j < NUM_COMPOUND_PATTERNS; j++)
+    {
+      va_start (args, nargs);
+      detected = (* vect_recog_compound_func_ptrs[j]) (nargs, args);
+      va_end (args);
+      if (detected)
+        break;
+    }
+}
+
Index: tree-vect-stmts.c
===================================================================
--- tree-vect-stmts.c	(revision 162994)
+++ tree-vect-stmts.c	(working copy)
@@ -272,8 +272,10 @@ process_use (gimple stmt, tree use, loop
   /* case 2: A reduction phi (STMT) defined by a reduction stmt (DEF_STMT).
      DEF_STMT must have already been processed, because this should be the
      only way that STMT, which is a reduction-phi, was put in the worklist,
-     as there should be no other uses for DEF_STMT in the loop.  So we just
-     check that everything is as expected, and we are done.  */
+     as there should be no other uses for DEF_STMT in the loop, unless it is
+     min/max location pattern.  So we just check that everything is as
+     as expected, and mark the min/max stmt of the location pattern stmt as
+     used by reduction (it is used by the reduction of location).  */
   dstmt_vinfo = vinfo_for_stmt (def_stmt);
   bb = gimple_bb (stmt);
   if (gimple_code (stmt) == GIMPLE_PHI
@@ -284,11 +286,22 @@ process_use (gimple stmt, tree use, loop
     {
       if (vect_print_dump_info (REPORT_DETAILS))
 	fprintf (vect_dump, "reduc-stmt defining reduc-phi in the same nest.");
+
+      /* Compound reduction pattern: is used by reduction.  */
+      if (STMT_VINFO_COMPOUND_PATTERN (dstmt_vinfo))
+        {
+          relevant = vect_used_by_reduction;
+          vect_mark_relevant (worklist, def_stmt, relevant, live_p);
+          return true;
+        }
+
       if (STMT_VINFO_IN_PATTERN_P (dstmt_vinfo))
 	dstmt_vinfo = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (dstmt_vinfo));
+
       gcc_assert (STMT_VINFO_RELEVANT (dstmt_vinfo) < vect_used_by_reduction);
-      gcc_assert (STMT_VINFO_LIVE_P (dstmt_vinfo)
-		  || STMT_VINFO_RELEVANT (dstmt_vinfo) > vect_unused_in_scope);
+      gcc_assert (STMT_VINFO_LIVE_P (dstmt_vinfo) 
+		  || STMT_VINFO_RELEVANT (dstmt_vinfo) > vect_unused_in_scope
+		  || STMT_VINFO_COMPOUND_PATTERN (dstmt_vinfo));
       return true;
     }
 
@@ -482,7 +495,8 @@ vect_mark_stmts_to_be_vectorized (loop_v
 	          break;
 
 	        case vect_used_by_reduction:
-	          if (gimple_code (stmt) == GIMPLE_PHI)
+	          if (gimple_code (stmt) == GIMPLE_PHI
+                      || STMT_VINFO_COMPOUND_PATTERN (stmt_vinfo))
                     break;
   	          /* fall through */
 
@@ -3993,11 +4007,16 @@ vect_is_simple_cond (tree cond, loop_vec
    to be used at REDUC_INDEX (in then clause if REDUC_INDEX is 1, and in
    else caluse if it is 2).
 
+   In min/max location pattern, reduction defs are used in both condition part
+   and then/else clause. In that case COND_REDUC_DEF contains such vector def,
+   and COND_REDUC_INDEX specifies its place in the condition.
+
    Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
 
 bool
 vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
-			gimple *vec_stmt, tree reduc_def, int reduc_index)
+			gimple *vec_stmt, tree reduc_def, int reduc_index,
+                        tree cond_reduc_def, int cond_reduc_index) 
 {
   tree scalar_dest = NULL_TREE;
   tree vec_dest = NULL_TREE;
@@ -4015,6 +4034,7 @@ vectorizable_condition (gimple stmt, gim
   int nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
   enum tree_code code;
+  tree comparison_type;
 
   /* FORNOW: unsupported in basic block SLP.  */
   gcc_assert (loop_vinfo);
@@ -4023,20 +4043,23 @@ vectorizable_condition (gimple stmt, gim
   if (ncopies > 1)
     return false; /* FORNOW */
 
-  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+  if (!STMT_VINFO_RELEVANT_P (stmt_info)
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info))
     return false;
 
   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
-      && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+      && !((STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+            || STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
            && reduc_def))
-    return false;
+    return false;  
 
   /* FORNOW: SLP not supported.  */
   if (STMT_SLP_TYPE (stmt_info))
     return false;
 
   /* FORNOW: not yet supported.  */
-  if (STMT_VINFO_LIVE_P (stmt_info))
+  if (STMT_VINFO_LIVE_P (stmt_info) 
+      && !STMT_VINFO_COMPOUND_PATTERN (stmt_info)) 
     {
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "value used after loop.");
@@ -4062,10 +4085,13 @@ vectorizable_condition (gimple stmt, gim
     return false;
 
   /* We do not handle two different vector types for the condition
-     and the values.  */
+     and the values.  */ 
   if (!types_compatible_p (TREE_TYPE (TREE_OPERAND (cond_expr, 0)),
-			   TREE_TYPE (vectype)))
-    return false;
+			   TREE_TYPE (vectype))
+      && !(STMT_VINFO_COMPOUND_PATTERN (stmt_info)
+           && TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (cond_expr, 0)))
+               == TYPE_SIZE_UNIT (TREE_TYPE (then_clause))))
+    return false; 
 
   if (TREE_CODE (then_clause) == SSA_NAME)
     {
@@ -4094,43 +4120,75 @@ vectorizable_condition (gimple stmt, gim
 
   vec_mode = TYPE_MODE (vectype);
 
-  if (!vec_stmt)
+  comparison_type = 
+         get_vectype_for_scalar_type (TREE_TYPE (TREE_OPERAND (cond_expr, 0)));
+
+  /* Check that min/max location pattern is supported. Here we check if the
+     reduction epilogue is supported. The condition itself is checked in
+     expand_vec_cond_expr_p () below.  */
+  if (STMT_VINFO_COMPOUND_PATTERN (stmt_info) == minmax_loc_stmt)
+    {
+      enum tree_code reduc_code;
+      enum insn_code icode;
+
+      reduc_code = get_minloc_reduc_epilogue_code (TREE_CODE (cond_expr));
+      icode = get_vec_reduc_minloc_expr_code (reduc_code, comparison_type, 
+                                              vectype);
+      if (icode == CODE_FOR_nothing)
+        {
+          if (vect_print_dump_info (REPORT_DETAILS))
+            fprintf (vect_dump, "unsupported reduction epilogue");
+  
+          return false;
+        }
+    }
+
+  if (!vec_stmt) 
     {
       STMT_VINFO_TYPE (stmt_info) = condition_vec_info_type;
-      return expand_vec_cond_expr_p (TREE_TYPE (op), vec_mode);
+      return expand_vec_cond_expr_p (TREE_TYPE (op), vec_mode, 
+               TREE_TYPE (TREE_OPERAND (cond_expr, 0)), 
+               TYPE_MODE (comparison_type));
     }
 
-  /* Transform */
+  /* Transform.  */
 
   /* Handle def.  */
   scalar_dest = gimple_assign_lhs (stmt);
   vec_dest = vect_create_destination_var (scalar_dest, vectype);
 
   /* Handle cond expr.  */
-  vec_cond_lhs =
-    vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0), stmt, NULL);
-  vec_cond_rhs =
-    vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1), stmt, NULL);
+  if (cond_reduc_index == 1)
+    vec_cond_lhs = cond_reduc_def;
+  else
+    vec_cond_lhs = 
+      vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0), stmt, NULL);
+
+  if (cond_reduc_index == 2)
+    vec_cond_rhs = cond_reduc_def;
+  else
+    vec_cond_rhs = 
+      vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1), stmt, NULL);
+
   if (reduc_index == 1)
     vec_then_clause = reduc_def;
   else
     vec_then_clause = vect_get_vec_def_for_operand (then_clause, stmt, NULL);
+
   if (reduc_index == 2)
     vec_else_clause = reduc_def;
   else
     vec_else_clause = vect_get_vec_def_for_operand (else_clause, stmt, NULL);
 
   /* Arguments are ready. Create the new vector stmt.  */
-  vec_compare = build2 (TREE_CODE (cond_expr), vectype,
-			vec_cond_lhs, vec_cond_rhs);
-  vec_cond_expr = build3 (VEC_COND_EXPR, vectype,
-			  vec_compare, vec_then_clause, vec_else_clause);
-
+  vec_compare = build2 (TREE_CODE (cond_expr), comparison_type, vec_cond_lhs, 
+                        vec_cond_rhs);
+  vec_cond_expr = build3 (VEC_COND_EXPR, vectype, vec_compare, 
+			  vec_then_clause, vec_else_clause);
   *vec_stmt = gimple_build_assign (vec_dest, vec_cond_expr);
   new_temp = make_ssa_name (vec_dest, *vec_stmt);
   gimple_assign_set_lhs (*vec_stmt, new_temp);
   vect_finish_stmt_generation (stmt, *vec_stmt, gsi);
-
   return true;
 }
 
@@ -4186,7 +4244,8 @@ vect_analyze_stmt (gimple stmt, bool *ne
       case vect_nested_cycle:
          gcc_assert (!bb_vinfo && (relevance == vect_used_in_outer
                      || relevance == vect_used_in_outer_by_reduction
-                     || relevance == vect_unused_in_scope));
+                     || relevance == vect_unused_in_scope
+                     || relevance == vect_used_by_reduction));
          break;
 
       case vect_induction_def:
@@ -4248,7 +4307,7 @@ vect_analyze_stmt (gimple stmt, bool *ne
             || vectorizable_call (stmt, NULL, NULL)
             || vectorizable_store (stmt, NULL, NULL, NULL)
             || vectorizable_reduction (stmt, NULL, NULL, NULL)
-            || vectorizable_condition (stmt, NULL, NULL, NULL, 0));
+            || vectorizable_condition (stmt, NULL, NULL, NULL, 0, NULL, 0)); 
     else
       {
         if (bb_vinfo)
@@ -4389,7 +4448,7 @@ vect_transform_stmt (gimple stmt, gimple
 
     case condition_vec_info_type:
       gcc_assert (!slp_node);
-      done = vectorizable_condition (stmt, gsi, &vec_stmt, NULL, 0);
+      done = vectorizable_condition (stmt, gsi, &vec_stmt, NULL, 0, NULL, 0); 
       gcc_assert (done);
       break;
 
@@ -4527,6 +4586,7 @@ new_stmt_vec_info (gimple stmt, loop_vec
   STMT_VINFO_VEC_STMT (res) = NULL;
   STMT_VINFO_VECTORIZABLE (res) = true;
   STMT_VINFO_IN_PATTERN_P (res) = false;
+  STMT_VINFO_COMPOUND_PATTERN (res) = not_in_pattern;
   STMT_VINFO_RELATED_STMT (res) = NULL;
   STMT_VINFO_DATA_REF (res) = NULL;
 
Index: tree-inline.c
===================================================================
--- tree-inline.c	(revision 162994)
+++ tree-inline.c	(working copy)
@@ -3336,6 +3336,10 @@ estimate_operator_cost (enum tree_code c
     case REDUC_MAX_EXPR:
     case REDUC_MIN_EXPR:
     case REDUC_PLUS_EXPR:
+    case REDUC_MIN_FIRST_LOC_EXPR:
+    case REDUC_MIN_LAST_LOC_EXPR:
+    case REDUC_MAX_FIRST_LOC_EXPR:
+    case REDUC_MAX_LAST_LOC_EXPR:
     case WIDEN_SUM_EXPR:
     case WIDEN_MULT_EXPR:
     case DOT_PROD_EXPR:
Index: tree-vect-generic.c
===================================================================
--- tree-vect-generic.c	(revision 162994)
+++ tree-vect-generic.c	(working copy)
@@ -395,6 +395,7 @@ expand_vector_operations_1 (gimple_stmt_
   optab op;
   enum gimple_rhs_class rhs_class;
   tree new_rhs;
+  enum insn_code icode = CODE_FOR_nothing;
 
   if (gimple_code (stmt) != GIMPLE_ASSIGN)
     return;
@@ -471,6 +472,16 @@ expand_vector_operations_1 (gimple_stmt_
       && INTEGRAL_TYPE_P (TREE_TYPE (type)))
     op = optab_for_tree_code (MINUS_EXPR, type, optab_default);
 
+  if (code == REDUC_MIN_FIRST_LOC_EXPR
+      || code == REDUC_MIN_LAST_LOC_EXPR
+      || code == REDUC_MAX_FIRST_LOC_EXPR
+      || code == REDUC_MAX_LAST_LOC_EXPR)
+    {
+      type = TREE_TYPE (rhs1);
+      icode = get_vec_reduc_minloc_expr_code (code, TREE_TYPE (rhs1), 
+                                              TREE_TYPE (rhs2));
+    }
+
   /* For very wide vectors, try using a smaller vector mode.  */
   compute_type = type;
   if (TYPE_MODE (type) == BLKmode && op)
@@ -496,8 +507,9 @@ expand_vector_operations_1 (gimple_stmt_
 	   || GET_MODE_CLASS (compute_mode) == MODE_VECTOR_UFRACT
 	   || GET_MODE_CLASS (compute_mode) == MODE_VECTOR_ACCUM
 	   || GET_MODE_CLASS (compute_mode) == MODE_VECTOR_UACCUM)
-          && op != NULL
-	  && optab_handler (op, compute_mode) != CODE_FOR_nothing)
+          && ((op != NULL
+	       && optab_handler (op, compute_mode) != CODE_FOR_nothing)
+              || icode != CODE_FOR_nothing))
 	return;
       else
 	/* There is no operation in hardware, so fall back to scalars.  */
Index: tree-cfg.c
===================================================================
--- tree-cfg.c	(revision 162994)
+++ tree-cfg.c	(working copy)
@@ -3574,6 +3574,22 @@ do_pointer_plus_expr_check:
       /* Continue with generic binary expression handling.  */
       break;
 
+    case REDUC_MIN_FIRST_LOC_EXPR:
+    case REDUC_MIN_LAST_LOC_EXPR:
+    case REDUC_MAX_FIRST_LOC_EXPR:
+    case REDUC_MAX_LAST_LOC_EXPR:
+      if (!useless_type_conversion_p (lhs_type, rhs2_type)
+          || TYPE_VECTOR_SUBPARTS (lhs_type) 
+              != TYPE_VECTOR_SUBPARTS (rhs1_type))
+        {
+          error ("type mismatch in binary expression");
+          debug_generic_stmt (lhs_type);
+          debug_generic_stmt (rhs2_type);
+          return true;
+        }
+
+      return false;
+
     default:
       gcc_unreachable ();
     }
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 162994)
+++ config/rs6000/rs6000.c	(working copy)
@@ -1187,7 +1187,8 @@ static tree rs6000_gimplify_va_arg (tree
 static bool rs6000_must_pass_in_stack (enum machine_mode, const_tree);
 static bool rs6000_scalar_mode_supported_p (enum machine_mode);
 static bool rs6000_vector_mode_supported_p (enum machine_mode);
-static rtx rs6000_emit_vector_compare_inner (enum rtx_code, rtx, rtx);
+static rtx rs6000_emit_vector_compare_inner (enum rtx_code, rtx, rtx, 
+                                             enum machine_mode);
 static rtx rs6000_emit_vector_compare (enum rtx_code, rtx, rtx,
 				       enum machine_mode);
 static tree rs6000_stack_protect_fail (void);
@@ -16389,7 +16390,8 @@ output_e500_flip_gt_bit (rtx dst, rtx sr
 /* Return insn for VSX or Altivec comparisons.  */
 
 static rtx
-rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
+rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1,
+                                  enum machine_mode mask_mode)
 {
   rtx mask;
   enum machine_mode mode = GET_MODE (op0);
@@ -16400,9 +16402,15 @@ rs6000_emit_vector_compare_inner (enum r
       break;
 
     case GE:
+    case GEF:
       if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
 	return NULL_RTX;
 
+    case EQF:
+    case GTF:
+      mode = mask_mode;
+      /* Fall through.  */
+
     case EQ:
     case GT:
     case GTU:
@@ -16432,7 +16440,7 @@ rs6000_emit_vector_compare (enum rtx_cod
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));
 
   /* See if the comparison works as is.  */
-  mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
+  mask = rs6000_emit_vector_compare_inner (rcode, op0, op1, dmode);
   if (mask)
     return mask;
 
@@ -16448,6 +16456,11 @@ rs6000_emit_vector_compare (enum rtx_cod
       swap_operands = true;
       try_again = true;
       break;
+    case LTF:
+      rcode = GTF;
+      swap_operands = true;
+      try_again = true;
+      break;
     case NE:
     case UNLE:
     case UNLT:
@@ -16481,11 +16494,12 @@ rs6000_emit_vector_compare (enum rtx_cod
     case GEU:
     case LE:
     case LEU:
+    case LEF:
       /* Try GT/GTU/LT/LTU OR EQ */
       {
 	rtx c_rtx, eq_rtx;
 	enum insn_code ior_code;
-	enum rtx_code new_code;
+	enum rtx_code new_code, eq_code = EQ;
 
 	switch (rcode)
 	  {
@@ -16501,6 +16515,11 @@ rs6000_emit_vector_compare (enum rtx_cod
 	    new_code = LT;
 	    break;
 
+          case LEF:
+            new_code = LTF;
+            eq_code = EQF;
+            break;
+
 	  case LEU:
 	    new_code = LTU;
 	    break;
@@ -16517,7 +16536,7 @@ rs6000_emit_vector_compare (enum rtx_cod
 	if (!c_rtx)
 	  return NULL_RTX;
 
-	eq_rtx = rs6000_emit_vector_compare (EQ, op0, op1, dmode);
+	eq_rtx = rs6000_emit_vector_compare (eq_code, op0, op1, dmode);
 	if (!eq_rtx)
 	  return NULL_RTX;
 
@@ -16540,7 +16559,7 @@ rs6000_emit_vector_compare (enum rtx_cod
 	  op1 = tmp;
 	}
 
-      mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
+      mask = rs6000_emit_vector_compare_inner (rcode, op0, op1, dmode);
       if (mask)
 	return mask;
     }
@@ -16558,6 +16577,7 @@ rs6000_emit_vector_cond_expr (rtx dest, 
 			      rtx cond, rtx cc_op0, rtx cc_op1)
 {
   enum machine_mode dest_mode = GET_MODE (dest);
+  enum machine_mode cond_mode = GET_MODE (cc_op0);
   enum rtx_code rcode = GET_CODE (cond);
   enum machine_mode cc_mode = CCmode;
   rtx mask;
@@ -16595,8 +16615,34 @@ rs6000_emit_vector_cond_expr (rtx dest, 
 
     default:
       break;
-    }
+  }
+
 
+  if (!FLOAT_MODE_P (dest_mode) && FLOAT_MODE_P (cond_mode))
+    {
+      switch (rcode)
+        {
+          case GE:
+            rcode = GEF;
+            break;
+
+          case GT:
+            rcode = GTF;
+            break;
+
+          case LE:
+            rcode = LEF;
+            break;
+
+          case LT:
+            rcode = LTF;
+            break;
+
+          default:
+            break;
+        }
+    }
+  
   /* Get the vector mask for the given relational operations.  */
   mask = rs6000_emit_vector_compare (rcode, cc_op0, cc_op1, dest_mode);
 
Index: config/rs6000/altivec.md
===================================================================
--- config/rs6000/altivec.md	(revision 162994)
+++ config/rs6000/altivec.md	(working copy)
@@ -144,6 +144,7 @@
    (UNSPEC_VUPKHU_V4SF  326)
    (UNSPEC_VUPKLU_V4SF  327)
    (UNSPEC_VNMSUBFP	328)
+   (UNSPEC_REDUC_MINLOC 329)
 ])
 
 (define_constants
@@ -475,6 +476,14 @@
   "vcmpeqfp %0,%1,%2"
   [(set_attr "type" "veccmp")])
 
+(define_insn "altivec_eqfv4sf"
+  [(set (match_operand:V4SI 0 "altivec_register_operand" "=v")
+        (eqf:V4SI (match_operand:V4SF 1 "altivec_register_operand" "v")
+                  (match_operand:V4SF 2 "altivec_register_operand" "v")))]
+  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
+  "vcmpeqfp %0,%1,%2"
+  [(set_attr "type" "veccmp")])
+
 (define_insn "*altivec_gtv4sf"
   [(set (match_operand:V4SF 0 "altivec_register_operand" "=v")
 	(gt:V4SF (match_operand:V4SF 1 "altivec_register_operand" "v")
@@ -483,6 +492,14 @@
   "vcmpgtfp %0,%1,%2"
   [(set_attr "type" "veccmp")])
 
+(define_insn "*altivec_gtfv4sf"
+  [(set (match_operand:V4SI 0 "altivec_register_operand" "=v")
+        (gtf:V4SI (match_operand:V4SF 1 "altivec_register_operand" "v")
+                  (match_operand:V4SF 2 "altivec_register_operand" "v")))]
+  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
+  "vcmpgtfp %0,%1,%2"
+  [(set_attr "type" "veccmp")])
+
 (define_insn "*altivec_gev4sf"
   [(set (match_operand:V4SF 0 "altivec_register_operand" "=v")
 	(ge:V4SF (match_operand:V4SF 1 "altivec_register_operand" "v")
@@ -491,6 +508,14 @@
   "vcmpgefp %0,%1,%2"
   [(set_attr "type" "veccmp")])
 
+(define_insn "*altivec_gefv4sf"
+  [(set (match_operand:V4SI 0 "altivec_register_operand" "=v")
+        (gef:V4SI (match_operand:V4SF 1 "altivec_register_operand" "v")
+                  (match_operand:V4SF 2 "altivec_register_operand" "v")))]
+  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
+  "vcmpgefp %0,%1,%2"
+  [(set_attr "type" "veccmp")])
+
 (define_insn "*altivec_vsel<mode>"
   [(set (match_operand:VM 0 "altivec_register_operand" "=v")
 	(if_then_else:VM
@@ -1942,6 +1967,172 @@
   DONE;
 }")
 
+(define_expand "reduc_min_first_loc_v4sfv4si"
+ [(set (match_operand:V4SI 0 "register_operand" "")
+        (unspec:V4SI [(match_operand:V4SF 1 "register_operand" "")
+                    (match_operand:V4SI 2 "register_operand" "")]
+                                UNSPEC_REDUC_MINLOC))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx shift1 = gen_rtx_CONST_INT (QImode, 8);
+  rtx shift2 = gen_rtx_CONST_INT (QImode, 4);
+  rtx vr1 = gen_reg_rtx (V4SFmode);
+  rtx vr2 = gen_reg_rtx (V4SFmode);
+  rtx vr3 = gen_reg_rtx (V4SFmode);
+  rtx mask = gen_reg_rtx (V4SImode);
+  rtx not_mask = gen_reg_rtx (V4SImode);
+  rtx vr4 = gen_reg_rtx (V4SImode);
+  rtx vr5 = gen_reg_rtx (V4SImode);
+  rtx vr6 = gen_reg_rtx (V4SImode);
+  rtx max_val = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_altivec_vsldoi_v4sf (vr1, operands[1], operands[1], shift1));
+  emit_insn (gen_sminv4sf3 (vr2, vr1, operands[1]));
+  emit_insn (gen_altivec_vsldoi_v4sf (vr3, vr2, vr2, shift2));
+  emit_insn (gen_sminv4sf3 (vr1, vr2, vr3));
+
+  emit_insn (gen_altivec_eqfv4sf (mask, operands[1], vr1));
+
+  emit_insn (gen_andv4si3 (vr4, mask, operands[2]));
+  emit_insn (gen_norv4si3 (not_mask, mask, mask));
+  emit_insn (gen_altivec_vspltisw (max_val, constm1_rtx));
+  emit_insn (gen_andv4si3 (vr5, not_mask, max_val));
+  emit_insn (gen_iorv4si3 (vr6, vr4, vr5));
+
+  emit_insn (gen_altivec_vsldoi_v4si (vr4, vr6, vr6, shift1));
+  emit_insn (gen_uminv4si3 (vr5, vr4, vr6));
+  emit_insn (gen_altivec_vsldoi_v4si (vr6, vr5, vr5, shift2));
+  emit_insn (gen_uminv4si3 (operands[0], vr5, vr6));
+
+  DONE;
+}")
+
+(define_expand "reduc_min_last_loc_v4sfv4si"
+ [(set (match_operand:V4SI 0 "register_operand" "")
+        (unspec:V4SI [(match_operand:V4SF 1 "register_operand" "")
+                    (match_operand:V4SI 2 "register_operand" "")]
+                                UNSPEC_REDUC_MINLOC))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx shift1 = gen_rtx_CONST_INT (QImode, 8);
+  rtx shift2 = gen_rtx_CONST_INT (QImode, 4);
+  rtx vr1 = gen_reg_rtx (V4SFmode);
+  rtx vr2 = gen_reg_rtx (V4SFmode);
+  rtx vr3 = gen_reg_rtx (V4SFmode);
+  rtx mask = gen_reg_rtx (V4SImode);
+  rtx not_mask = gen_reg_rtx (V4SImode);
+  rtx vr4 = gen_reg_rtx (V4SImode);
+  rtx vr5 = gen_reg_rtx (V4SImode);
+  rtx vr6 = gen_reg_rtx (V4SImode);
+  rtx min_val = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_altivec_vsldoi_v4sf (vr1, operands[1], operands[1], shift1));
+  emit_insn (gen_sminv4sf3 (vr2, vr1, operands[1]));
+  emit_insn (gen_altivec_vsldoi_v4sf (vr3, vr2, vr2, shift2));
+  emit_insn (gen_sminv4sf3 (vr1, vr2, vr3));
+
+  emit_insn (gen_altivec_eqfv4sf (mask, operands[1], vr1));
+
+  emit_insn (gen_andv4si3 (vr4, mask, operands[2]));
+  emit_insn (gen_norv4si3 (not_mask, mask, mask));
+  emit_insn (gen_altivec_vspltisw (min_val, const0_rtx));
+  emit_insn (gen_andv4si3 (vr5, not_mask, min_val));
+  emit_insn (gen_iorv4si3 (vr6, vr4, vr5));
+
+  emit_insn (gen_altivec_vsldoi_v4si (vr4, vr6, vr6, shift1));
+  emit_insn (gen_umaxv4si3 (vr5, vr4, vr6));
+  emit_insn (gen_altivec_vsldoi_v4si (vr6, vr5, vr5, shift2));
+  emit_insn (gen_umaxv4si3 (operands[0], vr5, vr6));
+
+  DONE;
+}")
+
+
+(define_expand "reduc_max_first_loc_v4sfv4si"
+ [(set (match_operand:V4SI 0 "register_operand" "")
+        (unspec:V4SI [(match_operand:V4SF 1 "register_operand" "")
+                    (match_operand:V4SI 2 "register_operand" "")]
+                                UNSPEC_REDUC_MINLOC))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx shift1 = gen_rtx_CONST_INT (QImode, 8);
+  rtx shift2 = gen_rtx_CONST_INT (QImode, 4);
+  rtx vr1 = gen_reg_rtx (V4SFmode);
+  rtx vr2 = gen_reg_rtx (V4SFmode);
+  rtx vr3 = gen_reg_rtx (V4SFmode);
+  rtx mask = gen_reg_rtx (V4SImode);
+  rtx not_mask = gen_reg_rtx (V4SImode);
+  rtx vr4 = gen_reg_rtx (V4SImode);
+  rtx vr5 = gen_reg_rtx (V4SImode);
+  rtx vr6 = gen_reg_rtx (V4SImode);
+  rtx max_val = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_altivec_vsldoi_v4sf (vr1, operands[1], operands[1], shift1));
+  emit_insn (gen_smaxv4sf3 (vr2, vr1, operands[1]));
+  emit_insn (gen_altivec_vsldoi_v4sf (vr3, vr2, vr2, shift2));
+  emit_insn (gen_smaxv4sf3 (vr1, vr2, vr3));
+
+  emit_insn (gen_altivec_eqfv4sf (mask, operands[1], vr1));
+
+  emit_insn (gen_andv4si3 (vr4, mask, operands[2]));
+  emit_insn (gen_norv4si3 (not_mask, mask, mask));
+  emit_insn (gen_altivec_vspltisw (max_val, constm1_rtx));
+  emit_insn (gen_andv4si3 (vr5, not_mask, max_val));
+  emit_insn (gen_iorv4si3 (vr6, vr4, vr5));
+
+  emit_insn (gen_altivec_vsldoi_v4si (vr4, vr6, vr6, shift1));
+  emit_insn (gen_uminv4si3 (vr5, vr4, vr6));
+  emit_insn (gen_altivec_vsldoi_v4si (vr6, vr5, vr5, shift2));
+  emit_insn (gen_uminv4si3 (operands[0], vr5, vr6));
+
+  DONE;
+}")
+
+(define_expand "reduc_max_last_loc_v4sfv4si"
+ [(set (match_operand:V4SI 0 "register_operand" "")
+        (unspec:V4SI [(match_operand:V4SF 1 "register_operand" "")
+                    (match_operand:V4SI 2 "register_operand" "")]
+                                UNSPEC_REDUC_MINLOC))]
+  "TARGET_ALTIVEC"
+  "
+{
+  rtx shift1 = gen_rtx_CONST_INT (QImode, 8);
+  rtx shift2 = gen_rtx_CONST_INT (QImode, 4);
+  rtx vr1 = gen_reg_rtx (V4SFmode);
+  rtx vr2 = gen_reg_rtx (V4SFmode);
+  rtx vr3 = gen_reg_rtx (V4SFmode);
+  rtx mask = gen_reg_rtx (V4SImode);
+  rtx not_mask = gen_reg_rtx (V4SImode);
+  rtx vr4 = gen_reg_rtx (V4SImode);
+  rtx vr5 = gen_reg_rtx (V4SImode);
+  rtx vr6 = gen_reg_rtx (V4SImode);
+  rtx min_val = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_altivec_vsldoi_v4sf (vr1, operands[1], operands[1], shift1));
+  emit_insn (gen_smaxv4sf3 (vr2, vr1, operands[1]));
+  emit_insn (gen_altivec_vsldoi_v4sf (vr3, vr2, vr2, shift2));
+  emit_insn (gen_smaxv4sf3 (vr1, vr2, vr3));
+
+  emit_insn (gen_altivec_eqfv4sf (mask, operands[1], vr1));
+
+  emit_insn (gen_andv4si3 (vr4, mask, operands[2]));
+  emit_insn (gen_norv4si3 (not_mask, mask, mask));
+  emit_insn (gen_altivec_vspltisw (min_val, const0_rtx));
+  emit_insn (gen_andv4si3 (vr5, not_mask, min_val));
+  emit_insn (gen_iorv4si3 (vr6, vr4, vr5));
+
+  emit_insn (gen_altivec_vsldoi_v4si (vr4, vr6, vr6, shift1));
+  emit_insn (gen_umaxv4si3 (vr5, vr4, vr6));
+  emit_insn (gen_altivec_vsldoi_v4si (vr6, vr5, vr5, shift2));
+  emit_insn (gen_umaxv4si3 (operands[0], vr5, vr6));
+
+  DONE;
+}")
+
+
 (define_expand "neg<mode>2"
   [(use (match_operand:VI 0 "register_operand" ""))
    (use (match_operand:VI 1 "register_operand" ""))]
@@ -2802,3 +2993,39 @@
   emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx));
   DONE;
 }")
+
+(define_expand "vcondcv4sfv4si"
+  [(set (match_operand:V4SI 0 "vint_operand" "")
+        (if_then_else:V4SI
+         (match_operator 3 "comparison_operator"
+                         [(match_operand:V4SF 4 "vfloat_operand" "")
+                          (match_operand:V4SF 5 "vfloat_operand" "")])
+         (match_operand:V4SI 1 "vint_operand" "")
+         (match_operand:V4SI 2 "vint_operand" "")))]
+  "TARGET_ALTIVEC"
+  "
+{
+  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
+                                    operands[3], operands[4], operands[5]))
+    DONE;
+  else
+    FAIL;
+}")
+
+(define_expand "vcondcuv4sfv4si"
+  [(set (match_operand:V4SI 0 "vint_operand" "")
+        (if_then_else:V4SI
+         (match_operator 3 "comparison_operator"
+                         [(match_operand:V4SF 4 "vfloat_operand" "")
+                          (match_operand:V4SF 5 "vfloat_operand" "")])
+         (match_operand:V4SI 1 "vint_operand" "")
+         (match_operand:V4SI 2 "vint_operand" "")))]
+  "TARGET_ALTIVEC"
+  "
+{
+  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
+                                    operands[3], operands[4], operands[5]))
+    DONE;
+  else
+    FAIL;
+}")
Index: tree-vect-slp.c
===================================================================
--- tree-vect-slp.c	(revision 162994)
+++ tree-vect-slp.c	(working copy)
@@ -146,6 +146,18 @@ vect_get_and_check_slp_defs (loop_vec_in
 	  return false;
 	}
 
+      if (def_stmt && vinfo_for_stmt (def_stmt)
+          && STMT_VINFO_COMPOUND_PATTERN (vinfo_for_stmt (def_stmt))) 
+        {
+          if (vect_print_dump_info (REPORT_SLP))
+            {
+              fprintf (vect_dump, "Build SLP failed: compound pattern ");
+              print_gimple_stmt (vect_dump, def_stmt, 0, TDF_SLIM);
+            }
+
+          return false;
+        }
+
       /* Check if DEF_STMT is a part of a pattern in LOOP and get the def stmt
          from the pattern. Check that all the stmts of the node are in the
          pattern.  */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] Support vectorization of min/max location pattern - take  2
  2010-08-09  7:55             ` [patch] Support vectorization of min/max location pattern - take 2 Ira Rosen
@ 2010-08-09 10:05               ` Richard Guenther
  2010-08-09 10:58                 ` Ira Rosen
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Guenther @ 2010-08-09 10:05 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Richard Henderson, gcc-patches

On Mon, Aug 9, 2010 at 8:59 AM, Ira Rosen <IRAR@il.ibm.com> wrote:
> Richard Henderson <rth@redhat.com> wrote on 08/07/2010 11:10:37 PM:
>
>> On 07/08/2010 11:19 AM, Ira Rosen wrote:
>> > It's minloc pattern, i.e., a loop that finds the location of the
> minimum:
>> >
>> >   float  arr[N};
>> >
>> >   for (i = 0; i < N; i++)
>> >     if (arr[i] < limit)
>> >       {
>> >         pos = i + 1;
>> >         limit = arr[i];
>> >       }
>> >
>> > Vectorizer's input code:
>> >
>> >   # pos_22 = PHI <pos_1(4), 1(2)>
>> >   # limit_24 = PHI <limit_4(4), 0(2)>
>> >   ...
>> >   pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22;       //
>> > location
>> >   limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24;  //
> min
>>
>> Ok, I get it now.
>>
>> So your thinking was that you needed the builtin to replace the
>> comparison portion of the VEC_COND_EXPR?  Or, looking again I see
>> that you don't actually use VEC_COND_EXPR, you use ...
>>
>> > +  /* Create: VEC_DEST = (VEC_OPRND1 & MASK) | (VEC_OPRND2 & !MASK).
> */
>>
>> ... explicit masking.  I.e. you assume that the return value of
>> the builtin is a bit mask of the full width, and that there's no
>> better way to implement the VEC_COND.
>>
>> I wonder if it wouldn't be better to extend the definition
>> of VEC_COND_EXPR so that the comparison values can be of a
>> different type than the data operands (with the caveat that the
>> number of elements should be the same -- i.e. 4-wide compare must
>> match 4-wide data movement).
>
> I implemented VEC_COND_EXPR extension in the attached patch.
>
> For reduction epilogue I defined new tree codes
> REDUC_MIN/MAX_FIRST/LAST_LOC_EXPR.

Why do you need new tree codes here?  They btw need
documentation - just stating the new operand is a vector isn't
very informative.  They need documentation in generic.texi.

Likewise the new RTX codes (what are they for??) need documentation
in rtl.texi.

Btw, you still don't adjust if-conversion to fold the COND_EXPR
it generates - that would generate the MIN/MAX expressions
directly and you wouldn't have to pattern match the COND_EXPR.

Richard.

> Bootstrapped and tested on powerpc64-suse-linux.
> OK for mainline?
>
> Thanks,
> Ira
>
> ChangeLog:
>
>        * tree-pretty-print.c (dump_generic_node): Handle new codes.
>        * optabs.c (optab_for_tree_code): Likewise.
>        (init_optabs): Initialize new optabs.
>        (get_vcond_icode): Handle vector condition with different types
>        of comparison and then/else operands.
>        (expand_vec_cond_expr_p, expand_vec_cond_expr): Likewise.
>        (get_vec_reduc_minloc_expr_icode): New function.
>        (expand_vec_reduc_minloc_expr): New function.
>        * optabs.h (enum convert_optab_index): Add new optabs.
>        (vcondc_optab): Define.
>        (vcondcu_optab, reduc_min_first_loc_optab, reduc_min_last_loc_optab,
>        reduc_max_last_loc_optab): Likewise.
>        (expand_vec_cond_expr_p): Add arguments.
>        (get_vec_reduc_minloc_expr_code): Declare.
>        (expand_vec_reduc_minloc_expr): Declare.
>        * genopinit.c (optabs): Add vcondc_optab, vcondcu_optab,
>        reduc_min_first_loc_optab, reduc_min_last_loc_optab,
>        reduc_max_last_loc_optab.
>        * rtl.def (GEF): New rtx.
>        (GTF, LEF, LTF, EQF, NEQF): Likewise.
>        * jump.c (reverse_condition): Handle new rtx.
>        (swap_condition): Likewise.
>        * expr.c (expand_expr_real_2): Expand new reduction tree codes.
>        * gimple-pretty-print.c (dump_binary_rhs): Print new codes.
>        * tree-vectorizer.h (enum vect_compound_pattern): New.
>        (struct _stmt_vec_info): Add new field compound_pattern. Add macro
>        to access it.
>        (is_pattern_stmt_p): Return true for compound pattern.
>        (get_minloc_reduc_epilogue_code): New.
>        (vectorizable_condition): Add arguments.
>        (vect_recog_compound_func_ptr): New function-pointer type.
>        (NUM_COMPOUND_PATTERNS): New.
>        (vect_compound_pattern_recog): Declare.
>        * tree-vect-loop.c (vect_determine_vectorization_factor): Fix assert
>        for compound patterns.
>        (vect_analyze_scalar_cycles_1): Fix typo. Detect compound reduction
>        patterns. Update comment.
>        (vect_analyze_scalar_cycles): Update comment.
>        (destroy_loop_vec_info): Update def stmt for the original pattern
>        statement.
>        (vect_is_simple_reduction_1): Skip compound pattern statements in
>        uses check. Add spaces. Skip commutativity and type checks for
>        minimum location statement. Fix printings.
>        (vect_model_reduction_cost): Add min/max location pattern cost
>        computation.
>        (vect_create_epilog_for_reduction): Don't retrieve the original
>        statement for compound pattern. Fix comment accordingly. Get tree
>        code for reduction epilogue of min/max location computation
>        according to the comparison operation. Don't expect to find an
>        exit phi node for min/max statement.
>        (vectorizable_reduction): Skip check for uses in loop for compound
>        patterns. Don't retrieve the original statement for compound pattern.
>        Call vectorizable_condition () with additional parameters. Skip
>        reduction code check for compound patterns. Prepare operands for
>        min/max location statement vectorization and pass them to
>        vectorizable_condition ().
>        (vectorizable_live_operation): Return TRUE for compound patterns.
>        * tree.def (REDUC_MIN_FIRST_LOC_EXPR): Define.
>        (REDUC_MIN_LAST_LOC_EXPR, REDUC_MAX_FIRST_LOC_EXPR,
>        REDUC_MAX_LAST_LOC_EXPR): Likewise.
>        * cfgexpand.c (expand_debug_expr): Handle new tree codes.
>        * tree-vect-patterns.c (vect_recog_min_max_loc_pattern): Declare.
>        (vect_recog_compound_func_ptrs): Likewise.
>        (vect_recog_min_max_loc_pattern): New function.
>        (vect_compound_pattern_recog): Likewise.
>        * tree-vect-stmts.c (process_use): Mark compound pattern statements
> as
>        used by reduction.
>        (vect_mark_stmts_to_be_vectorized): Allow compound pattern statements
>        to be used by reduction.
>        (vectorizable_condition): Update comment, add arguments. Skip checks
>        irrelevant for compound pattern. Check that if comparison and
> then/else
>        operands are of different types, the size of the types is equal.Check
>        that reduction epilogue, if needed, is supported. Prepare operands
>        using new arguments.
>        (vect_analyze_stmt): Allow nested cycle statements to be used by
>        reduction. Call vectorizable_condition () with additional arguments.
>        (vect_transform_stmt): Call vectorizable_condition () with additional
>        arguments.
>        (new_stmt_vec_info): Initialize new fields.
>        * tree-inline.c (estimate_operator_cost): Handle new tree codes.
>        * tree-vect-generic.c (expand_vector_operations_1): Likewise.
>        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
>        * config/rs6000/rs6000.c (rs6000_emit_vector_compare_inner): Add
>        argument. Handle new rtx.
>        (rs6000_emit_vector_compare): Handle the case of result type
> different
>        from the operands, update calls to rs6000_emit_vector_compare_inner
> ().
>        (rs6000_emit_vector_cond_expr): Use new codes in case of different
>        types.
>        * config/rs6000/altivec.md (UNSPEC_REDUC_MINLOC): New.
>        (altivec_gefv4sf): New pattern.
>        (altivec_gtfv4sf, altivec_eqfv4sf, reduc_min_first_loc_v4sfv4si,
>        reduc_min_last_loc_v4sfv4si, reduc_max_first_loc_v4sfv4si,
>        reduc_max_last_loc_v4sfv4si): Likewise.
>        * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for compound
>        patterns.
>
> testsuite/ChangeLog:
>
>        * gcc.dg/vect/vect.exp: Define how to run tests named fast-math*.c
>        * lib/target-supports.exp (check_effective_target_vect_cmp): New.
>        * gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test.
>        * gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c,
>        gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c,
>        gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c,
>        gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c,
>        gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c,
>        gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c,
>        gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c,
>        gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c,
>        gcc.dg/vect/fast-math-no-pre-minmax-loc-10.c: Likewise.
>
>
> (See attached file: minloc.txt)
>
>>
>> I can think of 2 portability problems with your current solution:
>>
>> (1) SSE4.1 would prefer to use BLEND instructions, which perform
>>     that entire (X & M) | (Y & ~M) operation in one insn.
>>
>> (2) The mips C.cond.PS instruction does *not* produce a bitmask
>>     like altivec or sse do.  Instead it sets multiple condition
>>     codes.  One then uses MOV[TF].PS to merge the elements based
>>     on the individual condition codes.  While there's no direct
>>     corresponding instruction that will operate on integers, I
>>     don't think it would be too difficult to use MOV[TF].G or
>>     BC1AND2[FT] instructions to emulate it.  In any case, this
>>     is again a case where you don't want to expose any part of
>>     the VEC_COND at the gimple level.
>>
>>
>> r~

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] Support vectorization of min/max location pattern - take  2
  2010-08-09 10:05               ` Richard Guenther
@ 2010-08-09 10:58                 ` Ira Rosen
  2010-08-09 11:01                   ` Richard Guenther
  0 siblings, 1 reply; 16+ messages in thread
From: Ira Rosen @ 2010-08-09 10:58 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Richard Henderson



Richard Guenther <richard.guenther@gmail.com> wrote on 09/08/2010 12:50:14
PM:
> > I implemented VEC_COND_EXPR extension in the attached patch.
> >
> > For reduction epilogue I defined new tree codes
> > REDUC_MIN/MAX_FIRST/LAST_LOC_EXPR.
>
> Why do you need new tree codes here?

After vector loop we have two vectors one with four minimums and the second
with four corresponding array indexes. The extraction of the correct index
out of four can be done differently on each platform (including problematic
vector comparisons).

> They btw need
> documentation - just stating the new operand is a vector isn't
> very informative.  They need documentation in generic.texi.

Sorry about that, I'll add documentation for both.

>
> Likewise the new RTX codes (what are they for??)

Probably there is a better way to do that, but I needed to map new vector
comparison instructions that compare floats and return ints.

> need documentation
> in rtl.texi.
>
> Btw, you still don't adjust if-conversion to fold the COND_EXPR
> it generates - that would generate the MIN/MAX expressions
> directly and you wouldn't have to pattern match the COND_EXPR.

I don't see how it can help to avoid pattern matching. We will still need
to match MIN/MAX's arguments with the COND_EXPR arguments.

Thanks,
Ira

>
> Richard.
>
> > Bootstrapped and tested on powerpc64-suse-linux.
> > OK for mainline?
> >
> > Thanks,
> > Ira
> >
> > ChangeLog:
> >
> >        * tree-pretty-print.c (dump_generic_node): Handle new codes.
> >        * optabs.c (optab_for_tree_code): Likewise.
> >        (init_optabs): Initialize new optabs.
> >        (get_vcond_icode): Handle vector condition with different types
> >        of comparison and then/else operands.
> >        (expand_vec_cond_expr_p, expand_vec_cond_expr): Likewise.
> >        (get_vec_reduc_minloc_expr_icode): New function.
> >        (expand_vec_reduc_minloc_expr): New function.
> >        * optabs.h (enum convert_optab_index): Add new optabs.
> >        (vcondc_optab): Define.
> >        (vcondcu_optab, reduc_min_first_loc_optab,
reduc_min_last_loc_optab,
> >        reduc_max_last_loc_optab): Likewise.
> >        (expand_vec_cond_expr_p): Add arguments.
> >        (get_vec_reduc_minloc_expr_code): Declare.
> >        (expand_vec_reduc_minloc_expr): Declare.
> >        * genopinit.c (optabs): Add vcondc_optab, vcondcu_optab,
> >        reduc_min_first_loc_optab, reduc_min_last_loc_optab,
> >        reduc_max_last_loc_optab.
> >        * rtl.def (GEF): New rtx.
> >        (GTF, LEF, LTF, EQF, NEQF): Likewise.
> >        * jump.c (reverse_condition): Handle new rtx.
> >        (swap_condition): Likewise.
> >        * expr.c (expand_expr_real_2): Expand new reduction tree codes.
> >        * gimple-pretty-print.c (dump_binary_rhs): Print new codes.
> >        * tree-vectorizer.h (enum vect_compound_pattern): New.
> >        (struct _stmt_vec_info): Add new field compound_pattern. Add
macro
> >        to access it.
> >        (is_pattern_stmt_p): Return true for compound pattern.
> >        (get_minloc_reduc_epilogue_code): New.
> >        (vectorizable_condition): Add arguments.
> >        (vect_recog_compound_func_ptr): New function-pointer type.
> >        (NUM_COMPOUND_PATTERNS): New.
> >        (vect_compound_pattern_recog): Declare.
> >        * tree-vect-loop.c (vect_determine_vectorization_factor): Fix
assert
> >        for compound patterns.
> >        (vect_analyze_scalar_cycles_1): Fix typo. Detect compound
reduction
> >        patterns. Update comment.
> >        (vect_analyze_scalar_cycles): Update comment.
> >        (destroy_loop_vec_info): Update def stmt for the original
pattern
> >        statement.
> >        (vect_is_simple_reduction_1): Skip compound pattern statements
in
> >        uses check. Add spaces. Skip commutativity and type checks for
> >        minimum location statement. Fix printings.
> >        (vect_model_reduction_cost): Add min/max location pattern cost
> >        computation.
> >        (vect_create_epilog_for_reduction): Don't retrieve the original
> >        statement for compound pattern. Fix comment accordingly. Get
tree
> >        code for reduction epilogue of min/max location computation
> >        according to the comparison operation. Don't expect to find an
> >        exit phi node for min/max statement.
> >        (vectorizable_reduction): Skip check for uses in loop for
compound
> >        patterns. Don't retrieve the original statement for compound
pattern.
> >        Call vectorizable_condition () with additional parameters. Skip
> >        reduction code check for compound patterns. Prepare operands for
> >        min/max location statement vectorization and pass them to
> >        vectorizable_condition ().
> >        (vectorizable_live_operation): Return TRUE for compound
patterns.
> >        * tree.def (REDUC_MIN_FIRST_LOC_EXPR): Define.
> >        (REDUC_MIN_LAST_LOC_EXPR, REDUC_MAX_FIRST_LOC_EXPR,
> >        REDUC_MAX_LAST_LOC_EXPR): Likewise.
> >        * cfgexpand.c (expand_debug_expr): Handle new tree codes.
> >        * tree-vect-patterns.c (vect_recog_min_max_loc_pattern):
Declare.
> >        (vect_recog_compound_func_ptrs): Likewise.
> >        (vect_recog_min_max_loc_pattern): New function.
> >        (vect_compound_pattern_recog): Likewise.
> >        * tree-vect-stmts.c (process_use): Mark compound pattern
statements
> > as
> >        used by reduction.
> >        (vect_mark_stmts_to_be_vectorized): Allow compound pattern
statements
> >        to be used by reduction.
> >        (vectorizable_condition): Update comment, add arguments. Skip
checks
> >        irrelevant for compound pattern. Check that if comparison and
> > then/else
> >        operands are of different types, the size of the types is
equal.Check
> >        that reduction epilogue, if needed, is supported. Prepare
operands
> >        using new arguments.
> >        (vect_analyze_stmt): Allow nested cycle statements to be used by
> >        reduction. Call vectorizable_condition () with additional
arguments.
> >        (vect_transform_stmt): Call vectorizable_condition () with
additional
> >        arguments.
> >        (new_stmt_vec_info): Initialize new fields.
> >        * tree-inline.c (estimate_operator_cost): Handle new tree codes.
> >        * tree-vect-generic.c (expand_vector_operations_1): Likewise.
> >        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
> >        * config/rs6000/rs6000.c (rs6000_emit_vector_compare_inner): Add
> >        argument. Handle new rtx.
> >        (rs6000_emit_vector_compare): Handle the case of result type
> > different
> >        from the operands, update calls to
rs6000_emit_vector_compare_inner
> > ().
> >        (rs6000_emit_vector_cond_expr): Use new codes in case of
different
> >        types.
> >        * config/rs6000/altivec.md (UNSPEC_REDUC_MINLOC): New.
> >        (altivec_gefv4sf): New pattern.
> >        (altivec_gtfv4sf, altivec_eqfv4sf, reduc_min_first_loc_v4sfv4si,
> >        reduc_min_last_loc_v4sfv4si, reduc_max_first_loc_v4sfv4si,
> >        reduc_max_last_loc_v4sfv4si): Likewise.
> >        * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for
compound
> >        patterns.
> >
> > testsuite/ChangeLog:
> >
> >        * gcc.dg/vect/vect.exp: Define how to run tests named
fast-math*.c
> >        * lib/target-supports.exp (check_effective_target_vect_cmp):
New.
> >        * gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test.
> >        * gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c,
> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-10.c: Likewise.
> >
> >
> > (See attached file: minloc.txt)
> >
> >>
> >> I can think of 2 portability problems with your current solution:
> >>
> >> (1) SSE4.1 would prefer to use BLEND instructions, which perform
> >>     that entire (X & M) | (Y & ~M) operation in one insn.
> >>
> >> (2) The mips C.cond.PS instruction does *not* produce a bitmask
> >>     like altivec or sse do.  Instead it sets multiple condition
> >>     codes.  One then uses MOV[TF].PS to merge the elements based
> >>     on the individual condition codes.  While there's no direct
> >>     corresponding instruction that will operate on integers, I
> >>     don't think it would be too difficult to use MOV[TF].G or
> >>     BC1AND2[FT] instructions to emulate it.  In any case, this
> >>     is again a case where you don't want to expose any part of
> >>     the VEC_COND at the gimple level.
> >>
> >>
> >> r~

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] Support vectorization of min/max location pattern - take  2
  2010-08-09 10:58                 ` Ira Rosen
@ 2010-08-09 11:01                   ` Richard Guenther
  2010-08-09 11:03                     ` Richard Guenther
  2010-08-09 12:33                     ` Ira Rosen
  0 siblings, 2 replies; 16+ messages in thread
From: Richard Guenther @ 2010-08-09 11:01 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches, Richard Henderson

On Mon, Aug 9, 2010 at 12:53 PM, Ira Rosen <IRAR@il.ibm.com> wrote:
>
>
> Richard Guenther <richard.guenther@gmail.com> wrote on 09/08/2010 12:50:14
> PM:
>> > I implemented VEC_COND_EXPR extension in the attached patch.
>> >
>> > For reduction epilogue I defined new tree codes
>> > REDUC_MIN/MAX_FIRST/LAST_LOC_EXPR.
>>
>> Why do you need new tree codes here?
>
> After vector loop we have two vectors one with four minimums and the second
> with four corresponding array indexes. The extraction of the correct index
> out of four can be done differently on each platform (including problematic
> vector comparisons).

So the tree code is just to tie those two operations together?

>> They btw need
>> documentation - just stating the new operand is a vector isn't
>> very informative.  They need documentation in generic.texi.
>
> Sorry about that, I'll add documentation for both.

Thanks.

>>
>> Likewise the new RTX codes (what are they for??)
>
> Probably there is a better way to do that, but I needed to map new vector
> comparison instructions that compare floats and return ints.

So you just need this at expansion time then and the RTXen
will never appear in RTL code?  Why not use a target hook for
expanding those comparisons then?  Btw, my GSoC student
implemented lowering of generic vector comparisons resulting
in a mask in tree-vect-generic.c using a target hook that eventually
uses target specific builtins.  I attached the latest patch for that.

>> need documentation
>> in rtl.texi.
>>
>> Btw, you still don't adjust if-conversion to fold the COND_EXPR
>> it generates - that would generate the MIN/MAX expressions
>> directly and you wouldn't have to pattern match the COND_EXPR.
>
> I don't see how it can help to avoid pattern matching. We will still need
> to match MIN/MAX's arguments with the COND_EXPR arguments.

True, but you need to match MIN/MAX instead.  Well, my point
is that if-convert shouldn't create a COND_EXPR in that case.

Richard.

> Thanks,
> Ira
>
>>
>> Richard.
>>
>> > Bootstrapped and tested on powerpc64-suse-linux.
>> > OK for mainline?
>> >
>> > Thanks,
>> > Ira
>> >
>> > ChangeLog:
>> >
>> >        * tree-pretty-print.c (dump_generic_node): Handle new codes.
>> >        * optabs.c (optab_for_tree_code): Likewise.
>> >        (init_optabs): Initialize new optabs.
>> >        (get_vcond_icode): Handle vector condition with different types
>> >        of comparison and then/else operands.
>> >        (expand_vec_cond_expr_p, expand_vec_cond_expr): Likewise.
>> >        (get_vec_reduc_minloc_expr_icode): New function.
>> >        (expand_vec_reduc_minloc_expr): New function.
>> >        * optabs.h (enum convert_optab_index): Add new optabs.
>> >        (vcondc_optab): Define.
>> >        (vcondcu_optab, reduc_min_first_loc_optab,
> reduc_min_last_loc_optab,
>> >        reduc_max_last_loc_optab): Likewise.
>> >        (expand_vec_cond_expr_p): Add arguments.
>> >        (get_vec_reduc_minloc_expr_code): Declare.
>> >        (expand_vec_reduc_minloc_expr): Declare.
>> >        * genopinit.c (optabs): Add vcondc_optab, vcondcu_optab,
>> >        reduc_min_first_loc_optab, reduc_min_last_loc_optab,
>> >        reduc_max_last_loc_optab.
>> >        * rtl.def (GEF): New rtx.
>> >        (GTF, LEF, LTF, EQF, NEQF): Likewise.
>> >        * jump.c (reverse_condition): Handle new rtx.
>> >        (swap_condition): Likewise.
>> >        * expr.c (expand_expr_real_2): Expand new reduction tree codes.
>> >        * gimple-pretty-print.c (dump_binary_rhs): Print new codes.
>> >        * tree-vectorizer.h (enum vect_compound_pattern): New.
>> >        (struct _stmt_vec_info): Add new field compound_pattern. Add
> macro
>> >        to access it.
>> >        (is_pattern_stmt_p): Return true for compound pattern.
>> >        (get_minloc_reduc_epilogue_code): New.
>> >        (vectorizable_condition): Add arguments.
>> >        (vect_recog_compound_func_ptr): New function-pointer type.
>> >        (NUM_COMPOUND_PATTERNS): New.
>> >        (vect_compound_pattern_recog): Declare.
>> >        * tree-vect-loop.c (vect_determine_vectorization_factor): Fix
> assert
>> >        for compound patterns.
>> >        (vect_analyze_scalar_cycles_1): Fix typo. Detect compound
> reduction
>> >        patterns. Update comment.
>> >        (vect_analyze_scalar_cycles): Update comment.
>> >        (destroy_loop_vec_info): Update def stmt for the original
> pattern
>> >        statement.
>> >        (vect_is_simple_reduction_1): Skip compound pattern statements
> in
>> >        uses check. Add spaces. Skip commutativity and type checks for
>> >        minimum location statement. Fix printings.
>> >        (vect_model_reduction_cost): Add min/max location pattern cost
>> >        computation.
>> >        (vect_create_epilog_for_reduction): Don't retrieve the original
>> >        statement for compound pattern. Fix comment accordingly. Get
> tree
>> >        code for reduction epilogue of min/max location computation
>> >        according to the comparison operation. Don't expect to find an
>> >        exit phi node for min/max statement.
>> >        (vectorizable_reduction): Skip check for uses in loop for
> compound
>> >        patterns. Don't retrieve the original statement for compound
> pattern.
>> >        Call vectorizable_condition () with additional parameters. Skip
>> >        reduction code check for compound patterns. Prepare operands for
>> >        min/max location statement vectorization and pass them to
>> >        vectorizable_condition ().
>> >        (vectorizable_live_operation): Return TRUE for compound
> patterns.
>> >        * tree.def (REDUC_MIN_FIRST_LOC_EXPR): Define.
>> >        (REDUC_MIN_LAST_LOC_EXPR, REDUC_MAX_FIRST_LOC_EXPR,
>> >        REDUC_MAX_LAST_LOC_EXPR): Likewise.
>> >        * cfgexpand.c (expand_debug_expr): Handle new tree codes.
>> >        * tree-vect-patterns.c (vect_recog_min_max_loc_pattern):
> Declare.
>> >        (vect_recog_compound_func_ptrs): Likewise.
>> >        (vect_recog_min_max_loc_pattern): New function.
>> >        (vect_compound_pattern_recog): Likewise.
>> >        * tree-vect-stmts.c (process_use): Mark compound pattern
> statements
>> > as
>> >        used by reduction.
>> >        (vect_mark_stmts_to_be_vectorized): Allow compound pattern
> statements
>> >        to be used by reduction.
>> >        (vectorizable_condition): Update comment, add arguments. Skip
> checks
>> >        irrelevant for compound pattern. Check that if comparison and
>> > then/else
>> >        operands are of different types, the size of the types is
> equal.Check
>> >        that reduction epilogue, if needed, is supported. Prepare
> operands
>> >        using new arguments.
>> >        (vect_analyze_stmt): Allow nested cycle statements to be used by
>> >        reduction. Call vectorizable_condition () with additional
> arguments.
>> >        (vect_transform_stmt): Call vectorizable_condition () with
> additional
>> >        arguments.
>> >        (new_stmt_vec_info): Initialize new fields.
>> >        * tree-inline.c (estimate_operator_cost): Handle new tree codes.
>> >        * tree-vect-generic.c (expand_vector_operations_1): Likewise.
>> >        * tree-cfg.c (verify_gimple_assign_binary): Likewise.
>> >        * config/rs6000/rs6000.c (rs6000_emit_vector_compare_inner): Add
>> >        argument. Handle new rtx.
>> >        (rs6000_emit_vector_compare): Handle the case of result type
>> > different
>> >        from the operands, update calls to
> rs6000_emit_vector_compare_inner
>> > ().
>> >        (rs6000_emit_vector_cond_expr): Use new codes in case of
> different
>> >        types.
>> >        * config/rs6000/altivec.md (UNSPEC_REDUC_MINLOC): New.
>> >        (altivec_gefv4sf): New pattern.
>> >        (altivec_gtfv4sf, altivec_eqfv4sf, reduc_min_first_loc_v4sfv4si,
>> >        reduc_min_last_loc_v4sfv4si, reduc_max_first_loc_v4sfv4si,
>> >        reduc_max_last_loc_v4sfv4si): Likewise.
>> >        * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for
> compound
>> >        patterns.
>> >
>> > testsuite/ChangeLog:
>> >
>> >        * gcc.dg/vect/vect.exp: Define how to run tests named
> fast-math*.c
>> >        * lib/target-supports.exp (check_effective_target_vect_cmp):
> New.
>> >        * gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test.
>> >        * gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c,
>> >        gcc.dg/vect/fast-math-no-pre-minmax-loc-10.c: Likewise.
>> >
>> >
>> > (See attached file: minloc.txt)
>> >
>> >>
>> >> I can think of 2 portability problems with your current solution:
>> >>
>> >> (1) SSE4.1 would prefer to use BLEND instructions, which perform
>> >>     that entire (X & M) | (Y & ~M) operation in one insn.
>> >>
>> >> (2) The mips C.cond.PS instruction does *not* produce a bitmask
>> >>     like altivec or sse do.  Instead it sets multiple condition
>> >>     codes.  One then uses MOV[TF].PS to merge the elements based
>> >>     on the individual condition codes.  While there's no direct
>> >>     corresponding instruction that will operate on integers, I
>> >>     don't think it would be too difficult to use MOV[TF].G or
>> >>     BC1AND2[FT] instructions to emulate it.  In any case, this
>> >>     is again a case where you don't want to expose any part of
>> >>     the VEC_COND at the gimple level.
>> >>
>> >>
>> >> r~
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] Support vectorization of min/max location pattern - take  2
  2010-08-09 11:01                   ` Richard Guenther
@ 2010-08-09 11:03                     ` Richard Guenther
  2010-08-09 12:33                     ` Ira Rosen
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Guenther @ 2010-08-09 11:03 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 70 bytes --]

On Mon, Aug 9, 2010 at 1:00 PM, Richard Guenther

Missing attachment.

[-- Attachment #2: vec-compare.v3.diff --]
[-- Type: text/x-patch, Size: 23091 bytes --]

Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	(revision 162841)
+++ gcc/targhooks.c	(working copy)
@@ -954,6 +954,13 @@ default_builtin_vector_alignment_reachab
   return true;
 }
 
+tree 
+default_builtin_vec_compare (gimple_stmt_iterator *gsi, tree type, tree v0, 
+                             tree v1, enum tree_code code)
+{
+  return false;
+}
+
 /* By default, assume that a target supports any factor of misalignment
    memory access if it supports movmisalign patten.
    is_packed is true if the memory access is defined in a packed struct.  */
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 162841)
+++ gcc/target.def	(working copy)
@@ -830,6 +830,13 @@ DEFHOOK
  bool, (tree vec_type, tree mask),
  hook_bool_tree_tree_true)
 
+/* Implement hardware vector comparison or return false.  */
+DEFHOOK
+(builtin_vec_compare,
+ "",
+ tree, (gimple_stmt_iterator *gsi, tree type, tree v0, tree v1, enum tree_code code),
+ default_builtin_vec_compare)
+
 /* Return true if the target supports misaligned store/load of a
    specific factor denoted in the third parameter.  The last parameter
    is true if the access is defined in a packed struct.  */
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	(revision 162841)
+++ gcc/tree.c	(working copy)
@@ -1360,6 +1360,28 @@ build_vector_from_ctor (tree type, VEC(c
   return build_vector (type, nreverse (list));
 }
 
+/* Build a vector of type VECTYPE where all the elements are SCs.  */
+tree
+build_vector_from_val (const tree sc, const tree vectype) 
+{
+  tree t = NULL_TREE;
+  int i, nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  if (sc == error_mark_node)
+    return sc;
+
+  gcc_assert (TREE_TYPE (sc) == TREE_TYPE (vectype));
+
+  for (i = 0; i < nunits; ++i)
+    t = tree_cons (NULL_TREE, sc, t);
+
+  if (CONSTANT_CLASS_P (sc))
+    return build_vector (vectype, t);
+  else 
+    return build_constructor_from_list (vectype, t);
+}
+
+
 /* Return a new CONSTRUCTOR node whose type is TYPE and whose values
    are in the VEC pointed to by VALS.  */
 tree
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 162841)
+++ gcc/tree.h	(working copy)
@@ -4029,6 +4029,7 @@ extern tree build_int_cst_type (tree, HO
 extern tree build_int_cst_wide (tree, unsigned HOST_WIDE_INT, HOST_WIDE_INT);
 extern tree build_vector (tree, tree);
 extern tree build_vector_from_ctor (tree, VEC(constructor_elt,gc) *);
+extern tree build_vector_from_val (const tree, const tree);
 extern tree build_constructor (tree, VEC(constructor_elt,gc) *);
 extern tree build_constructor_single (tree, tree, tree);
 extern tree build_constructor_from_list (tree, tree);
Index: gcc/target.h
===================================================================
--- gcc/target.h	(revision 162841)
+++ gcc/target.h	(working copy)
@@ -51,7 +51,7 @@
 
 #include "tm.h"
 #include "insn-modes.h"
-
+#include "gimple.h"
 /* Types used by the record_gcc_switches() target function.  */
 typedef enum
 {
Index: gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
+++ gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c	(revision 0)
@@ -0,0 +1,109 @@
+#define vector(elcount, type)  \
+__attribute__((vector_size((elcount)*sizeof(type)))) type
+
+#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+
+#define check_compare(type, count, res, i0, i1, op) \
+do { \
+    int __i; \
+    for (__i = 0; __i < count; __i ++) { \
+        if (vidx (type, res, __i) != \
+                ((vidx (type, i0, __i) op vidx (type, i1, __i)) ? (type)-1 : 0)) { \
+            __builtin_printf ("%i != ((%i " #op " %i) ? -1 : 0) ", vidx (type, res, __i), \
+                              vidx (type, i0, __i), vidx (type, i1, __i)); \
+            __builtin_abort (); \
+        } \
+    } \
+} while (0)
+
+#define test(type, count, v0, v1, res); \
+do { \
+    res = (v0 > v1); \
+    check_compare (type, count, res, v0, v1, >); \
+    res = (v0 < v1); \
+    check_compare (type, count, res, v0, v1, <); \
+    res = (v0 >= v1); \
+    check_compare (type, count, res, v0, v1, >=); \
+    res = (v0 <= v1); \
+    check_compare (type, count, res, v0, v1, <=); \
+    res = (v0 == v1); \
+    check_compare (type, count, res, v0, v1, ==); \
+    res = (v0 != v1); \
+    check_compare (type, count, res, v0, v1, !=); \
+} while (0)
+
+
+int main (int argc, char *argv[]) {
+#define INT  int
+    vector (4, INT) i0;
+    vector (4, INT) i1;
+    vector (4, int) ires;
+
+    i0 = (vector (4, INT)){argc, 1,  2,  10};
+    i1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (INT, 4, i0, i1, ires);
+#undef INT
+
+
+#define INT unsigned int 
+    vector (4, int) ures;
+    vector (4, INT) u0;
+    vector (4, INT) u1;
+
+    u0 = (vector (4, INT)){argc, 1,  2,  10};
+    u1 = (vector (4, INT)){0, 3, 2, (INT)-23};    
+    test (INT, 4, u0, u1, ures);
+#undef INT
+
+
+#define SHORT short
+    vector (8, SHORT) s0;
+    vector (8, SHORT) s1;
+    vector (8, short) sres;
+
+    s0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    s1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (SHORT, 8, s0, s1, sres);
+#undef SHORT
+
+#define SHORT unsigned short
+    vector (8, SHORT) us0;
+    vector (8, SHORT) us1;
+    vector (8, short) usres;
+
+    us0 = (vector (8, SHORT)){argc, 1,  2,  10,  6, 87, (SHORT)-5, 2};
+    us1 = (vector (8, SHORT)){0, 3, 2, (SHORT)-23, 12, 10, (SHORT)-2, 0};    
+    test (SHORT, 8, us0, us1, usres);
+#undef SHORT
+
+
+#define CHAR signed char
+    vector (16, CHAR) c0;
+    vector (16, CHAR) c1;
+    vector (16, signed char) cres;
+
+    c0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    c1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (CHAR, 16, c0, c1, cres);
+#undef CHAR
+
+#define CHAR char
+    vector (16, CHAR) uc0;
+    vector (16, CHAR) uc1;
+    vector (16, signed char) ucres;
+
+    uc0 = (vector (16, CHAR)){argc, 1,  2,  10,  6, 87, (CHAR)-5, 2, \
+                             argc, 1,  2,  10,  6, 87, (CHAR)-5, 2 };
+
+    uc1 = (vector (16, CHAR)){0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0, \
+                             0, 3, 2, (CHAR)-23, 12, 10, (CHAR)-2, 0};
+    test (CHAR, 16, uc0, uc1, ucres);
+#undef CHAR
+
+
+    return 0;
+}
+
Index: gcc/c-typeck.c
===================================================================
--- gcc/c-typeck.c	(revision 162841)
+++ gcc/c-typeck.c	(working copy)
@@ -9606,6 +9606,29 @@ build_binary_op (location_t location, en
 
     case EQ_EXPR:
     case NE_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
 		    OPT_Wfloat_equal,
@@ -9718,6 +9741,29 @@ build_binary_op (location_t location, en
     case GE_EXPR:
     case LT_EXPR:
     case GT_EXPR:
+      if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE)
+        {
+          tree intt;
+          if (TREE_TYPE (type0) != TREE_TYPE (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "element types");
+              return error_mark_node;
+            }
+
+          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+            {
+              error_at (location, "comparing vectors with different "
+                                  "number of elements");
+              return error_mark_node;
+            }
+
+          /* Always construct signed integer vector type.  */
+          intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0);
+          result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+          converted = 1;
+          break;
+        }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
 	   || code0 == FIXED_POINT_TYPE)
@@ -10113,6 +10159,10 @@ c_objc_common_truthvalue_conversion (loc
     case FUNCTION_TYPE:
       gcc_unreachable ();
 
+    case VECTOR_TYPE:
+      error_at (location, "used vector type where scalar is required");
+      return error_mark_node;
+
     default:
       break;
     }
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 162841)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  
 #include "tree-pass.h"
 #include "flags.h"
 #include "ggc.h"
+#include "target.h"
 
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
@@ -125,6 +126,21 @@ do_binop (gimple_stmt_iterator *gsi, tre
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
+
+/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0;  */
+static tree
+do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+	  tree bitpos, tree bitsize, enum tree_code code)
+{
+  tree cond;
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  cond = gimplify_build2 (gsi, code, inner_type, a, b);
+  return gimplify_build3 (gsi, COND_EXPR, inner_type, cond, 
+                    build_int_cst (inner_type, -1),
+                    build_int_cst (inner_type, 0));
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -284,6 +300,21 @@ expand_vector_addition (gimple_stmt_iter
 				    a, b, code);
 }
 
+/* Try a hardware hook for vector comparison or 
+   extract comparison piecewise.  */
+static tree
+expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
+                          tree op1, enum tree_code code)
+{
+  tree t = targetm.vectorize.builtin_vec_compare (gsi, type, op0, op1, code);
+
+  if (t == NULL_TREE)
+    t = expand_vector_piecewise (gsi, do_compare, type, 
+                    TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  return t;
+
+}
+
 static tree
 expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type,
 			 gimple assign, enum tree_code code)
@@ -326,8 +357,24 @@ expand_vector_operation (gimple_stmt_ite
       case BIT_NOT_EXPR:
         return expand_vector_parallel (gsi, do_unop, type,
 		      		       gimple_assign_rhs1 (assign),
-				       NULL_TREE, code);
-
+        			       NULL_TREE, code);
+      case EQ_EXPR:
+      case NE_EXPR:
+      case GT_EXPR:
+      case LT_EXPR:
+      case GE_EXPR:
+      case LE_EXPR:
+      case UNEQ_EXPR:
+      case UNGT_EXPR:
+      case UNLT_EXPR:
+      case UNGE_EXPR:
+      case UNLE_EXPR:
+      case LTGT_EXPR:
+      case ORDERED_EXPR:
+      case UNORDERED_EXPR:
+        return expand_vector_comparison (gsi, type,
+                                      gimple_assign_rhs1 (assign),
+                                      gimple_assign_rhs2 (assign), code);
       default:
 	break;
       }
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	(revision 162841)
+++ gcc/tree-cfg.c	(working copy)
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  
 #include "flags.h"
 #include "function.h"
 #include "ggc.h"
+#include "c-lang.h"
 #include "langhooks.h"
 #include "tree-pretty-print.h"
 #include "gimple-pretty-print.h"
@@ -3144,6 +3145,39 @@ verify_gimple_comparison (tree type, tre
       return true;
     }
 
+  if (TREE_CODE (op0_type) == VECTOR_TYPE 
+      && TREE_CODE (op1_type) == VECTOR_TYPE
+      && TREE_CODE (type) == VECTOR_TYPE)
+    {
+      tree t;
+      if (TYPE_VECTOR_SUBPARTS (op0_type) != TYPE_VECTOR_SUBPARTS (op1_type))
+        {
+          error ("invalid vector comparison, number of elements do not match");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TREE_TYPE (op0_type) != TREE_TYPE (op1_type))
+        {
+          error ("invalid vector comparison, vector element type mismatch");
+          debug_generic_expr (op0_type);
+          debug_generic_expr (op1_type);
+          return true;
+        }
+      
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
+          && TYPE_PRECISION (TREE_TYPE (op0_type)) 
+             != TYPE_PRECISION (TREE_TYPE (type)))
+        {
+          error ("invalid vector comparison resulting type");
+          debug_generic_expr (type);
+          return true;
+        }
+        
+      return false;
+    }
+
   /* For comparisons we do not have the operations type as the
      effective type the comparison is carried out in.  Instead
      we require that either the first operand is trivially
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 162841)
+++ gcc/config/i386/i386.c	(working copy)
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  
 #include "tm.h"
 #include "rtl.h"
 #include "tree.h"
+#include "tree-flow.h"
 #include "tm_p.h"
 #include "regs.h"
 #include "hard-reg-set.h"
@@ -30050,6 +30051,276 @@ ix86_vectorize_builtin_vec_perm (tree ve
   return ix86_builtins[(int) fcode];
 }
 
+/* Find target specific sequence for vector comparison of 
+   real-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                   enum machine_mode mode, tree v0, tree v1,
+                   enum tree_code code)
+{
+  enum ix86_builtins fcode;
+  int arg = -1;
+  tree fdef, frtype, tmp, var, t;
+  gimple new_stmt;
+  bool reverse = false;
+
+#define SWITCH_MODE(mode, fcode, code, value) \
+switch (mode) \
+  { \
+    case V2DFmode: \
+      if (!TARGET_SSE2) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## PD; \
+      break; \
+    case V4DFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPD256; \
+      arg = value; \
+      break; \
+    case V4SFmode: \
+      if (!TARGET_SSE) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMP ## code ## SS; \
+      break; \
+    case V8SFmode: \
+      if (!TARGET_AVX) return NULL_TREE; \
+      fcode = IX86_BUILTIN_CMPPS256; \
+      arg = value; \
+      break; \
+    default: \
+      return NULL_TREE; \
+    /* FIXME: Similar instructions for MMX.  */ \
+  }
+
+  switch (code)
+    {
+      case EQ_EXPR:
+        SWITCH_MODE (mode, fcode, EQ, 0);
+        break;
+      
+      case NE_EXPR:
+        SWITCH_MODE (mode, fcode, NEQ, 4);
+        break;
+      
+      case GT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        reverse = true;
+        break;
+      
+      case LT_EXPR:
+        SWITCH_MODE (mode, fcode, LT, 1);
+        break;
+      
+      case LE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        break;
+
+      case GE_EXPR:
+        SWITCH_MODE (mode, fcode, LE, 2);
+        reverse = true;
+        break;
+
+      default:
+        return NULL_TREE;
+    }
+#undef SWITCH_MODE
+
+  fdef = ix86_builtins[(int)fcode];
+  frtype = TREE_TYPE (TREE_TYPE (fdef));
+ 
+  tmp = create_tmp_var (frtype, "tmp");
+  var = create_tmp_var (rettype, "tmp");
+
+  if (arg == -1)
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 2, v1, v0);
+    else
+      new_stmt = gimple_build_call (fdef, 2, v0, v1);
+  else
+    if (reverse)
+      new_stmt = gimple_build_call (fdef, 3, v0, v1, 
+                    build_int_cst (char_type_node, arg));
+    else
+      new_stmt = gimple_build_call (fdef, 3, v1, v0, 
+                    build_int_cst (char_type_node, arg));
+     
+  gimple_call_set_lhs (new_stmt, tmp); 
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  
+  return var;
+}
+
+/* Find target specific sequence for vector comparison of 
+   integer-type vectors V0 and V1. Returns variable containing 
+   result of the comparison or NULL_TREE in other case.  */
+static tree
+vector_int_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                    enum machine_mode mode, tree v0, tree v1,
+                    enum tree_code code)
+{
+  enum ix86_builtins feq, fgt;
+  tree var, t, tmp, tmp1, tmp2, defeq, defgt, gtrtype, eqrtype;
+  gimple new_stmt;
+
+  switch (mode)
+    {
+      /* SSE integer-type vectors.  */
+      case V2DImode:
+        if (!TARGET_SSE4_2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQQ;
+        fgt = IX86_BUILTIN_PCMPGTQ;
+        break;
+
+      case V4SImode:
+        if (!TARGET_SSE2) return NULL_TREE; 
+        feq = IX86_BUILTIN_PCMPEQD128;
+        fgt = IX86_BUILTIN_PCMPGTD128;
+        break;
+      
+      case V8HImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW128;
+        fgt = IX86_BUILTIN_PCMPGTW128;
+        break;
+      
+      case V16QImode:
+        if (!TARGET_SSE2) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB128;
+        fgt = IX86_BUILTIN_PCMPGTB128;
+        break;
+      
+      /* MMX integer-type vectors.  */
+      case V2SImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQD;
+        fgt = IX86_BUILTIN_PCMPGTD;
+        break;
+
+      case V4HImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQW;
+        fgt = IX86_BUILTIN_PCMPGTW;
+        break;
+
+      case V8QImode:
+        if (!TARGET_MMX) return NULL_TREE;
+        feq = IX86_BUILTIN_PCMPEQB;
+        fgt = IX86_BUILTIN_PCMPGTB;
+        break;
+      
+      /* FIXME: Similar instructions for AVX.  */
+      default:
+        return NULL_TREE;
+    }
+
+  
+  var = create_tmp_var (rettype, "ret");
+  defeq = ix86_builtins[(int)feq];
+  defgt = ix86_builtins[(int)fgt];
+  eqrtype = TREE_TYPE (TREE_TYPE (defeq));
+  gtrtype = TREE_TYPE (TREE_TYPE (defgt));
+
+#define EQGT_CALL(gsi, stmt, var, op0, op1, gteq) \
+do { \
+  var = create_tmp_var (gteq ## rtype, "tmp"); \
+  stmt = gimple_build_call (def ## gteq, 2, op0, op1); \
+  gimple_call_set_lhs (stmt, var); \
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT); \
+} while (0)
+   
+  switch (code)
+    {
+      case EQ_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, eq);
+        break;
+
+      case NE_EXPR:
+        tmp = create_tmp_var (eqrtype, "tmp");
+
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, eq);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v0, eq);
+
+        /* t = tmp1 ^ {-1, -1,...}  */
+        t = gimplify_build2 (gsi, BIT_XOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+
+      case GT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v0, v1, gt);
+        break;
+
+      case LT_EXPR:
+        EQGT_CALL (gsi, new_stmt, tmp, v1, v0, gt);
+        break;
+
+      case GE_EXPR:
+        if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+      
+      case LE_EXPR:
+         if (eqrtype != gtrtype)
+          return NULL_TREE;
+        tmp = create_tmp_var (eqrtype, "tmp");
+        EQGT_CALL (gsi, new_stmt, tmp1, v1, v0, gt);
+        EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq);
+        t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2);
+        new_stmt = gimple_build_assign (tmp, t);
+        gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+        break;
+     
+      default:
+        return NULL_TREE;
+    }
+#undef EQGT_CALL
+
+  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp);
+  new_stmt = gimple_build_assign (var, t);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  return var;
+}
+
+/* Lower a comparison of two vectors V0 and V1, returning a 
+   variable with the result of comparison. Returns NULL_TREE
+   when it is impossible to find a target specific sequence.  */
+static tree 
+ix86_vectorize_builtin_vec_compare (gimple_stmt_iterator *gsi, tree rettype, 
+                                    tree v0, tree v1, enum tree_code code)
+{
+  tree type;
+
+  /* Make sure we are comparing the same types.  */
+  if (TREE_TYPE (v0) != TREE_TYPE (v1)
+      || TREE_TYPE (TREE_TYPE (v0)) != TREE_TYPE (TREE_TYPE (v1)))
+    return NULL_TREE;
+  
+  type = TREE_TYPE (v0);
+  
+  /* Cannot compare packed unsigned integers 
+     unless it is EQ or NEQ operations.  */
+  if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE 
+      && TYPE_UNSIGNED (TREE_TYPE (type)))
+    if (code != EQ_EXPR && code != NE_EXPR)
+      return NULL_TREE;
+
+
+  if (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE)
+    return vector_fp_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+    return vector_int_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code);
+  else
+    return NULL_TREE;
+}
+
 /* Return a vector mode with twice as many elements as VMODE.  */
 /* ??? Consider moving this to a table generated by genmodes.c.  */
 
@@ -31541,6 +31812,11 @@ ix86_enum_va_list (int idx, const char *
 #define TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK \
   ix86_vectorize_builtin_vec_perm_ok
 
+#undef TARGET_VECTORIZE_BUILTIN_VEC_COMPARE
+#define TARGET_VECTORIZE_BUILTIN_VEC_COMPARE \
+  ix86_vectorize_builtin_vec_compare
+
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [patch] Support vectorization of min/max location pattern - take  2
  2010-08-09 11:01                   ` Richard Guenther
  2010-08-09 11:03                     ` Richard Guenther
@ 2010-08-09 12:33                     ` Ira Rosen
  1 sibling, 0 replies; 16+ messages in thread
From: Ira Rosen @ 2010-08-09 12:33 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Richard Henderson



Richard Guenther <richard.guenther@gmail.com> wrote on 09/08/2010 02:00:59
PM:

> On Mon, Aug 9, 2010 at 12:53 PM, Ira Rosen <IRAR@il.ibm.com> wrote:
> >
> >
> > Richard Guenther <richard.guenther@gmail.com> wrote on 09/08/2010
12:50:14
> > PM:
> >> > I implemented VEC_COND_EXPR extension in the attached patch.
> >> >
> >> > For reduction epilogue I defined new tree codes
> >> > REDUC_MIN/MAX_FIRST/LAST_LOC_EXPR.
> >>
> >> Why do you need new tree codes here?
> >
> > After vector loop we have two vectors one with four minimums and the
second
> > with four corresponding array indexes. The extraction of the correct
index
> > out of four can be done differently on each platform (including
problematic
> > vector comparisons).
>
> So the tree code is just to tie those two operations together?

It is not to tie MIN/MAX extraction and index extraction together. It is to
extract scalar value from a vector of indexes. To do that we need both
vectors (minimums and indexes). We already have tree codes for other
reduction epilogues (like REDUC_PLUS_EXPR).

>
> >> They btw need
> >> documentation - just stating the new operand is a vector isn't
> >> very informative.  They need documentation in generic.texi.
> >
> > Sorry about that, I'll add documentation for both.
>
> Thanks.
>
> >>
> >> Likewise the new RTX codes (what are they for??)
> >
> > Probably there is a better way to do that, but I needed to map new
vector
> > comparison instructions that compare floats and return ints.
>
> So you just need this at expansion time then and the RTXen
> will never appear in RTL code?

It will appear in RTL code. AFAIU, current RTX codes for vector comparison
require same types for input and output, so I had to add new codes.

> Why not use a target hook for
> expanding those comparisons then?  Btw, my GSoC student
> implemented lowering of generic vector comparisons resulting
> in a mask in tree-vect-generic.c using a target hook that eventually
> uses target specific builtins.  I attached the latest patch for that.

I used a target hook in the original patch, but I inserted calls it in the
vectorizer.
BTW, I don't understand how your hook will work for mips:

> >> >>
> >> >> (2) The mips C.cond.PS instruction does *not* produce a bitmask
> >> >>     like altivec or sse do.  Instead it sets multiple condition
> >> >>     codes.  One then uses MOV[TF].PS to merge the elements based
> >> >>     on the individual condition codes.  While there's no direct
> >> >>     corresponding instruction that will operate on integers, I
> >> >>     don't think it would be too difficult to use MOV[TF].G or
> >> >>     BC1AND2[FT] instructions to emulate it.  In any case, this
> >> >>     is again a case where you don't want to expose any part of
> >> >>     the VEC_COND at the gimple level.
> >> >>
> >> >>
> >> >> r~
> >


If VECT_COND_EXPR and REDUC_..._LOC_EXPR are used, why do we need a target
hook? To avoid new RTX codes?

>
> >> need documentation
> >> in rtl.texi.
> >>
> >> Btw, you still don't adjust if-conversion to fold the COND_EXPR
> >> it generates - that would generate the MIN/MAX expressions
> >> directly and you wouldn't have to pattern match the COND_EXPR.
> >
> > I don't see how it can help to avoid pattern matching. We will still
need
> > to match MIN/MAX's arguments with the COND_EXPR arguments.
>
> True, but you need to match MIN/MAX instead.  Well, my point
> is that if-convert shouldn't create a COND_EXPR in that case.

OK, I'll try to fix if-convert (and my patch accordingly).

Thanks,
Ira

>
> Richard.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [patch] Support vectorization of min/max location pattern
  2010-07-06  7:15 ` Ira Rosen
  2010-07-07 20:43   ` Richard Henderson
@ 2010-11-19 15:53   ` H.J. Lu
  2010-12-15 20:27     ` H.J. Lu
  1 sibling, 1 reply; 16+ messages in thread
From: H.J. Lu @ 2010-11-19 15:53 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

On Tue, Jul 6, 2010 at 12:14 AM, Ira Rosen <IRAR@il.ibm.com> wrote:
> gcc-patches-owner@gcc.gnu.org wrote on 01/07/2010 11:00:50 AM:
>
>> Hi,
>>
>> This patch adds vectorization support of min/max location pattern:
>>
>>   for (i = 0; i < N; i++)
>>     if (arr[i] < limit)
>>       {
>>         pos = i + 1;
>>         limit = arr[i];
>>       }
>>
>> The recognized pattern is compound of two statements (and is called
>> compound pattern):
>>
>>   # pos_22 = PHI <pos_1(4), 1(2)>
>>   # limit_24 = PHI <limit_4(4), 0(2)>
>>   ...
>>   pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22;
>>   limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24;
>>
>> both statements should be reductions with cond_expr and have the same
>> condition part. The min/max statement is expected to be of the form "x op
>> y ? x : y" (where op can be >, <, >= or <=), and the location is expected
>> to be an induction.
>>
>> To vectorize min/max location pattern we use a technique described in
>> "Multimedia vectorization of floating-point MIN/MAX reductions" by
>> A.J.C.Bik, X.Tian and M.B.Girkar,
>> http://portal.acm.org/citation.cfm?id=1145765.
>>
>> Vectorized loop (maxloc, first index):
>>      vcx[0:vl-1:1] = | x |..| x |;  - vector of max values
>>      vck[0:vl-1:1] = | k |..| k |;  - vector of positions
>>      ind[0:vl-1:1] = |vl-1|..| 0 |;
>>      inc[0:vl-1:1] = | vl |..| vl |;
>>      for (i = 0; i < N; i += vl) {
>>        msk[0:vl-1:1] = (a[i:i+vl-1:1] > vcx[0:vl-1:1]);
>>        vck[0:vl-1:1] = (ind[0:vl-1:1] & msk[0:vl-1:1]) |
>>                        (vck[0:vl-1:1] & !msk[0:vl-1:1]);
>>        vcx[0:vl-1:1] = VMAX(vcx[0:vl-1:1], a[i:i+vl-1:1]);
>>        ind[0:vl-1:1] += inc[0:vl-1:1];
>>      }
>>      x = HMAX(vcx[0:vl-1:1]);       - scalar maximum extraction
>>      msk[0:vl-1:1] = (vcx[0:vl-1:1] == |x|..|x|);
>>      vck[0:vl-1:1] = (vck[0:vl-1:1] & msk[0:vl-1:1]) |
>>                      (|MaxInt|..|MaxInt| & !msk[0:vl-1:1]);
>>      k = HMIN(vck[0:vl-1:1]);       - first position extraction
>>
>>
>> Vectorization of minloc is supposed to help gas_dyn from Polyhedron as
>> discussed in PR 31067.
>>
>> PRs 44710 and 44711 currently prevent the vectorization. PR 44711 can be
>> bypassed by using -fno-tree-pre. I'll wait for a fix of PR 44710 before I
>> commit this patch (after I regtest it again).
>> Also the case of pos = i; instead of pos = i+1; is not supported since in
>> this case the operands are switched, i.e., we get "x op y ? y : x".
>>
>>
>> My main question is the implementation of vector comparisons. I
> understand
>> that different targets can return different types of results. So instead
> of
>> defining new tree codes, I used target builtin which also returns the
> type
>> of the result.
>>
>> Other comments are welcome too.
>>
>> Bootstrapped and tested on powerpc64-suse-linux.
>
> Since it looks like nobody objects the use of target builtins for vector
> comparison, I am resubmitting an updated patch (the code) for review of
> non-vectorizer parts.
>
> Thanks,
> Ira
>
>
> ChangeLog:
>
>      * doc/tm.texi: Regenerate.
>      * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_VECT_COMPARE):
>      Document.
>      * target.def (builtin_vect_compare): Add new builtin.
>      * tree-vectorizer.h (enum vect_compound_pattern): New.
>      (struct _stmt_vec_info): Add new fields compound_pattern and
>      reduc_scalar_result_stmt. Add macros to access them.
>      (is_pattern_stmt_p): Return true for compound pattern.
>      (vectorizable_condition): Add arguments.
>      (vect_recog_compound_func_ptr): New function-pointer type.
>      (NUM_COMPOUND_PATTERNS): New.
>      (vect_compound_pattern_recog): Declare.
>      * tree-vect-loop.c (vect_determine_vectorization_factor): Fix assert
>      for compound patterns.
>      (vect_analyze_scalar_cycles_1): Fix typo. Detect compound reduction
>      patterns. Update comment.
>      (vect_analyze_scalar_cycles): Update comment.
>      (destroy_loop_vec_info): Update def stmt for the original pattern
>      statement.
>      (vect_is_simple_reduction_1): Skip compound pattern statements in
>      uses check. Add spaces. Skip commutativity and type checks for
>      minimum location statement. Fix printings.
>      (vect_model_reduction_cost): Add min/max location pattern cost
>      computation.
>      (vect_create_epilogue_for_compound_pattern): New function.
>      (vect_create_epilog_for_reduction): Don't retrieve the original
>      statement for compound pattern. Fix comment accordingly. Store the
>      result of vector reduction computation in a variable and use it. Call
>      vect_create_epilogue_for_compound_pattern (). Check if optab exists
>      before using it. Keep the scalar result computation statement. Use
>      either exit phi node result or compound pattern result in scalar
>      extraction. Don't expect to find an exit phi node for min/max
>      statement.
>      (vectorizable_reduction): Skip check for uses in loop for compound
>      patterns. Don't retrieve the original statement for compound pattern.
>      Call vectorizable_condition () with additional parameters. Skip
>      reduction code check for compound patterns. Prepare operands for
>      min/max location statement vectorization and pass them to
>      vectorizable_condition ().
>      (vectorizable_live_operation): Return TRUE for compound patterns.
>      * tree-vect-patterns.c (vect_recog_min_max_loc_pattern): Declare.
>      (vect_recog_compound_func_ptrs): Likewise.
>      (vect_recog_min_max_loc_pattern): New function.
>      (vect_compound_pattern_recog): Likewise.
>      * tree-vect-stmts.c (process_use): Mark compound pattern statements
>      as used by reduction.
>      (vect_mark_stmts_to_be_vectorized): Allow compound pattern statements
>      to be used by reduction.
>      (vectorize_minmax_location_pattern): New function.
>      (vectorizable_condition): Update comment, add arguments. Skip checks
>      irrelevant for compound pattern. Check that vector comparisons are
>      supported by the target. Prepare operands using new arguments. Call
>      vectorize_minmax_location_pattern().
>      (vect_analyze_stmt): Allow nested cycle statements to be used by
>      reduction. Call vectorizable_condition () with additional arguments.
>      (vect_transform_stmt): Call vectorizable_condition () with additional
>      arguments.
>      (new_stmt_vec_info): Initialize new fields.
>      * config/rs6000/rs6000-builtin.def (ALTIVEC_BUILTIN_VCMPLTFP): New.
>      (ALTIVEC_BUILTIN_VCMPLEFP): New.
>      * config/rs6000/rs6000.c (rs6000_builtin_vect_compare): New.
>      (TARGET_VECTORIZE_BUILTIN_VEC_CMP): Redefine.
>      (struct builtin_description bdesc_2arg): Add altivec_vcmpltfp and
>      altivec_vcmplefp.
>      * config/rs6000/altivec.md (altivec_vcmpltfp): New pattern.
>      (altivec_vcmplefp): Likewise.
>      * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for compound
>      patterns.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46561


-- 
H.J.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [patch] Support vectorization of min/max location pattern
  2010-11-19 15:53   ` [RFC] [patch] Support vectorization of min/max location pattern H.J. Lu
@ 2010-12-15 20:27     ` H.J. Lu
  0 siblings, 0 replies; 16+ messages in thread
From: H.J. Lu @ 2010-12-15 20:27 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

On Fri, Nov 19, 2010 at 7:02 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Jul 6, 2010 at 12:14 AM, Ira Rosen <IRAR@il.ibm.com> wrote:
>> gcc-patches-owner@gcc.gnu.org wrote on 01/07/2010 11:00:50 AM:
>>
>>> Hi,
>>>
>>> This patch adds vectorization support of min/max location pattern:
>>>
>>>   for (i = 0; i < N; i++)
>>>     if (arr[i] < limit)
>>>       {
>>>         pos = i + 1;
>>>         limit = arr[i];
>>>       }
>>>
>>> The recognized pattern is compound of two statements (and is called
>>> compound pattern):
>>>
>>>   # pos_22 = PHI <pos_1(4), 1(2)>
>>>   # limit_24 = PHI <limit_4(4), 0(2)>
>>>   ...
>>>   pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22;
>>>   limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24;
>>>
>>> both statements should be reductions with cond_expr and have the same
>>> condition part. The min/max statement is expected to be of the form "x op
>>> y ? x : y" (where op can be >, <, >= or <=), and the location is expected
>>> to be an induction.
>>>
>>> To vectorize min/max location pattern we use a technique described in
>>> "Multimedia vectorization of floating-point MIN/MAX reductions" by
>>> A.J.C.Bik, X.Tian and M.B.Girkar,
>>> http://portal.acm.org/citation.cfm?id=1145765.
>>>
>>> Vectorized loop (maxloc, first index):
>>>      vcx[0:vl-1:1] = | x |..| x |;  - vector of max values
>>>      vck[0:vl-1:1] = | k |..| k |;  - vector of positions
>>>      ind[0:vl-1:1] = |vl-1|..| 0 |;
>>>      inc[0:vl-1:1] = | vl |..| vl |;
>>>      for (i = 0; i < N; i += vl) {
>>>        msk[0:vl-1:1] = (a[i:i+vl-1:1] > vcx[0:vl-1:1]);
>>>        vck[0:vl-1:1] = (ind[0:vl-1:1] & msk[0:vl-1:1]) |
>>>                        (vck[0:vl-1:1] & !msk[0:vl-1:1]);
>>>        vcx[0:vl-1:1] = VMAX(vcx[0:vl-1:1], a[i:i+vl-1:1]);
>>>        ind[0:vl-1:1] += inc[0:vl-1:1];
>>>      }
>>>      x = HMAX(vcx[0:vl-1:1]);       - scalar maximum extraction
>>>      msk[0:vl-1:1] = (vcx[0:vl-1:1] == |x|..|x|);
>>>      vck[0:vl-1:1] = (vck[0:vl-1:1] & msk[0:vl-1:1]) |
>>>                      (|MaxInt|..|MaxInt| & !msk[0:vl-1:1]);
>>>      k = HMIN(vck[0:vl-1:1]);       - first position extraction
>>>
>>>
>>> Vectorization of minloc is supposed to help gas_dyn from Polyhedron as
>>> discussed in PR 31067.
>>>
>>> PRs 44710 and 44711 currently prevent the vectorization. PR 44711 can be
>>> bypassed by using -fno-tree-pre. I'll wait for a fix of PR 44710 before I
>>> commit this patch (after I regtest it again).
>>> Also the case of pos = i; instead of pos = i+1; is not supported since in
>>> this case the operands are switched, i.e., we get "x op y ? y : x".
>>>
>>>
>>> My main question is the implementation of vector comparisons. I
>> understand
>>> that different targets can return different types of results. So instead
>> of
>>> defining new tree codes, I used target builtin which also returns the
>> type
>>> of the result.
>>>
>>> Other comments are welcome too.
>>>
>>> Bootstrapped and tested on powerpc64-suse-linux.
>>
>> Since it looks like nobody objects the use of target builtins for vector
>> comparison, I am resubmitting an updated patch (the code) for review of
>> non-vectorizer parts.
>>
>> Thanks,
>> Ira
>>
>>
>> ChangeLog:
>>
>>      * doc/tm.texi: Regenerate.
>>      * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_VECT_COMPARE):
>>      Document.
>>      * target.def (builtin_vect_compare): Add new builtin.
>>      * tree-vectorizer.h (enum vect_compound_pattern): New.
>>      (struct _stmt_vec_info): Add new fields compound_pattern and
>>      reduc_scalar_result_stmt. Add macros to access them.
>>      (is_pattern_stmt_p): Return true for compound pattern.
>>      (vectorizable_condition): Add arguments.
>>      (vect_recog_compound_func_ptr): New function-pointer type.
>>      (NUM_COMPOUND_PATTERNS): New.
>>      (vect_compound_pattern_recog): Declare.
>>      * tree-vect-loop.c (vect_determine_vectorization_factor): Fix assert
>>      for compound patterns.
>>      (vect_analyze_scalar_cycles_1): Fix typo. Detect compound reduction
>>      patterns. Update comment.
>>      (vect_analyze_scalar_cycles): Update comment.
>>      (destroy_loop_vec_info): Update def stmt for the original pattern
>>      statement.
>>      (vect_is_simple_reduction_1): Skip compound pattern statements in
>>      uses check. Add spaces. Skip commutativity and type checks for
>>      minimum location statement. Fix printings.
>>      (vect_model_reduction_cost): Add min/max location pattern cost
>>      computation.
>>      (vect_create_epilogue_for_compound_pattern): New function.
>>      (vect_create_epilog_for_reduction): Don't retrieve the original
>>      statement for compound pattern. Fix comment accordingly. Store the
>>      result of vector reduction computation in a variable and use it. Call
>>      vect_create_epilogue_for_compound_pattern (). Check if optab exists
>>      before using it. Keep the scalar result computation statement. Use
>>      either exit phi node result or compound pattern result in scalar
>>      extraction. Don't expect to find an exit phi node for min/max
>>      statement.
>>      (vectorizable_reduction): Skip check for uses in loop for compound
>>      patterns. Don't retrieve the original statement for compound pattern.
>>      Call vectorizable_condition () with additional parameters. Skip
>>      reduction code check for compound patterns. Prepare operands for
>>      min/max location statement vectorization and pass them to
>>      vectorizable_condition ().
>>      (vectorizable_live_operation): Return TRUE for compound patterns.
>>      * tree-vect-patterns.c (vect_recog_min_max_loc_pattern): Declare.
>>      (vect_recog_compound_func_ptrs): Likewise.
>>      (vect_recog_min_max_loc_pattern): New function.
>>      (vect_compound_pattern_recog): Likewise.
>>      * tree-vect-stmts.c (process_use): Mark compound pattern statements
>>      as used by reduction.
>>      (vect_mark_stmts_to_be_vectorized): Allow compound pattern statements
>>      to be used by reduction.
>>      (vectorize_minmax_location_pattern): New function.
>>      (vectorizable_condition): Update comment, add arguments. Skip checks
>>      irrelevant for compound pattern. Check that vector comparisons are
>>      supported by the target. Prepare operands using new arguments. Call
>>      vectorize_minmax_location_pattern().
>>      (vect_analyze_stmt): Allow nested cycle statements to be used by
>>      reduction. Call vectorizable_condition () with additional arguments.
>>      (vect_transform_stmt): Call vectorizable_condition () with additional
>>      arguments.
>>      (new_stmt_vec_info): Initialize new fields.
>>      * config/rs6000/rs6000-builtin.def (ALTIVEC_BUILTIN_VCMPLTFP): New.
>>      (ALTIVEC_BUILTIN_VCMPLEFP): New.
>>      * config/rs6000/rs6000.c (rs6000_builtin_vect_compare): New.
>>      (TARGET_VECTORIZE_BUILTIN_VEC_CMP): Redefine.
>>      (struct builtin_description bdesc_2arg): Add altivec_vcmpltfp and
>>      altivec_vcmplefp.
>>      * config/rs6000/altivec.md (altivec_vcmpltfp): New pattern.
>>      (altivec_vcmplefp): Likewise.
>>      * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for compound
>>      patterns.
>>
>
> This caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46561
>

This also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46969

-- 
H.J.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-12-15 20:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-01  8:01 [RFC] [patch] Support vectorization of min/max location pattern Ira Rosen
2010-07-06  7:15 ` Ira Rosen
2010-07-07 20:43   ` Richard Henderson
2010-07-08  7:34     ` Ira Rosen
2010-07-08  9:21       ` Richard Guenther
2010-07-08 17:15       ` Richard Henderson
2010-07-08 18:20         ` Ira Rosen
2010-07-08 20:10           ` Richard Henderson
2010-08-09  7:55             ` [patch] Support vectorization of min/max location pattern - take 2 Ira Rosen
2010-08-09 10:05               ` Richard Guenther
2010-08-09 10:58                 ` Ira Rosen
2010-08-09 11:01                   ` Richard Guenther
2010-08-09 11:03                     ` Richard Guenther
2010-08-09 12:33                     ` Ira Rosen
2010-11-19 15:53   ` [RFC] [patch] Support vectorization of min/max location pattern H.J. Lu
2010-12-15 20:27     ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).